All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH resend v5 0/11] ceph: add perf metrics support
@ 2020-01-29  8:27 xiubli
  2020-01-29  8:27 ` [PATCH resend v5 01/11] ceph: add global dentry lease metric support xiubli
                   ` (10 more replies)
  0 siblings, 11 replies; 31+ messages in thread
From: xiubli @ 2020-01-29  8:27 UTC (permalink / raw)
  To: jlayton, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

Changed in V2:
- add read/write/metadata latency metric support.
- add and send client provided metric flags in client metadata
- addressed the comments from Ilya and merged the 4/4 patch into 3/4.
- addressed all the other comments in v1 series.

Changed in V3:
- addressed Jeff's comments and let's the callers do the metric
counting.
- with some small fixes for the read/write latency
- tested based on the latest testing branch

Changed in V4:
- fix the lock issue

Changed in V5:
- add r_end_stamp for the osdc request
- delete reset metric and move it to metric sysfs
- move ceph_osdc_{read,write}pages to ceph.ko
- use percpu counters instead for read/write/metadata latencies

Changed in V5 resend:
- add the lost patch.

It will send the metrics to the MDSs every second if sending_metrics is enabled, disable as default.


We can get the metrics from the debugfs:

$ cat /sys/kernel/debug/ceph/0c93a60d-5645-4c46-8568-4c8f63db4c7f.client4267/metrics 
item          total       sum_lat(us)     avg_lat(us)
-----------------------------------------------------
read          13          417000          32076
write         42          131205000       3123928
metadata      104         493000          4740

item          total           miss            hit
-------------------------------------------------
d_lease       204             0               918

session       caps            miss            hit
-------------------------------------------------
0             204             213             368218


In the MDS side, we can get the metrics(NOTE: the latency is in
nanosecond):

$ ./bin/ceph fs perf stats | python -m json.tool
{
    "client_metadata": {
        "client.4267": {
            "IP": "v1:192.168.195.165",
            "hostname": "fedora1",
            "mount_point": "N/A",
            "root": "/"
        }
    },
    "counters": [
        "cap_hit"
    ],
    "global_counters": [
        "read_latency",
        "write_latency",
        "metadata_latency",
        "dentry_lease_hit"
    ],
    "global_metrics": {
        "client.4267": [
            [
                0,
                32076923
            ],
            [
                3,
                123928571
            ],
            [
                0,
                4740384
            ],
            [
                918,
                0
            ]
        ]
    },
    "metrics": {
        "delayed_ranks": [],
        "mds.0": {
            "client.4267": [
                [
                    368218,
                    213
                ]
            ]
        }
    }
}


The provided metric flags in client metadata

$./bin/cephfs-journal-tool --rank=1:0 event get --type=SESSION json
Wrote output to JSON file 'dump'
$ cat dump
[ 
    {
        "client instance": "client.4275 v1:192.168.195.165:0/461391971",
        "open": "true",
        "client map version": 1,
        "inos": "[]",
        "inotable version": 0,
        "client_metadata": {
            "client_features": {
                "feature_bits": "0000000000001bff"
            },
            "metric_spec": {
                "metric_flags": {
                    "feature_bits": "000000000000001f"
                }
            },
            "entity_id": "",
            "hostname": "fedora1",
            "kernel_version": "5.5.0-rc2+",
            "root": "/"
        }
    },
[...]



*** BLURB HERE ***

Xiubo Li (11):
  ceph: add global dentry lease metric support
  ceph: add caps perf metric for each session
  ceph: move ceph_osdc_{read,write}pages to ceph.ko
  ceph: add r_end_stamp for the osdc request
  ceph: add global read latency metric support
  ceph: add global write latency metric support
  ceph: add global metadata perf metric support
  ceph: periodically send perf metrics to MDS
  ceph: add CEPH_DEFINE_RW_FUNC helper support
  ceph: add reset metrics support
  ceph: send client provided metric flags in client metadata

 fs/ceph/acl.c                   |   2 +
 fs/ceph/addr.c                  | 106 +++++++++-
 fs/ceph/caps.c                  |  74 +++++++
 fs/ceph/debugfs.c               | 168 ++++++++++++++-
 fs/ceph/dir.c                   |  27 ++-
 fs/ceph/file.c                  |  26 +++
 fs/ceph/mds_client.c            | 350 ++++++++++++++++++++++++++++++--
 fs/ceph/mds_client.h            |  10 +
 fs/ceph/metric.h                | 150 ++++++++++++++
 fs/ceph/quota.c                 |   9 +-
 fs/ceph/super.h                 |  14 ++
 fs/ceph/xattr.c                 |  17 +-
 include/linux/ceph/ceph_fs.h    |   1 +
 include/linux/ceph/debugfs.h    |  14 ++
 include/linux/ceph/osd_client.h |  18 +-
 net/ceph/osd_client.c           |  81 +-------
 16 files changed, 935 insertions(+), 132 deletions(-)
 create mode 100644 fs/ceph/metric.h

-- 
2.21.0

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH resend v5 01/11] ceph: add global dentry lease metric support
  2020-01-29  8:27 [PATCH resend v5 0/11] ceph: add perf metrics support xiubli
@ 2020-01-29  8:27 ` xiubli
  2020-01-29  8:27 ` [PATCH resend v5 02/11] ceph: add caps perf metric for each session xiubli
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: xiubli @ 2020-01-29  8:27 UTC (permalink / raw)
  To: jlayton, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

For the dentry lease we will only count the hit/miss info triggered
from the vfs calls, for the cases like request reply handling and
perodically ceph_trim_dentries() we will ignore them.

Currently only the debugfs is support:

The output will be:

item          total           miss            hit
-------------------------------------------------
d_lease       11              7               141

URL: https://tracker.ceph.com/issues/43215
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/debugfs.c    | 32 ++++++++++++++++++++++++++++----
 fs/ceph/dir.c        | 18 ++++++++++++++++--
 fs/ceph/mds_client.c | 37 +++++++++++++++++++++++++++++++++++--
 fs/ceph/mds_client.h |  9 +++++++++
 fs/ceph/super.h      |  1 +
 5 files changed, 89 insertions(+), 8 deletions(-)

diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
index fb7cabd98e7b..40a22da0214a 100644
--- a/fs/ceph/debugfs.c
+++ b/fs/ceph/debugfs.c
@@ -124,6 +124,22 @@ static int mdsc_show(struct seq_file *s, void *p)
 	return 0;
 }
 
+static int metric_show(struct seq_file *s, void *p)
+{
+	struct ceph_fs_client *fsc = s->private;
+	struct ceph_mds_client *mdsc = fsc->mdsc;
+
+	seq_printf(s, "item          total           miss            hit\n");
+	seq_printf(s, "-------------------------------------------------\n");
+
+	seq_printf(s, "%-14s%-16lld%-16lld%lld\n", "d_lease",
+		   atomic64_read(&mdsc->metric.total_dentries),
+		   percpu_counter_sum(&mdsc->metric.d_lease_mis),
+		   percpu_counter_sum(&mdsc->metric.d_lease_hit));
+
+	return 0;
+}
+
 static int caps_show_cb(struct inode *inode, struct ceph_cap *cap, void *p)
 {
 	struct seq_file *s = p;
@@ -220,6 +236,7 @@ static int mds_sessions_show(struct seq_file *s, void *ptr)
 
 CEPH_DEFINE_SHOW_FUNC(mdsmap_show)
 CEPH_DEFINE_SHOW_FUNC(mdsc_show)
+CEPH_DEFINE_SHOW_FUNC(metric_show)
 CEPH_DEFINE_SHOW_FUNC(caps_show)
 CEPH_DEFINE_SHOW_FUNC(mds_sessions_show)
 
@@ -255,6 +272,7 @@ void ceph_fs_debugfs_cleanup(struct ceph_fs_client *fsc)
 	debugfs_remove(fsc->debugfs_mdsmap);
 	debugfs_remove(fsc->debugfs_mds_sessions);
 	debugfs_remove(fsc->debugfs_caps);
+	debugfs_remove(fsc->debugfs_metric);
 	debugfs_remove(fsc->debugfs_mdsc);
 }
 
@@ -295,11 +313,17 @@ void ceph_fs_debugfs_init(struct ceph_fs_client *fsc)
 						fsc,
 						&mdsc_show_fops);
 
+	fsc->debugfs_metric = debugfs_create_file("metrics",
+						  0400,
+						  fsc->client->debugfs_dir,
+						  fsc,
+						  &metric_show_fops);
+
 	fsc->debugfs_caps = debugfs_create_file("caps",
-						   0400,
-						   fsc->client->debugfs_dir,
-						   fsc,
-						   &caps_show_fops);
+						0400,
+						fsc->client->debugfs_dir,
+						fsc,
+						&caps_show_fops);
 }
 
 
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 10294f07f5f0..658c55b323cc 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -38,6 +38,8 @@ static int __dir_lease_try_check(const struct dentry *dentry);
 static int ceph_d_init(struct dentry *dentry)
 {
 	struct ceph_dentry_info *di;
+	struct ceph_fs_client *fsc = ceph_sb_to_client(dentry->d_sb);
+	struct ceph_mds_client *mdsc = fsc->mdsc;
 
 	di = kmem_cache_zalloc(ceph_dentry_cachep, GFP_KERNEL);
 	if (!di)
@@ -48,6 +50,9 @@ static int ceph_d_init(struct dentry *dentry)
 	di->time = jiffies;
 	dentry->d_fsdata = di;
 	INIT_LIST_HEAD(&di->lease_list);
+
+	atomic64_inc(&mdsc->metric.total_dentries);
+
 	return 0;
 }
 
@@ -1613,6 +1618,7 @@ static int dir_lease_is_valid(struct inode *dir, struct dentry *dentry)
  */
 static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
 {
+	struct ceph_mds_client *mdsc;
 	int valid = 0;
 	struct dentry *parent;
 	struct inode *dir, *inode;
@@ -1651,9 +1657,8 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
 		}
 	}
 
+	mdsc = ceph_sb_to_client(dir->i_sb)->mdsc;
 	if (!valid) {
-		struct ceph_mds_client *mdsc =
-			ceph_sb_to_client(dir->i_sb)->mdsc;
 		struct ceph_mds_request *req;
 		int op, err;
 		u32 mask;
@@ -1661,6 +1666,8 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
 		if (flags & LOOKUP_RCU)
 			return -ECHILD;
 
+		percpu_counter_inc(&mdsc->metric.d_lease_mis);
+
 		op = ceph_snap(dir) == CEPH_SNAPDIR ?
 			CEPH_MDS_OP_LOOKUPSNAP : CEPH_MDS_OP_LOOKUP;
 		req = ceph_mdsc_create_request(mdsc, op, USE_ANY_MDS);
@@ -1692,6 +1699,8 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
 			dout("d_revalidate %p lookup result=%d\n",
 			     dentry, err);
 		}
+	} else {
+		percpu_counter_inc(&mdsc->metric.d_lease_hit);
 	}
 
 	dout("d_revalidate %p %s\n", dentry, valid ? "valid" : "invalid");
@@ -1700,6 +1709,7 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
 
 	if (!(flags & LOOKUP_RCU))
 		dput(parent);
+
 	return valid;
 }
 
@@ -1734,9 +1744,13 @@ static int ceph_d_delete(const struct dentry *dentry)
 static void ceph_d_release(struct dentry *dentry)
 {
 	struct ceph_dentry_info *di = ceph_dentry(dentry);
+	struct ceph_fs_client *fsc = ceph_sb_to_client(dentry->d_sb);
+	struct ceph_mds_client *mdsc = fsc->mdsc;
 
 	dout("d_release %p\n", dentry);
 
+	atomic64_dec(&mdsc->metric.total_dentries);
+
 	spin_lock(&dentry->d_lock);
 	__dentry_lease_unlist(di);
 	dentry->d_fsdata = NULL;
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 8263f75badfc..a24fd00676b8 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -4158,10 +4158,31 @@ static void delayed_work(struct work_struct *work)
 	schedule_delayed(mdsc);
 }
 
+static int ceph_mdsc_metric_init(struct ceph_client_metric *metric)
+{
+	int ret;
+
+	if (!metric)
+		return -EINVAL;
+
+	atomic64_set(&metric->total_dentries, 0);
+	ret = percpu_counter_init(&metric->d_lease_hit, 0, GFP_KERNEL);
+	if (ret)
+		return ret;
+	ret = percpu_counter_init(&metric->d_lease_mis, 0, GFP_KERNEL);
+	if (ret) {
+		percpu_counter_destroy(&metric->d_lease_hit);
+		return ret;
+	}
+
+	return 0;
+}
+
 int ceph_mdsc_init(struct ceph_fs_client *fsc)
 
 {
 	struct ceph_mds_client *mdsc;
+	int err;
 
 	mdsc = kzalloc(sizeof(struct ceph_mds_client), GFP_NOFS);
 	if (!mdsc)
@@ -4170,8 +4191,8 @@ int ceph_mdsc_init(struct ceph_fs_client *fsc)
 	mutex_init(&mdsc->mutex);
 	mdsc->mdsmap = kzalloc(sizeof(*mdsc->mdsmap), GFP_NOFS);
 	if (!mdsc->mdsmap) {
-		kfree(mdsc);
-		return -ENOMEM;
+		err = -ENOMEM;
+		goto err_mdsc;
 	}
 
 	fsc->mdsc = mdsc;
@@ -4210,6 +4231,9 @@ int ceph_mdsc_init(struct ceph_fs_client *fsc)
 	init_waitqueue_head(&mdsc->cap_flushing_wq);
 	INIT_WORK(&mdsc->cap_reclaim_work, ceph_cap_reclaim_work);
 	atomic_set(&mdsc->cap_reclaim_pending, 0);
+	err = ceph_mdsc_metric_init(&mdsc->metric);
+	if (err)
+		goto err_mdsmap;
 
 	spin_lock_init(&mdsc->dentry_list_lock);
 	INIT_LIST_HEAD(&mdsc->dentry_leases);
@@ -4228,6 +4252,12 @@ int ceph_mdsc_init(struct ceph_fs_client *fsc)
 	strscpy(mdsc->nodename, utsname()->nodename,
 		sizeof(mdsc->nodename));
 	return 0;
+
+err_mdsmap:
+	kfree(mdsc->mdsmap);
+err_mdsc:
+	kfree(mdsc);
+	return err;
 }
 
 /*
@@ -4485,6 +4515,9 @@ void ceph_mdsc_destroy(struct ceph_fs_client *fsc)
 
 	ceph_mdsc_stop(mdsc);
 
+	percpu_counter_destroy(&mdsc->metric.d_lease_mis);
+	percpu_counter_destroy(&mdsc->metric.d_lease_hit);
+
 	fsc->mdsc = NULL;
 	kfree(mdsc);
 	dout("mdsc_destroy %p done\n", mdsc);
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 27a7446e10d3..dd1f417b90eb 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -358,6 +358,13 @@ struct cap_wait {
 	int			want;
 };
 
+/* This is the global metrics */
+struct ceph_client_metric {
+	atomic64_t		total_dentries;
+	struct percpu_counter	d_lease_hit;
+	struct percpu_counter	d_lease_mis;
+};
+
 /*
  * mds client state
  */
@@ -446,6 +453,8 @@ struct ceph_mds_client {
 	struct list_head  dentry_leases;     /* fifo list */
 	struct list_head  dentry_dir_leases; /* lru list */
 
+	struct ceph_client_metric metric;
+
 	spinlock_t		snapid_map_lock;
 	struct rb_root		snapid_map_tree;
 	struct list_head	snapid_map_lru;
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 3ef17dd6491e..7af91628636c 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -128,6 +128,7 @@ struct ceph_fs_client {
 	struct dentry *debugfs_congestion_kb;
 	struct dentry *debugfs_bdi;
 	struct dentry *debugfs_mdsc, *debugfs_mdsmap;
+	struct dentry *debugfs_metric;
 	struct dentry *debugfs_mds_sessions;
 #endif
 
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH resend v5 02/11] ceph: add caps perf metric for each session
  2020-01-29  8:27 [PATCH resend v5 0/11] ceph: add perf metrics support xiubli
  2020-01-29  8:27 ` [PATCH resend v5 01/11] ceph: add global dentry lease metric support xiubli
@ 2020-01-29  8:27 ` xiubli
  2020-01-29 14:21   ` Jeff Layton
  2020-01-29  8:27 ` [PATCH resend v5 03/11] ceph: move ceph_osdc_{read,write}pages to ceph.ko xiubli
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: xiubli @ 2020-01-29  8:27 UTC (permalink / raw)
  To: jlayton, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

This will fulfill the caps hit/miss metric for each session. When
checking the "need" mask and if one cap has the subset of the "need"
mask it means hit, or missed.

item          total           miss            hit
-------------------------------------------------
d_lease       295             0               993

session       caps            miss            hit
-------------------------------------------------
0             295             107             4119
1             1               107             9

URL: https://tracker.ceph.com/issues/43215
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/acl.c        |  2 ++
 fs/ceph/addr.c       |  2 ++
 fs/ceph/caps.c       | 74 ++++++++++++++++++++++++++++++++++++++++++++
 fs/ceph/debugfs.c    | 20 ++++++++++++
 fs/ceph/dir.c        |  9 ++++--
 fs/ceph/file.c       |  3 ++
 fs/ceph/mds_client.c | 16 +++++++++-
 fs/ceph/mds_client.h |  3 ++
 fs/ceph/quota.c      |  9 ++++--
 fs/ceph/super.h      | 11 +++++++
 fs/ceph/xattr.c      | 17 ++++++++--
 11 files changed, 158 insertions(+), 8 deletions(-)

diff --git a/fs/ceph/acl.c b/fs/ceph/acl.c
index 26be6520d3fb..58e119e3519f 100644
--- a/fs/ceph/acl.c
+++ b/fs/ceph/acl.c
@@ -22,6 +22,8 @@ static inline void ceph_set_cached_acl(struct inode *inode,
 	struct ceph_inode_info *ci = ceph_inode(inode);
 
 	spin_lock(&ci->i_ceph_lock);
+	__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
+
 	if (__ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 0))
 		set_cached_acl(inode, type, acl);
 	else
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 7ab616601141..29d4513eff8c 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -1706,6 +1706,8 @@ int ceph_uninline_data(struct file *filp, struct page *locked_page)
 			err = -ENOMEM;
 			goto out;
 		}
+
+		ceph_caps_metric(ci, CEPH_STAT_CAP_INLINE_DATA);
 		err = __ceph_do_getattr(inode, page,
 					CEPH_STAT_CAP_INLINE_DATA, true);
 		if (err < 0) {
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 7fc87b693ba4..af2e9e826f8c 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -783,6 +783,75 @@ static int __cap_is_valid(struct ceph_cap *cap)
 	return 1;
 }
 
+/*
+ * Counts the cap metric.
+ */
+void __ceph_caps_metric(struct ceph_inode_info *ci, int mask)
+{
+	int have = ci->i_snap_caps;
+	struct ceph_mds_session *s;
+	struct ceph_cap *cap;
+	struct rb_node *p;
+	bool skip_auth = false;
+
+	lockdep_assert_held(&ci->i_ceph_lock);
+
+	if (mask <= 0)
+		return;
+
+	/* Counts the snap caps metric in the auth cap */
+	if (ci->i_auth_cap) {
+		cap = ci->i_auth_cap;
+		if (have) {
+			have |= cap->issued;
+
+			dout("%s %p cap %p issued %s, mask %s\n", __func__,
+			     &ci->vfs_inode, cap, ceph_cap_string(cap->issued),
+			     ceph_cap_string(mask));
+
+			s = ceph_get_mds_session(cap->session);
+			if (s) {
+				if (mask & have)
+					percpu_counter_inc(&s->i_caps_hit);
+				else
+					percpu_counter_inc(&s->i_caps_mis);
+				ceph_put_mds_session(s);
+			}
+			skip_auth = true;
+		}
+	}
+
+	if ((mask & have) == mask)
+		return;
+
+	/* Checks others */
+	for (p = rb_first(&ci->i_caps); p; p = rb_next(p)) {
+		cap = rb_entry(p, struct ceph_cap, ci_node);
+		if (!__cap_is_valid(cap))
+			continue;
+
+		if (skip_auth && cap == ci->i_auth_cap)
+			continue;
+
+		dout("%s %p cap %p issued %s, mask %s\n", __func__,
+		     &ci->vfs_inode, cap, ceph_cap_string(cap->issued),
+		     ceph_cap_string(mask));
+
+		s = ceph_get_mds_session(cap->session);
+		if (s) {
+			if (mask & cap->issued)
+				percpu_counter_inc(&s->i_caps_hit);
+			else
+				percpu_counter_inc(&s->i_caps_mis);
+			ceph_put_mds_session(s);
+		}
+
+		have |= cap->issued;
+		if ((mask & have) == mask)
+			return;
+	}
+}
+
 /*
  * Return set of valid cap bits issued to us.  Note that caps time
  * out, and may be invalidated in bulk if the client session times out
@@ -2746,6 +2815,7 @@ static void check_max_size(struct inode *inode, loff_t endoff)
 int ceph_try_get_caps(struct inode *inode, int need, int want,
 		      bool nonblock, int *got)
 {
+	struct ceph_inode_info *ci = ceph_inode(inode);
 	int ret;
 
 	BUG_ON(need & ~CEPH_CAP_FILE_RD);
@@ -2758,6 +2828,7 @@ int ceph_try_get_caps(struct inode *inode, int need, int want,
 	BUG_ON(want & ~(CEPH_CAP_FILE_CACHE | CEPH_CAP_FILE_LAZYIO |
 			CEPH_CAP_FILE_SHARED | CEPH_CAP_FILE_EXCL |
 			CEPH_CAP_ANY_DIR_OPS));
+	ceph_caps_metric(ci, need | want);
 	ret = try_get_cap_refs(inode, need, want, 0, nonblock, got);
 	return ret == -EAGAIN ? 0 : ret;
 }
@@ -2784,6 +2855,8 @@ int ceph_get_caps(struct file *filp, int need, int want,
 	    fi->filp_gen != READ_ONCE(fsc->filp_gen))
 		return -EBADF;
 
+	ceph_caps_metric(ci, need | want);
+
 	while (true) {
 		if (endoff > 0)
 			check_max_size(inode, endoff);
@@ -2871,6 +2944,7 @@ int ceph_get_caps(struct file *filp, int need, int want,
 			 * getattr request will bring inline data into
 			 * page cache
 			 */
+			ceph_caps_metric(ci, CEPH_STAT_CAP_INLINE_DATA);
 			ret = __ceph_do_getattr(inode, NULL,
 						CEPH_STAT_CAP_INLINE_DATA,
 						true);
diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
index 40a22da0214a..c132fdb40d53 100644
--- a/fs/ceph/debugfs.c
+++ b/fs/ceph/debugfs.c
@@ -128,6 +128,7 @@ static int metric_show(struct seq_file *s, void *p)
 {
 	struct ceph_fs_client *fsc = s->private;
 	struct ceph_mds_client *mdsc = fsc->mdsc;
+	int i;
 
 	seq_printf(s, "item          total           miss            hit\n");
 	seq_printf(s, "-------------------------------------------------\n");
@@ -137,6 +138,25 @@ static int metric_show(struct seq_file *s, void *p)
 		   percpu_counter_sum(&mdsc->metric.d_lease_mis),
 		   percpu_counter_sum(&mdsc->metric.d_lease_hit));
 
+	seq_printf(s, "\n");
+	seq_printf(s, "session       caps            miss            hit\n");
+	seq_printf(s, "-------------------------------------------------\n");
+
+	mutex_lock(&mdsc->mutex);
+	for (i = 0; i < mdsc->max_sessions; i++) {
+		struct ceph_mds_session *session;
+
+		session = __ceph_lookup_mds_session(mdsc, i);
+		if (!session)
+			continue;
+		seq_printf(s, "%-14d%-16d%-16lld%lld\n", i,
+			   session->s_nr_caps,
+			   percpu_counter_sum(&session->i_caps_mis),
+			   percpu_counter_sum(&session->i_caps_hit));
+		ceph_put_mds_session(session);
+	}
+	mutex_unlock(&mdsc->mutex);
+
 	return 0;
 }
 
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 658c55b323cc..33eb239e09e2 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -313,7 +313,7 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
 	struct ceph_mds_client *mdsc = fsc->mdsc;
 	int i;
-	int err;
+	int err, ret = -1;
 	unsigned frag = -1;
 	struct ceph_mds_reply_info_parsed *rinfo;
 
@@ -346,13 +346,16 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
 	    !ceph_test_mount_opt(fsc, NOASYNCREADDIR) &&
 	    ceph_snap(inode) != CEPH_SNAPDIR &&
 	    __ceph_dir_is_complete_ordered(ci) &&
-	    __ceph_caps_issued_mask(ci, CEPH_CAP_FILE_SHARED, 1)) {
+	    (ret = __ceph_caps_issued_mask(ci, CEPH_CAP_FILE_SHARED, 1))) {
 		int shared_gen = atomic_read(&ci->i_shared_gen);
+		__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
 		spin_unlock(&ci->i_ceph_lock);
 		err = __dcache_readdir(file, ctx, shared_gen);
 		if (err != -EAGAIN)
 			return err;
 	} else {
+		if (ret != -1)
+			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
 		spin_unlock(&ci->i_ceph_lock);
 	}
 
@@ -757,6 +760,8 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
 		struct ceph_dentry_info *di = ceph_dentry(dentry);
 
 		spin_lock(&ci->i_ceph_lock);
+		__ceph_caps_metric(ci, CEPH_CAP_FILE_SHARED);
+
 		dout(" dir %p flags are %d\n", dir, ci->i_ceph_flags);
 		if (strncmp(dentry->d_name.name,
 			    fsc->mount_options->snapdir_name,
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 1e6cdf2dfe90..c78dfbbb7b91 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -384,6 +384,8 @@ int ceph_open(struct inode *inode, struct file *file)
 	 * asynchronously.
 	 */
 	spin_lock(&ci->i_ceph_lock);
+	__ceph_caps_metric(ci, wanted);
+
 	if (__ceph_is_any_real_caps(ci) &&
 	    (((fmode & CEPH_FILE_MODE_WR) == 0) || ci->i_auth_cap)) {
 		int mds_wanted = __ceph_caps_mds_wanted(ci, true);
@@ -1340,6 +1342,7 @@ static ssize_t ceph_read_iter(struct kiocb *iocb, struct iov_iter *to)
 				return -ENOMEM;
 		}
 
+		ceph_caps_metric(ci, CEPH_STAT_CAP_INLINE_DATA);
 		statret = __ceph_do_getattr(inode, page,
 					    CEPH_STAT_CAP_INLINE_DATA, !!page);
 		if (statret < 0) {
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index a24fd00676b8..141c1c03636c 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -558,6 +558,8 @@ void ceph_put_mds_session(struct ceph_mds_session *s)
 	if (refcount_dec_and_test(&s->s_ref)) {
 		if (s->s_auth.authorizer)
 			ceph_auth_destroy_authorizer(s->s_auth.authorizer);
+		percpu_counter_destroy(&s->i_caps_hit);
+		percpu_counter_destroy(&s->i_caps_mis);
 		kfree(s);
 	}
 }
@@ -598,6 +600,7 @@ static struct ceph_mds_session *register_session(struct ceph_mds_client *mdsc,
 						 int mds)
 {
 	struct ceph_mds_session *s;
+	int err;
 
 	if (mds >= mdsc->mdsmap->possible_max_rank)
 		return ERR_PTR(-EINVAL);
@@ -612,8 +615,10 @@ static struct ceph_mds_session *register_session(struct ceph_mds_client *mdsc,
 
 		dout("%s: realloc to %d\n", __func__, newmax);
 		sa = kcalloc(newmax, sizeof(void *), GFP_NOFS);
-		if (!sa)
+		if (!sa) {
+			err = -ENOMEM;
 			goto fail_realloc;
+		}
 		if (mdsc->sessions) {
 			memcpy(sa, mdsc->sessions,
 			       mdsc->max_sessions * sizeof(void *));
@@ -653,6 +658,13 @@ static struct ceph_mds_session *register_session(struct ceph_mds_client *mdsc,
 
 	INIT_LIST_HEAD(&s->s_cap_flushing);
 
+	err = percpu_counter_init(&s->i_caps_hit, 0, GFP_NOFS);
+	if (err)
+		goto fail_realloc;
+	err = percpu_counter_init(&s->i_caps_mis, 0, GFP_NOFS);
+	if (err)
+		goto fail_init;
+
 	mdsc->sessions[mds] = s;
 	atomic_inc(&mdsc->num_sessions);
 	refcount_inc(&s->s_ref);  /* one ref to sessions[], one to caller */
@@ -662,6 +674,8 @@ static struct ceph_mds_session *register_session(struct ceph_mds_client *mdsc,
 
 	return s;
 
+fail_init:
+	percpu_counter_destroy(&s->i_caps_hit);
 fail_realloc:
 	kfree(s);
 	return ERR_PTR(-ENOMEM);
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index dd1f417b90eb..ba74ff74c59c 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -201,6 +201,9 @@ struct ceph_mds_session {
 
 	struct list_head  s_waiting;  /* waiting requests */
 	struct list_head  s_unsafe;   /* unsafe requests */
+
+	struct percpu_counter i_caps_hit;
+	struct percpu_counter i_caps_mis;
 };
 
 /*
diff --git a/fs/ceph/quota.c b/fs/ceph/quota.c
index de56dee60540..4ce2f658e63d 100644
--- a/fs/ceph/quota.c
+++ b/fs/ceph/quota.c
@@ -147,9 +147,14 @@ static struct inode *lookup_quotarealm_inode(struct ceph_mds_client *mdsc,
 		return NULL;
 	}
 	if (qri->inode) {
+		struct ceph_inode_info *ci = ceph_inode(qri->inode);
+		int ret;
+
+		ceph_caps_metric(ci, CEPH_STAT_CAP_INODE);
+
 		/* get caps */
-		int ret = __ceph_do_getattr(qri->inode, NULL,
-					    CEPH_STAT_CAP_INODE, true);
+		ret = __ceph_do_getattr(qri->inode, NULL,
+					CEPH_STAT_CAP_INODE, true);
 		if (ret >= 0)
 			in = qri->inode;
 		else
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 7af91628636c..3f4829222528 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -641,6 +641,14 @@ static inline bool __ceph_is_any_real_caps(struct ceph_inode_info *ci)
 	return !RB_EMPTY_ROOT(&ci->i_caps);
 }
 
+extern void __ceph_caps_metric(struct ceph_inode_info *ci, int mask);
+static inline void ceph_caps_metric(struct ceph_inode_info *ci, int mask)
+{
+	spin_lock(&ci->i_ceph_lock);
+	__ceph_caps_metric(ci, mask);
+	spin_unlock(&ci->i_ceph_lock);
+}
+
 extern int __ceph_caps_issued(struct ceph_inode_info *ci, int *implemented);
 extern int __ceph_caps_issued_mask(struct ceph_inode_info *ci, int mask, int t);
 extern int __ceph_caps_issued_other(struct ceph_inode_info *ci,
@@ -927,6 +935,9 @@ extern int __ceph_do_getattr(struct inode *inode, struct page *locked_page,
 			     int mask, bool force);
 static inline int ceph_do_getattr(struct inode *inode, int mask, bool force)
 {
+	struct ceph_inode_info *ci = ceph_inode(inode);
+
+	ceph_caps_metric(ci, mask);
 	return __ceph_do_getattr(inode, NULL, mask, force);
 }
 extern int ceph_permission(struct inode *inode, int mask);
diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
index d58fa14c1f01..ebd522edb0a8 100644
--- a/fs/ceph/xattr.c
+++ b/fs/ceph/xattr.c
@@ -829,6 +829,7 @@ ssize_t __ceph_getxattr(struct inode *inode, const char *name, void *value,
 	struct ceph_vxattr *vxattr = NULL;
 	int req_mask;
 	ssize_t err;
+	int ret = -1;
 
 	/* let's see if a virtual xattr was requested */
 	vxattr = ceph_match_vxattr(inode, name);
@@ -856,7 +857,9 @@ ssize_t __ceph_getxattr(struct inode *inode, const char *name, void *value,
 
 	if (ci->i_xattrs.version == 0 ||
 	    !((req_mask & CEPH_CAP_XATTR_SHARED) ||
-	      __ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 1))) {
+	      (ret = __ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 1)))) {
+		if (ret != -1)
+			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
 		spin_unlock(&ci->i_ceph_lock);
 
 		/* security module gets xattr while filling trace */
@@ -871,6 +874,9 @@ ssize_t __ceph_getxattr(struct inode *inode, const char *name, void *value,
 		if (err)
 			return err;
 		spin_lock(&ci->i_ceph_lock);
+	} else {
+		if (ret != -1)
+			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
 	}
 
 	err = __build_xattrs(inode);
@@ -907,19 +913,24 @@ ssize_t ceph_listxattr(struct dentry *dentry, char *names, size_t size)
 	struct ceph_inode_info *ci = ceph_inode(inode);
 	bool len_only = (size == 0);
 	u32 namelen;
-	int err;
+	int err, ret = -1;
 
 	spin_lock(&ci->i_ceph_lock);
 	dout("listxattr %p ver=%lld index_ver=%lld\n", inode,
 	     ci->i_xattrs.version, ci->i_xattrs.index_version);
 
 	if (ci->i_xattrs.version == 0 ||
-	    !__ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 1)) {
+	    !(ret = __ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 1))) {
+		if (ret != -1)
+			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
 		spin_unlock(&ci->i_ceph_lock);
 		err = ceph_do_getattr(inode, CEPH_STAT_CAP_XATTR, true);
 		if (err)
 			return err;
 		spin_lock(&ci->i_ceph_lock);
+	} else {
+		if (ret != -1)
+			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
 	}
 
 	err = __build_xattrs(inode);
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH resend v5 03/11] ceph: move ceph_osdc_{read,write}pages to ceph.ko
  2020-01-29  8:27 [PATCH resend v5 0/11] ceph: add perf metrics support xiubli
  2020-01-29  8:27 ` [PATCH resend v5 01/11] ceph: add global dentry lease metric support xiubli
  2020-01-29  8:27 ` [PATCH resend v5 02/11] ceph: add caps perf metric for each session xiubli
@ 2020-01-29  8:27 ` xiubli
  2020-02-04 18:38   ` Jeff Layton
  2020-01-29  8:27 ` [PATCH resend v5 04/11] ceph: add r_end_stamp for the osdc request xiubli
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: xiubli @ 2020-01-29  8:27 UTC (permalink / raw)
  To: jlayton, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

Since this helpers are only used by cpeh.ko, let's move it to ceph.ko
and rename to _sync_.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/addr.c                  | 86 ++++++++++++++++++++++++++++++++-
 include/linux/ceph/osd_client.h | 17 -------
 net/ceph/osd_client.c           | 79 ------------------------------
 3 files changed, 84 insertions(+), 98 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 29d4513eff8c..20e5ebfff389 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -182,6 +182,47 @@ static int ceph_releasepage(struct page *page, gfp_t g)
 	return !PagePrivate(page);
 }
 
+/*
+ * Read some contiguous pages.  If we cross a stripe boundary, shorten
+ * *plen.  Return number of bytes read, or error.
+ */
+static int ceph_sync_readpages(struct ceph_fs_client *fsc,
+			       struct ceph_vino vino,
+			       struct ceph_file_layout *layout,
+			       u64 off, u64 *plen,
+			       u32 truncate_seq, u64 truncate_size,
+			       struct page **pages, int num_pages,
+			       int page_align)
+{
+	struct ceph_osd_client *osdc = &fsc->client->osdc;
+	struct ceph_osd_request *req;
+	int rc = 0;
+
+	dout("readpages on ino %llx.%llx on %llu~%llu\n", vino.ino,
+	     vino.snap, off, *plen);
+	req = ceph_osdc_new_request(osdc, layout, vino, off, plen, 0, 1,
+				    CEPH_OSD_OP_READ, CEPH_OSD_FLAG_READ,
+				    NULL, truncate_seq, truncate_size,
+				    false);
+	if (IS_ERR(req))
+		return PTR_ERR(req);
+
+	/* it may be a short read due to an object boundary */
+	osd_req_op_extent_osd_data_pages(req, 0,
+				pages, *plen, page_align, false, false);
+
+	dout("readpages  final extent is %llu~%llu (%llu bytes align %d)\n",
+	     off, *plen, *plen, page_align);
+
+	rc = ceph_osdc_start_request(osdc, req, false);
+	if (!rc)
+		rc = ceph_osdc_wait_request(osdc, req);
+
+	ceph_osdc_put_request(req);
+	dout("readpages result %d\n", rc);
+	return rc;
+}
+
 /*
  * read a single page, without unlocking it.
  */
@@ -218,7 +259,7 @@ static int ceph_do_readpage(struct file *filp, struct page *page)
 
 	dout("readpage inode %p file %p page %p index %lu\n",
 	     inode, filp, page, page->index);
-	err = ceph_osdc_readpages(&fsc->client->osdc, ceph_vino(inode),
+	err = ceph_sync_readpages(fsc, ceph_vino(inode),
 				  &ci->i_layout, off, &len,
 				  ci->i_truncate_seq, ci->i_truncate_size,
 				  &page, 1, 0);
@@ -570,6 +611,47 @@ static u64 get_writepages_data_length(struct inode *inode,
 	return end > start ? end - start : 0;
 }
 
+/*
+ * do a synchronous write on N pages
+ */
+static int ceph_sync_writepages(struct ceph_fs_client *fsc,
+				struct ceph_vino vino,
+				struct ceph_file_layout *layout,
+				struct ceph_snap_context *snapc,
+				u64 off, u64 len,
+				u32 truncate_seq, u64 truncate_size,
+				struct timespec64 *mtime,
+				struct page **pages, int num_pages)
+{
+	struct ceph_osd_client *osdc = &fsc->client->osdc;
+	struct ceph_osd_request *req;
+	int rc = 0;
+	int page_align = off & ~PAGE_MASK;
+
+	req = ceph_osdc_new_request(osdc, layout, vino, off, &len, 0, 1,
+				    CEPH_OSD_OP_WRITE, CEPH_OSD_FLAG_WRITE,
+				    snapc, truncate_seq, truncate_size,
+				    true);
+	if (IS_ERR(req))
+		return PTR_ERR(req);
+
+	/* it may be a short write due to an object boundary */
+	osd_req_op_extent_osd_data_pages(req, 0, pages, len, page_align,
+				false, false);
+	dout("writepages %llu~%llu (%llu bytes)\n", off, len, len);
+
+	req->r_mtime = *mtime;
+	rc = ceph_osdc_start_request(osdc, req, true);
+	if (!rc)
+		rc = ceph_osdc_wait_request(osdc, req);
+
+	ceph_osdc_put_request(req);
+	if (rc == 0)
+		rc = len;
+	dout("writepages result %d\n", rc);
+	return rc;
+}
+
 /*
  * Write a single page, but leave the page locked.
  *
@@ -628,7 +710,7 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
 		set_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);
 
 	set_page_writeback(page);
-	err = ceph_osdc_writepages(&fsc->client->osdc, ceph_vino(inode),
+	err = ceph_sync_writepages(fsc, ceph_vino(inode),
 				   &ci->i_layout, snapc, page_off, len,
 				   ceph_wbc.truncate_seq,
 				   ceph_wbc.truncate_size,
diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
index 5a62dbd3f4c2..9d9f745b98a1 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -509,23 +509,6 @@ int ceph_osdc_call(struct ceph_osd_client *osdc,
 		   struct page *req_page, size_t req_len,
 		   struct page **resp_pages, size_t *resp_len);
 
-extern int ceph_osdc_readpages(struct ceph_osd_client *osdc,
-			       struct ceph_vino vino,
-			       struct ceph_file_layout *layout,
-			       u64 off, u64 *plen,
-			       u32 truncate_seq, u64 truncate_size,
-			       struct page **pages, int nr_pages,
-			       int page_align);
-
-extern int ceph_osdc_writepages(struct ceph_osd_client *osdc,
-				struct ceph_vino vino,
-				struct ceph_file_layout *layout,
-				struct ceph_snap_context *sc,
-				u64 off, u64 len,
-				u32 truncate_seq, u64 truncate_size,
-				struct timespec64 *mtime,
-				struct page **pages, int nr_pages);
-
 int ceph_osdc_copy_from(struct ceph_osd_client *osdc,
 			u64 src_snapid, u64 src_version,
 			struct ceph_object_id *src_oid,
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index b68b376d8c2f..8ff2856e2d52 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -5230,85 +5230,6 @@ void ceph_osdc_stop(struct ceph_osd_client *osdc)
 	ceph_msgpool_destroy(&osdc->msgpool_op_reply);
 }
 
-/*
- * Read some contiguous pages.  If we cross a stripe boundary, shorten
- * *plen.  Return number of bytes read, or error.
- */
-int ceph_osdc_readpages(struct ceph_osd_client *osdc,
-			struct ceph_vino vino, struct ceph_file_layout *layout,
-			u64 off, u64 *plen,
-			u32 truncate_seq, u64 truncate_size,
-			struct page **pages, int num_pages, int page_align)
-{
-	struct ceph_osd_request *req;
-	int rc = 0;
-
-	dout("readpages on ino %llx.%llx on %llu~%llu\n", vino.ino,
-	     vino.snap, off, *plen);
-	req = ceph_osdc_new_request(osdc, layout, vino, off, plen, 0, 1,
-				    CEPH_OSD_OP_READ, CEPH_OSD_FLAG_READ,
-				    NULL, truncate_seq, truncate_size,
-				    false);
-	if (IS_ERR(req))
-		return PTR_ERR(req);
-
-	/* it may be a short read due to an object boundary */
-	osd_req_op_extent_osd_data_pages(req, 0,
-				pages, *plen, page_align, false, false);
-
-	dout("readpages  final extent is %llu~%llu (%llu bytes align %d)\n",
-	     off, *plen, *plen, page_align);
-
-	rc = ceph_osdc_start_request(osdc, req, false);
-	if (!rc)
-		rc = ceph_osdc_wait_request(osdc, req);
-
-	ceph_osdc_put_request(req);
-	dout("readpages result %d\n", rc);
-	return rc;
-}
-EXPORT_SYMBOL(ceph_osdc_readpages);
-
-/*
- * do a synchronous write on N pages
- */
-int ceph_osdc_writepages(struct ceph_osd_client *osdc, struct ceph_vino vino,
-			 struct ceph_file_layout *layout,
-			 struct ceph_snap_context *snapc,
-			 u64 off, u64 len,
-			 u32 truncate_seq, u64 truncate_size,
-			 struct timespec64 *mtime,
-			 struct page **pages, int num_pages)
-{
-	struct ceph_osd_request *req;
-	int rc = 0;
-	int page_align = off & ~PAGE_MASK;
-
-	req = ceph_osdc_new_request(osdc, layout, vino, off, &len, 0, 1,
-				    CEPH_OSD_OP_WRITE, CEPH_OSD_FLAG_WRITE,
-				    snapc, truncate_seq, truncate_size,
-				    true);
-	if (IS_ERR(req))
-		return PTR_ERR(req);
-
-	/* it may be a short write due to an object boundary */
-	osd_req_op_extent_osd_data_pages(req, 0, pages, len, page_align,
-				false, false);
-	dout("writepages %llu~%llu (%llu bytes)\n", off, len, len);
-
-	req->r_mtime = *mtime;
-	rc = ceph_osdc_start_request(osdc, req, true);
-	if (!rc)
-		rc = ceph_osdc_wait_request(osdc, req);
-
-	ceph_osdc_put_request(req);
-	if (rc == 0)
-		rc = len;
-	dout("writepages result %d\n", rc);
-	return rc;
-}
-EXPORT_SYMBOL(ceph_osdc_writepages);
-
 static int osd_req_op_copy_from_init(struct ceph_osd_request *req,
 				     u64 src_snapid, u64 src_version,
 				     struct ceph_object_id *src_oid,
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH resend v5 04/11] ceph: add r_end_stamp for the osdc request
  2020-01-29  8:27 [PATCH resend v5 0/11] ceph: add perf metrics support xiubli
                   ` (2 preceding siblings ...)
  2020-01-29  8:27 ` [PATCH resend v5 03/11] ceph: move ceph_osdc_{read,write}pages to ceph.ko xiubli
@ 2020-01-29  8:27 ` xiubli
  2020-02-05 19:14   ` Jeff Layton
  2020-01-29  8:27 ` [PATCH resend v5 05/11] ceph: add global read latency metric support xiubli
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: xiubli @ 2020-01-29  8:27 UTC (permalink / raw)
  To: jlayton, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

Grab the osdc requests' end time stamp.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 include/linux/ceph/osd_client.h | 1 +
 net/ceph/osd_client.c           | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
index 9d9f745b98a1..00a449cfc478 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -213,6 +213,7 @@ struct ceph_osd_request {
 	/* internal */
 	unsigned long r_stamp;                /* jiffies, send or check time */
 	unsigned long r_start_stamp;          /* jiffies */
+	unsigned long r_end_stamp;          /* jiffies */
 	int r_attempts;
 	u32 r_map_dne_bound;
 
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 8ff2856e2d52..108c9457d629 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -2389,6 +2389,8 @@ static void finish_request(struct ceph_osd_request *req)
 	WARN_ON(lookup_request_mc(&osdc->map_checks, req->r_tid));
 	dout("%s req %p tid %llu\n", __func__, req, req->r_tid);
 
+	req->r_end_stamp = jiffies;
+
 	if (req->r_osd)
 		unlink_request(req->r_osd, req);
 	atomic_dec(&osdc->num_requests);
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH resend v5 05/11] ceph: add global read latency metric support
  2020-01-29  8:27 [PATCH resend v5 0/11] ceph: add perf metrics support xiubli
                   ` (3 preceding siblings ...)
  2020-01-29  8:27 ` [PATCH resend v5 04/11] ceph: add r_end_stamp for the osdc request xiubli
@ 2020-01-29  8:27 ` xiubli
  2020-02-05 20:15   ` Jeff Layton
  2020-01-29  8:27 ` [PATCH resend v5 06/11] ceph: add global write " xiubli
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: xiubli @ 2020-01-29  8:27 UTC (permalink / raw)
  To: jlayton, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

item          total       sum_lat(us)     avg_lat(us)
-----------------------------------------------------
read          73          3590000         49178082

URL: https://tracker.ceph.com/issues/43215
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/addr.c       |  8 ++++++++
 fs/ceph/debugfs.c    | 11 +++++++++++
 fs/ceph/file.c       | 15 +++++++++++++++
 fs/ceph/mds_client.c | 29 +++++++++++++++++++++++------
 fs/ceph/mds_client.h |  9 ++-------
 fs/ceph/metric.h     | 30 ++++++++++++++++++++++++++++++
 6 files changed, 89 insertions(+), 13 deletions(-)
 create mode 100644 fs/ceph/metric.h

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 20e5ebfff389..0435a694370b 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -195,6 +195,7 @@ static int ceph_sync_readpages(struct ceph_fs_client *fsc,
 			       int page_align)
 {
 	struct ceph_osd_client *osdc = &fsc->client->osdc;
+	struct ceph_client_metric *metric = &fsc->mdsc->metric;
 	struct ceph_osd_request *req;
 	int rc = 0;
 
@@ -218,6 +219,8 @@ static int ceph_sync_readpages(struct ceph_fs_client *fsc,
 	if (!rc)
 		rc = ceph_osdc_wait_request(osdc, req);
 
+	ceph_update_read_latency(metric, req, rc);
+
 	ceph_osdc_put_request(req);
 	dout("readpages result %d\n", rc);
 	return rc;
@@ -301,6 +304,8 @@ static int ceph_readpage(struct file *filp, struct page *page)
 static void finish_read(struct ceph_osd_request *req)
 {
 	struct inode *inode = req->r_inode;
+	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
+	struct ceph_client_metric *metric = &fsc->mdsc->metric;
 	struct ceph_osd_data *osd_data;
 	int rc = req->r_result <= 0 ? req->r_result : 0;
 	int bytes = req->r_result >= 0 ? req->r_result : 0;
@@ -338,6 +343,9 @@ static void finish_read(struct ceph_osd_request *req)
 		put_page(page);
 		bytes -= PAGE_SIZE;
 	}
+
+	ceph_update_read_latency(metric, req, rc);
+
 	kfree(osd_data->pages);
 }
 
diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
index c132fdb40d53..f8a32fa335ae 100644
--- a/fs/ceph/debugfs.c
+++ b/fs/ceph/debugfs.c
@@ -128,8 +128,19 @@ static int metric_show(struct seq_file *s, void *p)
 {
 	struct ceph_fs_client *fsc = s->private;
 	struct ceph_mds_client *mdsc = fsc->mdsc;
+	s64 total, sum, avg = 0;
 	int i;
 
+	seq_printf(s, "item          total       sum_lat(us)     avg_lat(us)\n");
+	seq_printf(s, "-----------------------------------------------------\n");
+
+	total = percpu_counter_sum(&mdsc->metric.total_reads);
+	sum = percpu_counter_sum(&mdsc->metric.read_latency_sum);
+	avg = total ? sum / total : 0;
+	seq_printf(s, "%-14s%-12lld%-16lld%lld\n", "read",
+		   total, sum / NSEC_PER_USEC, avg / NSEC_PER_USEC);
+
+	seq_printf(s, "\n");
 	seq_printf(s, "item          total           miss            hit\n");
 	seq_printf(s, "-------------------------------------------------\n");
 
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index c78dfbbb7b91..69288c39229b 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -588,6 +588,7 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
 	struct inode *inode = file_inode(file);
 	struct ceph_inode_info *ci = ceph_inode(inode);
 	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
+	struct ceph_client_metric *metric = &fsc->mdsc->metric;
 	struct ceph_osd_client *osdc = &fsc->client->osdc;
 	ssize_t ret;
 	u64 off = iocb->ki_pos;
@@ -660,6 +661,9 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
 		ret = ceph_osdc_start_request(osdc, req, false);
 		if (!ret)
 			ret = ceph_osdc_wait_request(osdc, req);
+
+		ceph_update_read_latency(metric, req, ret);
+
 		ceph_osdc_put_request(req);
 
 		i_size = i_size_read(inode);
@@ -798,13 +802,20 @@ static void ceph_aio_complete_req(struct ceph_osd_request *req)
 	struct inode *inode = req->r_inode;
 	struct ceph_aio_request *aio_req = req->r_priv;
 	struct ceph_osd_data *osd_data = osd_req_op_extent_osd_data(req, 0);
+	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
+	struct ceph_client_metric *metric = &fsc->mdsc->metric;
 
 	BUG_ON(osd_data->type != CEPH_OSD_DATA_TYPE_BVECS);
 	BUG_ON(!osd_data->num_bvecs);
+	BUG_ON(!aio_req);
 
 	dout("ceph_aio_complete_req %p rc %d bytes %u\n",
 	     inode, rc, osd_data->bvec_pos.iter.bi_size);
 
+	/* r_start_stamp == 0 means the request was not submitted */
+	if (req->r_start_stamp && !aio_req->write)
+		ceph_update_read_latency(metric, req, rc);
+
 	if (rc == -EOLDSNAPC) {
 		struct ceph_aio_work *aio_work;
 		BUG_ON(!aio_req->write);
@@ -933,6 +944,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter,
 	struct inode *inode = file_inode(file);
 	struct ceph_inode_info *ci = ceph_inode(inode);
 	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
+	struct ceph_client_metric *metric = &fsc->mdsc->metric;
 	struct ceph_vino vino;
 	struct ceph_osd_request *req;
 	struct bio_vec *bvecs;
@@ -1049,6 +1061,9 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter,
 		if (!ret)
 			ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
 
+		if (!write)
+			ceph_update_read_latency(metric, req, ret);
+
 		size = i_size_read(inode);
 		if (!write) {
 			if (ret == -ENOENT)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 141c1c03636c..101b51f9f05d 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -4182,14 +4182,29 @@ static int ceph_mdsc_metric_init(struct ceph_client_metric *metric)
 	atomic64_set(&metric->total_dentries, 0);
 	ret = percpu_counter_init(&metric->d_lease_hit, 0, GFP_KERNEL);
 	if (ret)
-		return ret;
+		return ret;;
 	ret = percpu_counter_init(&metric->d_lease_mis, 0, GFP_KERNEL);
-	if (ret) {
-		percpu_counter_destroy(&metric->d_lease_hit);
-		return ret;
-	}
+	if (ret)
+		goto err_dlease_mis;
 
-	return 0;
+	ret = percpu_counter_init(&metric->total_reads, 0, GFP_KERNEL);
+	if (ret)
+		goto err_total_reads;
+
+	ret = percpu_counter_init(&metric->read_latency_sum, 0, GFP_KERNEL);
+	if (ret)
+		goto err_read_latency_sum;
+
+	return ret;
+
+err_read_latency_sum:
+	percpu_counter_destroy(&metric->total_reads);
+err_total_reads:
+	percpu_counter_destroy(&metric->d_lease_mis);
+err_dlease_mis:
+	percpu_counter_destroy(&metric->d_lease_hit);
+
+	return ret;
 }
 
 int ceph_mdsc_init(struct ceph_fs_client *fsc)
@@ -4529,6 +4544,8 @@ void ceph_mdsc_destroy(struct ceph_fs_client *fsc)
 
 	ceph_mdsc_stop(mdsc);
 
+	percpu_counter_destroy(&mdsc->metric.read_latency_sum);
+	percpu_counter_destroy(&mdsc->metric.total_reads);
 	percpu_counter_destroy(&mdsc->metric.d_lease_mis);
 	percpu_counter_destroy(&mdsc->metric.d_lease_hit);
 
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index ba74ff74c59c..574d4e5a5de2 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -16,6 +16,8 @@
 #include <linux/ceph/mdsmap.h>
 #include <linux/ceph/auth.h>
 
+#include "metric.h"
+
 /* The first 8 bits are reserved for old ceph releases */
 enum ceph_feature_type {
 	CEPHFS_FEATURE_MIMIC = 8,
@@ -361,13 +363,6 @@ struct cap_wait {
 	int			want;
 };
 
-/* This is the global metrics */
-struct ceph_client_metric {
-	atomic64_t		total_dentries;
-	struct percpu_counter	d_lease_hit;
-	struct percpu_counter	d_lease_mis;
-};
-
 /*
  * mds client state
  */
diff --git a/fs/ceph/metric.h b/fs/ceph/metric.h
new file mode 100644
index 000000000000..2a7b8f3fe6a4
--- /dev/null
+++ b/fs/ceph/metric.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _FS_CEPH_MDS_METRIC_H
+#define _FS_CEPH_MDS_METRIC_H
+
+#include <linux/ceph/osd_client.h>
+
+/* This is the global metrics */
+struct ceph_client_metric {
+	atomic64_t		total_dentries;
+	struct percpu_counter	d_lease_hit;
+	struct percpu_counter	d_lease_mis;
+
+	struct percpu_counter	total_reads;
+	struct percpu_counter	read_latency_sum;
+};
+
+static inline void ceph_update_read_latency(struct ceph_client_metric *m,
+					    struct ceph_osd_request *req,
+					    int rc)
+{
+	if (!m || !req)
+		return;
+
+	if (rc >= 0 || rc == -ENOENT || rc == -ETIMEDOUT) {
+		s64 latency = req->r_end_stamp - req->r_start_stamp;
+		percpu_counter_inc(&m->total_reads);
+		percpu_counter_add(&m->read_latency_sum, latency);
+	}
+}
+#endif
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH resend v5 06/11] ceph: add global write latency metric support
  2020-01-29  8:27 [PATCH resend v5 0/11] ceph: add perf metrics support xiubli
                   ` (4 preceding siblings ...)
  2020-01-29  8:27 ` [PATCH resend v5 05/11] ceph: add global read latency metric support xiubli
@ 2020-01-29  8:27 ` xiubli
  2020-01-29  8:27 ` [PATCH resend v5 07/11] ceph: add global metadata perf " xiubli
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: xiubli @ 2020-01-29  8:27 UTC (permalink / raw)
  To: jlayton, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

item          total       sum_lat(us)     avg_lat(us)
-----------------------------------------------------
write         222         5287750000      23818693

URL: https://tracker.ceph.com/issues/43215
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/addr.c       | 10 ++++++++++
 fs/ceph/debugfs.c    |  6 ++++++
 fs/ceph/file.c       | 14 +++++++++++---
 fs/ceph/mds_client.c | 15 ++++++++++++++-
 fs/ceph/metric.h     | 17 +++++++++++++++++
 5 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 0435a694370b..74868231f007 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -632,6 +632,7 @@ static int ceph_sync_writepages(struct ceph_fs_client *fsc,
 				struct page **pages, int num_pages)
 {
 	struct ceph_osd_client *osdc = &fsc->client->osdc;
+	struct ceph_client_metric *metric = &fsc->mdsc->metric;
 	struct ceph_osd_request *req;
 	int rc = 0;
 	int page_align = off & ~PAGE_MASK;
@@ -653,6 +654,8 @@ static int ceph_sync_writepages(struct ceph_fs_client *fsc,
 	if (!rc)
 		rc = ceph_osdc_wait_request(osdc, req);
 
+	ceph_update_write_latency(metric, req, rc);
+
 	ceph_osdc_put_request(req);
 	if (rc == 0)
 		rc = len;
@@ -792,6 +795,7 @@ static void writepages_finish(struct ceph_osd_request *req)
 	struct ceph_snap_context *snapc = req->r_snapc;
 	struct address_space *mapping = inode->i_mapping;
 	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
+	struct ceph_client_metric *metric = &fsc->mdsc->metric;
 	bool remove_page;
 
 	dout("writepages_finish %p rc %d\n", inode, rc);
@@ -804,6 +808,8 @@ static void writepages_finish(struct ceph_osd_request *req)
 		ceph_clear_error_write(ci);
 	}
 
+	ceph_update_write_latency(metric, req, rc);
+
 	/*
 	 * We lost the cache cap, need to truncate the page before
 	 * it is unlocked, otherwise we'd truncate it later in the
@@ -1752,6 +1758,7 @@ int ceph_uninline_data(struct file *filp, struct page *locked_page)
 	struct inode *inode = file_inode(filp);
 	struct ceph_inode_info *ci = ceph_inode(inode);
 	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
+	struct ceph_client_metric *metric = &fsc->mdsc->metric;
 	struct ceph_osd_request *req;
 	struct page *page = NULL;
 	u64 len, inline_version;
@@ -1864,6 +1871,9 @@ int ceph_uninline_data(struct file *filp, struct page *locked_page)
 	err = ceph_osdc_start_request(&fsc->client->osdc, req, false);
 	if (!err)
 		err = ceph_osdc_wait_request(&fsc->client->osdc, req);
+
+	ceph_update_write_latency(metric, req, err);
+
 out_put:
 	ceph_osdc_put_request(req);
 	if (err == -ECANCELED)
diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
index f8a32fa335ae..3d27f2e6f556 100644
--- a/fs/ceph/debugfs.c
+++ b/fs/ceph/debugfs.c
@@ -140,6 +140,12 @@ static int metric_show(struct seq_file *s, void *p)
 	seq_printf(s, "%-14s%-12lld%-16lld%lld\n", "read",
 		   total, sum / NSEC_PER_USEC, avg / NSEC_PER_USEC);
 
+	total = percpu_counter_sum(&mdsc->metric.total_writes);
+	sum = percpu_counter_sum(&mdsc->metric.write_latency_sum);
+	avg = total ? sum / total : 0;
+	seq_printf(s, "%-14s%-12lld%-16lld%lld\n", "write",
+		   total, sum / NSEC_PER_USEC, avg / NSEC_PER_USEC);
+
 	seq_printf(s, "\n");
 	seq_printf(s, "item          total           miss            hit\n");
 	seq_printf(s, "-------------------------------------------------\n");
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 69288c39229b..9940eb85eff6 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -813,8 +813,12 @@ static void ceph_aio_complete_req(struct ceph_osd_request *req)
 	     inode, rc, osd_data->bvec_pos.iter.bi_size);
 
 	/* r_start_stamp == 0 means the request was not submitted */
-	if (req->r_start_stamp && !aio_req->write)
-		ceph_update_read_latency(metric, req, rc);
+	if (req->r_start_stamp) {
+		if (aio_req->write)
+			ceph_update_write_latency(metric, req, rc);
+		else
+			ceph_update_read_latency(metric, req, rc);
+	}
 
 	if (rc == -EOLDSNAPC) {
 		struct ceph_aio_work *aio_work;
@@ -1061,7 +1065,9 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter,
 		if (!ret)
 			ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
 
-		if (!write)
+		if (write)
+			ceph_update_write_latency(metric, req, ret);
+		else
 			ceph_update_read_latency(metric, req, ret);
 
 		size = i_size_read(inode);
@@ -1150,6 +1156,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
 	struct inode *inode = file_inode(file);
 	struct ceph_inode_info *ci = ceph_inode(inode);
 	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
+	struct ceph_client_metric *metric = &fsc->mdsc->metric;
 	struct ceph_vino vino;
 	struct ceph_osd_request *req;
 	struct page **pages;
@@ -1235,6 +1242,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
 		if (!ret)
 			ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
 
+		ceph_update_write_latency(metric, req, ret);
 out:
 		ceph_osdc_put_request(req);
 		if (ret != 0) {
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 101b51f9f05d..d072cab77ab2 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -4195,8 +4195,19 @@ static int ceph_mdsc_metric_init(struct ceph_client_metric *metric)
 	if (ret)
 		goto err_read_latency_sum;
 
-	return ret;
+	ret = percpu_counter_init(&metric->total_writes, 0, GFP_KERNEL);
+	if (ret)
+		goto err_total_writes;
 
+	ret = percpu_counter_init(&metric->write_latency_sum, 0, GFP_KERNEL);
+	if (ret)
+		goto err_write_latency_sum;
+
+	return ret;
+err_write_latency_sum:
+	percpu_counter_destroy(&metric->total_writes);
+err_total_writes:
+	percpu_counter_destroy(&metric->read_latency_sum);
 err_read_latency_sum:
 	percpu_counter_destroy(&metric->total_reads);
 err_total_reads:
@@ -4544,6 +4555,8 @@ void ceph_mdsc_destroy(struct ceph_fs_client *fsc)
 
 	ceph_mdsc_stop(mdsc);
 
+	percpu_counter_destroy(&mdsc->metric.write_latency_sum);
+	percpu_counter_destroy(&mdsc->metric.total_writes);
 	percpu_counter_destroy(&mdsc->metric.read_latency_sum);
 	percpu_counter_destroy(&mdsc->metric.total_reads);
 	percpu_counter_destroy(&mdsc->metric.d_lease_mis);
diff --git a/fs/ceph/metric.h b/fs/ceph/metric.h
index 2a7b8f3fe6a4..49546961eeed 100644
--- a/fs/ceph/metric.h
+++ b/fs/ceph/metric.h
@@ -12,6 +12,9 @@ struct ceph_client_metric {
 
 	struct percpu_counter	total_reads;
 	struct percpu_counter	read_latency_sum;
+
+	struct percpu_counter	total_writes;
+	struct percpu_counter	write_latency_sum;
 };
 
 static inline void ceph_update_read_latency(struct ceph_client_metric *m,
@@ -27,4 +30,18 @@ static inline void ceph_update_read_latency(struct ceph_client_metric *m,
 		percpu_counter_add(&m->read_latency_sum, latency);
 	}
 }
+
+static inline void ceph_update_write_latency(struct ceph_client_metric *m,
+					     struct ceph_osd_request *req,
+					     int rc)
+{
+	if (!m || !req)
+		return;
+
+	if (!rc || rc == -ETIMEDOUT) {
+		s64 latency = req->r_end_stamp - req->r_start_stamp;
+		percpu_counter_inc(&m->total_writes);
+		percpu_counter_add(&m->write_latency_sum, latency);
+	}
+}
 #endif
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH resend v5 07/11] ceph: add global metadata perf metric support
  2020-01-29  8:27 [PATCH resend v5 0/11] ceph: add perf metrics support xiubli
                   ` (5 preceding siblings ...)
  2020-01-29  8:27 ` [PATCH resend v5 06/11] ceph: add global write " xiubli
@ 2020-01-29  8:27 ` xiubli
  2020-01-29  8:27 ` [PATCH resend v5 08/11] ceph: periodically send perf metrics to MDS xiubli
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: xiubli @ 2020-01-29  8:27 UTC (permalink / raw)
  To: jlayton, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

item          total       sum_lat(us)     avg_lat(us)
-----------------------------------------------------
metadata      1288        24506000        19026

URL: https://tracker.ceph.com/issues/43215
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/debugfs.c    |  6 ++++++
 fs/ceph/mds_client.c | 25 +++++++++++++++++++++++++
 fs/ceph/metric.h     | 13 +++++++++++++
 3 files changed, 44 insertions(+)

diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
index 3d27f2e6f556..7fd031c18309 100644
--- a/fs/ceph/debugfs.c
+++ b/fs/ceph/debugfs.c
@@ -146,6 +146,12 @@ static int metric_show(struct seq_file *s, void *p)
 	seq_printf(s, "%-14s%-12lld%-16lld%lld\n", "write",
 		   total, sum / NSEC_PER_USEC, avg / NSEC_PER_USEC);
 
+	total = percpu_counter_sum(&mdsc->metric.total_metadatas);
+	sum = percpu_counter_sum(&mdsc->metric.metadata_latency_sum);
+	avg = total ? sum / total : 0;
+	seq_printf(s, "%-14s%-12lld%-16lld%lld\n", "metadata",
+		   total, sum / NSEC_PER_USEC, avg / NSEC_PER_USEC);
+
 	seq_printf(s, "\n");
 	seq_printf(s, "item          total           miss            hit\n");
 	seq_printf(s, "-------------------------------------------------\n");
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index d072cab77ab2..92a933810a79 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2772,6 +2772,12 @@ static int ceph_mdsc_wait_request(struct ceph_mds_client *mdsc,
 		else
 			err = timeleft;  /* killed */
 	}
+
+	if (!err || err == -EIO) {
+		s64 latency = jiffies - req->r_started;
+		ceph_update_metadata_latency(&mdsc->metric, latency);
+	}
+
 	dout("do_request waited, got %d\n", err);
 	mutex_lock(&mdsc->mutex);
 
@@ -3033,6 +3039,11 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg)
 
 	/* kick calling process */
 	complete_request(mdsc, req);
+
+	if (!result || result == -ENOENT) {
+		s64 latency = jiffies - req->r_started;
+		ceph_update_metadata_latency(&mdsc->metric, latency);
+	}
 out:
 	ceph_mdsc_put_request(req);
 	return;
@@ -4203,7 +4214,19 @@ static int ceph_mdsc_metric_init(struct ceph_client_metric *metric)
 	if (ret)
 		goto err_write_latency_sum;
 
+	ret = percpu_counter_init(&metric->total_metadatas, 0, GFP_KERNEL);
+	if (ret)
+		goto err_total_metadatas;
+
+	ret = percpu_counter_init(&metric->metadata_latency_sum, 0, GFP_KERNEL);
+	if (ret)
+		goto err_metadata_latency_sum;
+
 	return ret;
+err_metadata_latency_sum:
+	percpu_counter_destroy(&metric->total_metadatas);
+err_total_metadatas:
+	percpu_counter_destroy(&metric->write_latency_sum);
 err_write_latency_sum:
 	percpu_counter_destroy(&metric->total_writes);
 err_total_writes:
@@ -4555,6 +4578,8 @@ void ceph_mdsc_destroy(struct ceph_fs_client *fsc)
 
 	ceph_mdsc_stop(mdsc);
 
+	percpu_counter_destroy(&mdsc->metric.metadata_latency_sum);
+	percpu_counter_destroy(&mdsc->metric.total_metadatas);
 	percpu_counter_destroy(&mdsc->metric.write_latency_sum);
 	percpu_counter_destroy(&mdsc->metric.total_writes);
 	percpu_counter_destroy(&mdsc->metric.read_latency_sum);
diff --git a/fs/ceph/metric.h b/fs/ceph/metric.h
index 49546961eeed..3cda616ba594 100644
--- a/fs/ceph/metric.h
+++ b/fs/ceph/metric.h
@@ -15,6 +15,9 @@ struct ceph_client_metric {
 
 	struct percpu_counter	total_writes;
 	struct percpu_counter	write_latency_sum;
+
+	struct percpu_counter	total_metadatas;
+	struct percpu_counter	metadata_latency_sum;
 };
 
 static inline void ceph_update_read_latency(struct ceph_client_metric *m,
@@ -44,4 +47,14 @@ static inline void ceph_update_write_latency(struct ceph_client_metric *m,
 		percpu_counter_add(&m->write_latency_sum, latency);
 	}
 }
+
+static inline void ceph_update_metadata_latency(struct ceph_client_metric *m,
+						s64 latency)
+{
+	if (!m)
+		return;
+
+	percpu_counter_inc(&m->total_metadatas);
+	percpu_counter_add(&m->metadata_latency_sum, latency);
+}
 #endif
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH resend v5 08/11] ceph: periodically send perf metrics to MDS
  2020-01-29  8:27 [PATCH resend v5 0/11] ceph: add perf metrics support xiubli
                   ` (6 preceding siblings ...)
  2020-01-29  8:27 ` [PATCH resend v5 07/11] ceph: add global metadata perf " xiubli
@ 2020-01-29  8:27 ` xiubli
  2020-02-05 21:43   ` Jeff Layton
  2020-01-29  8:27 ` [PATCH resend v5 09/11] ceph: add CEPH_DEFINE_RW_FUNC helper support xiubli
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: xiubli @ 2020-01-29  8:27 UTC (permalink / raw)
  To: jlayton, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

Add enable/disable sending metrics to MDS debugfs and disabled as
default, if it's enabled the kclient will send metrics every
second.

This will send global dentry lease hit/miss and read/write/metadata
latency metrics and each session's caps hit/miss metric to MDS.

Every time only sends the global metrics once via any availible
session.

URL: https://tracker.ceph.com/issues/43215
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/debugfs.c            |  44 +++++++-
 fs/ceph/mds_client.c         | 201 ++++++++++++++++++++++++++++++++---
 fs/ceph/mds_client.h         |   3 +
 fs/ceph/metric.h             |  76 +++++++++++++
 fs/ceph/super.h              |   1 +
 include/linux/ceph/ceph_fs.h |   1 +
 6 files changed, 307 insertions(+), 19 deletions(-)

diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
index 7fd031c18309..8aae7ecea54a 100644
--- a/fs/ceph/debugfs.c
+++ b/fs/ceph/debugfs.c
@@ -124,6 +124,40 @@ static int mdsc_show(struct seq_file *s, void *p)
 	return 0;
 }
 
+/*
+ * metrics debugfs
+ */
+static int sending_metrics_set(void *data, u64 val)
+{
+	struct ceph_fs_client *fsc = (struct ceph_fs_client *)data;
+	struct ceph_mds_client *mdsc = fsc->mdsc;
+
+	if (val > 1) {
+		pr_err("Invalid sending metrics set value %llu\n", val);
+		return -EINVAL;
+	}
+
+	mutex_lock(&mdsc->mutex);
+	mdsc->sending_metrics = (unsigned int)val;
+	mutex_unlock(&mdsc->mutex);
+
+	return 0;
+}
+
+static int sending_metrics_get(void *data, u64 *val)
+{
+	struct ceph_fs_client *fsc = (struct ceph_fs_client *)data;
+	struct ceph_mds_client *mdsc = fsc->mdsc;
+
+	mutex_lock(&mdsc->mutex);
+	*val = (u64)mdsc->sending_metrics;
+	mutex_unlock(&mdsc->mutex);
+
+	return 0;
+}
+DEFINE_SIMPLE_ATTRIBUTE(sending_metrics_fops, sending_metrics_get,
+			sending_metrics_set, "%llu\n");
+
 static int metric_show(struct seq_file *s, void *p)
 {
 	struct ceph_fs_client *fsc = s->private;
@@ -302,11 +336,9 @@ static int congestion_kb_get(void *data, u64 *val)
 	*val = (u64)fsc->mount_options->congestion_kb;
 	return 0;
 }
-
 DEFINE_SIMPLE_ATTRIBUTE(congestion_kb_fops, congestion_kb_get,
 			congestion_kb_set, "%llu\n");
 
-
 void ceph_fs_debugfs_cleanup(struct ceph_fs_client *fsc)
 {
 	dout("ceph_fs_debugfs_cleanup\n");
@@ -316,6 +348,7 @@ void ceph_fs_debugfs_cleanup(struct ceph_fs_client *fsc)
 	debugfs_remove(fsc->debugfs_mds_sessions);
 	debugfs_remove(fsc->debugfs_caps);
 	debugfs_remove(fsc->debugfs_metric);
+	debugfs_remove(fsc->debugfs_sending_metrics);
 	debugfs_remove(fsc->debugfs_mdsc);
 }
 
@@ -356,6 +389,13 @@ void ceph_fs_debugfs_init(struct ceph_fs_client *fsc)
 						fsc,
 						&mdsc_show_fops);
 
+	fsc->debugfs_sending_metrics =
+			debugfs_create_file("sending_metrics",
+					    0600,
+					    fsc->client->debugfs_dir,
+					    fsc,
+					    &sending_metrics_fops);
+
 	fsc->debugfs_metric = debugfs_create_file("metrics",
 						  0400,
 						  fsc->client->debugfs_dir,
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 92a933810a79..d765804dc855 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -4104,13 +4104,156 @@ static void maybe_recover_session(struct ceph_mds_client *mdsc)
 	ceph_force_reconnect(fsc->sb);
 }
 
+/*
+ * called under s_mutex
+ */
+static bool ceph_mdsc_send_metrics(struct ceph_mds_client *mdsc,
+				   struct ceph_mds_session *s,
+				   bool skip_global)
+{
+	struct ceph_metric_head *head;
+	struct ceph_metric_cap *cap;
+	struct ceph_metric_dentry_lease *lease;
+	struct ceph_metric_read_latency *read;
+	struct ceph_metric_write_latency *write;
+	struct ceph_metric_metadata_latency *meta;
+	struct ceph_msg *msg;
+	struct timespec64 ts;
+	s32 len = sizeof(*head) + sizeof(*cap);
+	s64 sum, total, avg;
+	s32 items = 0;
+
+	if (!mdsc || !s)
+		return false;
+
+	if (!skip_global) {
+		len += sizeof(*lease);
+		len += sizeof(*read);
+		len += sizeof(*write);
+		len += sizeof(*meta);
+	}
+
+	msg = ceph_msg_new(CEPH_MSG_CLIENT_METRICS, len, GFP_NOFS, true);
+	if (!msg) {
+		pr_err("send metrics to mds%d, failed to allocate message\n",
+		       s->s_mds);
+		return false;
+	}
+
+	head = msg->front.iov_base;
+
+	/* encode the cap metric */
+	cap = (struct ceph_metric_cap *)(head + 1);
+	cap->type = cpu_to_le32(CLIENT_METRIC_TYPE_CAP_INFO);
+	cap->ver = 1;
+	cap->campat = 1;
+	cap->data_len = cpu_to_le32(sizeof(*cap) - 10);
+	cap->hit = cpu_to_le64(percpu_counter_sum(&s->i_caps_hit));
+	cap->mis = cpu_to_le64(percpu_counter_sum(&s->i_caps_mis));
+	cap->total = cpu_to_le64(s->s_nr_caps);
+	items++;
+
+	dout("cap metric hit %lld, mis %lld, total caps %lld",
+	     le64_to_cpu(cap->hit), le64_to_cpu(cap->mis),
+	     le64_to_cpu(cap->total));
+
+	/* only send the global once */
+	if (skip_global)
+		goto skip_global;
+
+	/* encode the dentry lease metric */
+	lease = (struct ceph_metric_dentry_lease *)(cap + 1);
+	lease->type = cpu_to_le32(CLIENT_METRIC_TYPE_DENTRY_LEASE);
+	lease->ver = 1;
+	lease->campat = 1;
+	lease->data_len = cpu_to_le32(sizeof(*lease) - 10);
+	lease->hit = cpu_to_le64(percpu_counter_sum(&mdsc->metric.d_lease_hit));
+	lease->mis = cpu_to_le64(percpu_counter_sum(&mdsc->metric.d_lease_mis));
+	lease->total = cpu_to_le64(atomic64_read(&mdsc->metric.total_dentries));
+	items++;
+
+	dout("dentry lease metric hit %lld, mis %lld, total dentries %lld",
+	     le64_to_cpu(lease->hit), le64_to_cpu(lease->mis),
+	     le64_to_cpu(lease->total));
+
+	/* encode the read latency metric */
+	read = (struct ceph_metric_read_latency *)(lease + 1);
+	read->type = cpu_to_le32(CLIENT_METRIC_TYPE_READ_LATENCY);
+	read->ver = 1;
+	read->campat = 1;
+	read->data_len = cpu_to_le32(sizeof(*read) - 10);
+	total = percpu_counter_sum(&mdsc->metric.total_reads),
+	sum = percpu_counter_sum(&mdsc->metric.read_latency_sum);
+	avg = total ? sum / total : 0;
+	ts = ns_to_timespec64(avg);
+	read->sec = cpu_to_le32(ts.tv_sec);
+	read->nsec = cpu_to_le32(ts.tv_nsec);
+	items++;
+
+	dout("read latency metric total %lld, sum lat %lld, avg lat %lld",
+	     total, sum, avg);
+
+	/* encode the write latency metric */
+	write = (struct ceph_metric_write_latency *)(read + 1);
+	write->type = cpu_to_le32(CLIENT_METRIC_TYPE_WRITE_LATENCY);
+	write->ver = 1;
+	write->campat = 1;
+	write->data_len = cpu_to_le32(sizeof(*write) - 10);
+	total = percpu_counter_sum(&mdsc->metric.total_writes),
+	sum = percpu_counter_sum(&mdsc->metric.write_latency_sum);
+	avg = total ? sum / total : 0;
+	ts = ns_to_timespec64(avg);
+	write->sec = cpu_to_le32(ts.tv_sec);
+	write->nsec = cpu_to_le32(ts.tv_nsec);
+	items++;
+
+	dout("write latency metric total %lld, sum lat %lld, avg lat %lld",
+	     total, sum, avg);
+
+	/* encode the metadata latency metric */
+	meta = (struct ceph_metric_metadata_latency *)(write + 1);
+	meta->type = cpu_to_le32(CLIENT_METRIC_TYPE_METADATA_LATENCY);
+	meta->ver = 1;
+	meta->campat = 1;
+	meta->data_len = cpu_to_le32(sizeof(*meta) - 10);
+	total = percpu_counter_sum(&mdsc->metric.total_metadatas),
+	sum = percpu_counter_sum(&mdsc->metric.metadata_latency_sum);
+	avg = total ? sum / total : 0;
+	ts = ns_to_timespec64(avg);
+	meta->sec = cpu_to_le32(ts.tv_sec);
+	meta->nsec = cpu_to_le32(ts.tv_nsec);
+	items++;
+
+	dout("metadata latency metric total %lld, sum lat %lld, avg lat %lld",
+	     total, sum, avg);
+
+skip_global:
+	put_unaligned_le32(items, &head->num);
+	msg->front.iov_len = cpu_to_le32(len);
+	msg->hdr.version = cpu_to_le16(1);
+	msg->hdr.compat_version = cpu_to_le16(1);
+	msg->hdr.front_len = cpu_to_le32(msg->front.iov_len);
+	dout("send metrics to mds%d %p\n", s->s_mds, msg);
+	ceph_con_send(&s->s_con, msg);
+
+	return true;
+}
+
 /*
  * delayed work -- periodically trim expired leases, renew caps with mds
  */
+#define CEPH_WORK_DELAY_DEF 5
 static void schedule_delayed(struct ceph_mds_client *mdsc)
 {
-	int delay = 5;
-	unsigned hz = round_jiffies_relative(HZ * delay);
+	unsigned int hz;
+	int delay = CEPH_WORK_DELAY_DEF;
+
+	mutex_lock(&mdsc->mutex);
+	if (mdsc->sending_metrics)
+		delay = 1;
+	mutex_unlock(&mdsc->mutex);
+
+	hz = round_jiffies_relative(HZ * delay);
 	schedule_delayed_work(&mdsc->delayed_work, hz);
 }
 
@@ -4121,18 +4264,28 @@ static void delayed_work(struct work_struct *work)
 		container_of(work, struct ceph_mds_client, delayed_work.work);
 	int renew_interval;
 	int renew_caps;
+	bool metric_only;
+	bool sending_metrics;
+	bool g_skip = false;
 
 	dout("mdsc delayed_work\n");
 
 	mutex_lock(&mdsc->mutex);
-	renew_interval = mdsc->mdsmap->m_session_timeout >> 2;
-	renew_caps = time_after_eq(jiffies, HZ*renew_interval +
-				   mdsc->last_renew_caps);
-	if (renew_caps)
-		mdsc->last_renew_caps = jiffies;
+	sending_metrics = !!mdsc->sending_metrics;
+	metric_only = mdsc->sending_metrics &&
+		(mdsc->ticks++ % CEPH_WORK_DELAY_DEF);
+
+	if (!metric_only) {
+		renew_interval = mdsc->mdsmap->m_session_timeout >> 2;
+		renew_caps = time_after_eq(jiffies, HZ*renew_interval +
+					   mdsc->last_renew_caps);
+		if (renew_caps)
+			mdsc->last_renew_caps = jiffies;
+	}
 
 	for (i = 0; i < mdsc->max_sessions; i++) {
 		struct ceph_mds_session *s = __ceph_lookup_mds_session(mdsc, i);
+
 		if (!s)
 			continue;
 		if (s->s_state == CEPH_MDS_SESSION_CLOSING) {
@@ -4158,13 +4311,20 @@ static void delayed_work(struct work_struct *work)
 		mutex_unlock(&mdsc->mutex);
 
 		mutex_lock(&s->s_mutex);
-		if (renew_caps)
-			send_renew_caps(mdsc, s);
-		else
-			ceph_con_keepalive(&s->s_con);
-		if (s->s_state == CEPH_MDS_SESSION_OPEN ||
-		    s->s_state == CEPH_MDS_SESSION_HUNG)
-			ceph_send_cap_releases(mdsc, s);
+
+		if (sending_metrics)
+			g_skip = ceph_mdsc_send_metrics(mdsc, s, g_skip);
+
+		if (!metric_only) {
+			if (renew_caps)
+				send_renew_caps(mdsc, s);
+			else
+				ceph_con_keepalive(&s->s_con);
+			if (s->s_state == CEPH_MDS_SESSION_OPEN ||
+					s->s_state == CEPH_MDS_SESSION_HUNG)
+				ceph_send_cap_releases(mdsc, s);
+		}
+
 		mutex_unlock(&s->s_mutex);
 		ceph_put_mds_session(s);
 
@@ -4172,6 +4332,9 @@ static void delayed_work(struct work_struct *work)
 	}
 	mutex_unlock(&mdsc->mutex);
 
+	if (metric_only)
+		goto delay_work;
+
 	ceph_check_delayed_caps(mdsc);
 
 	ceph_queue_cap_reclaim_work(mdsc);
@@ -4180,11 +4343,13 @@ static void delayed_work(struct work_struct *work)
 
 	maybe_recover_session(mdsc);
 
+delay_work:
 	schedule_delayed(mdsc);
 }
 
-static int ceph_mdsc_metric_init(struct ceph_client_metric *metric)
+static int ceph_mdsc_metric_init(struct ceph_mds_client *mdsc)
 {
+	struct ceph_client_metric *metric = &mdsc->metric;
 	int ret;
 
 	if (!metric)
@@ -4222,7 +4387,9 @@ static int ceph_mdsc_metric_init(struct ceph_client_metric *metric)
 	if (ret)
 		goto err_metadata_latency_sum;
 
-	return ret;
+	mdsc->sending_metrics = 0;
+	mdsc->ticks = 0;
+	return 0;
 err_metadata_latency_sum:
 	percpu_counter_destroy(&metric->total_metadatas);
 err_total_metadatas:
@@ -4294,7 +4461,7 @@ int ceph_mdsc_init(struct ceph_fs_client *fsc)
 	init_waitqueue_head(&mdsc->cap_flushing_wq);
 	INIT_WORK(&mdsc->cap_reclaim_work, ceph_cap_reclaim_work);
 	atomic_set(&mdsc->cap_reclaim_pending, 0);
-	err = ceph_mdsc_metric_init(&mdsc->metric);
+	err = ceph_mdsc_metric_init(mdsc);
 	if (err)
 		goto err_mdsmap;
 
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 574d4e5a5de2..a0ece55d987c 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -451,6 +451,9 @@ struct ceph_mds_client {
 	struct list_head  dentry_leases;     /* fifo list */
 	struct list_head  dentry_dir_leases; /* lru list */
 
+	/* metrics */
+	unsigned int		  sending_metrics;
+	unsigned int		  ticks;
 	struct ceph_client_metric metric;
 
 	spinlock_t		snapid_map_lock;
diff --git a/fs/ceph/metric.h b/fs/ceph/metric.h
index 3cda616ba594..352eb753ce25 100644
--- a/fs/ceph/metric.h
+++ b/fs/ceph/metric.h
@@ -4,6 +4,82 @@
 
 #include <linux/ceph/osd_client.h>
 
+enum ceph_metric_type {
+	CLIENT_METRIC_TYPE_CAP_INFO,
+	CLIENT_METRIC_TYPE_READ_LATENCY,
+	CLIENT_METRIC_TYPE_WRITE_LATENCY,
+	CLIENT_METRIC_TYPE_METADATA_LATENCY,
+	CLIENT_METRIC_TYPE_DENTRY_LEASE,
+
+	CLIENT_METRIC_TYPE_MAX = CLIENT_METRIC_TYPE_DENTRY_LEASE,
+};
+
+/* metric caps header */
+struct ceph_metric_cap {
+	__le32 type;     /* ceph metric type */
+
+	__u8  ver;
+	__u8  campat;
+
+	__le32 data_len; /* length of sizeof(hit + mis + total) */
+	__le64 hit;
+	__le64 mis;
+	__le64 total;
+} __attribute__ ((packed));
+
+/* metric dentry lease header */
+struct ceph_metric_dentry_lease {
+	__le32 type;     /* ceph metric type */
+
+	__u8  ver;
+	__u8  campat;
+
+	__le32 data_len; /* length of sizeof(hit + mis + total) */
+	__le64 hit;
+	__le64 mis;
+	__le64 total;
+} __attribute__ ((packed));
+
+/* metric read latency header */
+struct ceph_metric_read_latency {
+	__le32 type;     /* ceph metric type */
+
+	__u8  ver;
+	__u8  campat;
+
+	__le32 data_len; /* length of sizeof(sec + nsec) */
+	__le32 sec;
+	__le32 nsec;
+} __attribute__ ((packed));
+
+/* metric write latency header */
+struct ceph_metric_write_latency {
+	__le32 type;     /* ceph metric type */
+
+	__u8  ver;
+	__u8  campat;
+
+	__le32 data_len; /* length of sizeof(sec + nsec) */
+	__le32 sec;
+	__le32 nsec;
+} __attribute__ ((packed));
+
+/* metric metadata latency header */
+struct ceph_metric_metadata_latency {
+	__le32 type;     /* ceph metric type */
+
+	__u8  ver;
+	__u8  campat;
+
+	__le32 data_len; /* length of sizeof(sec + nsec) */
+	__le32 sec;
+	__le32 nsec;
+} __attribute__ ((packed));
+
+struct ceph_metric_head {
+	__le32 num;	/* the number of metrics will be sent */
+} __attribute__ ((packed));
+
 /* This is the global metrics */
 struct ceph_client_metric {
 	atomic64_t		total_dentries;
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 3f4829222528..a91431e9bdf7 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -128,6 +128,7 @@ struct ceph_fs_client {
 	struct dentry *debugfs_congestion_kb;
 	struct dentry *debugfs_bdi;
 	struct dentry *debugfs_mdsc, *debugfs_mdsmap;
+	struct dentry *debugfs_sending_metrics;
 	struct dentry *debugfs_metric;
 	struct dentry *debugfs_mds_sessions;
 #endif
diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
index a099f60feb7b..6028d3e865e4 100644
--- a/include/linux/ceph/ceph_fs.h
+++ b/include/linux/ceph/ceph_fs.h
@@ -130,6 +130,7 @@ struct ceph_dir_layout {
 #define CEPH_MSG_CLIENT_REQUEST         24
 #define CEPH_MSG_CLIENT_REQUEST_FORWARD 25
 #define CEPH_MSG_CLIENT_REPLY           26
+#define CEPH_MSG_CLIENT_METRICS         29
 #define CEPH_MSG_CLIENT_CAPS            0x310
 #define CEPH_MSG_CLIENT_LEASE           0x311
 #define CEPH_MSG_CLIENT_SNAP            0x312
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH resend v5 09/11] ceph: add CEPH_DEFINE_RW_FUNC helper support
  2020-01-29  8:27 [PATCH resend v5 0/11] ceph: add perf metrics support xiubli
                   ` (7 preceding siblings ...)
  2020-01-29  8:27 ` [PATCH resend v5 08/11] ceph: periodically send perf metrics to MDS xiubli
@ 2020-01-29  8:27 ` xiubli
  2020-01-29  8:27 ` [PATCH resend v5 10/11] ceph: add reset metrics support xiubli
  2020-01-29  8:27 ` [PATCH resend v5 11/11] ceph: send client provided metric flags in client metadata xiubli
  10 siblings, 0 replies; 31+ messages in thread
From: xiubli @ 2020-01-29  8:27 UTC (permalink / raw)
  To: jlayton, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

This will support the string store.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 include/linux/ceph/debugfs.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/include/linux/ceph/debugfs.h b/include/linux/ceph/debugfs.h
index cf5e840eec71..b918fb1f6f54 100644
--- a/include/linux/ceph/debugfs.h
+++ b/include/linux/ceph/debugfs.h
@@ -18,6 +18,20 @@ static const struct file_operations name##_fops = {			\
 	.release	= single_release,				\
 };
 
+#define CEPH_DEFINE_RW_FUNC(name)					\
+static int name##_open(struct inode *inode, struct file *file)		\
+{									\
+	return single_open(file, name##_show, inode->i_private);	\
+}									\
+									\
+static const struct file_operations name##_fops = {			\
+	.open		= name##_open,					\
+	.read		= seq_read,					\
+	.write		= name##_store,					\
+	.llseek		= seq_lseek,					\
+	.release	= single_release,				\
+};
+
 /* debugfs.c */
 extern void ceph_debugfs_init(void);
 extern void ceph_debugfs_cleanup(void);
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH resend v5 10/11] ceph: add reset metrics support
  2020-01-29  8:27 [PATCH resend v5 0/11] ceph: add perf metrics support xiubli
                   ` (8 preceding siblings ...)
  2020-01-29  8:27 ` [PATCH resend v5 09/11] ceph: add CEPH_DEFINE_RW_FUNC helper support xiubli
@ 2020-01-29  8:27 ` xiubli
  2020-01-29  8:27 ` [PATCH resend v5 11/11] ceph: send client provided metric flags in client metadata xiubli
  10 siblings, 0 replies; 31+ messages in thread
From: xiubli @ 2020-01-29  8:27 UTC (permalink / raw)
  To: jlayton, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

This will reset the most metric counters, except the cap and dentry
total numbers.

Sometimes we need to discard the old metrics and start to get new
metrics.

URL: https://tracker.ceph.com/issues/43215
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/debugfs.c | 53 +++++++++++++++++++++++++++++++++++++++++++++--
 fs/ceph/super.h   |  1 +
 2 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
index 8aae7ecea54a..37ca1efa6b27 100644
--- a/fs/ceph/debugfs.c
+++ b/fs/ceph/debugfs.c
@@ -217,6 +217,55 @@ static int metric_show(struct seq_file *s, void *p)
 	return 0;
 }
 
+static ssize_t metric_store(struct file *file, const char __user *user_buf,
+			    size_t count, loff_t *ppos)
+{
+	struct seq_file *s = file->private_data;
+	struct ceph_fs_client *fsc = s->private;
+	struct ceph_mds_client *mdsc = fsc->mdsc;
+	struct ceph_client_metric *metric = &mdsc->metric;
+	char buf[8];
+	int i;
+
+	if (copy_from_user(buf, user_buf, 8))
+		return -EFAULT;
+
+	if (strcmp(buf, "reset")) {
+		pr_err("Invalid set value '%s', only 'reset' is valid\n", buf);
+		return -EINVAL;
+	}
+
+	percpu_counter_set(&metric->d_lease_hit, 0);
+	percpu_counter_set(&metric->d_lease_mis, 0);
+
+	percpu_counter_set(&metric->read_latency_sum, 0);
+	percpu_counter_set(&metric->total_reads, 0);
+
+	percpu_counter_set(&metric->write_latency_sum, 0);
+	percpu_counter_set(&metric->total_writes, 0);
+
+	percpu_counter_set(&metric->metadata_latency_sum, 0);
+	percpu_counter_set(&metric->total_metadatas, 0);
+
+	mutex_lock(&mdsc->mutex);
+	for (i = 0; i < mdsc->max_sessions; i++) {
+		struct ceph_mds_session *session;
+
+		session = __ceph_lookup_mds_session(mdsc, i);
+		if (!session)
+			continue;
+		percpu_counter_set(&session->i_caps_hit, 0);
+		percpu_counter_set(&session->i_caps_mis, 0);
+		ceph_put_mds_session(session);
+	}
+
+	mutex_unlock(&mdsc->mutex);
+
+	return 0;
+}
+
+CEPH_DEFINE_RW_FUNC(metric)
+
 static int caps_show_cb(struct inode *inode, struct ceph_cap *cap, void *p)
 {
 	struct seq_file *s = p;
@@ -313,7 +362,6 @@ static int mds_sessions_show(struct seq_file *s, void *ptr)
 
 CEPH_DEFINE_SHOW_FUNC(mdsmap_show)
 CEPH_DEFINE_SHOW_FUNC(mdsc_show)
-CEPH_DEFINE_SHOW_FUNC(metric_show)
 CEPH_DEFINE_SHOW_FUNC(caps_show)
 CEPH_DEFINE_SHOW_FUNC(mds_sessions_show)
 
@@ -349,6 +397,7 @@ void ceph_fs_debugfs_cleanup(struct ceph_fs_client *fsc)
 	debugfs_remove(fsc->debugfs_caps);
 	debugfs_remove(fsc->debugfs_metric);
 	debugfs_remove(fsc->debugfs_sending_metrics);
+	debugfs_remove(fsc->debugfs_reset_metrics);
 	debugfs_remove(fsc->debugfs_mdsc);
 }
 
@@ -400,7 +449,7 @@ void ceph_fs_debugfs_init(struct ceph_fs_client *fsc)
 						  0400,
 						  fsc->client->debugfs_dir,
 						  fsc,
-						  &metric_show_fops);
+						  &metric_fops);
 
 	fsc->debugfs_caps = debugfs_create_file("caps",
 						0400,
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index a91431e9bdf7..d24929f1c4bf 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -129,6 +129,7 @@ struct ceph_fs_client {
 	struct dentry *debugfs_bdi;
 	struct dentry *debugfs_mdsc, *debugfs_mdsmap;
 	struct dentry *debugfs_sending_metrics;
+	struct dentry *debugfs_reset_metrics;
 	struct dentry *debugfs_metric;
 	struct dentry *debugfs_mds_sessions;
 #endif
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH resend v5 11/11] ceph: send client provided metric flags in client metadata
  2020-01-29  8:27 [PATCH resend v5 0/11] ceph: add perf metrics support xiubli
                   ` (9 preceding siblings ...)
  2020-01-29  8:27 ` [PATCH resend v5 10/11] ceph: add reset metrics support xiubli
@ 2020-01-29  8:27 ` xiubli
  10 siblings, 0 replies; 31+ messages in thread
From: xiubli @ 2020-01-29  8:27 UTC (permalink / raw)
  To: jlayton, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

Will send the metric flags to MDS, currently it supports the cap,
dentry lease, read latency, write latency and metadata latency.

URL: https://tracker.ceph.com/issues/43435
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/mds_client.c | 47 ++++++++++++++++++++++++++++++++++++++++++--
 fs/ceph/metric.h     | 14 +++++++++++++
 2 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index d765804dc855..f9d3acd36656 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1096,6 +1096,41 @@ static void encode_supported_features(void **p, void *end)
 	}
 }
 
+static const unsigned char metric_bits[] = CEPHFS_METRIC_SPEC_CLIENT_SUPPORTED;
+#define METRIC_BYTES(cnt) (DIV_ROUND_UP((size_t)metric_bits[cnt - 1] + 1, 64) * 8)
+static void encode_metric_spec(void **p, void *end)
+{
+	static const size_t count = ARRAY_SIZE(metric_bits);
+
+	/* header */
+	BUG_ON(*p + 2 > end);
+	ceph_encode_8(p, 1); /* version */
+	ceph_encode_8(p, 1); /* compat */
+
+	if (count > 0) {
+		size_t i;
+		size_t size = METRIC_BYTES(count);
+
+		BUG_ON(*p + 4 + 4 + size > end);
+
+		/* metric spec info length */
+		ceph_encode_32(p, 4 + size);
+
+		/* metric spec */
+		ceph_encode_32(p, size);
+		memset(*p, 0, size);
+		for (i = 0; i < count; i++)
+			((unsigned char*)(*p))[i / 8] |= BIT(metric_bits[i] % 8);
+		*p += size;
+	} else {
+		BUG_ON(*p + 4 + 4 > end);
+		/* metric spec info length */
+		ceph_encode_32(p, 4);
+		/* metric spec */
+		ceph_encode_32(p, 0);
+	}
+}
+
 /*
  * session message, specialization for CEPH_SESSION_REQUEST_OPEN
  * to include additional client metadata fields.
@@ -1135,6 +1170,13 @@ static struct ceph_msg *create_session_open_msg(struct ceph_mds_client *mdsc, u6
 		size = FEATURE_BYTES(count);
 	extra_bytes += 4 + size;
 
+	/* metric spec */
+	size = 0;
+	count = ARRAY_SIZE(metric_bits);
+	if (count > 0)
+		size = METRIC_BYTES(count);
+	extra_bytes += 2 + 4 + 4 + size;
+
 	/* Allocate the message */
 	msg = ceph_msg_new(CEPH_MSG_CLIENT_SESSION, sizeof(*h) + extra_bytes,
 			   GFP_NOFS, false);
@@ -1153,9 +1195,9 @@ static struct ceph_msg *create_session_open_msg(struct ceph_mds_client *mdsc, u6
 	 * Serialize client metadata into waiting buffer space, using
 	 * the format that userspace expects for map<string, string>
 	 *
-	 * ClientSession messages with metadata are v3
+	 * ClientSession messages with metadata are v4
 	 */
-	msg->hdr.version = cpu_to_le16(3);
+	msg->hdr.version = cpu_to_le16(4);
 	msg->hdr.compat_version = cpu_to_le16(1);
 
 	/* The write pointer, following the session_head structure */
@@ -1178,6 +1220,7 @@ static struct ceph_msg *create_session_open_msg(struct ceph_mds_client *mdsc, u6
 	}
 
 	encode_supported_features(&p, end);
+	encode_metric_spec(&p, end);
 	msg->front.iov_len = p - msg->front.iov_base;
 	msg->hdr.front_len = cpu_to_le32(msg->front.iov_len);
 
diff --git a/fs/ceph/metric.h b/fs/ceph/metric.h
index 352eb753ce25..70e0b586b687 100644
--- a/fs/ceph/metric.h
+++ b/fs/ceph/metric.h
@@ -14,6 +14,20 @@ enum ceph_metric_type {
 	CLIENT_METRIC_TYPE_MAX = CLIENT_METRIC_TYPE_DENTRY_LEASE,
 };
 
+/*
+ * This will always have the highest metric bit value
+ * as the last element of the array.
+ */
+#define CEPHFS_METRIC_SPEC_CLIENT_SUPPORTED {	\
+	CLIENT_METRIC_TYPE_CAP_INFO,		\
+	CLIENT_METRIC_TYPE_READ_LATENCY,	\
+	CLIENT_METRIC_TYPE_WRITE_LATENCY,	\
+	CLIENT_METRIC_TYPE_METADATA_LATENCY,	\
+	CLIENT_METRIC_TYPE_DENTRY_LEASE,	\
+						\
+	CLIENT_METRIC_TYPE_MAX,			\
+}
+
 /* metric caps header */
 struct ceph_metric_cap {
 	__le32 type;     /* ceph metric type */
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH resend v5 02/11] ceph: add caps perf metric for each session
  2020-01-29  8:27 ` [PATCH resend v5 02/11] ceph: add caps perf metric for each session xiubli
@ 2020-01-29 14:21   ` Jeff Layton
  2020-01-30  2:22     ` Xiubo Li
  0 siblings, 1 reply; 31+ messages in thread
From: Jeff Layton @ 2020-01-29 14:21 UTC (permalink / raw)
  To: xiubli, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel

On Wed, 2020-01-29 at 03:27 -0500, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> This will fulfill the caps hit/miss metric for each session. When
> checking the "need" mask and if one cap has the subset of the "need"
> mask it means hit, or missed.
> 
> item          total           miss            hit
> -------------------------------------------------
> d_lease       295             0               993
> 
> session       caps            miss            hit
> -------------------------------------------------
> 0             295             107             4119
> 1             1               107             9
> 
> URL: https://tracker.ceph.com/issues/43215
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  fs/ceph/acl.c        |  2 ++
>  fs/ceph/addr.c       |  2 ++
>  fs/ceph/caps.c       | 74 ++++++++++++++++++++++++++++++++++++++++++++
>  fs/ceph/debugfs.c    | 20 ++++++++++++
>  fs/ceph/dir.c        |  9 ++++--
>  fs/ceph/file.c       |  3 ++
>  fs/ceph/mds_client.c | 16 +++++++++-
>  fs/ceph/mds_client.h |  3 ++
>  fs/ceph/quota.c      |  9 ++++--
>  fs/ceph/super.h      | 11 +++++++
>  fs/ceph/xattr.c      | 17 ++++++++--
>  11 files changed, 158 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/ceph/acl.c b/fs/ceph/acl.c
> index 26be6520d3fb..58e119e3519f 100644
> --- a/fs/ceph/acl.c
> +++ b/fs/ceph/acl.c
> @@ -22,6 +22,8 @@ static inline void ceph_set_cached_acl(struct inode *inode,
>  	struct ceph_inode_info *ci = ceph_inode(inode);
>  
>  	spin_lock(&ci->i_ceph_lock);
> +	__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
> +
>  	if (__ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 0))
>  		set_cached_acl(inode, type, acl);
>  	else
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index 7ab616601141..29d4513eff8c 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -1706,6 +1706,8 @@ int ceph_uninline_data(struct file *filp, struct page *locked_page)
>  			err = -ENOMEM;
>  			goto out;
>  		}
> +
> +		ceph_caps_metric(ci, CEPH_STAT_CAP_INLINE_DATA);

Should a check for inline data really count here?

>  		err = __ceph_do_getattr(inode, page,
>  					CEPH_STAT_CAP_INLINE_DATA, true);
>  		if (err < 0) {
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index 7fc87b693ba4..af2e9e826f8c 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -783,6 +783,75 @@ static int __cap_is_valid(struct ceph_cap *cap)
>  	return 1;
>  }
>  
> +/*
> + * Counts the cap metric.
> + */

This needs some comments. Specifically, what should this be counting and
how?

> +void __ceph_caps_metric(struct ceph_inode_info *ci, int mask)
> +{
> +	int have = ci->i_snap_caps;
> +	struct ceph_mds_session *s;
> +	struct ceph_cap *cap;
> +	struct rb_node *p;
> +	bool skip_auth = false;
> +
> +	lockdep_assert_held(&ci->i_ceph_lock);
> +
> +	if (mask <= 0)
> +		return;
> +
> +	/* Counts the snap caps metric in the auth cap */
> +	if (ci->i_auth_cap) {
> +		cap = ci->i_auth_cap;
> +		if (have) {
> +			have |= cap->issued;
> +
> +			dout("%s %p cap %p issued %s, mask %s\n", __func__,
> +			     &ci->vfs_inode, cap, ceph_cap_string(cap->issued),
> +			     ceph_cap_string(mask));
> +
> +			s = ceph_get_mds_session(cap->session);
> +			if (s) {
> +				if (mask & have)
> +					percpu_counter_inc(&s->i_caps_hit);
> +				else
> +					percpu_counter_inc(&s->i_caps_mis);
> +				ceph_put_mds_session(s);
> +			}
> +			skip_auth = true;
> +		}
> +	}
> +
> +	if ((mask & have) == mask)
> +		return;
> +
> +	/* Checks others */
> +	for (p = rb_first(&ci->i_caps); p; p = rb_next(p)) {
> +		cap = rb_entry(p, struct ceph_cap, ci_node);
> +		if (!__cap_is_valid(cap))
> +			continue;
> +
> +		if (skip_auth && cap == ci->i_auth_cap)
> +			continue;
> +
> +		dout("%s %p cap %p issued %s, mask %s\n", __func__,
> +		     &ci->vfs_inode, cap, ceph_cap_string(cap->issued),
> +		     ceph_cap_string(mask));
> +
> +		s = ceph_get_mds_session(cap->session);
> +		if (s) {
> +			if (mask & cap->issued)
> +				percpu_counter_inc(&s->i_caps_hit);
> +			else
> +				percpu_counter_inc(&s->i_caps_mis);
> +			ceph_put_mds_session(s);
> +		}
> +
> +		have |= cap->issued;
> +		if ((mask & have) == mask)
> +			return;
> +	}
> +}
> +

I'm trying to understand what happens with the above when more than one
ceph_cap has the same bit set in "issued". For instance:

Suppose we're doing the check for a statx call, and we're trying to get
caps for pAsFsLs. We have two MDS's and they've each granted us caps for
the inode, say:

MDS 0: pAs
MDS 1: pAsLsFs

We check the cap0 first, and consider it a hit, and then we check cap1
and consider it a hit as well. So that seems like it's being double-
counted.

ISTM, that what you really want to do here is logically or all of the
cap->issued fields together, and then check that vs. the mask value, and
count only one hit or miss per inode.

That said, it's not 100% clear what you're counting as a hit or miss
here, so please let me know if I have that wrong.
 
>  /*
>   * Return set of valid cap bits issued to us.  Note that caps time
>   * out, and may be invalidated in bulk if the client session times out
> @@ -2746,6 +2815,7 @@ static void check_max_size(struct inode *inode, loff_t endoff)
>  int ceph_try_get_caps(struct inode *inode, int need, int want,
>  		      bool nonblock, int *got)
>  {
> +	struct ceph_inode_info *ci = ceph_inode(inode);
>  	int ret;
>  
>  	BUG_ON(need & ~CEPH_CAP_FILE_RD);
> @@ -2758,6 +2828,7 @@ int ceph_try_get_caps(struct inode *inode, int need, int want,
>  	BUG_ON(want & ~(CEPH_CAP_FILE_CACHE | CEPH_CAP_FILE_LAZYIO |
>  			CEPH_CAP_FILE_SHARED | CEPH_CAP_FILE_EXCL |
>  			CEPH_CAP_ANY_DIR_OPS));
> +	ceph_caps_metric(ci, need | want);
>  	ret = try_get_cap_refs(inode, need, want, 0, nonblock, got);
>  	return ret == -EAGAIN ? 0 : ret;
>  }
> @@ -2784,6 +2855,8 @@ int ceph_get_caps(struct file *filp, int need, int want,
>  	    fi->filp_gen != READ_ONCE(fsc->filp_gen))
>  		return -EBADF;
>  
> +	ceph_caps_metric(ci, need | want);
> +
>  	while (true) {
>  		if (endoff > 0)
>  			check_max_size(inode, endoff);
> @@ -2871,6 +2944,7 @@ int ceph_get_caps(struct file *filp, int need, int want,
>  			 * getattr request will bring inline data into
>  			 * page cache
>  			 */
> +			ceph_caps_metric(ci, CEPH_STAT_CAP_INLINE_DATA);
>  			ret = __ceph_do_getattr(inode, NULL,
>  						CEPH_STAT_CAP_INLINE_DATA,
>  						true);
> diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
> index 40a22da0214a..c132fdb40d53 100644
> --- a/fs/ceph/debugfs.c
> +++ b/fs/ceph/debugfs.c
> @@ -128,6 +128,7 @@ static int metric_show(struct seq_file *s, void *p)
>  {
>  	struct ceph_fs_client *fsc = s->private;
>  	struct ceph_mds_client *mdsc = fsc->mdsc;
> +	int i;
>  
>  	seq_printf(s, "item          total           miss            hit\n");
>  	seq_printf(s, "-------------------------------------------------\n");
> @@ -137,6 +138,25 @@ static int metric_show(struct seq_file *s, void *p)
>  		   percpu_counter_sum(&mdsc->metric.d_lease_mis),
>  		   percpu_counter_sum(&mdsc->metric.d_lease_hit));
>  
> +	seq_printf(s, "\n");
> +	seq_printf(s, "session       caps            miss            hit\n");
> +	seq_printf(s, "-------------------------------------------------\n");
> +
> +	mutex_lock(&mdsc->mutex);
> +	for (i = 0; i < mdsc->max_sessions; i++) {
> +		struct ceph_mds_session *session;
> +
> +		session = __ceph_lookup_mds_session(mdsc, i);
> +		if (!session)
> +			continue;
> +		seq_printf(s, "%-14d%-16d%-16lld%lld\n", i,
> +			   session->s_nr_caps,
> +			   percpu_counter_sum(&session->i_caps_mis),
> +			   percpu_counter_sum(&session->i_caps_hit));
> +		ceph_put_mds_session(session);
> +	}
> +	mutex_unlock(&mdsc->mutex);
> +
>  	return 0;
>  }
>  
> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> index 658c55b323cc..33eb239e09e2 100644
> --- a/fs/ceph/dir.c
> +++ b/fs/ceph/dir.c
> @@ -313,7 +313,7 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
>  	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
>  	struct ceph_mds_client *mdsc = fsc->mdsc;
>  	int i;
> -	int err;
> +	int err, ret = -1;
>  	unsigned frag = -1;
>  	struct ceph_mds_reply_info_parsed *rinfo;
>  
> @@ -346,13 +346,16 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
>  	    !ceph_test_mount_opt(fsc, NOASYNCREADDIR) &&
>  	    ceph_snap(inode) != CEPH_SNAPDIR &&
>  	    __ceph_dir_is_complete_ordered(ci) &&
> -	    __ceph_caps_issued_mask(ci, CEPH_CAP_FILE_SHARED, 1)) {
> +	    (ret = __ceph_caps_issued_mask(ci, CEPH_CAP_FILE_SHARED, 1))) {
>  		int shared_gen = atomic_read(&ci->i_shared_gen);
> +		__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>  		spin_unlock(&ci->i_ceph_lock);
>  		err = __dcache_readdir(file, ctx, shared_gen);
>  		if (err != -EAGAIN)
>  			return err;
>  	} else {
> +		if (ret != -1)
> +			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>  		spin_unlock(&ci->i_ceph_lock);
>  	}
>  
> @@ -757,6 +760,8 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
>  		struct ceph_dentry_info *di = ceph_dentry(dentry);
>  
>  		spin_lock(&ci->i_ceph_lock);
> +		__ceph_caps_metric(ci, CEPH_CAP_FILE_SHARED);
> +
>  		dout(" dir %p flags are %d\n", dir, ci->i_ceph_flags);
>  		if (strncmp(dentry->d_name.name,
>  			    fsc->mount_options->snapdir_name,
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 1e6cdf2dfe90..c78dfbbb7b91 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -384,6 +384,8 @@ int ceph_open(struct inode *inode, struct file *file)
>  	 * asynchronously.
>  	 */
>  	spin_lock(&ci->i_ceph_lock);
> +	__ceph_caps_metric(ci, wanted);
> +
>  	if (__ceph_is_any_real_caps(ci) &&
>  	    (((fmode & CEPH_FILE_MODE_WR) == 0) || ci->i_auth_cap)) {
>  		int mds_wanted = __ceph_caps_mds_wanted(ci, true);
> @@ -1340,6 +1342,7 @@ static ssize_t ceph_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  				return -ENOMEM;
>  		}
>  
> +		ceph_caps_metric(ci, CEPH_STAT_CAP_INLINE_DATA);
>  		statret = __ceph_do_getattr(inode, page,
>  					    CEPH_STAT_CAP_INLINE_DATA, !!page);
>  		if (statret < 0) {
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index a24fd00676b8..141c1c03636c 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -558,6 +558,8 @@ void ceph_put_mds_session(struct ceph_mds_session *s)
>  	if (refcount_dec_and_test(&s->s_ref)) {
>  		if (s->s_auth.authorizer)
>  			ceph_auth_destroy_authorizer(s->s_auth.authorizer);
> +		percpu_counter_destroy(&s->i_caps_hit);
> +		percpu_counter_destroy(&s->i_caps_mis);
>  		kfree(s);
>  	}
>  }
> @@ -598,6 +600,7 @@ static struct ceph_mds_session *register_session(struct ceph_mds_client *mdsc,
>  						 int mds)
>  {
>  	struct ceph_mds_session *s;
> +	int err;
>  
>  	if (mds >= mdsc->mdsmap->possible_max_rank)
>  		return ERR_PTR(-EINVAL);
> @@ -612,8 +615,10 @@ static struct ceph_mds_session *register_session(struct ceph_mds_client *mdsc,
>  
>  		dout("%s: realloc to %d\n", __func__, newmax);
>  		sa = kcalloc(newmax, sizeof(void *), GFP_NOFS);
> -		if (!sa)
> +		if (!sa) {
> +			err = -ENOMEM;
>  			goto fail_realloc;
> +		}
>  		if (mdsc->sessions) {
>  			memcpy(sa, mdsc->sessions,
>  			       mdsc->max_sessions * sizeof(void *));
> @@ -653,6 +658,13 @@ static struct ceph_mds_session *register_session(struct ceph_mds_client *mdsc,
>  
>  	INIT_LIST_HEAD(&s->s_cap_flushing);
>  
> +	err = percpu_counter_init(&s->i_caps_hit, 0, GFP_NOFS);
> +	if (err)
> +		goto fail_realloc;
> +	err = percpu_counter_init(&s->i_caps_mis, 0, GFP_NOFS);
> +	if (err)
> +		goto fail_init;
> +
>  	mdsc->sessions[mds] = s;
>  	atomic_inc(&mdsc->num_sessions);
>  	refcount_inc(&s->s_ref);  /* one ref to sessions[], one to caller */
> @@ -662,6 +674,8 @@ static struct ceph_mds_session *register_session(struct ceph_mds_client *mdsc,
>  
>  	return s;
>  
> +fail_init:
> +	percpu_counter_destroy(&s->i_caps_hit);
>  fail_realloc:
>  	kfree(s);
>  	return ERR_PTR(-ENOMEM);
> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> index dd1f417b90eb..ba74ff74c59c 100644
> --- a/fs/ceph/mds_client.h
> +++ b/fs/ceph/mds_client.h
> @@ -201,6 +201,9 @@ struct ceph_mds_session {
>  
>  	struct list_head  s_waiting;  /* waiting requests */
>  	struct list_head  s_unsafe;   /* unsafe requests */
> +
> +	struct percpu_counter i_caps_hit;
> +	struct percpu_counter i_caps_mis;
>  };
>  
>  /*
> diff --git a/fs/ceph/quota.c b/fs/ceph/quota.c
> index de56dee60540..4ce2f658e63d 100644
> --- a/fs/ceph/quota.c
> +++ b/fs/ceph/quota.c
> @@ -147,9 +147,14 @@ static struct inode *lookup_quotarealm_inode(struct ceph_mds_client *mdsc,
>  		return NULL;
>  	}
>  	if (qri->inode) {
> +		struct ceph_inode_info *ci = ceph_inode(qri->inode);
> +		int ret;
> +
> +		ceph_caps_metric(ci, CEPH_STAT_CAP_INODE);
> +
>  		/* get caps */
> -		int ret = __ceph_do_getattr(qri->inode, NULL,
> -					    CEPH_STAT_CAP_INODE, true);
> +		ret = __ceph_do_getattr(qri->inode, NULL,
> +					CEPH_STAT_CAP_INODE, true);
>  		if (ret >= 0)
>  			in = qri->inode;
>  		else
> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> index 7af91628636c..3f4829222528 100644
> --- a/fs/ceph/super.h
> +++ b/fs/ceph/super.h
> @@ -641,6 +641,14 @@ static inline bool __ceph_is_any_real_caps(struct ceph_inode_info *ci)
>  	return !RB_EMPTY_ROOT(&ci->i_caps);
>  }
>  
> +extern void __ceph_caps_metric(struct ceph_inode_info *ci, int mask);
> +static inline void ceph_caps_metric(struct ceph_inode_info *ci, int mask)
> +{
> +	spin_lock(&ci->i_ceph_lock);
> +	__ceph_caps_metric(ci, mask);
> +	spin_unlock(&ci->i_ceph_lock);
> +}
> +
>  extern int __ceph_caps_issued(struct ceph_inode_info *ci, int *implemented);
>  extern int __ceph_caps_issued_mask(struct ceph_inode_info *ci, int mask, int t);
>  extern int __ceph_caps_issued_other(struct ceph_inode_info *ci,
> @@ -927,6 +935,9 @@ extern int __ceph_do_getattr(struct inode *inode, struct page *locked_page,
>  			     int mask, bool force);
>  static inline int ceph_do_getattr(struct inode *inode, int mask, bool force)
>  {
> +	struct ceph_inode_info *ci = ceph_inode(inode);
> +
> +	ceph_caps_metric(ci, mask);
>  	return __ceph_do_getattr(inode, NULL, mask, force);
>  }
>  extern int ceph_permission(struct inode *inode, int mask);
> diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
> index d58fa14c1f01..ebd522edb0a8 100644
> --- a/fs/ceph/xattr.c
> +++ b/fs/ceph/xattr.c
> @@ -829,6 +829,7 @@ ssize_t __ceph_getxattr(struct inode *inode, const char *name, void *value,
>  	struct ceph_vxattr *vxattr = NULL;
>  	int req_mask;
>  	ssize_t err;
> +	int ret = -1;
>  
>  	/* let's see if a virtual xattr was requested */
>  	vxattr = ceph_match_vxattr(inode, name);
> @@ -856,7 +857,9 @@ ssize_t __ceph_getxattr(struct inode *inode, const char *name, void *value,
>  
>  	if (ci->i_xattrs.version == 0 ||
>  	    !((req_mask & CEPH_CAP_XATTR_SHARED) ||
> -	      __ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 1))) {
> +	      (ret = __ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 1)))) {
> +		if (ret != -1)
> +			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>  		spin_unlock(&ci->i_ceph_lock);
>  
>  		/* security module gets xattr while filling trace */
> @@ -871,6 +874,9 @@ ssize_t __ceph_getxattr(struct inode *inode, const char *name, void *value,
>  		if (err)
>  			return err;
>  		spin_lock(&ci->i_ceph_lock);
> +	} else {
> +		if (ret != -1)
> +			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>  	}
>  
>  	err = __build_xattrs(inode);
> @@ -907,19 +913,24 @@ ssize_t ceph_listxattr(struct dentry *dentry, char *names, size_t size)
>  	struct ceph_inode_info *ci = ceph_inode(inode);
>  	bool len_only = (size == 0);
>  	u32 namelen;
> -	int err;
> +	int err, ret = -1;
>  
>  	spin_lock(&ci->i_ceph_lock);
>  	dout("listxattr %p ver=%lld index_ver=%lld\n", inode,
>  	     ci->i_xattrs.version, ci->i_xattrs.index_version);
>  
>  	if (ci->i_xattrs.version == 0 ||
> -	    !__ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 1)) {
> +	    !(ret = __ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 1))) {
> +		if (ret != -1)
> +			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>  		spin_unlock(&ci->i_ceph_lock);
>  		err = ceph_do_getattr(inode, CEPH_STAT_CAP_XATTR, true);
>  		if (err)
>  			return err;
>  		spin_lock(&ci->i_ceph_lock);
> +	} else {
> +		if (ret != -1)
> +			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>  	}
>  
>  	err = __build_xattrs(inode);

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH resend v5 02/11] ceph: add caps perf metric for each session
  2020-01-29 14:21   ` Jeff Layton
@ 2020-01-30  2:22     ` Xiubo Li
  2020-01-30 19:00       ` Jeffrey Layton
  0 siblings, 1 reply; 31+ messages in thread
From: Xiubo Li @ 2020-01-30  2:22 UTC (permalink / raw)
  To: Jeff Layton, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel

On 2020/1/29 22:21, Jeff Layton wrote:
> On Wed, 2020-01-29 at 03:27 -0500, xiubli@redhat.com wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> This will fulfill the caps hit/miss metric for each session. When
>> checking the "need" mask and if one cap has the subset of the "need"
>> mask it means hit, or missed.
>>
>> item          total           miss            hit
>> -------------------------------------------------
>> d_lease       295             0               993
>>
>> session       caps            miss            hit
>> -------------------------------------------------
>> 0             295             107             4119
>> 1             1               107             9
>>
>> URL: https://tracker.ceph.com/issues/43215
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>   fs/ceph/acl.c        |  2 ++
>>   fs/ceph/addr.c       |  2 ++
>>   fs/ceph/caps.c       | 74 ++++++++++++++++++++++++++++++++++++++++++++
>>   fs/ceph/debugfs.c    | 20 ++++++++++++
>>   fs/ceph/dir.c        |  9 ++++--
>>   fs/ceph/file.c       |  3 ++
>>   fs/ceph/mds_client.c | 16 +++++++++-
>>   fs/ceph/mds_client.h |  3 ++
>>   fs/ceph/quota.c      |  9 ++++--
>>   fs/ceph/super.h      | 11 +++++++
>>   fs/ceph/xattr.c      | 17 ++++++++--
>>   11 files changed, 158 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/ceph/acl.c b/fs/ceph/acl.c
>> index 26be6520d3fb..58e119e3519f 100644
>> --- a/fs/ceph/acl.c
>> +++ b/fs/ceph/acl.c
>> @@ -22,6 +22,8 @@ static inline void ceph_set_cached_acl(struct inode *inode,
>>   	struct ceph_inode_info *ci = ceph_inode(inode);
>>   
>>   	spin_lock(&ci->i_ceph_lock);
>> +	__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>> +
>>   	if (__ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 0))
>>   		set_cached_acl(inode, type, acl);
>>   	else
>> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
>> index 7ab616601141..29d4513eff8c 100644
>> --- a/fs/ceph/addr.c
>> +++ b/fs/ceph/addr.c
>> @@ -1706,6 +1706,8 @@ int ceph_uninline_data(struct file *filp, struct page *locked_page)
>>   			err = -ENOMEM;
>>   			goto out;
>>   		}
>> +
>> +		ceph_caps_metric(ci, CEPH_STAT_CAP_INLINE_DATA);
> Should a check for inline data really count here?
Currently all the INLINE_DATA is in 'force' mode, so we can ignore it.
>>   		err = __ceph_do_getattr(inode, page,
>>   					CEPH_STAT_CAP_INLINE_DATA, true);
>>   		if (err < 0) {
>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
>> index 7fc87b693ba4..af2e9e826f8c 100644
>> --- a/fs/ceph/caps.c
>> +++ b/fs/ceph/caps.c
>> @@ -783,6 +783,75 @@ static int __cap_is_valid(struct ceph_cap *cap)
>>   	return 1;
>>   }
>>   
>> +/*
>> + * Counts the cap metric.
>> + */
> This needs some comments. Specifically, what should this be counting and
> how?

Will add it.

The __ceph_caps_metric() will traverse the inode's i_caps try to get the 
'issued' excepting the 'invoking' caps until it get enough caps in 
'mask'. The i_caps traverse logic is following __ceph_caps_issued_mask().


>> +void __ceph_caps_metric(struct ceph_inode_info *ci, int mask)
>> +{
>> +	int have = ci->i_snap_caps;
>> +	struct ceph_mds_session *s;
>> +	struct ceph_cap *cap;
>> +	struct rb_node *p;
>> +	bool skip_auth = false;
>> +
>> +	lockdep_assert_held(&ci->i_ceph_lock);
>> +
>> +	if (mask <= 0)
>> +		return;
>> +
>> +	/* Counts the snap caps metric in the auth cap */
>> +	if (ci->i_auth_cap) {
>> +		cap = ci->i_auth_cap;
>> +		if (have) {
>> +			have |= cap->issued;
>> +
>> +			dout("%s %p cap %p issued %s, mask %s\n", __func__,
>> +			     &ci->vfs_inode, cap, ceph_cap_string(cap->issued),
>> +			     ceph_cap_string(mask));
>> +
>> +			s = ceph_get_mds_session(cap->session);
>> +			if (s) {
>> +				if (mask & have)
>> +					percpu_counter_inc(&s->i_caps_hit);
>> +				else
>> +					percpu_counter_inc(&s->i_caps_mis);
>> +				ceph_put_mds_session(s);
>> +			}
>> +			skip_auth = true;
>> +		}
>> +	}
>> +
>> +	if ((mask & have) == mask)
>> +		return;
>> +
>> +	/* Checks others */
>> +	for (p = rb_first(&ci->i_caps); p; p = rb_next(p)) {
>> +		cap = rb_entry(p, struct ceph_cap, ci_node);
>> +		if (!__cap_is_valid(cap))
>> +			continue;
>> +
>> +		if (skip_auth && cap == ci->i_auth_cap)
>> +			continue;
>> +
>> +		dout("%s %p cap %p issued %s, mask %s\n", __func__,
>> +		     &ci->vfs_inode, cap, ceph_cap_string(cap->issued),
>> +		     ceph_cap_string(mask));
>> +
>> +		s = ceph_get_mds_session(cap->session);
>> +		if (s) {
>> +			if (mask & cap->issued)
>> +				percpu_counter_inc(&s->i_caps_hit);
>> +			else
>> +				percpu_counter_inc(&s->i_caps_mis);
>> +			ceph_put_mds_session(s);
>> +		}
>> +
>> +		have |= cap->issued;
>> +		if ((mask & have) == mask)
>> +			return;
>> +	}
>> +}
>> +
> I'm trying to understand what happens with the above when more than one
> ceph_cap has the same bit set in "issued". For instance:
>
> Suppose we're doing the check for a statx call, and we're trying to get
> caps for pAsFsLs. We have two MDS's and they've each granted us caps for
> the inode, say:
>
> MDS 0: pAs
> MDS 1: pAsLsFs
>
> We check the cap0 first, and consider it a hit, and then we check cap1
> and consider it a hit as well. So that seems like it's being double-
> counted.

Yeah, it will.

In case2:

MDS 0: pAsFs

MDS 1: pAsLs

For this case and yours both the i_cap0 and i_cap1 are 'hit'.


In case3 :

MDS0: pAsFsLs

MDS1: pAs

Only the i_cap0 is 'hit'.  i_cap1 will not be counted any 'hit' or 'mis'.


In case4:

MDS0: p

MDS1: pAsLsFs

i_cap0 will 'mis' and i_cap1 will 'hit'.


All the logic are the same with __ceph_caps_issued_mask() does.

The 'hit' means to get all the caps in 'mask we have checked the 
i_cap[0~N] and if they have a subset in 'mask', and the 'mis' means we 
have checked the i_cap[0~N] but they do not.

For the i_cap[N+1 ~ M], if we won't touch them because we have already 
gotten enough caps needed in 'mask', they won't count any 'hit' or 'mis'

All in all, the current logic is that the 'hit' means 'touched' and it 
have some of what we needed, and the 'mis' means 'touched' and missed 
any of what we needed.



>
> ISTM, that what you really want to do here is logically or all of the
> cap->issued fields together, and then check that vs. the mask value, and
> count only one hit or miss per inode.
>
> That said, it's not 100% clear what you're counting as a hit or miss
> here, so please let me know if I have that wrong.
>   
>>   /*
>>    * Return set of valid cap bits issued to us.  Note that caps time
>>    * out, and may be invalidated in bulk if the client session times out
>> @@ -2746,6 +2815,7 @@ static void check_max_size(struct inode *inode, loff_t endoff)
>>   int ceph_try_get_caps(struct inode *inode, int need, int want,
>>   		      bool nonblock, int *got)
>>   {
>> +	struct ceph_inode_info *ci = ceph_inode(inode);
>>   	int ret;
>>   
>>   	BUG_ON(need & ~CEPH_CAP_FILE_RD);
>> @@ -2758,6 +2828,7 @@ int ceph_try_get_caps(struct inode *inode, int need, int want,
>>   	BUG_ON(want & ~(CEPH_CAP_FILE_CACHE | CEPH_CAP_FILE_LAZYIO |
>>   			CEPH_CAP_FILE_SHARED | CEPH_CAP_FILE_EXCL |
>>   			CEPH_CAP_ANY_DIR_OPS));
>> +	ceph_caps_metric(ci, need | want);
>>   	ret = try_get_cap_refs(inode, need, want, 0, nonblock, got);
>>   	return ret == -EAGAIN ? 0 : ret;
>>   }
>> @@ -2784,6 +2855,8 @@ int ceph_get_caps(struct file *filp, int need, int want,
>>   	    fi->filp_gen != READ_ONCE(fsc->filp_gen))
>>   		return -EBADF;
>>   
>> +	ceph_caps_metric(ci, need | want);
>> +
>>   	while (true) {
>>   		if (endoff > 0)
>>   			check_max_size(inode, endoff);
>> @@ -2871,6 +2944,7 @@ int ceph_get_caps(struct file *filp, int need, int want,
>>   			 * getattr request will bring inline data into
>>   			 * page cache
>>   			 */
>> +			ceph_caps_metric(ci, CEPH_STAT_CAP_INLINE_DATA);
>>   			ret = __ceph_do_getattr(inode, NULL,
>>   						CEPH_STAT_CAP_INLINE_DATA,
>>   						true);
>> diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
>> index 40a22da0214a..c132fdb40d53 100644
>> --- a/fs/ceph/debugfs.c
>> +++ b/fs/ceph/debugfs.c
>> @@ -128,6 +128,7 @@ static int metric_show(struct seq_file *s, void *p)
>>   {
>>   	struct ceph_fs_client *fsc = s->private;
>>   	struct ceph_mds_client *mdsc = fsc->mdsc;
>> +	int i;
>>   
>>   	seq_printf(s, "item          total           miss            hit\n");
>>   	seq_printf(s, "-------------------------------------------------\n");
>> @@ -137,6 +138,25 @@ static int metric_show(struct seq_file *s, void *p)
>>   		   percpu_counter_sum(&mdsc->metric.d_lease_mis),
>>   		   percpu_counter_sum(&mdsc->metric.d_lease_hit));
>>   
>> +	seq_printf(s, "\n");
>> +	seq_printf(s, "session       caps            miss            hit\n");
>> +	seq_printf(s, "-------------------------------------------------\n");
>> +
>> +	mutex_lock(&mdsc->mutex);
>> +	for (i = 0; i < mdsc->max_sessions; i++) {
>> +		struct ceph_mds_session *session;
>> +
>> +		session = __ceph_lookup_mds_session(mdsc, i);
>> +		if (!session)
>> +			continue;
>> +		seq_printf(s, "%-14d%-16d%-16lld%lld\n", i,
>> +			   session->s_nr_caps,
>> +			   percpu_counter_sum(&session->i_caps_mis),
>> +			   percpu_counter_sum(&session->i_caps_hit));
>> +		ceph_put_mds_session(session);
>> +	}
>> +	mutex_unlock(&mdsc->mutex);
>> +
>>   	return 0;
>>   }
>>   
>> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
>> index 658c55b323cc..33eb239e09e2 100644
>> --- a/fs/ceph/dir.c
>> +++ b/fs/ceph/dir.c
>> @@ -313,7 +313,7 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
>>   	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
>>   	struct ceph_mds_client *mdsc = fsc->mdsc;
>>   	int i;
>> -	int err;
>> +	int err, ret = -1;
>>   	unsigned frag = -1;
>>   	struct ceph_mds_reply_info_parsed *rinfo;
>>   
>> @@ -346,13 +346,16 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx)
>>   	    !ceph_test_mount_opt(fsc, NOASYNCREADDIR) &&
>>   	    ceph_snap(inode) != CEPH_SNAPDIR &&
>>   	    __ceph_dir_is_complete_ordered(ci) &&
>> -	    __ceph_caps_issued_mask(ci, CEPH_CAP_FILE_SHARED, 1)) {
>> +	    (ret = __ceph_caps_issued_mask(ci, CEPH_CAP_FILE_SHARED, 1))) {
>>   		int shared_gen = atomic_read(&ci->i_shared_gen);
>> +		__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>>   		spin_unlock(&ci->i_ceph_lock);
>>   		err = __dcache_readdir(file, ctx, shared_gen);
>>   		if (err != -EAGAIN)
>>   			return err;
>>   	} else {
>> +		if (ret != -1)
>> +			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>>   		spin_unlock(&ci->i_ceph_lock);
>>   	}
>>   
>> @@ -757,6 +760,8 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
>>   		struct ceph_dentry_info *di = ceph_dentry(dentry);
>>   
>>   		spin_lock(&ci->i_ceph_lock);
>> +		__ceph_caps_metric(ci, CEPH_CAP_FILE_SHARED);
>> +
>>   		dout(" dir %p flags are %d\n", dir, ci->i_ceph_flags);
>>   		if (strncmp(dentry->d_name.name,
>>   			    fsc->mount_options->snapdir_name,
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index 1e6cdf2dfe90..c78dfbbb7b91 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -384,6 +384,8 @@ int ceph_open(struct inode *inode, struct file *file)
>>   	 * asynchronously.
>>   	 */
>>   	spin_lock(&ci->i_ceph_lock);
>> +	__ceph_caps_metric(ci, wanted);
>> +
>>   	if (__ceph_is_any_real_caps(ci) &&
>>   	    (((fmode & CEPH_FILE_MODE_WR) == 0) || ci->i_auth_cap)) {
>>   		int mds_wanted = __ceph_caps_mds_wanted(ci, true);
>> @@ -1340,6 +1342,7 @@ static ssize_t ceph_read_iter(struct kiocb *iocb, struct iov_iter *to)
>>   				return -ENOMEM;
>>   		}
>>   
>> +		ceph_caps_metric(ci, CEPH_STAT_CAP_INLINE_DATA);
>>   		statret = __ceph_do_getattr(inode, page,
>>   					    CEPH_STAT_CAP_INLINE_DATA, !!page);
>>   		if (statret < 0) {
>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>> index a24fd00676b8..141c1c03636c 100644
>> --- a/fs/ceph/mds_client.c
>> +++ b/fs/ceph/mds_client.c
>> @@ -558,6 +558,8 @@ void ceph_put_mds_session(struct ceph_mds_session *s)
>>   	if (refcount_dec_and_test(&s->s_ref)) {
>>   		if (s->s_auth.authorizer)
>>   			ceph_auth_destroy_authorizer(s->s_auth.authorizer);
>> +		percpu_counter_destroy(&s->i_caps_hit);
>> +		percpu_counter_destroy(&s->i_caps_mis);
>>   		kfree(s);
>>   	}
>>   }
>> @@ -598,6 +600,7 @@ static struct ceph_mds_session *register_session(struct ceph_mds_client *mdsc,
>>   						 int mds)
>>   {
>>   	struct ceph_mds_session *s;
>> +	int err;
>>   
>>   	if (mds >= mdsc->mdsmap->possible_max_rank)
>>   		return ERR_PTR(-EINVAL);
>> @@ -612,8 +615,10 @@ static struct ceph_mds_session *register_session(struct ceph_mds_client *mdsc,
>>   
>>   		dout("%s: realloc to %d\n", __func__, newmax);
>>   		sa = kcalloc(newmax, sizeof(void *), GFP_NOFS);
>> -		if (!sa)
>> +		if (!sa) {
>> +			err = -ENOMEM;
>>   			goto fail_realloc;
>> +		}
>>   		if (mdsc->sessions) {
>>   			memcpy(sa, mdsc->sessions,
>>   			       mdsc->max_sessions * sizeof(void *));
>> @@ -653,6 +658,13 @@ static struct ceph_mds_session *register_session(struct ceph_mds_client *mdsc,
>>   
>>   	INIT_LIST_HEAD(&s->s_cap_flushing);
>>   
>> +	err = percpu_counter_init(&s->i_caps_hit, 0, GFP_NOFS);
>> +	if (err)
>> +		goto fail_realloc;
>> +	err = percpu_counter_init(&s->i_caps_mis, 0, GFP_NOFS);
>> +	if (err)
>> +		goto fail_init;
>> +
>>   	mdsc->sessions[mds] = s;
>>   	atomic_inc(&mdsc->num_sessions);
>>   	refcount_inc(&s->s_ref);  /* one ref to sessions[], one to caller */
>> @@ -662,6 +674,8 @@ static struct ceph_mds_session *register_session(struct ceph_mds_client *mdsc,
>>   
>>   	return s;
>>   
>> +fail_init:
>> +	percpu_counter_destroy(&s->i_caps_hit);
>>   fail_realloc:
>>   	kfree(s);
>>   	return ERR_PTR(-ENOMEM);
>> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
>> index dd1f417b90eb..ba74ff74c59c 100644
>> --- a/fs/ceph/mds_client.h
>> +++ b/fs/ceph/mds_client.h
>> @@ -201,6 +201,9 @@ struct ceph_mds_session {
>>   
>>   	struct list_head  s_waiting;  /* waiting requests */
>>   	struct list_head  s_unsafe;   /* unsafe requests */
>> +
>> +	struct percpu_counter i_caps_hit;
>> +	struct percpu_counter i_caps_mis;
>>   };
>>   
>>   /*
>> diff --git a/fs/ceph/quota.c b/fs/ceph/quota.c
>> index de56dee60540..4ce2f658e63d 100644
>> --- a/fs/ceph/quota.c
>> +++ b/fs/ceph/quota.c
>> @@ -147,9 +147,14 @@ static struct inode *lookup_quotarealm_inode(struct ceph_mds_client *mdsc,
>>   		return NULL;
>>   	}
>>   	if (qri->inode) {
>> +		struct ceph_inode_info *ci = ceph_inode(qri->inode);
>> +		int ret;
>> +
>> +		ceph_caps_metric(ci, CEPH_STAT_CAP_INODE);
>> +
>>   		/* get caps */
>> -		int ret = __ceph_do_getattr(qri->inode, NULL,
>> -					    CEPH_STAT_CAP_INODE, true);
>> +		ret = __ceph_do_getattr(qri->inode, NULL,
>> +					CEPH_STAT_CAP_INODE, true);
>>   		if (ret >= 0)
>>   			in = qri->inode;
>>   		else
>> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
>> index 7af91628636c..3f4829222528 100644
>> --- a/fs/ceph/super.h
>> +++ b/fs/ceph/super.h
>> @@ -641,6 +641,14 @@ static inline bool __ceph_is_any_real_caps(struct ceph_inode_info *ci)
>>   	return !RB_EMPTY_ROOT(&ci->i_caps);
>>   }
>>   
>> +extern void __ceph_caps_metric(struct ceph_inode_info *ci, int mask);
>> +static inline void ceph_caps_metric(struct ceph_inode_info *ci, int mask)
>> +{
>> +	spin_lock(&ci->i_ceph_lock);
>> +	__ceph_caps_metric(ci, mask);
>> +	spin_unlock(&ci->i_ceph_lock);
>> +}
>> +
>>   extern int __ceph_caps_issued(struct ceph_inode_info *ci, int *implemented);
>>   extern int __ceph_caps_issued_mask(struct ceph_inode_info *ci, int mask, int t);
>>   extern int __ceph_caps_issued_other(struct ceph_inode_info *ci,
>> @@ -927,6 +935,9 @@ extern int __ceph_do_getattr(struct inode *inode, struct page *locked_page,
>>   			     int mask, bool force);
>>   static inline int ceph_do_getattr(struct inode *inode, int mask, bool force)
>>   {
>> +	struct ceph_inode_info *ci = ceph_inode(inode);
>> +
>> +	ceph_caps_metric(ci, mask);
>>   	return __ceph_do_getattr(inode, NULL, mask, force);
>>   }
>>   extern int ceph_permission(struct inode *inode, int mask);
>> diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
>> index d58fa14c1f01..ebd522edb0a8 100644
>> --- a/fs/ceph/xattr.c
>> +++ b/fs/ceph/xattr.c
>> @@ -829,6 +829,7 @@ ssize_t __ceph_getxattr(struct inode *inode, const char *name, void *value,
>>   	struct ceph_vxattr *vxattr = NULL;
>>   	int req_mask;
>>   	ssize_t err;
>> +	int ret = -1;
>>   
>>   	/* let's see if a virtual xattr was requested */
>>   	vxattr = ceph_match_vxattr(inode, name);
>> @@ -856,7 +857,9 @@ ssize_t __ceph_getxattr(struct inode *inode, const char *name, void *value,
>>   
>>   	if (ci->i_xattrs.version == 0 ||
>>   	    !((req_mask & CEPH_CAP_XATTR_SHARED) ||
>> -	      __ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 1))) {
>> +	      (ret = __ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 1)))) {
>> +		if (ret != -1)
>> +			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>>   		spin_unlock(&ci->i_ceph_lock);
>>   
>>   		/* security module gets xattr while filling trace */
>> @@ -871,6 +874,9 @@ ssize_t __ceph_getxattr(struct inode *inode, const char *name, void *value,
>>   		if (err)
>>   			return err;
>>   		spin_lock(&ci->i_ceph_lock);
>> +	} else {
>> +		if (ret != -1)
>> +			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>>   	}
>>   
>>   	err = __build_xattrs(inode);
>> @@ -907,19 +913,24 @@ ssize_t ceph_listxattr(struct dentry *dentry, char *names, size_t size)
>>   	struct ceph_inode_info *ci = ceph_inode(inode);
>>   	bool len_only = (size == 0);
>>   	u32 namelen;
>> -	int err;
>> +	int err, ret = -1;
>>   
>>   	spin_lock(&ci->i_ceph_lock);
>>   	dout("listxattr %p ver=%lld index_ver=%lld\n", inode,
>>   	     ci->i_xattrs.version, ci->i_xattrs.index_version);
>>   
>>   	if (ci->i_xattrs.version == 0 ||
>> -	    !__ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 1)) {
>> +	    !(ret = __ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 1))) {
>> +		if (ret != -1)
>> +			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>>   		spin_unlock(&ci->i_ceph_lock);
>>   		err = ceph_do_getattr(inode, CEPH_STAT_CAP_XATTR, true);
>>   		if (err)
>>   			return err;
>>   		spin_lock(&ci->i_ceph_lock);
>> +	} else {
>> +		if (ret != -1)
>> +			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>>   	}
>>   
>>   	err = __build_xattrs(inode);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH resend v5 02/11] ceph: add caps perf metric for each session
  2020-01-30  2:22     ` Xiubo Li
@ 2020-01-30 19:00       ` Jeffrey Layton
  2020-01-31  1:34         ` Xiubo Li
  0 siblings, 1 reply; 31+ messages in thread
From: Jeffrey Layton @ 2020-01-30 19:00 UTC (permalink / raw)
  To: Xiubo Li, Jeff Layton, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel

On Thu, 2020-01-30 at 10:22 +0800, Xiubo Li wrote:
> On 2020/1/29 22:21, Jeff Layton wrote:
> > On Wed, 2020-01-29 at 03:27 -0500, xiubli@redhat.com wrote:
> > > From: Xiubo Li <xiubli@redhat.com>
> > > 
> > > This will fulfill the caps hit/miss metric for each session. When
> > > checking the "need" mask and if one cap has the subset of the
> > > "need"
> > > mask it means hit, or missed.
> > > 
> > > item          total           miss            hit
> > > -------------------------------------------------
> > > d_lease       295             0               993
> > > 
> > > session       caps            miss            hit
> > > -------------------------------------------------
> > > 0             295             107             4119
> > > 1             1               107             9
> > > 
> > > URL: https://tracker.ceph.com/issues/43215
> > > Signed-off-by: Xiubo Li <xiubli@redhat.com>
> > > ---
> > >   fs/ceph/acl.c        |  2 ++
> > >   fs/ceph/addr.c       |  2 ++
> > >   fs/ceph/caps.c       | 74
> > > ++++++++++++++++++++++++++++++++++++++++++++
> > >   fs/ceph/debugfs.c    | 20 ++++++++++++
> > >   fs/ceph/dir.c        |  9 ++++--
> > >   fs/ceph/file.c       |  3 ++
> > >   fs/ceph/mds_client.c | 16 +++++++++-
> > >   fs/ceph/mds_client.h |  3 ++
> > >   fs/ceph/quota.c      |  9 ++++--
> > >   fs/ceph/super.h      | 11 +++++++
> > >   fs/ceph/xattr.c      | 17 ++++++++--
> > >   11 files changed, 158 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/fs/ceph/acl.c b/fs/ceph/acl.c
> > > index 26be6520d3fb..58e119e3519f 100644
> > > --- a/fs/ceph/acl.c
> > > +++ b/fs/ceph/acl.c
> > > @@ -22,6 +22,8 @@ static inline void ceph_set_cached_acl(struct
> > > inode *inode,
> > >   	struct ceph_inode_info *ci = ceph_inode(inode);
> > >   
> > >   	spin_lock(&ci->i_ceph_lock);
> > > +	__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
> > > +
> > >   	if (__ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED,
> > > 0))
> > >   		set_cached_acl(inode, type, acl);
> > >   	else
> > > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> > > index 7ab616601141..29d4513eff8c 100644
> > > --- a/fs/ceph/addr.c
> > > +++ b/fs/ceph/addr.c
> > > @@ -1706,6 +1706,8 @@ int ceph_uninline_data(struct file *filp,
> > > struct page *locked_page)
> > >   			err = -ENOMEM;
> > >   			goto out;
> > >   		}
> > > +
> > > +		ceph_caps_metric(ci, CEPH_STAT_CAP_INLINE_DATA);
> > Should a check for inline data really count here?
> Currently all the INLINE_DATA is in 'force' mode, so we can ignore
> it.
> > >   		err = __ceph_do_getattr(inode, page,
> > >   					CEPH_STAT_CAP_INLINE_DA
> > > TA, true);
> > >   		if (err < 0) {
> > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > > index 7fc87b693ba4..af2e9e826f8c 100644
> > > --- a/fs/ceph/caps.c
> > > +++ b/fs/ceph/caps.c
> > > @@ -783,6 +783,75 @@ static int __cap_is_valid(struct ceph_cap
> > > *cap)
> > >   	return 1;
> > >   }
> > >   
> > > +/*
> > > + * Counts the cap metric.
> > > + */
> > This needs some comments. Specifically, what should this be
> > counting and
> > how?
> 
> Will add it.
> 
> The __ceph_caps_metric() will traverse the inode's i_caps try to get
> the 
> 'issued' excepting the 'invoking' caps until it get enough caps in 
> 'mask'. The i_caps traverse logic is following
> __ceph_caps_issued_mask().
> 
> 
> > > +void __ceph_caps_metric(struct ceph_inode_info *ci, int mask)
> > > +{
> > > +	int have = ci->i_snap_caps;
> > > +	struct ceph_mds_session *s;
> > > +	struct ceph_cap *cap;
> > > +	struct rb_node *p;
> > > +	bool skip_auth = false;
> > > +
> > > +	lockdep_assert_held(&ci->i_ceph_lock);
> > > +
> > > +	if (mask <= 0)
> > > +		return;
> > > +
> > > +	/* Counts the snap caps metric in the auth cap */
> > > +	if (ci->i_auth_cap) {
> > > +		cap = ci->i_auth_cap;
> > > +		if (have) {
> > > +			have |= cap->issued;
> > > +
> > > +			dout("%s %p cap %p issued %s, mask %s\n",
> > > __func__,
> > > +			     &ci->vfs_inode, cap, ceph_cap_string(cap-
> > > >issued),
> > > +			     ceph_cap_string(mask));
> > > +
> > > +			s = ceph_get_mds_session(cap->session);
> > > +			if (s) {
> > > +				if (mask & have)
> > > +					percpu_counter_inc(&s-
> > > >i_caps_hit);
> > > +				else
> > > +					percpu_counter_inc(&s-
> > > >i_caps_mis);
> > > +				ceph_put_mds_session(s);
> > > +			}
> > > +			skip_auth = true;
> > > +		}
> > > +	}
> > > +
> > > +	if ((mask & have) == mask)
> > > +		return;
> > > +
> > > +	/* Checks others */
> > > +	for (p = rb_first(&ci->i_caps); p; p = rb_next(p)) {
> > > +		cap = rb_entry(p, struct ceph_cap, ci_node);
> > > +		if (!__cap_is_valid(cap))
> > > +			continue;
> > > +
> > > +		if (skip_auth && cap == ci->i_auth_cap)
> > > +			continue;
> > > +
> > > +		dout("%s %p cap %p issued %s, mask %s\n", __func__,
> > > +		     &ci->vfs_inode, cap, ceph_cap_string(cap->issued),
> > > +		     ceph_cap_string(mask));
> > > +
> > > +		s = ceph_get_mds_session(cap->session);
> > > +		if (s) {
> > > +			if (mask & cap->issued)
> > > +				percpu_counter_inc(&s->i_caps_hit);
> > > +			else
> > > +				percpu_counter_inc(&s->i_caps_mis);
> > > +			ceph_put_mds_session(s);
> > > +		}
> > > +
> > > +		have |= cap->issued;
> > > +		if ((mask & have) == mask)
> > > +			return;
> > > +	}
> > > +}
> > > +
> > I'm trying to understand what happens with the above when more than
> > one
> > ceph_cap has the same bit set in "issued". For instance:
> > 
> > Suppose we're doing the check for a statx call, and we're trying to
> > get
> > caps for pAsFsLs. We have two MDS's and they've each granted us
> > caps for
> > the inode, say:
> > 
> > MDS 0: pAs
> > MDS 1: pAsLsFs
> > 
> > We check the cap0 first, and consider it a hit, and then we check
> > cap1
> > and consider it a hit as well. So that seems like it's being
> > double-
> > counted.
> 
> Yeah, it will.
> 
> In case2:
> 
> MDS 0: pAsFs
> 
> MDS 1: pAsLs
> 
> For this case and yours both the i_cap0 and i_cap1 are 'hit'.
> 
> 
> In case3 :
> 
> MDS0: pAsFsLs
> 
> MDS1: pAs
> 
> Only the i_cap0 is 'hit'.  i_cap1 will not be counted any 'hit' or
> 'mis'.
> 
> 
> In case4:
> 
> MDS0: p
> 
> MDS1: pAsLsFs
> 
> i_cap0 will 'mis' and i_cap1 will 'hit'.
> 
> 
> All the logic are the same with __ceph_caps_issued_mask() does.
> 
> The 'hit' means to get all the caps in 'mask we have checked the 
> i_cap[0~N] and if they have a subset in 'mask', and the 'mis' means
> we 
> have checked the i_cap[0~N] but they do not.
> 
> For the i_cap[N+1 ~ M], if we won't touch them because we have
> already 
> gotten enough caps needed in 'mask', they won't count any 'hit' or
> 'mis'
> 
> All in all, the current logic is that the 'hit' means 'touched' and
> it 
> have some of what we needed, and the 'mis' means 'touched' and
> missed 
> any of what we needed.
> 
> 

That seems sort of arbitrary, given that you're going to get different
results depending on the index of the MDS with the caps. For instance:


MDS0: pAsLsFs
MDS1: pAs

...vs...

MDS0: pAs
MDS1: pAsLsFs

If we assume we're looking for pAsLsFs, then the first scenario will
just end up with 1 hit and the second will give you 2. AFAIU, the two
MDSs are peers, so it really seems like the index should not matter
here.

I'm really struggling to understand how these numbers will be useful.
What, specifically, are we trying to count here and why?

> 
> > ISTM, that what you really want to do here is logically or all of
> > the
> > cap->issued fields together, and then check that vs. the mask
> > value, and
> > count only one hit or miss per inode.
> > 
> > That said, it's not 100% clear what you're counting as a hit or
> > miss
> > here, so please let me know if I have that wrong.
> >   
> > >   /*
> > >    * Return set of valid cap bits issued to us.  Note that caps
> > > time
> > >    * out, and may be invalidated in bulk if the client session
> > > times out
> > > @@ -2746,6 +2815,7 @@ static void check_max_size(struct inode
> > > *inode, loff_t endoff)
> > >   int ceph_try_get_caps(struct inode *inode, int need, int want,
> > >   		      bool nonblock, int *got)
> > >   {
> > > +	struct ceph_inode_info *ci = ceph_inode(inode);
> > >   	int ret;
> > >   
> > >   	BUG_ON(need & ~CEPH_CAP_FILE_RD);
> > > @@ -2758,6 +2828,7 @@ int ceph_try_get_caps(struct inode *inode,
> > > int need, int want,
> > >   	BUG_ON(want & ~(CEPH_CAP_FILE_CACHE |
> > > CEPH_CAP_FILE_LAZYIO |
> > >   			CEPH_CAP_FILE_SHARED |
> > > CEPH_CAP_FILE_EXCL |
> > >   			CEPH_CAP_ANY_DIR_OPS));
> > > +	ceph_caps_metric(ci, need | want);
> > >   	ret = try_get_cap_refs(inode, need, want, 0, nonblock,
> > > got);
> > >   	return ret == -EAGAIN ? 0 : ret;
> > >   }
> > > @@ -2784,6 +2855,8 @@ int ceph_get_caps(struct file *filp, int
> > > need, int want,
> > >   	    fi->filp_gen != READ_ONCE(fsc->filp_gen))
> > >   		return -EBADF;
> > >   
> > > +	ceph_caps_metric(ci, need | want);
> > > +
> > >   	while (true) {
> > >   		if (endoff > 0)
> > >   			check_max_size(inode, endoff);
> > > @@ -2871,6 +2944,7 @@ int ceph_get_caps(struct file *filp, int
> > > need, int want,
> > >   			 * getattr request will bring inline
> > > data into
> > >   			 * page cache
> > >   			 */
> > > +			ceph_caps_metric(ci,
> > > CEPH_STAT_CAP_INLINE_DATA);
> > >   			ret = __ceph_do_getattr(inode, NULL,
> > >   						CEPH_STAT_CAP_I
> > > NLINE_DATA,
> > >   						true);
> > > diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
> > > index 40a22da0214a..c132fdb40d53 100644
> > > --- a/fs/ceph/debugfs.c
> > > +++ b/fs/ceph/debugfs.c
> > > @@ -128,6 +128,7 @@ static int metric_show(struct seq_file *s,
> > > void *p)
> > >   {
> > >   	struct ceph_fs_client *fsc = s->private;
> > >   	struct ceph_mds_client *mdsc = fsc->mdsc;
> > > +	int i;
> > >   
> > >   	seq_printf(s,
> > > "item          total           miss            hit\n");
> > >   	seq_printf(s, "--------------------------------------
> > > -----------\n");
> > > @@ -137,6 +138,25 @@ static int metric_show(struct seq_file *s,
> > > void *p)
> > >   		   percpu_counter_sum(&mdsc-
> > > >metric.d_lease_mis),
> > >   		   percpu_counter_sum(&mdsc-
> > > >metric.d_lease_hit));
> > >   
> > > +	seq_printf(s, "\n");
> > > +	seq_printf(s,
> > > "session       caps            miss            hit\n");
> > > +	seq_printf(s, "----------------------------------------------
> > > ---\n");
> > > +
> > > +	mutex_lock(&mdsc->mutex);
> > > +	for (i = 0; i < mdsc->max_sessions; i++) {
> > > +		struct ceph_mds_session *session;
> > > +
> > > +		session = __ceph_lookup_mds_session(mdsc, i);
> > > +		if (!session)
> > > +			continue;
> > > +		seq_printf(s, "%-14d%-16d%-16lld%lld\n", i,
> > > +			   session->s_nr_caps,
> > > +			   percpu_counter_sum(&session->i_caps_mis),
> > > +			   percpu_counter_sum(&session->i_caps_hit));
> > > +		ceph_put_mds_session(session);
> > > +	}
> > > +	mutex_unlock(&mdsc->mutex);
> > > +
> > >   	return 0;
> > >   }
> > >   
> > > diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> > > index 658c55b323cc..33eb239e09e2 100644
> > > --- a/fs/ceph/dir.c
> > > +++ b/fs/ceph/dir.c
> > > @@ -313,7 +313,7 @@ static int ceph_readdir(struct file *file,
> > > struct dir_context *ctx)
> > >   	struct ceph_fs_client *fsc =
> > > ceph_inode_to_client(inode);
> > >   	struct ceph_mds_client *mdsc = fsc->mdsc;
> > >   	int i;
> > > -	int err;
> > > +	int err, ret = -1;
> > >   	unsigned frag = -1;
> > >   	struct ceph_mds_reply_info_parsed *rinfo;
> > >   
> > > @@ -346,13 +346,16 @@ static int ceph_readdir(struct file *file,
> > > struct dir_context *ctx)
> > >   	    !ceph_test_mount_opt(fsc, NOASYNCREADDIR) &&
> > >   	    ceph_snap(inode) != CEPH_SNAPDIR &&
> > >   	    __ceph_dir_is_complete_ordered(ci) &&
> > > -	    __ceph_caps_issued_mask(ci, CEPH_CAP_FILE_SHARED, 1)) {
> > > +	    (ret = __ceph_caps_issued_mask(ci, CEPH_CAP_FILE_SHARED,
> > > 1))) {
> > >   		int shared_gen = atomic_read(&ci-
> > > >i_shared_gen);
> > > +		__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
> > >   		spin_unlock(&ci->i_ceph_lock);
> > >   		err = __dcache_readdir(file, ctx, shared_gen);
> > >   		if (err != -EAGAIN)
> > >   			return err;
> > >   	} else {
> > > +		if (ret != -1)
> > > +			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
> > >   		spin_unlock(&ci->i_ceph_lock);
> > >   	}
> > >   
> > > @@ -757,6 +760,8 @@ static struct dentry *ceph_lookup(struct
> > > inode *dir, struct dentry *dentry,
> > >   		struct ceph_dentry_info *di =
> > > ceph_dentry(dentry);
> > >   
> > >   		spin_lock(&ci->i_ceph_lock);
> > > +		__ceph_caps_metric(ci, CEPH_CAP_FILE_SHARED);
> > > +
> > >   		dout(" dir %p flags are %d\n", dir, ci-
> > > >i_ceph_flags);
> > >   		if (strncmp(dentry->d_name.name,
> > >   			    fsc->mount_options->snapdir_name,
> > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > > index 1e6cdf2dfe90..c78dfbbb7b91 100644
> > > --- a/fs/ceph/file.c
> > > +++ b/fs/ceph/file.c
> > > @@ -384,6 +384,8 @@ int ceph_open(struct inode *inode, struct
> > > file *file)
> > >   	 * asynchronously.
> > >   	 */
> > >   	spin_lock(&ci->i_ceph_lock);
> > > +	__ceph_caps_metric(ci, wanted);
> > > +
> > >   	if (__ceph_is_any_real_caps(ci) &&
> > >   	    (((fmode & CEPH_FILE_MODE_WR) == 0) || ci-
> > > >i_auth_cap)) {
> > >   		int mds_wanted = __ceph_caps_mds_wanted(ci,
> > > true);
> > > @@ -1340,6 +1342,7 @@ static ssize_t ceph_read_iter(struct kiocb
> > > *iocb, struct iov_iter *to)
> > >   				return -ENOMEM;
> > >   		}
> > >   
> > > +		ceph_caps_metric(ci, CEPH_STAT_CAP_INLINE_DATA);
> > >   		statret = __ceph_do_getattr(inode, page,
> > >   					    CEPH_STAT_CAP_INLIN
> > > E_DATA, !!page);
> > >   		if (statret < 0) {
> > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > > index a24fd00676b8..141c1c03636c 100644
> > > --- a/fs/ceph/mds_client.c
> > > +++ b/fs/ceph/mds_client.c
> > > @@ -558,6 +558,8 @@ void ceph_put_mds_session(struct
> > > ceph_mds_session *s)
> > >   	if (refcount_dec_and_test(&s->s_ref)) {
> > >   		if (s->s_auth.authorizer)
> > >   			ceph_auth_destroy_authorizer(s-
> > > >s_auth.authorizer);
> > > +		percpu_counter_destroy(&s->i_caps_hit);
> > > +		percpu_counter_destroy(&s->i_caps_mis);
> > >   		kfree(s);
> > >   	}
> > >   }
> > > @@ -598,6 +600,7 @@ static struct ceph_mds_session
> > > *register_session(struct ceph_mds_client *mdsc,
> > >   						 int mds)
> > >   {
> > >   	struct ceph_mds_session *s;
> > > +	int err;
> > >   
> > >   	if (mds >= mdsc->mdsmap->possible_max_rank)
> > >   		return ERR_PTR(-EINVAL);
> > > @@ -612,8 +615,10 @@ static struct ceph_mds_session
> > > *register_session(struct ceph_mds_client *mdsc,
> > >   
> > >   		dout("%s: realloc to %d\n", __func__, newmax);
> > >   		sa = kcalloc(newmax, sizeof(void *), GFP_NOFS);
> > > -		if (!sa)
> > > +		if (!sa) {
> > > +			err = -ENOMEM;
> > >   			goto fail_realloc;
> > > +		}
> > >   		if (mdsc->sessions) {
> > >   			memcpy(sa, mdsc->sessions,
> > >   			       mdsc->max_sessions * sizeof(void
> > > *));
> > > @@ -653,6 +658,13 @@ static struct ceph_mds_session
> > > *register_session(struct ceph_mds_client *mdsc,
> > >   
> > >   	INIT_LIST_HEAD(&s->s_cap_flushing);
> > >   
> > > +	err = percpu_counter_init(&s->i_caps_hit, 0, GFP_NOFS);
> > > +	if (err)
> > > +		goto fail_realloc;
> > > +	err = percpu_counter_init(&s->i_caps_mis, 0, GFP_NOFS);
> > > +	if (err)
> > > +		goto fail_init;
> > > +
> > >   	mdsc->sessions[mds] = s;
> > >   	atomic_inc(&mdsc->num_sessions);
> > >   	refcount_inc(&s->s_ref);  /* one ref to sessions[], one
> > > to caller */
> > > @@ -662,6 +674,8 @@ static struct ceph_mds_session
> > > *register_session(struct ceph_mds_client *mdsc,
> > >   
> > >   	return s;
> > >   
> > > +fail_init:
> > > +	percpu_counter_destroy(&s->i_caps_hit);
> > >   fail_realloc:
> > >   	kfree(s);
> > >   	return ERR_PTR(-ENOMEM);
> > > diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> > > index dd1f417b90eb..ba74ff74c59c 100644
> > > --- a/fs/ceph/mds_client.h
> > > +++ b/fs/ceph/mds_client.h
> > > @@ -201,6 +201,9 @@ struct ceph_mds_session {
> > >   
> > >   	struct list_head  s_waiting;  /* waiting requests */
> > >   	struct list_head  s_unsafe;   /* unsafe requests */
> > > +
> > > +	struct percpu_counter i_caps_hit;
> > > +	struct percpu_counter i_caps_mis;
> > >   };
> > >   
> > >   /*
> > > diff --git a/fs/ceph/quota.c b/fs/ceph/quota.c
> > > index de56dee60540..4ce2f658e63d 100644
> > > --- a/fs/ceph/quota.c
> > > +++ b/fs/ceph/quota.c
> > > @@ -147,9 +147,14 @@ static struct inode
> > > *lookup_quotarealm_inode(struct ceph_mds_client *mdsc,
> > >   		return NULL;
> > >   	}
> > >   	if (qri->inode) {
> > > +		struct ceph_inode_info *ci = ceph_inode(qri->inode);
> > > +		int ret;
> > > +
> > > +		ceph_caps_metric(ci, CEPH_STAT_CAP_INODE);
> > > +
> > >   		/* get caps */
> > > -		int ret = __ceph_do_getattr(qri->inode, NULL,
> > > -					    CEPH_STAT_CAP_INODE, true);
> > > +		ret = __ceph_do_getattr(qri->inode, NULL,
> > > +					CEPH_STAT_CAP_INODE, true);
> > >   		if (ret >= 0)
> > >   			in = qri->inode;
> > >   		else
> > > diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> > > index 7af91628636c..3f4829222528 100644
> > > --- a/fs/ceph/super.h
> > > +++ b/fs/ceph/super.h
> > > @@ -641,6 +641,14 @@ static inline bool
> > > __ceph_is_any_real_caps(struct ceph_inode_info *ci)
> > >   	return !RB_EMPTY_ROOT(&ci->i_caps);
> > >   }
> > >   
> > > +extern void __ceph_caps_metric(struct ceph_inode_info *ci, int
> > > mask);
> > > +static inline void ceph_caps_metric(struct ceph_inode_info *ci,
> > > int mask)
> > > +{
> > > +	spin_lock(&ci->i_ceph_lock);
> > > +	__ceph_caps_metric(ci, mask);
> > > +	spin_unlock(&ci->i_ceph_lock);
> > > +}
> > > +
> > >   extern int __ceph_caps_issued(struct ceph_inode_info *ci, int
> > > *implemented);
> > >   extern int __ceph_caps_issued_mask(struct ceph_inode_info *ci,
> > > int mask, int t);
> > >   extern int __ceph_caps_issued_other(struct ceph_inode_info *ci,
> > > @@ -927,6 +935,9 @@ extern int __ceph_do_getattr(struct inode
> > > *inode, struct page *locked_page,
> > >   			     int mask, bool force);
> > >   static inline int ceph_do_getattr(struct inode *inode, int
> > > mask, bool force)
> > >   {
> > > +	struct ceph_inode_info *ci = ceph_inode(inode);
> > > +
> > > +	ceph_caps_metric(ci, mask);
> > >   	return __ceph_do_getattr(inode, NULL, mask, force);
> > >   }
> > >   extern int ceph_permission(struct inode *inode, int mask);
> > > diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
> > > index d58fa14c1f01..ebd522edb0a8 100644
> > > --- a/fs/ceph/xattr.c
> > > +++ b/fs/ceph/xattr.c
> > > @@ -829,6 +829,7 @@ ssize_t __ceph_getxattr(struct inode *inode,
> > > const char *name, void *value,
> > >   	struct ceph_vxattr *vxattr = NULL;
> > >   	int req_mask;
> > >   	ssize_t err;
> > > +	int ret = -1;
> > >   
> > >   	/* let's see if a virtual xattr was requested */
> > >   	vxattr = ceph_match_vxattr(inode, name);
> > > @@ -856,7 +857,9 @@ ssize_t __ceph_getxattr(struct inode *inode,
> > > const char *name, void *value,
> > >   
> > >   	if (ci->i_xattrs.version == 0 ||
> > >   	    !((req_mask & CEPH_CAP_XATTR_SHARED) ||
> > > -	      __ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 1))) {
> > > +	      (ret = __ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED,
> > > 1)))) {
> > > +		if (ret != -1)
> > > +			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
> > >   		spin_unlock(&ci->i_ceph_lock);
> > >   
> > >   		/* security module gets xattr while filling
> > > trace */
> > > @@ -871,6 +874,9 @@ ssize_t __ceph_getxattr(struct inode *inode,
> > > const char *name, void *value,
> > >   		if (err)
> > >   			return err;
> > >   		spin_lock(&ci->i_ceph_lock);
> > > +	} else {
> > > +		if (ret != -1)
> > > +			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
> > >   	}
> > >   
> > >   	err = __build_xattrs(inode);
> > > @@ -907,19 +913,24 @@ ssize_t ceph_listxattr(struct dentry
> > > *dentry, char *names, size_t size)
> > >   	struct ceph_inode_info *ci = ceph_inode(inode);
> > >   	bool len_only = (size == 0);
> > >   	u32 namelen;
> > > -	int err;
> > > +	int err, ret = -1;
> > >   
> > >   	spin_lock(&ci->i_ceph_lock);
> > >   	dout("listxattr %p ver=%lld index_ver=%lld\n", inode,
> > >   	     ci->i_xattrs.version, ci->i_xattrs.index_version);
> > >   
> > >   	if (ci->i_xattrs.version == 0 ||
> > > -	    !__ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 1)) {
> > > +	    !(ret = __ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED,
> > > 1))) {
> > > +		if (ret != -1)
> > > +			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
> > >   		spin_unlock(&ci->i_ceph_lock);
> > >   		err = ceph_do_getattr(inode,
> > > CEPH_STAT_CAP_XATTR, true);
> > >   		if (err)
> > >   			return err;
> > >   		spin_lock(&ci->i_ceph_lock);
> > > +	} else {
> > > +		if (ret != -1)
> > > +			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
> > >   	}
> > >   
> > >   	err = __build_xattrs(inode);
> 
> 
-- 
Jeffrey Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH resend v5 02/11] ceph: add caps perf metric for each session
  2020-01-30 19:00       ` Jeffrey Layton
@ 2020-01-31  1:34         ` Xiubo Li
  2020-01-31  9:02           ` Xiubo Li
  0 siblings, 1 reply; 31+ messages in thread
From: Xiubo Li @ 2020-01-31  1:34 UTC (permalink / raw)
  To: Jeffrey Layton, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel

On 2020/1/31 3:00, Jeffrey Layton wrote:
> On Thu, 2020-01-30 at 10:22 +0800, Xiubo Li wrote:
>> On 2020/1/29 22:21, Jeff Layton wrote:
>>> On Wed, 2020-01-29 at 03:27 -0500, xiubli@redhat.com wrote:
>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>
>>>> This will fulfill the caps hit/miss metric for each session. When
>>>> checking the "need" mask and if one cap has the subset of the
>>>> "need"
>>>> mask it means hit, or missed.
>>>>
>>>> item          total           miss            hit
>>>> -------------------------------------------------
>>>> d_lease       295             0               993
>>>>
>>>> session       caps            miss            hit
>>>> -------------------------------------------------
>>>> 0             295             107             4119
>>>> 1             1               107             9
>>>>
>>>> URL: https://tracker.ceph.com/issues/43215
>>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>>> ---
>>>>    fs/ceph/acl.c        |  2 ++
>>>>    fs/ceph/addr.c       |  2 ++
>>>>    fs/ceph/caps.c       | 74
>>>> ++++++++++++++++++++++++++++++++++++++++++++
>>>>    fs/ceph/debugfs.c    | 20 ++++++++++++
>>>>    fs/ceph/dir.c        |  9 ++++--
>>>>    fs/ceph/file.c       |  3 ++
>>>>    fs/ceph/mds_client.c | 16 +++++++++-
>>>>    fs/ceph/mds_client.h |  3 ++
>>>>    fs/ceph/quota.c      |  9 ++++--
>>>>    fs/ceph/super.h      | 11 +++++++
>>>>    fs/ceph/xattr.c      | 17 ++++++++--
>>>>    11 files changed, 158 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/fs/ceph/acl.c b/fs/ceph/acl.c
>>>> index 26be6520d3fb..58e119e3519f 100644
>>>> --- a/fs/ceph/acl.c
>>>> +++ b/fs/ceph/acl.c
>>>> @@ -22,6 +22,8 @@ static inline void ceph_set_cached_acl(struct
>>>> inode *inode,
>>>>    	struct ceph_inode_info *ci = ceph_inode(inode);
>>>>    
>>>>    	spin_lock(&ci->i_ceph_lock);
>>>> +	__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>>>> +
>>>>    	if (__ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED,
>>>> 0))
>>>>    		set_cached_acl(inode, type, acl);
>>>>    	else
>>>> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
>>>> index 7ab616601141..29d4513eff8c 100644
>>>> --- a/fs/ceph/addr.c
>>>> +++ b/fs/ceph/addr.c
>>>> @@ -1706,6 +1706,8 @@ int ceph_uninline_data(struct file *filp,
>>>> struct page *locked_page)
>>>>    			err = -ENOMEM;
>>>>    			goto out;
>>>>    		}
>>>> +
>>>> +		ceph_caps_metric(ci, CEPH_STAT_CAP_INLINE_DATA);
>>> Should a check for inline data really count here?
>> Currently all the INLINE_DATA is in 'force' mode, so we can ignore
>> it.
>>>>    		err = __ceph_do_getattr(inode, page,
>>>>    					CEPH_STAT_CAP_INLINE_DA
>>>> TA, true);
>>>>    		if (err < 0) {
>>>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
>>>> index 7fc87b693ba4..af2e9e826f8c 100644
>>>> --- a/fs/ceph/caps.c
>>>> +++ b/fs/ceph/caps.c
>>>> @@ -783,6 +783,75 @@ static int __cap_is_valid(struct ceph_cap
>>>> *cap)
>>>>    	return 1;
>>>>    }
>>>>    
>>>> +/*
>>>> + * Counts the cap metric.
>>>> + */
>>> This needs some comments. Specifically, what should this be
>>> counting and
>>> how?
>> Will add it.
>>
>> The __ceph_caps_metric() will traverse the inode's i_caps try to get
>> the
>> 'issued' excepting the 'invoking' caps until it get enough caps in
>> 'mask'. The i_caps traverse logic is following
>> __ceph_caps_issued_mask().
>>
>>
>>>> +void __ceph_caps_metric(struct ceph_inode_info *ci, int mask)
>>>> +{
>>>> +	int have = ci->i_snap_caps;
>>>> +	struct ceph_mds_session *s;
>>>> +	struct ceph_cap *cap;
>>>> +	struct rb_node *p;
>>>> +	bool skip_auth = false;
>>>> +
>>>> +	lockdep_assert_held(&ci->i_ceph_lock);
>>>> +
>>>> +	if (mask <= 0)
>>>> +		return;
>>>> +
>>>> +	/* Counts the snap caps metric in the auth cap */
>>>> +	if (ci->i_auth_cap) {
>>>> +		cap = ci->i_auth_cap;
>>>> +		if (have) {
>>>> +			have |= cap->issued;
>>>> +
>>>> +			dout("%s %p cap %p issued %s, mask %s\n",
>>>> __func__,
>>>> +			     &ci->vfs_inode, cap, ceph_cap_string(cap-
>>>>> issued),
>>>> +			     ceph_cap_string(mask));
>>>> +
>>>> +			s = ceph_get_mds_session(cap->session);
>>>> +			if (s) {
>>>> +				if (mask & have)
>>>> +					percpu_counter_inc(&s-
>>>>> i_caps_hit);
>>>> +				else
>>>> +					percpu_counter_inc(&s-
>>>>> i_caps_mis);
>>>> +				ceph_put_mds_session(s);
>>>> +			}
>>>> +			skip_auth = true;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	if ((mask & have) == mask)
>>>> +		return;
>>>> +
>>>> +	/* Checks others */
>>>> +	for (p = rb_first(&ci->i_caps); p; p = rb_next(p)) {
>>>> +		cap = rb_entry(p, struct ceph_cap, ci_node);
>>>> +		if (!__cap_is_valid(cap))
>>>> +			continue;
>>>> +
>>>> +		if (skip_auth && cap == ci->i_auth_cap)
>>>> +			continue;
>>>> +
>>>> +		dout("%s %p cap %p issued %s, mask %s\n", __func__,
>>>> +		     &ci->vfs_inode, cap, ceph_cap_string(cap->issued),
>>>> +		     ceph_cap_string(mask));
>>>> +
>>>> +		s = ceph_get_mds_session(cap->session);
>>>> +		if (s) {
>>>> +			if (mask & cap->issued)
>>>> +				percpu_counter_inc(&s->i_caps_hit);
>>>> +			else
>>>> +				percpu_counter_inc(&s->i_caps_mis);
>>>> +			ceph_put_mds_session(s);
>>>> +		}
>>>> +
>>>> +		have |= cap->issued;
>>>> +		if ((mask & have) == mask)
>>>> +			return;
>>>> +	}
>>>> +}
>>>> +
>>> I'm trying to understand what happens with the above when more than
>>> one
>>> ceph_cap has the same bit set in "issued". For instance:
>>>
>>> Suppose we're doing the check for a statx call, and we're trying to
>>> get
>>> caps for pAsFsLs. We have two MDS's and they've each granted us
>>> caps for
>>> the inode, say:
>>>
>>> MDS 0: pAs
>>> MDS 1: pAsLsFs
>>>
>>> We check the cap0 first, and consider it a hit, and then we check
>>> cap1
>>> and consider it a hit as well. So that seems like it's being
>>> double-
>>> counted.
>> Yeah, it will.
>>
>> In case2:
>>
>> MDS 0: pAsFs
>>
>> MDS 1: pAsLs
>>
>> For this case and yours both the i_cap0 and i_cap1 are 'hit'.
>>
>>
>> In case3 :
>>
>> MDS0: pAsFsLs
>>
>> MDS1: pAs
>>
>> Only the i_cap0 is 'hit'.  i_cap1 will not be counted any 'hit' or
>> 'mis'.
>>
>>
>> In case4:
>>
>> MDS0: p
>>
>> MDS1: pAsLsFs
>>
>> i_cap0 will 'mis' and i_cap1 will 'hit'.
>>
>>
>> All the logic are the same with __ceph_caps_issued_mask() does.
>>
>> The 'hit' means to get all the caps in 'mask we have checked the
>> i_cap[0~N] and if they have a subset in 'mask', and the 'mis' means
>> we
>> have checked the i_cap[0~N] but they do not.
>>
>> For the i_cap[N+1 ~ M], if we won't touch them because we have
>> already
>> gotten enough caps needed in 'mask', they won't count any 'hit' or
>> 'mis'
>>
>> All in all, the current logic is that the 'hit' means 'touched' and
>> it
>> have some of what we needed, and the 'mis' means 'touched' and
>> missed
>> any of what we needed.
>>
>>
> That seems sort of arbitrary, given that you're going to get different
> results depending on the index of the MDS with the caps. For instance:
>
>
> MDS0: pAsLsFs
> MDS1: pAs
>
> ...vs...
>
> MDS0: pAs
> MDS1: pAsLsFs
>
> If we assume we're looking for pAsLsFs, then the first scenario will
> just end up with 1 hit and the second will give you 2. AFAIU, the two
> MDSs are peers, so it really seems like the index should not matter
> here.
>
> I'm really struggling to understand how these numbers will be useful.
> What, specifically, are we trying to count here and why?

Maybe we need count the hit/mis only once, the fake code like:

// Case1: check the auth caps first

if (auth_cap & mask == mask) {

     s->hit++;

     return;

}

// Case2: check all the other one by one

for (caps : i_caps) {

     if (caps & mask == mask) {

         s->hit++;

         return;

     }

     c |= caps;

}

// Case3:

if (c & mask == mask)

     s->hit++;

else

     s->mis++;

return;

....

And for the session 's->' here, if one i_cap can hold all the 'mask' 
requested, like the Case1 and Case2 it will be i_cap's corresponding 
session. Or for Case3 we could choose any session.

But the above is still not very graceful of counting the cap metrics too.

IMO, for the cap hit/miss counter should be a global one just like the 
dentry_lease does in [PATCH 01/11], will this make sense ?

Thanks,


>>> ISTM, that what you really want to do here is logically or all of
>>> the
>>> cap->issued fields together, and then check that vs. the mask
>>> value, and
>>> count only one hit or miss per inode.
>>>
>>> That said, it's not 100% clear what you're counting as a hit or
>>> miss
>>> here, so please let me know if I have that wrong.
>>>    
>>>>    /*
>>>>     * Return set of valid cap bits issued to us.  Note that caps
>>>> time
>>>>     * out, and may be invalidated in bulk if the client session
>>>> times out
>>>> @@ -2746,6 +2815,7 @@ static void check_max_size(struct inode
>>>> *inode, loff_t endoff)
>>>>    int ceph_try_get_caps(struct inode *inode, int need, int want,
>>>>    		      bool nonblock, int *got)
>>>>    {
>>>> +	struct ceph_inode_info *ci = ceph_inode(inode);
>>>>    	int ret;
>>>>    
>>>>    	BUG_ON(need & ~CEPH_CAP_FILE_RD);
>>>> @@ -2758,6 +2828,7 @@ int ceph_try_get_caps(struct inode *inode,
>>>> int need, int want,
>>>>    	BUG_ON(want & ~(CEPH_CAP_FILE_CACHE |
>>>> CEPH_CAP_FILE_LAZYIO |
>>>>    			CEPH_CAP_FILE_SHARED |
>>>> CEPH_CAP_FILE_EXCL |
>>>>    			CEPH_CAP_ANY_DIR_OPS));
>>>> +	ceph_caps_metric(ci, need | want);
>>>>    	ret = try_get_cap_refs(inode, need, want, 0, nonblock,
>>>> got);
>>>>    	return ret == -EAGAIN ? 0 : ret;
>>>>    }
>>>> @@ -2784,6 +2855,8 @@ int ceph_get_caps(struct file *filp, int
>>>> need, int want,
>>>>    	    fi->filp_gen != READ_ONCE(fsc->filp_gen))
>>>>    		return -EBADF;
>>>>    
>>>> +	ceph_caps_metric(ci, need | want);
>>>> +
>>>>    	while (true) {
>>>>    		if (endoff > 0)
>>>>    			check_max_size(inode, endoff);
>>>> @@ -2871,6 +2944,7 @@ int ceph_get_caps(struct file *filp, int
>>>> need, int want,
>>>>    			 * getattr request will bring inline
>>>> data into
>>>>    			 * page cache
>>>>    			 */
>>>> +			ceph_caps_metric(ci,
>>>> CEPH_STAT_CAP_INLINE_DATA);
>>>>    			ret = __ceph_do_getattr(inode, NULL,
>>>>    						CEPH_STAT_CAP_I
>>>> NLINE_DATA,
>>>>    						true);
>>>> diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
>>>> index 40a22da0214a..c132fdb40d53 100644
>>>> --- a/fs/ceph/debugfs.c
>>>> +++ b/fs/ceph/debugfs.c
>>>> @@ -128,6 +128,7 @@ static int metric_show(struct seq_file *s,
>>>> void *p)
>>>>    {
>>>>    	struct ceph_fs_client *fsc = s->private;
>>>>    	struct ceph_mds_client *mdsc = fsc->mdsc;
>>>> +	int i;
>>>>    
>>>>    	seq_printf(s,
>>>> "item          total           miss            hit\n");
>>>>    	seq_printf(s, "--------------------------------------
>>>> -----------\n");
>>>> @@ -137,6 +138,25 @@ static int metric_show(struct seq_file *s,
>>>> void *p)
>>>>    		   percpu_counter_sum(&mdsc-
>>>>> metric.d_lease_mis),
>>>>    		   percpu_counter_sum(&mdsc-
>>>>> metric.d_lease_hit));
>>>>    
>>>> +	seq_printf(s, "\n");
>>>> +	seq_printf(s,
>>>> "session       caps            miss            hit\n");
>>>> +	seq_printf(s, "----------------------------------------------
>>>> ---\n");
>>>> +
>>>> +	mutex_lock(&mdsc->mutex);
>>>> +	for (i = 0; i < mdsc->max_sessions; i++) {
>>>> +		struct ceph_mds_session *session;
>>>> +
>>>> +		session = __ceph_lookup_mds_session(mdsc, i);
>>>> +		if (!session)
>>>> +			continue;
>>>> +		seq_printf(s, "%-14d%-16d%-16lld%lld\n", i,
>>>> +			   session->s_nr_caps,
>>>> +			   percpu_counter_sum(&session->i_caps_mis),
>>>> +			   percpu_counter_sum(&session->i_caps_hit));
>>>> +		ceph_put_mds_session(session);
>>>> +	}
>>>> +	mutex_unlock(&mdsc->mutex);
>>>> +
>>>>    	return 0;
>>>>    }
>>>>    
>>>> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
>>>> index 658c55b323cc..33eb239e09e2 100644
>>>> --- a/fs/ceph/dir.c
>>>> +++ b/fs/ceph/dir.c
>>>> @@ -313,7 +313,7 @@ static int ceph_readdir(struct file *file,
>>>> struct dir_context *ctx)
>>>>    	struct ceph_fs_client *fsc =
>>>> ceph_inode_to_client(inode);
>>>>    	struct ceph_mds_client *mdsc = fsc->mdsc;
>>>>    	int i;
>>>> -	int err;
>>>> +	int err, ret = -1;
>>>>    	unsigned frag = -1;
>>>>    	struct ceph_mds_reply_info_parsed *rinfo;
>>>>    
>>>> @@ -346,13 +346,16 @@ static int ceph_readdir(struct file *file,
>>>> struct dir_context *ctx)
>>>>    	    !ceph_test_mount_opt(fsc, NOASYNCREADDIR) &&
>>>>    	    ceph_snap(inode) != CEPH_SNAPDIR &&
>>>>    	    __ceph_dir_is_complete_ordered(ci) &&
>>>> -	    __ceph_caps_issued_mask(ci, CEPH_CAP_FILE_SHARED, 1)) {
>>>> +	    (ret = __ceph_caps_issued_mask(ci, CEPH_CAP_FILE_SHARED,
>>>> 1))) {
>>>>    		int shared_gen = atomic_read(&ci-
>>>>> i_shared_gen);
>>>> +		__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>>>>    		spin_unlock(&ci->i_ceph_lock);
>>>>    		err = __dcache_readdir(file, ctx, shared_gen);
>>>>    		if (err != -EAGAIN)
>>>>    			return err;
>>>>    	} else {
>>>> +		if (ret != -1)
>>>> +			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>>>>    		spin_unlock(&ci->i_ceph_lock);
>>>>    	}
>>>>    
>>>> @@ -757,6 +760,8 @@ static struct dentry *ceph_lookup(struct
>>>> inode *dir, struct dentry *dentry,
>>>>    		struct ceph_dentry_info *di =
>>>> ceph_dentry(dentry);
>>>>    
>>>>    		spin_lock(&ci->i_ceph_lock);
>>>> +		__ceph_caps_metric(ci, CEPH_CAP_FILE_SHARED);
>>>> +
>>>>    		dout(" dir %p flags are %d\n", dir, ci-
>>>>> i_ceph_flags);
>>>>    		if (strncmp(dentry->d_name.name,
>>>>    			    fsc->mount_options->snapdir_name,
>>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>>> index 1e6cdf2dfe90..c78dfbbb7b91 100644
>>>> --- a/fs/ceph/file.c
>>>> +++ b/fs/ceph/file.c
>>>> @@ -384,6 +384,8 @@ int ceph_open(struct inode *inode, struct
>>>> file *file)
>>>>    	 * asynchronously.
>>>>    	 */
>>>>    	spin_lock(&ci->i_ceph_lock);
>>>> +	__ceph_caps_metric(ci, wanted);
>>>> +
>>>>    	if (__ceph_is_any_real_caps(ci) &&
>>>>    	    (((fmode & CEPH_FILE_MODE_WR) == 0) || ci-
>>>>> i_auth_cap)) {
>>>>    		int mds_wanted = __ceph_caps_mds_wanted(ci,
>>>> true);
>>>> @@ -1340,6 +1342,7 @@ static ssize_t ceph_read_iter(struct kiocb
>>>> *iocb, struct iov_iter *to)
>>>>    				return -ENOMEM;
>>>>    		}
>>>>    
>>>> +		ceph_caps_metric(ci, CEPH_STAT_CAP_INLINE_DATA);
>>>>    		statret = __ceph_do_getattr(inode, page,
>>>>    					    CEPH_STAT_CAP_INLIN
>>>> E_DATA, !!page);
>>>>    		if (statret < 0) {
>>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>>>> index a24fd00676b8..141c1c03636c 100644
>>>> --- a/fs/ceph/mds_client.c
>>>> +++ b/fs/ceph/mds_client.c
>>>> @@ -558,6 +558,8 @@ void ceph_put_mds_session(struct
>>>> ceph_mds_session *s)
>>>>    	if (refcount_dec_and_test(&s->s_ref)) {
>>>>    		if (s->s_auth.authorizer)
>>>>    			ceph_auth_destroy_authorizer(s-
>>>>> s_auth.authorizer);
>>>> +		percpu_counter_destroy(&s->i_caps_hit);
>>>> +		percpu_counter_destroy(&s->i_caps_mis);
>>>>    		kfree(s);
>>>>    	}
>>>>    }
>>>> @@ -598,6 +600,7 @@ static struct ceph_mds_session
>>>> *register_session(struct ceph_mds_client *mdsc,
>>>>    						 int mds)
>>>>    {
>>>>    	struct ceph_mds_session *s;
>>>> +	int err;
>>>>    
>>>>    	if (mds >= mdsc->mdsmap->possible_max_rank)
>>>>    		return ERR_PTR(-EINVAL);
>>>> @@ -612,8 +615,10 @@ static struct ceph_mds_session
>>>> *register_session(struct ceph_mds_client *mdsc,
>>>>    
>>>>    		dout("%s: realloc to %d\n", __func__, newmax);
>>>>    		sa = kcalloc(newmax, sizeof(void *), GFP_NOFS);
>>>> -		if (!sa)
>>>> +		if (!sa) {
>>>> +			err = -ENOMEM;
>>>>    			goto fail_realloc;
>>>> +		}
>>>>    		if (mdsc->sessions) {
>>>>    			memcpy(sa, mdsc->sessions,
>>>>    			       mdsc->max_sessions * sizeof(void
>>>> *));
>>>> @@ -653,6 +658,13 @@ static struct ceph_mds_session
>>>> *register_session(struct ceph_mds_client *mdsc,
>>>>    
>>>>    	INIT_LIST_HEAD(&s->s_cap_flushing);
>>>>    
>>>> +	err = percpu_counter_init(&s->i_caps_hit, 0, GFP_NOFS);
>>>> +	if (err)
>>>> +		goto fail_realloc;
>>>> +	err = percpu_counter_init(&s->i_caps_mis, 0, GFP_NOFS);
>>>> +	if (err)
>>>> +		goto fail_init;
>>>> +
>>>>    	mdsc->sessions[mds] = s;
>>>>    	atomic_inc(&mdsc->num_sessions);
>>>>    	refcount_inc(&s->s_ref);  /* one ref to sessions[], one
>>>> to caller */
>>>> @@ -662,6 +674,8 @@ static struct ceph_mds_session
>>>> *register_session(struct ceph_mds_client *mdsc,
>>>>    
>>>>    	return s;
>>>>    
>>>> +fail_init:
>>>> +	percpu_counter_destroy(&s->i_caps_hit);
>>>>    fail_realloc:
>>>>    	kfree(s);
>>>>    	return ERR_PTR(-ENOMEM);
>>>> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
>>>> index dd1f417b90eb..ba74ff74c59c 100644
>>>> --- a/fs/ceph/mds_client.h
>>>> +++ b/fs/ceph/mds_client.h
>>>> @@ -201,6 +201,9 @@ struct ceph_mds_session {
>>>>    
>>>>    	struct list_head  s_waiting;  /* waiting requests */
>>>>    	struct list_head  s_unsafe;   /* unsafe requests */
>>>> +
>>>> +	struct percpu_counter i_caps_hit;
>>>> +	struct percpu_counter i_caps_mis;
>>>>    };
>>>>    
>>>>    /*
>>>> diff --git a/fs/ceph/quota.c b/fs/ceph/quota.c
>>>> index de56dee60540..4ce2f658e63d 100644
>>>> --- a/fs/ceph/quota.c
>>>> +++ b/fs/ceph/quota.c
>>>> @@ -147,9 +147,14 @@ static struct inode
>>>> *lookup_quotarealm_inode(struct ceph_mds_client *mdsc,
>>>>    		return NULL;
>>>>    	}
>>>>    	if (qri->inode) {
>>>> +		struct ceph_inode_info *ci = ceph_inode(qri->inode);
>>>> +		int ret;
>>>> +
>>>> +		ceph_caps_metric(ci, CEPH_STAT_CAP_INODE);
>>>> +
>>>>    		/* get caps */
>>>> -		int ret = __ceph_do_getattr(qri->inode, NULL,
>>>> -					    CEPH_STAT_CAP_INODE, true);
>>>> +		ret = __ceph_do_getattr(qri->inode, NULL,
>>>> +					CEPH_STAT_CAP_INODE, true);
>>>>    		if (ret >= 0)
>>>>    			in = qri->inode;
>>>>    		else
>>>> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
>>>> index 7af91628636c..3f4829222528 100644
>>>> --- a/fs/ceph/super.h
>>>> +++ b/fs/ceph/super.h
>>>> @@ -641,6 +641,14 @@ static inline bool
>>>> __ceph_is_any_real_caps(struct ceph_inode_info *ci)
>>>>    	return !RB_EMPTY_ROOT(&ci->i_caps);
>>>>    }
>>>>    
>>>> +extern void __ceph_caps_metric(struct ceph_inode_info *ci, int
>>>> mask);
>>>> +static inline void ceph_caps_metric(struct ceph_inode_info *ci,
>>>> int mask)
>>>> +{
>>>> +	spin_lock(&ci->i_ceph_lock);
>>>> +	__ceph_caps_metric(ci, mask);
>>>> +	spin_unlock(&ci->i_ceph_lock);
>>>> +}
>>>> +
>>>>    extern int __ceph_caps_issued(struct ceph_inode_info *ci, int
>>>> *implemented);
>>>>    extern int __ceph_caps_issued_mask(struct ceph_inode_info *ci,
>>>> int mask, int t);
>>>>    extern int __ceph_caps_issued_other(struct ceph_inode_info *ci,
>>>> @@ -927,6 +935,9 @@ extern int __ceph_do_getattr(struct inode
>>>> *inode, struct page *locked_page,
>>>>    			     int mask, bool force);
>>>>    static inline int ceph_do_getattr(struct inode *inode, int
>>>> mask, bool force)
>>>>    {
>>>> +	struct ceph_inode_info *ci = ceph_inode(inode);
>>>> +
>>>> +	ceph_caps_metric(ci, mask);
>>>>    	return __ceph_do_getattr(inode, NULL, mask, force);
>>>>    }
>>>>    extern int ceph_permission(struct inode *inode, int mask);
>>>> diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
>>>> index d58fa14c1f01..ebd522edb0a8 100644
>>>> --- a/fs/ceph/xattr.c
>>>> +++ b/fs/ceph/xattr.c
>>>> @@ -829,6 +829,7 @@ ssize_t __ceph_getxattr(struct inode *inode,
>>>> const char *name, void *value,
>>>>    	struct ceph_vxattr *vxattr = NULL;
>>>>    	int req_mask;
>>>>    	ssize_t err;
>>>> +	int ret = -1;
>>>>    
>>>>    	/* let's see if a virtual xattr was requested */
>>>>    	vxattr = ceph_match_vxattr(inode, name);
>>>> @@ -856,7 +857,9 @@ ssize_t __ceph_getxattr(struct inode *inode,
>>>> const char *name, void *value,
>>>>    
>>>>    	if (ci->i_xattrs.version == 0 ||
>>>>    	    !((req_mask & CEPH_CAP_XATTR_SHARED) ||
>>>> -	      __ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 1))) {
>>>> +	      (ret = __ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED,
>>>> 1)))) {
>>>> +		if (ret != -1)
>>>> +			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>>>>    		spin_unlock(&ci->i_ceph_lock);
>>>>    
>>>>    		/* security module gets xattr while filling
>>>> trace */
>>>> @@ -871,6 +874,9 @@ ssize_t __ceph_getxattr(struct inode *inode,
>>>> const char *name, void *value,
>>>>    		if (err)
>>>>    			return err;
>>>>    		spin_lock(&ci->i_ceph_lock);
>>>> +	} else {
>>>> +		if (ret != -1)
>>>> +			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>>>>    	}
>>>>    
>>>>    	err = __build_xattrs(inode);
>>>> @@ -907,19 +913,24 @@ ssize_t ceph_listxattr(struct dentry
>>>> *dentry, char *names, size_t size)
>>>>    	struct ceph_inode_info *ci = ceph_inode(inode);
>>>>    	bool len_only = (size == 0);
>>>>    	u32 namelen;
>>>> -	int err;
>>>> +	int err, ret = -1;
>>>>    
>>>>    	spin_lock(&ci->i_ceph_lock);
>>>>    	dout("listxattr %p ver=%lld index_ver=%lld\n", inode,
>>>>    	     ci->i_xattrs.version, ci->i_xattrs.index_version);
>>>>    
>>>>    	if (ci->i_xattrs.version == 0 ||
>>>> -	    !__ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 1)) {
>>>> +	    !(ret = __ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED,
>>>> 1))) {
>>>> +		if (ret != -1)
>>>> +			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>>>>    		spin_unlock(&ci->i_ceph_lock);
>>>>    		err = ceph_do_getattr(inode,
>>>> CEPH_STAT_CAP_XATTR, true);
>>>>    		if (err)
>>>>    			return err;
>>>>    		spin_lock(&ci->i_ceph_lock);
>>>> +	} else {
>>>> +		if (ret != -1)
>>>> +			__ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>>>>    	}
>>>>    
>>>>    	err = __build_xattrs(inode);
>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH resend v5 02/11] ceph: add caps perf metric for each session
  2020-01-31  1:34         ` Xiubo Li
@ 2020-01-31  9:02           ` Xiubo Li
  2020-02-04 21:10             ` Jeff Layton
  0 siblings, 1 reply; 31+ messages in thread
From: Xiubo Li @ 2020-01-31  9:02 UTC (permalink / raw)
  To: Jeffrey Layton, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel

On 2020/1/31 9:34, Xiubo Li wrote:
> On 2020/1/31 3:00, Jeffrey Layton wrote:
>> On Thu, 2020-01-30 at 10:22 +0800, Xiubo Li wrote:
>>> On 2020/1/29 22:21, Jeff Layton wrote:
>>>> On Wed, 2020-01-29 at 03:27 -0500, xiubli@redhat.com wrote:
>>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>>
>>>>> This will fulfill the caps hit/miss metric for each session. When
>>>>> checking the "need" mask and if one cap has the subset of the
>>>>> "need"
>>>>> mask it means hit, or missed.
>>>>>
>>>>> item          total           miss            hit
>>>>> -------------------------------------------------
>>>>> d_lease       295             0               993
>>>>>
>>>>> session       caps            miss            hit
>>>>> -------------------------------------------------
>>>>> 0             295             107             4119
>>>>> 1             1               107             9
>>>>>
>>>>> URL: https://tracker.ceph.com/issues/43215
>>>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>>>> ---
>>>>>    fs/ceph/acl.c        |  2 ++
>>>>>    fs/ceph/addr.c       |  2 ++
>>>>>    fs/ceph/caps.c       | 74
>>>>> ++++++++++++++++++++++++++++++++++++++++++++
>>>>>    fs/ceph/debugfs.c    | 20 ++++++++++++
>>>>>    fs/ceph/dir.c        |  9 ++++--
>>>>>    fs/ceph/file.c       |  3 ++
>>>>>    fs/ceph/mds_client.c | 16 +++++++++-
>>>>>    fs/ceph/mds_client.h |  3 ++
>>>>>    fs/ceph/quota.c      |  9 ++++--
>>>>>    fs/ceph/super.h      | 11 +++++++
>>>>>    fs/ceph/xattr.c      | 17 ++++++++--
>>>>>    11 files changed, 158 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/fs/ceph/acl.c b/fs/ceph/acl.c
>>>>> index 26be6520d3fb..58e119e3519f 100644
>>>>> --- a/fs/ceph/acl.c
>>>>> +++ b/fs/ceph/acl.c
>>>>> @@ -22,6 +22,8 @@ static inline void ceph_set_cached_acl(struct
>>>>> inode *inode,
>>>>>        struct ceph_inode_info *ci = ceph_inode(inode);
>>>>>           spin_lock(&ci->i_ceph_lock);
>>>>> +    __ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>>>>> +
>>>>>        if (__ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED,
>>>>> 0))
>>>>>            set_cached_acl(inode, type, acl);
>>>>>        else
>>>>> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
>>>>> index 7ab616601141..29d4513eff8c 100644
>>>>> --- a/fs/ceph/addr.c
>>>>> +++ b/fs/ceph/addr.c
>>>>> @@ -1706,6 +1706,8 @@ int ceph_uninline_data(struct file *filp,
>>>>> struct page *locked_page)
>>>>>                err = -ENOMEM;
>>>>>                goto out;
>>>>>            }
>>>>> +
>>>>> +        ceph_caps_metric(ci, CEPH_STAT_CAP_INLINE_DATA);
>>>> Should a check for inline data really count here?
>>> Currently all the INLINE_DATA is in 'force' mode, so we can ignore
>>> it.
>>>>>            err = __ceph_do_getattr(inode, page,
>>>>>                        CEPH_STAT_CAP_INLINE_DA
>>>>> TA, true);
>>>>>            if (err < 0) {
>>>>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
>>>>> index 7fc87b693ba4..af2e9e826f8c 100644
>>>>> --- a/fs/ceph/caps.c
>>>>> +++ b/fs/ceph/caps.c
>>>>> @@ -783,6 +783,75 @@ static int __cap_is_valid(struct ceph_cap
>>>>> *cap)
>>>>>        return 1;
>>>>>    }
>>>>>    +/*
>>>>> + * Counts the cap metric.
>>>>> + */
>>>> This needs some comments. Specifically, what should this be
>>>> counting and
>>>> how?
>>> Will add it.
>>>
>>> The __ceph_caps_metric() will traverse the inode's i_caps try to get
>>> the
>>> 'issued' excepting the 'invoking' caps until it get enough caps in
>>> 'mask'. The i_caps traverse logic is following
>>> __ceph_caps_issued_mask().
>>>
>>>
>>>>> +void __ceph_caps_metric(struct ceph_inode_info *ci, int mask)
>>>>> +{
>>>>> +    int have = ci->i_snap_caps;
>>>>> +    struct ceph_mds_session *s;
>>>>> +    struct ceph_cap *cap;
>>>>> +    struct rb_node *p;
>>>>> +    bool skip_auth = false;
>>>>> +
>>>>> +    lockdep_assert_held(&ci->i_ceph_lock);
>>>>> +
>>>>> +    if (mask <= 0)
>>>>> +        return;
>>>>> +
>>>>> +    /* Counts the snap caps metric in the auth cap */
>>>>> +    if (ci->i_auth_cap) {
>>>>> +        cap = ci->i_auth_cap;
>>>>> +        if (have) {
>>>>> +            have |= cap->issued;
>>>>> +
>>>>> +            dout("%s %p cap %p issued %s, mask %s\n",
>>>>> __func__,
>>>>> +                 &ci->vfs_inode, cap, ceph_cap_string(cap-
>>>>>> issued),
>>>>> +                 ceph_cap_string(mask));
>>>>> +
>>>>> +            s = ceph_get_mds_session(cap->session);
>>>>> +            if (s) {
>>>>> +                if (mask & have)
>>>>> +                    percpu_counter_inc(&s-
>>>>>> i_caps_hit);
>>>>> +                else
>>>>> +                    percpu_counter_inc(&s-
>>>>>> i_caps_mis);
>>>>> +                ceph_put_mds_session(s);
>>>>> +            }
>>>>> +            skip_auth = true;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    if ((mask & have) == mask)
>>>>> +        return;
>>>>> +
>>>>> +    /* Checks others */
>>>>> +    for (p = rb_first(&ci->i_caps); p; p = rb_next(p)) {
>>>>> +        cap = rb_entry(p, struct ceph_cap, ci_node);
>>>>> +        if (!__cap_is_valid(cap))
>>>>> +            continue;
>>>>> +
>>>>> +        if (skip_auth && cap == ci->i_auth_cap)
>>>>> +            continue;
>>>>> +
>>>>> +        dout("%s %p cap %p issued %s, mask %s\n", __func__,
>>>>> +             &ci->vfs_inode, cap, ceph_cap_string(cap->issued),
>>>>> +             ceph_cap_string(mask));
>>>>> +
>>>>> +        s = ceph_get_mds_session(cap->session);
>>>>> +        if (s) {
>>>>> +            if (mask & cap->issued)
>>>>> + percpu_counter_inc(&s->i_caps_hit);
>>>>> +            else
>>>>> + percpu_counter_inc(&s->i_caps_mis);
>>>>> +            ceph_put_mds_session(s);
>>>>> +        }
>>>>> +
>>>>> +        have |= cap->issued;
>>>>> +        if ((mask & have) == mask)
>>>>> +            return;
>>>>> +    }
>>>>> +}
>>>>> +
>>>> I'm trying to understand what happens with the above when more than
>>>> one
>>>> ceph_cap has the same bit set in "issued". For instance:
>>>>
>>>> Suppose we're doing the check for a statx call, and we're trying to
>>>> get
>>>> caps for pAsFsLs. We have two MDS's and they've each granted us
>>>> caps for
>>>> the inode, say:
>>>>
>>>> MDS 0: pAs
>>>> MDS 1: pAsLsFs
>>>>
>>>> We check the cap0 first, and consider it a hit, and then we check
>>>> cap1
>>>> and consider it a hit as well. So that seems like it's being
>>>> double-
>>>> counted.
>>> Yeah, it will.
>>>
>>> In case2:
>>>
>>> MDS 0: pAsFs
>>>
>>> MDS 1: pAsLs
>>>
>>> For this case and yours both the i_cap0 and i_cap1 are 'hit'.
>>>
>>>
>>> In case3 :
>>>
>>> MDS0: pAsFsLs
>>>
>>> MDS1: pAs
>>>
>>> Only the i_cap0 is 'hit'.  i_cap1 will not be counted any 'hit' or
>>> 'mis'.
>>>
>>>
>>> In case4:
>>>
>>> MDS0: p
>>>
>>> MDS1: pAsLsFs
>>>
>>> i_cap0 will 'mis' and i_cap1 will 'hit'.
>>>
>>>
>>> All the logic are the same with __ceph_caps_issued_mask() does.
>>>
>>> The 'hit' means to get all the caps in 'mask we have checked the
>>> i_cap[0~N] and if they have a subset in 'mask', and the 'mis' means
>>> we
>>> have checked the i_cap[0~N] but they do not.
>>>
>>> For the i_cap[N+1 ~ M], if we won't touch them because we have
>>> already
>>> gotten enough caps needed in 'mask', they won't count any 'hit' or
>>> 'mis'
>>>
>>> All in all, the current logic is that the 'hit' means 'touched' and
>>> it
>>> have some of what we needed, and the 'mis' means 'touched' and
>>> missed
>>> any of what we needed.
>>>
>>>
>> That seems sort of arbitrary, given that you're going to get different
>> results depending on the index of the MDS with the caps. For instance:
>>
>>
>> MDS0: pAsLsFs
>> MDS1: pAs
>>
>> ...vs...
>>
>> MDS0: pAs
>> MDS1: pAsLsFs
>>
>> If we assume we're looking for pAsLsFs, then the first scenario will
>> just end up with 1 hit and the second will give you 2. AFAIU, the two
>> MDSs are peers, so it really seems like the index should not matter
>> here.
>>
>> I'm really struggling to understand how these numbers will be useful.
>> What, specifically, are we trying to count here and why?
>
> Maybe we need count the hit/mis only once, the fake code like:
>
> // Case1: check the auth caps first
>
> if (auth_cap & mask == mask) {
>
>     s->hit++;
>
>     return;
>
> }
>
> // Case2: check all the other one by one
>
> for (caps : i_caps) {
>
>     if (caps & mask == mask) {
>
>         s->hit++;
>
>         return;
>
>     }
>
>     c |= caps;
>
> }
>
> // Case3:
>
> if (c & mask == mask)
>
>     s->hit++;
>
> else
>
>     s->mis++;
>
> return;
>
> ....
>
> And for the session 's->' here, if one i_cap can hold all the 'mask' 
> requested, like the Case1 and Case2 it will be i_cap's corresponding 
> session. Or for Case3 we could choose any session.
>
> But the above is still not very graceful of counting the cap metrics too.
>
> IMO, for the cap hit/miss counter should be a global one just like the 
> dentry_lease does in [PATCH 01/11], will this make sense ?
>
Currently in fuse client, for each inode it is its auth_cap->session's 
responsibility to do all the cap hit/mis counting if it has a auth_cap, 
or choose any one exist.

Maybe this is one acceptable approach.

> Thanks,
>
>
>>>> ISTM, that what you really want to do here is logically or all of
>>>> the
>>>> cap->issued fields together, and then check that vs. the mask
>>>> value, and
>>>> count only one hit or miss per inode.
>>>>
>>>> That said, it's not 100% clear what you're counting as a hit or
>>>> miss
>>>> here, so please let me know if I have that wrong.
>>>>>    /*
>>>>>     * Return set of valid cap bits issued to us.  Note that caps
>>>>> time
>>>>>     * out, and may be invalidated in bulk if the client session
>>>>> times out
>>>>> @@ -2746,6 +2815,7 @@ static void check_max_size(struct inode
>>>>> *inode, loff_t endoff)
>>>>>    int ceph_try_get_caps(struct inode *inode, int need, int want,
>>>>>                  bool nonblock, int *got)
>>>>>    {
>>>>> +    struct ceph_inode_info *ci = ceph_inode(inode);
>>>>>        int ret;
>>>>>           BUG_ON(need & ~CEPH_CAP_FILE_RD);
>>>>> @@ -2758,6 +2828,7 @@ int ceph_try_get_caps(struct inode *inode,
>>>>> int need, int want,
>>>>>        BUG_ON(want & ~(CEPH_CAP_FILE_CACHE |
>>>>> CEPH_CAP_FILE_LAZYIO |
>>>>>                CEPH_CAP_FILE_SHARED |
>>>>> CEPH_CAP_FILE_EXCL |
>>>>>                CEPH_CAP_ANY_DIR_OPS));
>>>>> +    ceph_caps_metric(ci, need | want);
>>>>>        ret = try_get_cap_refs(inode, need, want, 0, nonblock,
>>>>> got);
>>>>>        return ret == -EAGAIN ? 0 : ret;
>>>>>    }
>>>>> @@ -2784,6 +2855,8 @@ int ceph_get_caps(struct file *filp, int
>>>>> need, int want,
>>>>>            fi->filp_gen != READ_ONCE(fsc->filp_gen))
>>>>>            return -EBADF;
>>>>>    +    ceph_caps_metric(ci, need | want);
>>>>> +
>>>>>        while (true) {
>>>>>            if (endoff > 0)
>>>>>                check_max_size(inode, endoff);
>>>>> @@ -2871,6 +2944,7 @@ int ceph_get_caps(struct file *filp, int
>>>>> need, int want,
>>>>>                 * getattr request will bring inline
>>>>> data into
>>>>>                 * page cache
>>>>>                 */
>>>>> +            ceph_caps_metric(ci,
>>>>> CEPH_STAT_CAP_INLINE_DATA);
>>>>>                ret = __ceph_do_getattr(inode, NULL,
>>>>>                            CEPH_STAT_CAP_I
>>>>> NLINE_DATA,
>>>>>                            true);
>>>>> diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
>>>>> index 40a22da0214a..c132fdb40d53 100644
>>>>> --- a/fs/ceph/debugfs.c
>>>>> +++ b/fs/ceph/debugfs.c
>>>>> @@ -128,6 +128,7 @@ static int metric_show(struct seq_file *s,
>>>>> void *p)
>>>>>    {
>>>>>        struct ceph_fs_client *fsc = s->private;
>>>>>        struct ceph_mds_client *mdsc = fsc->mdsc;
>>>>> +    int i;
>>>>>           seq_printf(s,
>>>>> "item          total           miss            hit\n");
>>>>>        seq_printf(s, "--------------------------------------
>>>>> -----------\n");
>>>>> @@ -137,6 +138,25 @@ static int metric_show(struct seq_file *s,
>>>>> void *p)
>>>>>               percpu_counter_sum(&mdsc-
>>>>>> metric.d_lease_mis),
>>>>>               percpu_counter_sum(&mdsc-
>>>>>> metric.d_lease_hit));
>>>>>    +    seq_printf(s, "\n");
>>>>> +    seq_printf(s,
>>>>> "session       caps            miss            hit\n");
>>>>> +    seq_printf(s, "----------------------------------------------
>>>>> ---\n");
>>>>> +
>>>>> +    mutex_lock(&mdsc->mutex);
>>>>> +    for (i = 0; i < mdsc->max_sessions; i++) {
>>>>> +        struct ceph_mds_session *session;
>>>>> +
>>>>> +        session = __ceph_lookup_mds_session(mdsc, i);
>>>>> +        if (!session)
>>>>> +            continue;
>>>>> +        seq_printf(s, "%-14d%-16d%-16lld%lld\n", i,
>>>>> +               session->s_nr_caps,
>>>>> + percpu_counter_sum(&session->i_caps_mis),
>>>>> + percpu_counter_sum(&session->i_caps_hit));
>>>>> +        ceph_put_mds_session(session);
>>>>> +    }
>>>>> +    mutex_unlock(&mdsc->mutex);
>>>>> +
>>>>>        return 0;
>>>>>    }
>>>>>    diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
>>>>> index 658c55b323cc..33eb239e09e2 100644
>>>>> --- a/fs/ceph/dir.c
>>>>> +++ b/fs/ceph/dir.c
>>>>> @@ -313,7 +313,7 @@ static int ceph_readdir(struct file *file,
>>>>> struct dir_context *ctx)
>>>>>        struct ceph_fs_client *fsc =
>>>>> ceph_inode_to_client(inode);
>>>>>        struct ceph_mds_client *mdsc = fsc->mdsc;
>>>>>        int i;
>>>>> -    int err;
>>>>> +    int err, ret = -1;
>>>>>        unsigned frag = -1;
>>>>>        struct ceph_mds_reply_info_parsed *rinfo;
>>>>>    @@ -346,13 +346,16 @@ static int ceph_readdir(struct file *file,
>>>>> struct dir_context *ctx)
>>>>>            !ceph_test_mount_opt(fsc, NOASYNCREADDIR) &&
>>>>>            ceph_snap(inode) != CEPH_SNAPDIR &&
>>>>>            __ceph_dir_is_complete_ordered(ci) &&
>>>>> -        __ceph_caps_issued_mask(ci, CEPH_CAP_FILE_SHARED, 1)) {
>>>>> +        (ret = __ceph_caps_issued_mask(ci, CEPH_CAP_FILE_SHARED,
>>>>> 1))) {
>>>>>            int shared_gen = atomic_read(&ci-
>>>>>> i_shared_gen);
>>>>> +        __ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>>>>>            spin_unlock(&ci->i_ceph_lock);
>>>>>            err = __dcache_readdir(file, ctx, shared_gen);
>>>>>            if (err != -EAGAIN)
>>>>>                return err;
>>>>>        } else {
>>>>> +        if (ret != -1)
>>>>> +            __ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>>>>>            spin_unlock(&ci->i_ceph_lock);
>>>>>        }
>>>>>    @@ -757,6 +760,8 @@ static struct dentry *ceph_lookup(struct
>>>>> inode *dir, struct dentry *dentry,
>>>>>            struct ceph_dentry_info *di =
>>>>> ceph_dentry(dentry);
>>>>>               spin_lock(&ci->i_ceph_lock);
>>>>> +        __ceph_caps_metric(ci, CEPH_CAP_FILE_SHARED);
>>>>> +
>>>>>            dout(" dir %p flags are %d\n", dir, ci-
>>>>>> i_ceph_flags);
>>>>>            if (strncmp(dentry->d_name.name,
>>>>>                    fsc->mount_options->snapdir_name,
>>>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>>>> index 1e6cdf2dfe90..c78dfbbb7b91 100644
>>>>> --- a/fs/ceph/file.c
>>>>> +++ b/fs/ceph/file.c
>>>>> @@ -384,6 +384,8 @@ int ceph_open(struct inode *inode, struct
>>>>> file *file)
>>>>>         * asynchronously.
>>>>>         */
>>>>>        spin_lock(&ci->i_ceph_lock);
>>>>> +    __ceph_caps_metric(ci, wanted);
>>>>> +
>>>>>        if (__ceph_is_any_real_caps(ci) &&
>>>>>            (((fmode & CEPH_FILE_MODE_WR) == 0) || ci-
>>>>>> i_auth_cap)) {
>>>>>            int mds_wanted = __ceph_caps_mds_wanted(ci,
>>>>> true);
>>>>> @@ -1340,6 +1342,7 @@ static ssize_t ceph_read_iter(struct kiocb
>>>>> *iocb, struct iov_iter *to)
>>>>>                    return -ENOMEM;
>>>>>            }
>>>>>    +        ceph_caps_metric(ci, CEPH_STAT_CAP_INLINE_DATA);
>>>>>            statret = __ceph_do_getattr(inode, page,
>>>>>                            CEPH_STAT_CAP_INLIN
>>>>> E_DATA, !!page);
>>>>>            if (statret < 0) {
>>>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>>>>> index a24fd00676b8..141c1c03636c 100644
>>>>> --- a/fs/ceph/mds_client.c
>>>>> +++ b/fs/ceph/mds_client.c
>>>>> @@ -558,6 +558,8 @@ void ceph_put_mds_session(struct
>>>>> ceph_mds_session *s)
>>>>>        if (refcount_dec_and_test(&s->s_ref)) {
>>>>>            if (s->s_auth.authorizer)
>>>>>                ceph_auth_destroy_authorizer(s-
>>>>>> s_auth.authorizer);
>>>>> +        percpu_counter_destroy(&s->i_caps_hit);
>>>>> +        percpu_counter_destroy(&s->i_caps_mis);
>>>>>            kfree(s);
>>>>>        }
>>>>>    }
>>>>> @@ -598,6 +600,7 @@ static struct ceph_mds_session
>>>>> *register_session(struct ceph_mds_client *mdsc,
>>>>>                             int mds)
>>>>>    {
>>>>>        struct ceph_mds_session *s;
>>>>> +    int err;
>>>>>           if (mds >= mdsc->mdsmap->possible_max_rank)
>>>>>            return ERR_PTR(-EINVAL);
>>>>> @@ -612,8 +615,10 @@ static struct ceph_mds_session
>>>>> *register_session(struct ceph_mds_client *mdsc,
>>>>>               dout("%s: realloc to %d\n", __func__, newmax);
>>>>>            sa = kcalloc(newmax, sizeof(void *), GFP_NOFS);
>>>>> -        if (!sa)
>>>>> +        if (!sa) {
>>>>> +            err = -ENOMEM;
>>>>>                goto fail_realloc;
>>>>> +        }
>>>>>            if (mdsc->sessions) {
>>>>>                memcpy(sa, mdsc->sessions,
>>>>>                       mdsc->max_sessions * sizeof(void
>>>>> *));
>>>>> @@ -653,6 +658,13 @@ static struct ceph_mds_session
>>>>> *register_session(struct ceph_mds_client *mdsc,
>>>>>           INIT_LIST_HEAD(&s->s_cap_flushing);
>>>>>    +    err = percpu_counter_init(&s->i_caps_hit, 0, GFP_NOFS);
>>>>> +    if (err)
>>>>> +        goto fail_realloc;
>>>>> +    err = percpu_counter_init(&s->i_caps_mis, 0, GFP_NOFS);
>>>>> +    if (err)
>>>>> +        goto fail_init;
>>>>> +
>>>>>        mdsc->sessions[mds] = s;
>>>>>        atomic_inc(&mdsc->num_sessions);
>>>>>        refcount_inc(&s->s_ref);  /* one ref to sessions[], one
>>>>> to caller */
>>>>> @@ -662,6 +674,8 @@ static struct ceph_mds_session
>>>>> *register_session(struct ceph_mds_client *mdsc,
>>>>>           return s;
>>>>>    +fail_init:
>>>>> +    percpu_counter_destroy(&s->i_caps_hit);
>>>>>    fail_realloc:
>>>>>        kfree(s);
>>>>>        return ERR_PTR(-ENOMEM);
>>>>> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
>>>>> index dd1f417b90eb..ba74ff74c59c 100644
>>>>> --- a/fs/ceph/mds_client.h
>>>>> +++ b/fs/ceph/mds_client.h
>>>>> @@ -201,6 +201,9 @@ struct ceph_mds_session {
>>>>>           struct list_head  s_waiting;  /* waiting requests */
>>>>>        struct list_head  s_unsafe;   /* unsafe requests */
>>>>> +
>>>>> +    struct percpu_counter i_caps_hit;
>>>>> +    struct percpu_counter i_caps_mis;
>>>>>    };
>>>>>       /*
>>>>> diff --git a/fs/ceph/quota.c b/fs/ceph/quota.c
>>>>> index de56dee60540..4ce2f658e63d 100644
>>>>> --- a/fs/ceph/quota.c
>>>>> +++ b/fs/ceph/quota.c
>>>>> @@ -147,9 +147,14 @@ static struct inode
>>>>> *lookup_quotarealm_inode(struct ceph_mds_client *mdsc,
>>>>>            return NULL;
>>>>>        }
>>>>>        if (qri->inode) {
>>>>> +        struct ceph_inode_info *ci = ceph_inode(qri->inode);
>>>>> +        int ret;
>>>>> +
>>>>> +        ceph_caps_metric(ci, CEPH_STAT_CAP_INODE);
>>>>> +
>>>>>            /* get caps */
>>>>> -        int ret = __ceph_do_getattr(qri->inode, NULL,
>>>>> -                        CEPH_STAT_CAP_INODE, true);
>>>>> +        ret = __ceph_do_getattr(qri->inode, NULL,
>>>>> +                    CEPH_STAT_CAP_INODE, true);
>>>>>            if (ret >= 0)
>>>>>                in = qri->inode;
>>>>>            else
>>>>> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
>>>>> index 7af91628636c..3f4829222528 100644
>>>>> --- a/fs/ceph/super.h
>>>>> +++ b/fs/ceph/super.h
>>>>> @@ -641,6 +641,14 @@ static inline bool
>>>>> __ceph_is_any_real_caps(struct ceph_inode_info *ci)
>>>>>        return !RB_EMPTY_ROOT(&ci->i_caps);
>>>>>    }
>>>>>    +extern void __ceph_caps_metric(struct ceph_inode_info *ci, int
>>>>> mask);
>>>>> +static inline void ceph_caps_metric(struct ceph_inode_info *ci,
>>>>> int mask)
>>>>> +{
>>>>> +    spin_lock(&ci->i_ceph_lock);
>>>>> +    __ceph_caps_metric(ci, mask);
>>>>> +    spin_unlock(&ci->i_ceph_lock);
>>>>> +}
>>>>> +
>>>>>    extern int __ceph_caps_issued(struct ceph_inode_info *ci, int
>>>>> *implemented);
>>>>>    extern int __ceph_caps_issued_mask(struct ceph_inode_info *ci,
>>>>> int mask, int t);
>>>>>    extern int __ceph_caps_issued_other(struct ceph_inode_info *ci,
>>>>> @@ -927,6 +935,9 @@ extern int __ceph_do_getattr(struct inode
>>>>> *inode, struct page *locked_page,
>>>>>                     int mask, bool force);
>>>>>    static inline int ceph_do_getattr(struct inode *inode, int
>>>>> mask, bool force)
>>>>>    {
>>>>> +    struct ceph_inode_info *ci = ceph_inode(inode);
>>>>> +
>>>>> +    ceph_caps_metric(ci, mask);
>>>>>        return __ceph_do_getattr(inode, NULL, mask, force);
>>>>>    }
>>>>>    extern int ceph_permission(struct inode *inode, int mask);
>>>>> diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
>>>>> index d58fa14c1f01..ebd522edb0a8 100644
>>>>> --- a/fs/ceph/xattr.c
>>>>> +++ b/fs/ceph/xattr.c
>>>>> @@ -829,6 +829,7 @@ ssize_t __ceph_getxattr(struct inode *inode,
>>>>> const char *name, void *value,
>>>>>        struct ceph_vxattr *vxattr = NULL;
>>>>>        int req_mask;
>>>>>        ssize_t err;
>>>>> +    int ret = -1;
>>>>>           /* let's see if a virtual xattr was requested */
>>>>>        vxattr = ceph_match_vxattr(inode, name);
>>>>> @@ -856,7 +857,9 @@ ssize_t __ceph_getxattr(struct inode *inode,
>>>>> const char *name, void *value,
>>>>>           if (ci->i_xattrs.version == 0 ||
>>>>>            !((req_mask & CEPH_CAP_XATTR_SHARED) ||
>>>>> -          __ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 1))) {
>>>>> +          (ret = __ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED,
>>>>> 1)))) {
>>>>> +        if (ret != -1)
>>>>> +            __ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>>>>>            spin_unlock(&ci->i_ceph_lock);
>>>>>               /* security module gets xattr while filling
>>>>> trace */
>>>>> @@ -871,6 +874,9 @@ ssize_t __ceph_getxattr(struct inode *inode,
>>>>> const char *name, void *value,
>>>>>            if (err)
>>>>>                return err;
>>>>>            spin_lock(&ci->i_ceph_lock);
>>>>> +    } else {
>>>>> +        if (ret != -1)
>>>>> +            __ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>>>>>        }
>>>>>           err = __build_xattrs(inode);
>>>>> @@ -907,19 +913,24 @@ ssize_t ceph_listxattr(struct dentry
>>>>> *dentry, char *names, size_t size)
>>>>>        struct ceph_inode_info *ci = ceph_inode(inode);
>>>>>        bool len_only = (size == 0);
>>>>>        u32 namelen;
>>>>> -    int err;
>>>>> +    int err, ret = -1;
>>>>>           spin_lock(&ci->i_ceph_lock);
>>>>>        dout("listxattr %p ver=%lld index_ver=%lld\n", inode,
>>>>>             ci->i_xattrs.version, ci->i_xattrs.index_version);
>>>>>           if (ci->i_xattrs.version == 0 ||
>>>>> -        !__ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED, 1)) {
>>>>> +        !(ret = __ceph_caps_issued_mask(ci, CEPH_CAP_XATTR_SHARED,
>>>>> 1))) {
>>>>> +        if (ret != -1)
>>>>> +            __ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>>>>>            spin_unlock(&ci->i_ceph_lock);
>>>>>            err = ceph_do_getattr(inode,
>>>>> CEPH_STAT_CAP_XATTR, true);
>>>>>            if (err)
>>>>>                return err;
>>>>>            spin_lock(&ci->i_ceph_lock);
>>>>> +    } else {
>>>>> +        if (ret != -1)
>>>>> +            __ceph_caps_metric(ci, CEPH_CAP_XATTR_SHARED);
>>>>>        }
>>>>>           err = __build_xattrs(inode);
>>>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH resend v5 03/11] ceph: move ceph_osdc_{read,write}pages to ceph.ko
  2020-01-29  8:27 ` [PATCH resend v5 03/11] ceph: move ceph_osdc_{read,write}pages to ceph.ko xiubli
@ 2020-02-04 18:38   ` Jeff Layton
  0 siblings, 0 replies; 31+ messages in thread
From: Jeff Layton @ 2020-02-04 18:38 UTC (permalink / raw)
  To: xiubli, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel

On Wed, 2020-01-29 at 03:27 -0500, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> Since this helpers are only used by cpeh.ko, let's move it to ceph.ko
> and rename to _sync_.
> 
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  fs/ceph/addr.c                  | 86 ++++++++++++++++++++++++++++++++-
>  include/linux/ceph/osd_client.h | 17 -------
>  net/ceph/osd_client.c           | 79 ------------------------------
>  3 files changed, 84 insertions(+), 98 deletions(-)
> 
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index 29d4513eff8c..20e5ebfff389 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -182,6 +182,47 @@ static int ceph_releasepage(struct page *page, gfp_t g)
>  	return !PagePrivate(page);
>  }
>  
> +/*
> + * Read some contiguous pages.  If we cross a stripe boundary, shorten
> + * *plen.  Return number of bytes read, or error.
> + */
> +static int ceph_sync_readpages(struct ceph_fs_client *fsc,
> +			       struct ceph_vino vino,
> +			       struct ceph_file_layout *layout,
> +			       u64 off, u64 *plen,
> +			       u32 truncate_seq, u64 truncate_size,
> +			       struct page **pages, int num_pages,
> +			       int page_align)
> +{
> +	struct ceph_osd_client *osdc = &fsc->client->osdc;
> +	struct ceph_osd_request *req;
> +	int rc = 0;
> +
> +	dout("readpages on ino %llx.%llx on %llu~%llu\n", vino.ino,
> +	     vino.snap, off, *plen);
> +	req = ceph_osdc_new_request(osdc, layout, vino, off, plen, 0, 1,
> +				    CEPH_OSD_OP_READ, CEPH_OSD_FLAG_READ,
> +				    NULL, truncate_seq, truncate_size,
> +				    false);
> +	if (IS_ERR(req))
> +		return PTR_ERR(req);
> +
> +	/* it may be a short read due to an object boundary */
> +	osd_req_op_extent_osd_data_pages(req, 0,
> +				pages, *plen, page_align, false, false);
> +
> +	dout("readpages  final extent is %llu~%llu (%llu bytes align %d)\n",
> +	     off, *plen, *plen, page_align);
> +
> +	rc = ceph_osdc_start_request(osdc, req, false);
> +	if (!rc)
> +		rc = ceph_osdc_wait_request(osdc, req);
> +
> +	ceph_osdc_put_request(req);
> +	dout("readpages result %d\n", rc);
> +	return rc;
> +}
> +
>  /*
>   * read a single page, without unlocking it.
>   */
> @@ -218,7 +259,7 @@ static int ceph_do_readpage(struct file *filp, struct page *page)
>  
>  	dout("readpage inode %p file %p page %p index %lu\n",
>  	     inode, filp, page, page->index);
> -	err = ceph_osdc_readpages(&fsc->client->osdc, ceph_vino(inode),
> +	err = ceph_sync_readpages(fsc, ceph_vino(inode),
>  				  &ci->i_layout, off, &len,
>  				  ci->i_truncate_seq, ci->i_truncate_size,
>  				  &page, 1, 0);
> @@ -570,6 +611,47 @@ static u64 get_writepages_data_length(struct inode *inode,
>  	return end > start ? end - start : 0;
>  }
>  
> +/*
> + * do a synchronous write on N pages
> + */
> +static int ceph_sync_writepages(struct ceph_fs_client *fsc,
> +				struct ceph_vino vino,
> +				struct ceph_file_layout *layout,
> +				struct ceph_snap_context *snapc,
> +				u64 off, u64 len,
> +				u32 truncate_seq, u64 truncate_size,
> +				struct timespec64 *mtime,
> +				struct page **pages, int num_pages)
> +{
> +	struct ceph_osd_client *osdc = &fsc->client->osdc;
> +	struct ceph_osd_request *req;
> +	int rc = 0;
> +	int page_align = off & ~PAGE_MASK;
> +
> +	req = ceph_osdc_new_request(osdc, layout, vino, off, &len, 0, 1,
> +				    CEPH_OSD_OP_WRITE, CEPH_OSD_FLAG_WRITE,
> +				    snapc, truncate_seq, truncate_size,
> +				    true);
> +	if (IS_ERR(req))
> +		return PTR_ERR(req);
> +
> +	/* it may be a short write due to an object boundary */
> +	osd_req_op_extent_osd_data_pages(req, 0, pages, len, page_align,
> +				false, false);
> +	dout("writepages %llu~%llu (%llu bytes)\n", off, len, len);
> +
> +	req->r_mtime = *mtime;
> +	rc = ceph_osdc_start_request(osdc, req, true);
> +	if (!rc)
> +		rc = ceph_osdc_wait_request(osdc, req);
> +
> +	ceph_osdc_put_request(req);
> +	if (rc == 0)
> +		rc = len;
> +	dout("writepages result %d\n", rc);
> +	return rc;
> +}
> +
>  /*
>   * Write a single page, but leave the page locked.
>   *
> @@ -628,7 +710,7 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
>  		set_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);
>  
>  	set_page_writeback(page);
> -	err = ceph_osdc_writepages(&fsc->client->osdc, ceph_vino(inode),
> +	err = ceph_sync_writepages(fsc, ceph_vino(inode),
>  				   &ci->i_layout, snapc, page_off, len,
>  				   ceph_wbc.truncate_seq,
>  				   ceph_wbc.truncate_size,
> diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
> index 5a62dbd3f4c2..9d9f745b98a1 100644
> --- a/include/linux/ceph/osd_client.h
> +++ b/include/linux/ceph/osd_client.h
> @@ -509,23 +509,6 @@ int ceph_osdc_call(struct ceph_osd_client *osdc,
>  		   struct page *req_page, size_t req_len,
>  		   struct page **resp_pages, size_t *resp_len);
>  
> -extern int ceph_osdc_readpages(struct ceph_osd_client *osdc,
> -			       struct ceph_vino vino,
> -			       struct ceph_file_layout *layout,
> -			       u64 off, u64 *plen,
> -			       u32 truncate_seq, u64 truncate_size,
> -			       struct page **pages, int nr_pages,
> -			       int page_align);
> -
> -extern int ceph_osdc_writepages(struct ceph_osd_client *osdc,
> -				struct ceph_vino vino,
> -				struct ceph_file_layout *layout,
> -				struct ceph_snap_context *sc,
> -				u64 off, u64 len,
> -				u32 truncate_seq, u64 truncate_size,
> -				struct timespec64 *mtime,
> -				struct page **pages, int nr_pages);
> -
>  int ceph_osdc_copy_from(struct ceph_osd_client *osdc,
>  			u64 src_snapid, u64 src_version,
>  			struct ceph_object_id *src_oid,
> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> index b68b376d8c2f..8ff2856e2d52 100644
> --- a/net/ceph/osd_client.c
> +++ b/net/ceph/osd_client.c
> @@ -5230,85 +5230,6 @@ void ceph_osdc_stop(struct ceph_osd_client *osdc)
>  	ceph_msgpool_destroy(&osdc->msgpool_op_reply);
>  }
>  
> -/*
> - * Read some contiguous pages.  If we cross a stripe boundary, shorten
> - * *plen.  Return number of bytes read, or error.
> - */
> -int ceph_osdc_readpages(struct ceph_osd_client *osdc,
> -			struct ceph_vino vino, struct ceph_file_layout *layout,
> -			u64 off, u64 *plen,
> -			u32 truncate_seq, u64 truncate_size,
> -			struct page **pages, int num_pages, int page_align)
> -{
> -	struct ceph_osd_request *req;
> -	int rc = 0;
> -
> -	dout("readpages on ino %llx.%llx on %llu~%llu\n", vino.ino,
> -	     vino.snap, off, *plen);
> -	req = ceph_osdc_new_request(osdc, layout, vino, off, plen, 0, 1,
> -				    CEPH_OSD_OP_READ, CEPH_OSD_FLAG_READ,
> -				    NULL, truncate_seq, truncate_size,
> -				    false);
> -	if (IS_ERR(req))
> -		return PTR_ERR(req);
> -
> -	/* it may be a short read due to an object boundary */
> -	osd_req_op_extent_osd_data_pages(req, 0,
> -				pages, *plen, page_align, false, false);
> -
> -	dout("readpages  final extent is %llu~%llu (%llu bytes align %d)\n",
> -	     off, *plen, *plen, page_align);
> -
> -	rc = ceph_osdc_start_request(osdc, req, false);
> -	if (!rc)
> -		rc = ceph_osdc_wait_request(osdc, req);
> -
> -	ceph_osdc_put_request(req);
> -	dout("readpages result %d\n", rc);
> -	return rc;
> -}
> -EXPORT_SYMBOL(ceph_osdc_readpages);
> -
> -/*
> - * do a synchronous write on N pages
> - */
> -int ceph_osdc_writepages(struct ceph_osd_client *osdc, struct ceph_vino vino,
> -			 struct ceph_file_layout *layout,
> -			 struct ceph_snap_context *snapc,
> -			 u64 off, u64 len,
> -			 u32 truncate_seq, u64 truncate_size,
> -			 struct timespec64 *mtime,
> -			 struct page **pages, int num_pages)
> -{
> -	struct ceph_osd_request *req;
> -	int rc = 0;
> -	int page_align = off & ~PAGE_MASK;
> -
> -	req = ceph_osdc_new_request(osdc, layout, vino, off, &len, 0, 1,
> -				    CEPH_OSD_OP_WRITE, CEPH_OSD_FLAG_WRITE,
> -				    snapc, truncate_seq, truncate_size,
> -				    true);
> -	if (IS_ERR(req))
> -		return PTR_ERR(req);
> -
> -	/* it may be a short write due to an object boundary */
> -	osd_req_op_extent_osd_data_pages(req, 0, pages, len, page_align,
> -				false, false);
> -	dout("writepages %llu~%llu (%llu bytes)\n", off, len, len);
> -
> -	req->r_mtime = *mtime;
> -	rc = ceph_osdc_start_request(osdc, req, true);
> -	if (!rc)
> -		rc = ceph_osdc_wait_request(osdc, req);
> -
> -	ceph_osdc_put_request(req);
> -	if (rc == 0)
> -		rc = len;
> -	dout("writepages result %d\n", rc);
> -	return rc;
> -}
> -EXPORT_SYMBOL(ceph_osdc_writepages);
> -
>  static int osd_req_op_copy_from_init(struct ceph_osd_request *req,
>  				     u64 src_snapid, u64 src_version,
>  				     struct ceph_object_id *src_oid,

This looks like a nice cleanup. I'll plan to merge this one after a bit
of testing.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH resend v5 02/11] ceph: add caps perf metric for each session
  2020-01-31  9:02           ` Xiubo Li
@ 2020-02-04 21:10             ` Jeff Layton
  2020-02-05  0:58               ` Xiubo Li
  2020-02-05  7:57               ` Xiubo Li
  0 siblings, 2 replies; 31+ messages in thread
From: Jeff Layton @ 2020-02-04 21:10 UTC (permalink / raw)
  To: Xiubo Li, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel

On Fri, 2020-01-31 at 17:02 +0800, Xiubo Li wrote:
> On 2020/1/31 9:34, Xiubo Li wrote:
> > On 2020/1/31 3:00, Jeffrey Layton wrote:
> > > That seems sort of arbitrary, given that you're going to get different
> > > results depending on the index of the MDS with the caps. For instance:
> > > 
> > > 
> > > MDS0: pAsLsFs
> > > MDS1: pAs
> > > 
> > > ...vs...
> > > 
> > > MDS0: pAs
> > > MDS1: pAsLsFs
> > > 
> > > If we assume we're looking for pAsLsFs, then the first scenario will
> > > just end up with 1 hit and the second will give you 2. AFAIU, the two
> > > MDSs are peers, so it really seems like the index should not matter
> > > here.
> > > 
> > > I'm really struggling to understand how these numbers will be useful.
> > > What, specifically, are we trying to count here and why?
> > 
> > Maybe we need count the hit/mis only once, the fake code like:
> > 
> > // Case1: check the auth caps first
> > 
> > if (auth_cap & mask == mask) {
> > 
> >     s->hit++;
> > 
> >     return;
> > 
> > }
> > 
> > // Case2: check all the other one by one
> > 
> > for (caps : i_caps) {
> > 
> >     if (caps & mask == mask) {
> > 
> >         s->hit++;
> > 
> >         return;
> > 
> >     }
> > 
> >     c |= caps;
> > 
> > }
> > 
> > // Case3:
> > 
> > if (c & mask == mask)
> > 
> >     s->hit++;
> > 
> > else
> > 
> >     s->mis++;
> > 
> > return;
> > 
> > ....
> > 
> > And for the session 's->' here, if one i_cap can hold all the 'mask' 
> > requested, like the Case1 and Case2 it will be i_cap's corresponding 
> > session. Or for Case3 we could choose any session.
> > 
> > But the above is still not very graceful of counting the cap metrics too.
> > 
> > IMO, for the cap hit/miss counter should be a global one just like the 
> > dentry_lease does in [PATCH 01/11], will this make sense ?
> > 
> Currently in fuse client, for each inode it is its auth_cap->session's 
> responsibility to do all the cap hit/mis counting if it has a auth_cap, 
> or choose any one exist.
> 
> Maybe this is one acceptable approach.

Again, it's not clear to me what you're trying to measure.

Typically, when you're counting hits and misses on a cache, what you
care about is whether you had to wait to fill the cache in order to
proceed. That means a lookup in the case of the dcache, but for this
it's a cap request. If we have a miss, then we're going to ask a single
MDS to resolve it.

To me, it doesn't really make a lot of sense to track this at the
session level since the client deals with cap hits and misses as a union
of the caps for each session. Keeping per-superblock stats makes a lot
more sense in my opinion.

That makes this easy to determine too. You just logically OR all of the
"issued" masks together (and maybe the implemented masks in requests
that allow that), and check whether that covers the mask you need. If it
does, then you have a hit, if not, a miss.

So, to be clear, what we'd be measuring in that case is cap cache checks
per superblock. Is that what you're looking to measure with this?

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH resend v5 02/11] ceph: add caps perf metric for each session
  2020-02-04 21:10             ` Jeff Layton
@ 2020-02-05  0:58               ` Xiubo Li
  2020-02-05  7:57               ` Xiubo Li
  1 sibling, 0 replies; 31+ messages in thread
From: Xiubo Li @ 2020-02-05  0:58 UTC (permalink / raw)
  To: Jeff Layton, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel

On 2020/2/5 5:10, Jeff Layton wrote:
> On Fri, 2020-01-31 at 17:02 +0800, Xiubo Li wrote:
>> On 2020/1/31 9:34, Xiubo Li wrote:
>>> On 2020/1/31 3:00, Jeffrey Layton wrote:
>>>> That seems sort of arbitrary, given that you're going to get different
>>>> results depending on the index of the MDS with the caps. For instance:
>>>>
>>>>
>>>> MDS0: pAsLsFs
>>>> MDS1: pAs
>>>>
>>>> ...vs...
>>>>
>>>> MDS0: pAs
>>>> MDS1: pAsLsFs
>>>>
>>>> If we assume we're looking for pAsLsFs, then the first scenario will
>>>> just end up with 1 hit and the second will give you 2. AFAIU, the two
>>>> MDSs are peers, so it really seems like the index should not matter
>>>> here.
>>>>
>>>> I'm really struggling to understand how these numbers will be useful.
>>>> What, specifically, are we trying to count here and why?
>>> Maybe we need count the hit/mis only once, the fake code like:
>>>
>>> // Case1: check the auth caps first
>>>
>>> if (auth_cap & mask == mask) {
>>>
>>>      s->hit++;
>>>
>>>      return;
>>>
>>> }
>>>
>>> // Case2: check all the other one by one
>>>
>>> for (caps : i_caps) {
>>>
>>>      if (caps & mask == mask) {
>>>
>>>          s->hit++;
>>>
>>>          return;
>>>
>>>      }
>>>
>>>      c |= caps;
>>>
>>> }
>>>
>>> // Case3:
>>>
>>> if (c & mask == mask)
>>>
>>>      s->hit++;
>>>
>>> else
>>>
>>>      s->mis++;
>>>
>>> return;
>>>
>>> ....
>>>
>>> And for the session 's->' here, if one i_cap can hold all the 'mask'
>>> requested, like the Case1 and Case2 it will be i_cap's corresponding
>>> session. Or for Case3 we could choose any session.
>>>
>>> But the above is still not very graceful of counting the cap metrics too.
>>>
>>> IMO, for the cap hit/miss counter should be a global one just like the
>>> dentry_lease does in [PATCH 01/11], will this make sense ?
>>>
>> Currently in fuse client, for each inode it is its auth_cap->session's
>> responsibility to do all the cap hit/mis counting if it has a auth_cap,
>> or choose any one exist.
>>
>> Maybe this is one acceptable approach.
> Again, it's not clear to me what you're trying to measure.
>
> Typically, when you're counting hits and misses on a cache, what you
> care about is whether you had to wait to fill the cache in order to
> proceed. That means a lookup in the case of the dcache, but for this
> it's a cap request. If we have a miss, then we're going to ask a single
> MDS to resolve it.
>
> To me, it doesn't really make a lot of sense to track this at the
> session level since the client deals with cap hits and misses as a union
> of the caps for each session. Keeping per-superblock stats makes a lot
> more sense in my opinion.

This approach will be the same with the others, which are also 
per-superblock or global.

> That makes this easy to determine too. You just logically OR all of the
> "issued" masks together (and maybe the implemented masks in requests
> that allow that), and check whether that covers the mask you need. If it
> does, then you have a hit, if not, a miss.
Yeah, if so it will much easier to measure the hit/miss.
>
> So, to be clear, what we'd be measuring in that case is cap cache checks
> per superblock. Is that what you're looking to measure with this?

Basing per-superblock looks good for me, then there will need some 
change in the ceph side, which is receiving and showing the cap hit/miss 
per-session.

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH resend v5 02/11] ceph: add caps perf metric for each session
  2020-02-04 21:10             ` Jeff Layton
  2020-02-05  0:58               ` Xiubo Li
@ 2020-02-05  7:57               ` Xiubo Li
  1 sibling, 0 replies; 31+ messages in thread
From: Xiubo Li @ 2020-02-05  7:57 UTC (permalink / raw)
  To: Jeff Layton, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel

On 2020/2/5 5:10, Jeff Layton wrote:
> On Fri, 2020-01-31 at 17:02 +0800, Xiubo Li wrote:
>> On 2020/1/31 9:34, Xiubo Li wrote:
>>> On 2020/1/31 3:00, Jeffrey Layton wrote:
>>>
[...]
>> Currently in fuse client, for each inode it is its auth_cap->session's
>> responsibility to do all the cap hit/mis counting if it has a auth_cap,
>> or choose any one exist.
>>
>> Maybe this is one acceptable approach.
> Again, it's not clear to me what you're trying to measure.
>
> Typically, when you're counting hits and misses on a cache, what you
> care about is whether you had to wait to fill the cache in order to
> proceed. That means a lookup in the case of the dcache, but for this
> it's a cap request. If we have a miss, then we're going to ask a single
> MDS to resolve it.
>
> To me, it doesn't really make a lot of sense to track this at the
> session level since the client deals with cap hits and misses as a union
> of the caps for each session. Keeping per-superblock stats makes a lot
> more sense in my opinion.
>
> That makes this easy to determine too. You just logically OR all of the
> "issued" masks together (and maybe the implemented masks in requests
> that allow that), and check whether that covers the mask you need. If it
> does, then you have a hit, if not, a miss.
>
> So, to be clear, what we'd be measuring in that case is cap cache checks
> per superblock. Is that what you're looking to measure with this?
>
The following is the new approach:

  65 +/*
  66 + * Counts the cap metric.
  67 + *
  68 + * This will try to traverse all the ci->i_caps, if we can
  69 + * get all the cap 'mask' it will count the hit, or the mis.
  70 + */
  71 +void __ceph_caps_metric(struct ceph_inode_info *ci, int mask)
  72 +{
  73 +       struct ceph_mds_client *mdsc =
  74 + ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
  75 +       struct ceph_client_metric *metric = &mdsc->metric;
  76 +       int issued;
  77 +
  78 +       lockdep_assert_held(&ci->i_ceph_lock);
  79 +
  80 +       if (mask <= 0)
  81 +               return;
  82 +
  83 +       issued = __ceph_caps_issued(ci, NULL);
  84 +
  85 +       if ((mask & issued) == mask)
  86 + percpu_counter_inc(&metric->i_caps_hit);
  87 +       else
  88 + percpu_counter_inc(&metric->i_caps_mis);
  89 +}
  90 +

The cap hit/mis metric are per-superblock, just like the others.

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH resend v5 04/11] ceph: add r_end_stamp for the osdc request
  2020-01-29  8:27 ` [PATCH resend v5 04/11] ceph: add r_end_stamp for the osdc request xiubli
@ 2020-02-05 19:14   ` Jeff Layton
  2020-02-06  0:57     ` Xiubo Li
  0 siblings, 1 reply; 31+ messages in thread
From: Jeff Layton @ 2020-02-05 19:14 UTC (permalink / raw)
  To: xiubli, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel

On Wed, 2020-01-29 at 03:27 -0500, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> Grab the osdc requests' end time stamp.
> 
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  include/linux/ceph/osd_client.h | 1 +
>  net/ceph/osd_client.c           | 2 ++
>  2 files changed, 3 insertions(+)
> 
> diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
> index 9d9f745b98a1..00a449cfc478 100644
> --- a/include/linux/ceph/osd_client.h
> +++ b/include/linux/ceph/osd_client.h
> @@ -213,6 +213,7 @@ struct ceph_osd_request {
>  	/* internal */
>  	unsigned long r_stamp;                /* jiffies, send or check time */
>  	unsigned long r_start_stamp;          /* jiffies */
> +	unsigned long r_end_stamp;          /* jiffies */
>  	int r_attempts;
>  	u32 r_map_dne_bound;
>  
> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> index 8ff2856e2d52..108c9457d629 100644
> --- a/net/ceph/osd_client.c
> +++ b/net/ceph/osd_client.c
> @@ -2389,6 +2389,8 @@ static void finish_request(struct ceph_osd_request *req)
>  	WARN_ON(lookup_request_mc(&osdc->map_checks, req->r_tid));
>  	dout("%s req %p tid %llu\n", __func__, req, req->r_tid);
>  
> +	req->r_end_stamp = jiffies;
> +
>  	if (req->r_osd)
>  		unlink_request(req->r_osd, req);
>  	atomic_dec(&osdc->num_requests);

Maybe fold this patch into #6 in this series? I'd prefer to add the new
field along with its first user.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH resend v5 05/11] ceph: add global read latency metric support
  2020-01-29  8:27 ` [PATCH resend v5 05/11] ceph: add global read latency metric support xiubli
@ 2020-02-05 20:15   ` Jeff Layton
  2020-02-06  1:24     ` Xiubo Li
  0 siblings, 1 reply; 31+ messages in thread
From: Jeff Layton @ 2020-02-05 20:15 UTC (permalink / raw)
  To: xiubli, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel

On Wed, 2020-01-29 at 03:27 -0500, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> item          total       sum_lat(us)     avg_lat(us)
> -----------------------------------------------------
> read          73          3590000         49178082
> 
> URL: https://tracker.ceph.com/issues/43215
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  fs/ceph/addr.c       |  8 ++++++++
>  fs/ceph/debugfs.c    | 11 +++++++++++
>  fs/ceph/file.c       | 15 +++++++++++++++
>  fs/ceph/mds_client.c | 29 +++++++++++++++++++++++------
>  fs/ceph/mds_client.h |  9 ++-------
>  fs/ceph/metric.h     | 30 ++++++++++++++++++++++++++++++
>  6 files changed, 89 insertions(+), 13 deletions(-)
>  create mode 100644 fs/ceph/metric.h
> 
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index 20e5ebfff389..0435a694370b 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -195,6 +195,7 @@ static int ceph_sync_readpages(struct ceph_fs_client *fsc,
>  			       int page_align)
>  {
>  	struct ceph_osd_client *osdc = &fsc->client->osdc;
> +	struct ceph_client_metric *metric = &fsc->mdsc->metric;

nit: I think you can drop this variable and just dereference the metric
field directly below where it's used. Ditto in other places where
"metric" is only used once in the function.

>  	struct ceph_osd_request *req;
>  	int rc = 0;
>  
> @@ -218,6 +219,8 @@ static int ceph_sync_readpages(struct ceph_fs_client *fsc,
>  	if (!rc)
>  		rc = ceph_osdc_wait_request(osdc, req);
>  
> +	ceph_update_read_latency(metric, req, rc);
> +
>  	ceph_osdc_put_request(req);
>  	dout("readpages result %d\n", rc);
>  	return rc;
> @@ -301,6 +304,8 @@ static int ceph_readpage(struct file *filp, struct page *page)
>  static void finish_read(struct ceph_osd_request *req)
>  {
>  	struct inode *inode = req->r_inode;
> +	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> +	struct ceph_client_metric *metric = &fsc->mdsc->metric;
>  	struct ceph_osd_data *osd_data;
>  	int rc = req->r_result <= 0 ? req->r_result : 0;
>  	int bytes = req->r_result >= 0 ? req->r_result : 0;
> @@ -338,6 +343,9 @@ static void finish_read(struct ceph_osd_request *req)
>  		put_page(page);
>  		bytes -= PAGE_SIZE;
>  	}
> +
> +	ceph_update_read_latency(metric, req, rc);
> +
>  	kfree(osd_data->pages);
>  }
>  
> diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
> index c132fdb40d53..f8a32fa335ae 100644
> --- a/fs/ceph/debugfs.c
> +++ b/fs/ceph/debugfs.c
> @@ -128,8 +128,19 @@ static int metric_show(struct seq_file *s, void *p)
>  {
>  	struct ceph_fs_client *fsc = s->private;
>  	struct ceph_mds_client *mdsc = fsc->mdsc;
> +	s64 total, sum, avg = 0;
>  	int i;
>  
> +	seq_printf(s, "item          total       sum_lat(us)     avg_lat(us)\n");
> +	seq_printf(s, "-----------------------------------------------------\n");
> +
> +	total = percpu_counter_sum(&mdsc->metric.total_reads);
> +	sum = percpu_counter_sum(&mdsc->metric.read_latency_sum);
> +	avg = total ? sum / total : 0;
> +	seq_printf(s, "%-14s%-12lld%-16lld%lld\n", "read",
> +		   total, sum / NSEC_PER_USEC, avg / NSEC_PER_USEC);
> +
> +	seq_printf(s, "\n");
>  	seq_printf(s, "item          total           miss            hit\n");
>  	seq_printf(s, "-------------------------------------------------\n");
>  
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index c78dfbbb7b91..69288c39229b 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -588,6 +588,7 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
>  	struct inode *inode = file_inode(file);
>  	struct ceph_inode_info *ci = ceph_inode(inode);
>  	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> +	struct ceph_client_metric *metric = &fsc->mdsc->metric;
>  	struct ceph_osd_client *osdc = &fsc->client->osdc;
>  	ssize_t ret;
>  	u64 off = iocb->ki_pos;
> @@ -660,6 +661,9 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to,
>  		ret = ceph_osdc_start_request(osdc, req, false);
>  		if (!ret)
>  			ret = ceph_osdc_wait_request(osdc, req);
> +
> +		ceph_update_read_latency(metric, req, ret);
> +
>  		ceph_osdc_put_request(req);
>  
>  		i_size = i_size_read(inode);
> @@ -798,13 +802,20 @@ static void ceph_aio_complete_req(struct ceph_osd_request *req)
>  	struct inode *inode = req->r_inode;
>  	struct ceph_aio_request *aio_req = req->r_priv;
>  	struct ceph_osd_data *osd_data = osd_req_op_extent_osd_data(req, 0);
> +	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> +	struct ceph_client_metric *metric = &fsc->mdsc->metric;
>  
>  	BUG_ON(osd_data->type != CEPH_OSD_DATA_TYPE_BVECS);
>  	BUG_ON(!osd_data->num_bvecs);
> +	BUG_ON(!aio_req);
>  
>  	dout("ceph_aio_complete_req %p rc %d bytes %u\n",
>  	     inode, rc, osd_data->bvec_pos.iter.bi_size);
>  
> +	/* r_start_stamp == 0 means the request was not submitted */
> +	if (req->r_start_stamp && !aio_req->write)
> +		ceph_update_read_latency(metric, req, rc);
> +
>  	if (rc == -EOLDSNAPC) {
>  		struct ceph_aio_work *aio_work;
>  		BUG_ON(!aio_req->write);
> @@ -933,6 +944,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter,
>  	struct inode *inode = file_inode(file);
>  	struct ceph_inode_info *ci = ceph_inode(inode);
>  	struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
> +	struct ceph_client_metric *metric = &fsc->mdsc->metric;
>  	struct ceph_vino vino;
>  	struct ceph_osd_request *req;
>  	struct bio_vec *bvecs;
> @@ -1049,6 +1061,9 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter,
>  		if (!ret)
>  			ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
>  
> +		if (!write)
> +			ceph_update_read_latency(metric, req, ret);
> +
>  		size = i_size_read(inode);
>  		if (!write) {
>  			if (ret == -ENOENT)
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index 141c1c03636c..101b51f9f05d 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -4182,14 +4182,29 @@ static int ceph_mdsc_metric_init(struct ceph_client_metric *metric)
>  	atomic64_set(&metric->total_dentries, 0);
>  	ret = percpu_counter_init(&metric->d_lease_hit, 0, GFP_KERNEL);
>  	if (ret)
> -		return ret;
> +		return ret;;

drop this, please ^^^

>  	ret = percpu_counter_init(&metric->d_lease_mis, 0, GFP_KERNEL);
> -	if (ret) {
> -		percpu_counter_destroy(&metric->d_lease_hit);
> -		return ret;
> -	}
> +	if (ret)
> +		goto err_dlease_mis;
>  
> -	return 0;
> +	ret = percpu_counter_init(&metric->total_reads, 0, GFP_KERNEL);
> +	if (ret)
> +		goto err_total_reads;
> +
> +	ret = percpu_counter_init(&metric->read_latency_sum, 0, GFP_KERNEL);
> +	if (ret)
> +		goto err_read_latency_sum;
> +
> +	return ret;
> +
> +err_read_latency_sum:
> +	percpu_counter_destroy(&metric->total_reads);
> +err_total_reads:
> +	percpu_counter_destroy(&metric->d_lease_mis);
> +err_dlease_mis:
> +	percpu_counter_destroy(&metric->d_lease_hit);
> +
> +	return ret;
>  }
>  
>  int ceph_mdsc_init(struct ceph_fs_client *fsc)
> @@ -4529,6 +4544,8 @@ void ceph_mdsc_destroy(struct ceph_fs_client *fsc)
>  
>  	ceph_mdsc_stop(mdsc);
>  
> +	percpu_counter_destroy(&mdsc->metric.read_latency_sum);
> +	percpu_counter_destroy(&mdsc->metric.total_reads);
>  	percpu_counter_destroy(&mdsc->metric.d_lease_mis);
>  	percpu_counter_destroy(&mdsc->metric.d_lease_hit);
>  
> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> index ba74ff74c59c..574d4e5a5de2 100644
> --- a/fs/ceph/mds_client.h
> +++ b/fs/ceph/mds_client.h
> @@ -16,6 +16,8 @@
>  #include <linux/ceph/mdsmap.h>
>  #include <linux/ceph/auth.h>
>  
> +#include "metric.h"
> +
>  /* The first 8 bits are reserved for old ceph releases */
>  enum ceph_feature_type {
>  	CEPHFS_FEATURE_MIMIC = 8,
> @@ -361,13 +363,6 @@ struct cap_wait {
>  	int			want;
>  };
>  
> -/* This is the global metrics */
> -struct ceph_client_metric {
> -	atomic64_t		total_dentries;
> -	struct percpu_counter	d_lease_hit;
> -	struct percpu_counter	d_lease_mis;
> -};
> -
>  /*
>   * mds client state
>   */
> diff --git a/fs/ceph/metric.h b/fs/ceph/metric.h
> new file mode 100644
> index 000000000000..2a7b8f3fe6a4
> --- /dev/null
> +++ b/fs/ceph/metric.h
> @@ -0,0 +1,30 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _FS_CEPH_MDS_METRIC_H
> +#define _FS_CEPH_MDS_METRIC_H
> +
> +#include <linux/ceph/osd_client.h>
> +
> +/* This is the global metrics */
> +struct ceph_client_metric {
> +	atomic64_t		total_dentries;
> +	struct percpu_counter	d_lease_hit;
> +	struct percpu_counter	d_lease_mis;
> +
> +	struct percpu_counter	total_reads;
> +	struct percpu_counter	read_latency_sum;
> +};
> +
> +static inline void ceph_update_read_latency(struct ceph_client_metric *m,
> +					    struct ceph_osd_request *req,
> +					    int rc)
> +{
> +	if (!m || !req)
> +		return;
> +
> +	if (rc >= 0 || rc == -ENOENT || rc == -ETIMEDOUT) {
> +		s64 latency = req->r_end_stamp - req->r_start_stamp;
> +		percpu_counter_inc(&m->total_reads);
> +		percpu_counter_add(&m->read_latency_sum, latency);
> +	}
> +}
> +#endif

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH resend v5 08/11] ceph: periodically send perf metrics to MDS
  2020-01-29  8:27 ` [PATCH resend v5 08/11] ceph: periodically send perf metrics to MDS xiubli
@ 2020-02-05 21:43   ` Jeff Layton
  2020-02-06  2:36     ` Xiubo Li
  0 siblings, 1 reply; 31+ messages in thread
From: Jeff Layton @ 2020-02-05 21:43 UTC (permalink / raw)
  To: xiubli, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel

On Wed, 2020-01-29 at 03:27 -0500, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> Add enable/disable sending metrics to MDS debugfs and disabled as
> default, if it's enabled the kclient will send metrics every
> second.
> 
> This will send global dentry lease hit/miss and read/write/metadata
> latency metrics and each session's caps hit/miss metric to MDS.
> 
> Every time only sends the global metrics once via any availible
> session.
> 
> URL: https://tracker.ceph.com/issues/43215
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  fs/ceph/debugfs.c            |  44 +++++++-
>  fs/ceph/mds_client.c         | 201 ++++++++++++++++++++++++++++++++---
>  fs/ceph/mds_client.h         |   3 +
>  fs/ceph/metric.h             |  76 +++++++++++++
>  fs/ceph/super.h              |   1 +
>  include/linux/ceph/ceph_fs.h |   1 +
>  6 files changed, 307 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c
> index 7fd031c18309..8aae7ecea54a 100644
> --- a/fs/ceph/debugfs.c
> +++ b/fs/ceph/debugfs.c
> @@ -124,6 +124,40 @@ static int mdsc_show(struct seq_file *s, void *p)
>  	return 0;
>  }
>  
> +/*
> + * metrics debugfs
> + */
> +static int sending_metrics_set(void *data, u64 val)
> +{
> +	struct ceph_fs_client *fsc = (struct ceph_fs_client *)data;
> +	struct ceph_mds_client *mdsc = fsc->mdsc;
> +
> +	if (val > 1) {
> +		pr_err("Invalid sending metrics set value %llu\n", val);
> +		return -EINVAL;
> +	}
> +
> +	mutex_lock(&mdsc->mutex);
> +	mdsc->sending_metrics = (unsigned int)val;

Shouldn't that be a bool cast? Do we even need a cast there?

> +	mutex_unlock(&mdsc->mutex);
> +
> +	return 0;
> +}
> +
> +static int sending_metrics_get(void *data, u64 *val)
> +{
> +	struct ceph_fs_client *fsc = (struct ceph_fs_client *)data;
> +	struct ceph_mds_client *mdsc = fsc->mdsc;
> +
> +	mutex_lock(&mdsc->mutex);
> +	*val = (u64)mdsc->sending_metrics;
> +	mutex_unlock(&mdsc->mutex);
> +
> +	return 0;
> +}
> +DEFINE_SIMPLE_ATTRIBUTE(sending_metrics_fops, sending_metrics_get,
> +			sending_metrics_set, "%llu\n");
> +

I'd like to hear more about how we expect users to use this facility.
This debugfs file doesn't seem consistent with the rest of the UI, and I
imagine if the box reboots you'd have to (manually) re-enable it after
mount, right? Maybe this should be a mount option instead?


>  static int metric_show(struct seq_file *s, void *p)
>  {
>  	struct ceph_fs_client *fsc = s->private;
> @@ -302,11 +336,9 @@ static int congestion_kb_get(void *data, u64 *val)
>  	*val = (u64)fsc->mount_options->congestion_kb;
>  	return 0;
>  }
> -
>  DEFINE_SIMPLE_ATTRIBUTE(congestion_kb_fops, congestion_kb_get,
>  			congestion_kb_set, "%llu\n");
>  
> -
>  void ceph_fs_debugfs_cleanup(struct ceph_fs_client *fsc)
>  {
>  	dout("ceph_fs_debugfs_cleanup\n");
> @@ -316,6 +348,7 @@ void ceph_fs_debugfs_cleanup(struct ceph_fs_client *fsc)
>  	debugfs_remove(fsc->debugfs_mds_sessions);
>  	debugfs_remove(fsc->debugfs_caps);
>  	debugfs_remove(fsc->debugfs_metric);
> +	debugfs_remove(fsc->debugfs_sending_metrics);
>  	debugfs_remove(fsc->debugfs_mdsc);
>  }
>  
> @@ -356,6 +389,13 @@ void ceph_fs_debugfs_init(struct ceph_fs_client *fsc)
>  						fsc,
>  						&mdsc_show_fops);
>  
> +	fsc->debugfs_sending_metrics =
> +			debugfs_create_file("sending_metrics",
> +					    0600,
> +					    fsc->client->debugfs_dir,
> +					    fsc,
> +					    &sending_metrics_fops);
> +
>  	fsc->debugfs_metric = debugfs_create_file("metrics",
>  						  0400,
>  						  fsc->client->debugfs_dir,
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index 92a933810a79..d765804dc855 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -4104,13 +4104,156 @@ static void maybe_recover_session(struct ceph_mds_client *mdsc)
>  	ceph_force_reconnect(fsc->sb);
>  }
>  
> +/*
> + * called under s_mutex
> + */
> +static bool ceph_mdsc_send_metrics(struct ceph_mds_client *mdsc,
> +				   struct ceph_mds_session *s,
> +				   bool skip_global)
> +{
> +	struct ceph_metric_head *head;
> +	struct ceph_metric_cap *cap;
> +	struct ceph_metric_dentry_lease *lease;
> +	struct ceph_metric_read_latency *read;
> +	struct ceph_metric_write_latency *write;
> +	struct ceph_metric_metadata_latency *meta;
> +	struct ceph_msg *msg;
> +	struct timespec64 ts;
> +	s32 len = sizeof(*head) + sizeof(*cap);
> +	s64 sum, total, avg;
> +	s32 items = 0;
> +
> +	if (!mdsc || !s)
> +		return false;
> +
> +	if (!skip_global) {
> +		len += sizeof(*lease);
> +		len += sizeof(*read);
> +		len += sizeof(*write);
> +		len += sizeof(*meta);
> +	}
> +
> +	msg = ceph_msg_new(CEPH_MSG_CLIENT_METRICS, len, GFP_NOFS, true);
> +	if (!msg) {
> +		pr_err("send metrics to mds%d, failed to allocate message\n",
> +		       s->s_mds);
> +		return false;
> +	}
> +
> +	head = msg->front.iov_base;
> +
> +	/* encode the cap metric */
> +	cap = (struct ceph_metric_cap *)(head + 1);
> +	cap->type = cpu_to_le32(CLIENT_METRIC_TYPE_CAP_INFO);
> +	cap->ver = 1;
> +	cap->campat = 1;
> +	cap->data_len = cpu_to_le32(sizeof(*cap) - 10);
> +	cap->hit = cpu_to_le64(percpu_counter_sum(&s->i_caps_hit));
> +	cap->mis = cpu_to_le64(percpu_counter_sum(&s->i_caps_mis));
> +	cap->total = cpu_to_le64(s->s_nr_caps);
> +	items++;
> +
> +	dout("cap metric hit %lld, mis %lld, total caps %lld",
> +	     le64_to_cpu(cap->hit), le64_to_cpu(cap->mis),
> +	     le64_to_cpu(cap->total));
> +
> +	/* only send the global once */
> +	if (skip_global)
> +		goto skip_global;
> +
> +	/* encode the dentry lease metric */
> +	lease = (struct ceph_metric_dentry_lease *)(cap + 1);
> +	lease->type = cpu_to_le32(CLIENT_METRIC_TYPE_DENTRY_LEASE);
> +	lease->ver = 1;
> +	lease->campat = 1;
> +	lease->data_len = cpu_to_le32(sizeof(*lease) - 10);
> +	lease->hit = cpu_to_le64(percpu_counter_sum(&mdsc->metric.d_lease_hit));
> +	lease->mis = cpu_to_le64(percpu_counter_sum(&mdsc->metric.d_lease_mis));
> +	lease->total = cpu_to_le64(atomic64_read(&mdsc->metric.total_dentries));
> +	items++;
> +
> +	dout("dentry lease metric hit %lld, mis %lld, total dentries %lld",
> +	     le64_to_cpu(lease->hit), le64_to_cpu(lease->mis),
> +	     le64_to_cpu(lease->total));
> +
> +	/* encode the read latency metric */
> +	read = (struct ceph_metric_read_latency *)(lease + 1);
> +	read->type = cpu_to_le32(CLIENT_METRIC_TYPE_READ_LATENCY);
> +	read->ver = 1;
> +	read->campat = 1;
> +	read->data_len = cpu_to_le32(sizeof(*read) - 10);
> +	total = percpu_counter_sum(&mdsc->metric.total_reads),
> +	sum = percpu_counter_sum(&mdsc->metric.read_latency_sum);
> +	avg = total ? sum / total : 0;
> +	ts = ns_to_timespec64(avg);
> +	read->sec = cpu_to_le32(ts.tv_sec);
> +	read->nsec = cpu_to_le32(ts.tv_nsec);
> +	items++;
> +
> +	dout("read latency metric total %lld, sum lat %lld, avg lat %lld",
> +	     total, sum, avg);
> +
> +	/* encode the write latency metric */
> +	write = (struct ceph_metric_write_latency *)(read + 1);
> +	write->type = cpu_to_le32(CLIENT_METRIC_TYPE_WRITE_LATENCY);
> +	write->ver = 1;
> +	write->campat = 1;
> +	write->data_len = cpu_to_le32(sizeof(*write) - 10);
> +	total = percpu_counter_sum(&mdsc->metric.total_writes),
> +	sum = percpu_counter_sum(&mdsc->metric.write_latency_sum);
> +	avg = total ? sum / total : 0;
> +	ts = ns_to_timespec64(avg);
> +	write->sec = cpu_to_le32(ts.tv_sec);
> +	write->nsec = cpu_to_le32(ts.tv_nsec);
> +	items++;
> +
> +	dout("write latency metric total %lld, sum lat %lld, avg lat %lld",
> +	     total, sum, avg);
> +
> +	/* encode the metadata latency metric */
> +	meta = (struct ceph_metric_metadata_latency *)(write + 1);
> +	meta->type = cpu_to_le32(CLIENT_METRIC_TYPE_METADATA_LATENCY);
> +	meta->ver = 1;
> +	meta->campat = 1;
> +	meta->data_len = cpu_to_le32(sizeof(*meta) - 10);
> +	total = percpu_counter_sum(&mdsc->metric.total_metadatas),
> +	sum = percpu_counter_sum(&mdsc->metric.metadata_latency_sum);
> +	avg = total ? sum / total : 0;
> +	ts = ns_to_timespec64(avg);
> +	meta->sec = cpu_to_le32(ts.tv_sec);
> +	meta->nsec = cpu_to_le32(ts.tv_nsec);
> +	items++;
> +
> +	dout("metadata latency metric total %lld, sum lat %lld, avg lat %lld",
> +	     total, sum, avg);
> +
> +skip_global:
> +	put_unaligned_le32(items, &head->num);
> +	msg->front.iov_len = cpu_to_le32(len);
> +	msg->hdr.version = cpu_to_le16(1);
> +	msg->hdr.compat_version = cpu_to_le16(1);
> +	msg->hdr.front_len = cpu_to_le32(msg->front.iov_len);
> +	dout("send metrics to mds%d %p\n", s->s_mds, msg);
> +	ceph_con_send(&s->s_con, msg);
> +
> +	return true;
> +}
> +
>  /*
>   * delayed work -- periodically trim expired leases, renew caps with mds
>   */
> +#define CEPH_WORK_DELAY_DEF 5
>  static void schedule_delayed(struct ceph_mds_client *mdsc)
>  {
> -	int delay = 5;
> -	unsigned hz = round_jiffies_relative(HZ * delay);
> +	unsigned int hz;
> +	int delay = CEPH_WORK_DELAY_DEF;
> +
> +	mutex_lock(&mdsc->mutex);
> +	if (mdsc->sending_metrics)
> +		delay = 1;
> +	mutex_unlock(&mdsc->mutex);
> +

The mdsc->mutex is dropped in the callers a little before this is
called, so this is a little too mutex-thrashy. I think you'd be better
off changing this function to be called with the mutex still held.

> +	hz = round_jiffies_relative(HZ * delay);
>  	schedule_delayed_work(&mdsc->delayed_work, hz);
>  }
>  
> @@ -4121,18 +4264,28 @@ static void delayed_work(struct work_struct *work)
>  		container_of(work, struct ceph_mds_client, delayed_work.work);
>  	int renew_interval;
>  	int renew_caps;
> +	bool metric_only;
> +	bool sending_metrics;
> +	bool g_skip = false;
>  
>  	dout("mdsc delayed_work\n");
>  
>  	mutex_lock(&mdsc->mutex);
> -	renew_interval = mdsc->mdsmap->m_session_timeout >> 2;
> -	renew_caps = time_after_eq(jiffies, HZ*renew_interval +
> -				   mdsc->last_renew_caps);
> -	if (renew_caps)
> -		mdsc->last_renew_caps = jiffies;
> +	sending_metrics = !!mdsc->sending_metrics;
> +	metric_only = mdsc->sending_metrics &&
> +		(mdsc->ticks++ % CEPH_WORK_DELAY_DEF);
> +
> +	if (!metric_only) {
> +		renew_interval = mdsc->mdsmap->m_session_timeout >> 2;
> +		renew_caps = time_after_eq(jiffies, HZ*renew_interval +
> +					   mdsc->last_renew_caps);
> +		if (renew_caps)
> +			mdsc->last_renew_caps = jiffies;
> +	}
>  
>  	for (i = 0; i < mdsc->max_sessions; i++) {
>  		struct ceph_mds_session *s = __ceph_lookup_mds_session(mdsc, i);
> +
>  		if (!s)
>  			continue;
>  		if (s->s_state == CEPH_MDS_SESSION_CLOSING) {
> @@ -4158,13 +4311,20 @@ static void delayed_work(struct work_struct *work)
>  		mutex_unlock(&mdsc->mutex);
>  
>  		mutex_lock(&s->s_mutex);
> -		if (renew_caps)
> -			send_renew_caps(mdsc, s);
> -		else
> -			ceph_con_keepalive(&s->s_con);
> -		if (s->s_state == CEPH_MDS_SESSION_OPEN ||
> -		    s->s_state == CEPH_MDS_SESSION_HUNG)
> -			ceph_send_cap_releases(mdsc, s);
> +
> +		if (sending_metrics)
> +			g_skip = ceph_mdsc_send_metrics(mdsc, s, g_skip);
> +
> +		if (!metric_only) {
> +			if (renew_caps)
> +				send_renew_caps(mdsc, s);
> +			else
> +				ceph_con_keepalive(&s->s_con);
> +			if (s->s_state == CEPH_MDS_SESSION_OPEN ||
> +					s->s_state == CEPH_MDS_SESSION_HUNG)
> +				ceph_send_cap_releases(mdsc, s);
> +		}
> +
>  		mutex_unlock(&s->s_mutex);
>  		ceph_put_mds_session(s);
>  
> @@ -4172,6 +4332,9 @@ static void delayed_work(struct work_struct *work)
>  	}
>  	mutex_unlock(&mdsc->mutex);
>  
> +	if (metric_only)
> +		goto delay_work;
> +
>  	ceph_check_delayed_caps(mdsc);
>  
>  	ceph_queue_cap_reclaim_work(mdsc);
> @@ -4180,11 +4343,13 @@ static void delayed_work(struct work_struct *work)
>  
>  	maybe_recover_session(mdsc);
>  
> +delay_work:
>  	schedule_delayed(mdsc);
>  }
>  
> -static int ceph_mdsc_metric_init(struct ceph_client_metric *metric)
> +static int ceph_mdsc_metric_init(struct ceph_mds_client *mdsc)
>  {
> +	struct ceph_client_metric *metric = &mdsc->metric;
>  	int ret;
>  
>  	if (!metric)
> @@ -4222,7 +4387,9 @@ static int ceph_mdsc_metric_init(struct ceph_client_metric *metric)
>  	if (ret)
>  		goto err_metadata_latency_sum;
>  
> -	return ret;
> +	mdsc->sending_metrics = 0;
> +	mdsc->ticks = 0;
> +	return 0;
>  err_metadata_latency_sum:
>  	percpu_counter_destroy(&metric->total_metadatas);
>  err_total_metadatas:
> @@ -4294,7 +4461,7 @@ int ceph_mdsc_init(struct ceph_fs_client *fsc)
>  	init_waitqueue_head(&mdsc->cap_flushing_wq);
>  	INIT_WORK(&mdsc->cap_reclaim_work, ceph_cap_reclaim_work);
>  	atomic_set(&mdsc->cap_reclaim_pending, 0);
> -	err = ceph_mdsc_metric_init(&mdsc->metric);
> +	err = ceph_mdsc_metric_init(mdsc);
>  	if (err)
>  		goto err_mdsmap;
>  
> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> index 574d4e5a5de2..a0ece55d987c 100644
> --- a/fs/ceph/mds_client.h
> +++ b/fs/ceph/mds_client.h
> @@ -451,6 +451,9 @@ struct ceph_mds_client {
>  	struct list_head  dentry_leases;     /* fifo list */
>  	struct list_head  dentry_dir_leases; /* lru list */
>  
> +	/* metrics */
> +	unsigned int		  sending_metrics;
> +	unsigned int		  ticks;
>  	struct ceph_client_metric metric;
>  
>  	spinlock_t		snapid_map_lock;
> diff --git a/fs/ceph/metric.h b/fs/ceph/metric.h
> index 3cda616ba594..352eb753ce25 100644
> --- a/fs/ceph/metric.h
> +++ b/fs/ceph/metric.h
> @@ -4,6 +4,82 @@
>  
>  #include <linux/ceph/osd_client.h>
>  
> +enum ceph_metric_type {
> +	CLIENT_METRIC_TYPE_CAP_INFO,
> +	CLIENT_METRIC_TYPE_READ_LATENCY,
> +	CLIENT_METRIC_TYPE_WRITE_LATENCY,
> +	CLIENT_METRIC_TYPE_METADATA_LATENCY,
> +	CLIENT_METRIC_TYPE_DENTRY_LEASE,
> +
> +	CLIENT_METRIC_TYPE_MAX = CLIENT_METRIC_TYPE_DENTRY_LEASE,
> +};
> +
> +/* metric caps header */
> +struct ceph_metric_cap {
> +	__le32 type;     /* ceph metric type */
> +
> +	__u8  ver;
> +	__u8  campat;

I think you meant "compat" here.

> +
> +	__le32 data_len; /* length of sizeof(hit + mis + total) */
> +	__le64 hit;
> +	__le64 mis;
> +	__le64 total;
> +} __attribute__ ((packed));
> +
> +/* metric dentry lease header */
> +struct ceph_metric_dentry_lease {
> +	__le32 type;     /* ceph metric type */
> +
> +	__u8  ver;
> +	__u8  campat;
> +
> +	__le32 data_len; /* length of sizeof(hit + mis + total) */
> +	__le64 hit;
> +	__le64 mis;
> +	__le64 total;
> +} __attribute__ ((packed));
> +
> +/* metric read latency header */
> +struct ceph_metric_read_latency {
> +	__le32 type;     /* ceph metric type */
> +
> +	__u8  ver;
> +	__u8  campat;
> +
> +	__le32 data_len; /* length of sizeof(sec + nsec) */
> +	__le32 sec;
> +	__le32 nsec;
> +} __attribute__ ((packed));
> +
> +/* metric write latency header */
> +struct ceph_metric_write_latency {
> +	__le32 type;     /* ceph metric type */
> +
> +	__u8  ver;
> +	__u8  campat;
> +
> +	__le32 data_len; /* length of sizeof(sec + nsec) */
> +	__le32 sec;
> +	__le32 nsec;
> +} __attribute__ ((packed));
> +
> +/* metric metadata latency header */
> +struct ceph_metric_metadata_latency {
> +	__le32 type;     /* ceph metric type */
> +
> +	__u8  ver;
> +	__u8  campat;
> +
> +	__le32 data_len; /* length of sizeof(sec + nsec) */
> +	__le32 sec;
> +	__le32 nsec;
> +} __attribute__ ((packed));
> +
> +struct ceph_metric_head {
> +	__le32 num;	/* the number of metrics will be sent */

"the number of metrics that will be sent"

> +} __attribute__ ((packed));
> +
>  /* This is the global metrics */
>  struct ceph_client_metric {
>  	atomic64_t		total_dentries;
> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> index 3f4829222528..a91431e9bdf7 100644
> --- a/fs/ceph/super.h
> +++ b/fs/ceph/super.h
> @@ -128,6 +128,7 @@ struct ceph_fs_client {
>  	struct dentry *debugfs_congestion_kb;
>  	struct dentry *debugfs_bdi;
>  	struct dentry *debugfs_mdsc, *debugfs_mdsmap;
> +	struct dentry *debugfs_sending_metrics;
>  	struct dentry *debugfs_metric;
>  	struct dentry *debugfs_mds_sessions;
>  #endif
> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
> index a099f60feb7b..6028d3e865e4 100644
> --- a/include/linux/ceph/ceph_fs.h
> +++ b/include/linux/ceph/ceph_fs.h
> @@ -130,6 +130,7 @@ struct ceph_dir_layout {
>  #define CEPH_MSG_CLIENT_REQUEST         24
>  #define CEPH_MSG_CLIENT_REQUEST_FORWARD 25
>  #define CEPH_MSG_CLIENT_REPLY           26
> +#define CEPH_MSG_CLIENT_METRICS         29
>  #define CEPH_MSG_CLIENT_CAPS            0x310
>  #define CEPH_MSG_CLIENT_LEASE           0x311
>  #define CEPH_MSG_CLIENT_SNAP            0x312

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH resend v5 04/11] ceph: add r_end_stamp for the osdc request
  2020-02-05 19:14   ` Jeff Layton
@ 2020-02-06  0:57     ` Xiubo Li
  0 siblings, 0 replies; 31+ messages in thread
From: Xiubo Li @ 2020-02-06  0:57 UTC (permalink / raw)
  To: Jeff Layton, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel

On 2020/2/6 3:14, Jeff Layton wrote:
> On Wed, 2020-01-29 at 03:27 -0500, xiubli@redhat.com wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> Grab the osdc requests' end time stamp.
>>
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>   include/linux/ceph/osd_client.h | 1 +
>>   net/ceph/osd_client.c           | 2 ++
>>   2 files changed, 3 insertions(+)
>>
>> diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
>> index 9d9f745b98a1..00a449cfc478 100644
>> --- a/include/linux/ceph/osd_client.h
>> +++ b/include/linux/ceph/osd_client.h
>> @@ -213,6 +213,7 @@ struct ceph_osd_request {
>>   	/* internal */
>>   	unsigned long r_stamp;                /* jiffies, send or check time */
>>   	unsigned long r_start_stamp;          /* jiffies */
>> +	unsigned long r_end_stamp;          /* jiffies */
>>   	int r_attempts;
>>   	u32 r_map_dne_bound;
>>   
>> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
>> index 8ff2856e2d52..108c9457d629 100644
>> --- a/net/ceph/osd_client.c
>> +++ b/net/ceph/osd_client.c
>> @@ -2389,6 +2389,8 @@ static void finish_request(struct ceph_osd_request *req)
>>   	WARN_ON(lookup_request_mc(&osdc->map_checks, req->r_tid));
>>   	dout("%s req %p tid %llu\n", __func__, req, req->r_tid);
>>   
>> +	req->r_end_stamp = jiffies;
>> +
>>   	if (req->r_osd)
>>   		unlink_request(req->r_osd, req);
>>   	atomic_dec(&osdc->num_requests);
> Maybe fold this patch into #6 in this series? I'd prefer to add the new
> field along with its first user.

Sure, will merge it.

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH resend v5 05/11] ceph: add global read latency metric support
  2020-02-05 20:15   ` Jeff Layton
@ 2020-02-06  1:24     ` Xiubo Li
  0 siblings, 0 replies; 31+ messages in thread
From: Xiubo Li @ 2020-02-06  1:24 UTC (permalink / raw)
  To: Jeff Layton, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel

On 2020/2/6 4:15, Jeff Layton wrote:
> On Wed, 2020-01-29 at 03:27 -0500, xiubli@redhat.com wrote:
[...]
>> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
>> index 20e5ebfff389..0435a694370b 100644
>> --- a/fs/ceph/addr.c
>> +++ b/fs/ceph/addr.c
>> @@ -195,6 +195,7 @@ static int ceph_sync_readpages(struct ceph_fs_client *fsc,
>>   			       int page_align)
>>   {
>>   	struct ceph_osd_client *osdc = &fsc->client->osdc;
>> +	struct ceph_client_metric *metric = &fsc->mdsc->metric;
> nit: I think you can drop this variable and just dereference the metric
> field directly below where it's used. Ditto in other places where
> "metric" is only used once in the function.

Will fix them all.

>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>> index 141c1c03636c..101b51f9f05d 100644
>> --- a/fs/ceph/mds_client.c
>> +++ b/fs/ceph/mds_client.c
>> @@ -4182,14 +4182,29 @@ static int ceph_mdsc_metric_init(struct ceph_client_metric *metric)
>>   	atomic64_set(&metric->total_dentries, 0);
>>   	ret = percpu_counter_init(&metric->d_lease_hit, 0, GFP_KERNEL);
>>   	if (ret)
>> -		return ret;
>> +		return ret;;
> drop this, please ^^^

Will fix it.

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH resend v5 08/11] ceph: periodically send perf metrics to MDS
  2020-02-05 21:43   ` Jeff Layton
@ 2020-02-06  2:36     ` Xiubo Li
  2020-02-06 11:31       ` Jeff Layton
  0 siblings, 1 reply; 31+ messages in thread
From: Xiubo Li @ 2020-02-06  2:36 UTC (permalink / raw)
  To: Jeff Layton, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel

On 2020/2/6 5:43, Jeff Layton wrote:
> On Wed, 2020-01-29 at 03:27 -0500, xiubli@redhat.com wrote:
[...]
>>   
>> +/*
>> + * metrics debugfs
>> + */
>> +static int sending_metrics_set(void *data, u64 val)
>> +{
>> +	struct ceph_fs_client *fsc = (struct ceph_fs_client *)data;
>> +	struct ceph_mds_client *mdsc = fsc->mdsc;
>> +
>> +	if (val > 1) {
>> +		pr_err("Invalid sending metrics set value %llu\n", val);
>> +		return -EINVAL;
>> +	}
>> +
>> +	mutex_lock(&mdsc->mutex);
>> +	mdsc->sending_metrics = (unsigned int)val;
> Shouldn't that be a bool cast? Do we even need a cast there?
Will switch sending_metrics to bool type instead.
>> +	mutex_unlock(&mdsc->mutex);
>> +
>> +	return 0;
>> +}
>> +
>> +static int sending_metrics_get(void *data, u64 *val)
>> +{
>> +	struct ceph_fs_client *fsc = (struct ceph_fs_client *)data;
>> +	struct ceph_mds_client *mdsc = fsc->mdsc;
>> +
>> +	mutex_lock(&mdsc->mutex);
>> +	*val = (u64)mdsc->sending_metrics;
>> +	mutex_unlock(&mdsc->mutex);
>> +
>> +	return 0;
>> +}
>> +DEFINE_SIMPLE_ATTRIBUTE(sending_metrics_fops, sending_metrics_get,
>> +			sending_metrics_set, "%llu\n");
>> +
> I'd like to hear more about how we expect users to use this facility.
> This debugfs file doesn't seem consistent with the rest of the UI, and I
> imagine if the box reboots you'd have to (manually) re-enable it after
> mount, right? Maybe this should be a mount option instead?

A mount option means we must do the unmounting to disable it.

I was thinking with the debugfs file we can do the debug or tuning even 
in the product setups at any time, usually this should be disabled since 
it will send it per second.

Or we could merge the "sending_metric" to "metrics" UI, just writing 
"enable"/"disable" to enable/disable sending the metrics to ceph, and 
just like the "reset" does to clean the metrics.

Then the "/sys/kernel/debug/ceph/XXX.clientYYY/metrics" could be 
writable with:

"reset"  --> to clean and reset the metrics counters

"enable" --> enable sending metrics to ceph cluster

"disable" --> disable sending metrics to ceph cluster

Will this be better ?


[...]
>   /*
>    * delayed work -- periodically trim expired leases, renew caps with mds
>    */
> +#define CEPH_WORK_DELAY_DEF 5
>   static void schedule_delayed(struct ceph_mds_client *mdsc)
>   {
> -	int delay = 5;
> -	unsigned hz = round_jiffies_relative(HZ * delay);
> +	unsigned int hz;
> +	int delay = CEPH_WORK_DELAY_DEF;
> +
> +	mutex_lock(&mdsc->mutex);
> +	if (mdsc->sending_metrics)
> +		delay = 1;
> +	mutex_unlock(&mdsc->mutex);
> +
> The mdsc->mutex is dropped in the callers a little before this is
> called, so this is a little too mutex-thrashy. I think you'd be better
> off changing this function to be called with the mutex still held.

Will fix it.


[...]
>> +/* metric caps header */
>> +struct ceph_metric_cap {
>> +	__le32 type;     /* ceph metric type */
>> +
>> +	__u8  ver;
>> +	__u8  campat;
> I think you meant "compat" here.

Will fix it.


[...]
>> +/* metric metadata latency header */
>> +struct ceph_metric_metadata_latency {
>> +	__le32 type;     /* ceph metric type */
>> +
>> +	__u8  ver;
>> +	__u8  campat;
>> +
>> +	__le32 data_len; /* length of sizeof(sec + nsec) */
>> +	__le32 sec;
>> +	__le32 nsec;
>> +} __attribute__ ((packed));
>> +
>> +struct ceph_metric_head {
>> +	__le32 num;	/* the number of metrics will be sent */
> "the number of metrics that will be sent"

Will fix it.

Thanks,

>> +} __attribute__ ((packed));
>> +
>>   /* This is the global metrics */
>>   struct ceph_client_metric {
>>   	atomic64_t		total_dentries;
>> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
>> index 3f4829222528..a91431e9bdf7 100644
>> --- a/fs/ceph/super.h
>> +++ b/fs/ceph/super.h
>> @@ -128,6 +128,7 @@ struct ceph_fs_client {
>>   	struct dentry *debugfs_congestion_kb;
>>   	struct dentry *debugfs_bdi;
>>   	struct dentry *debugfs_mdsc, *debugfs_mdsmap;
>> +	struct dentry *debugfs_sending_metrics;
>>   	struct dentry *debugfs_metric;
>>   	struct dentry *debugfs_mds_sessions;
>>   #endif
>> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
>> index a099f60feb7b..6028d3e865e4 100644
>> --- a/include/linux/ceph/ceph_fs.h
>> +++ b/include/linux/ceph/ceph_fs.h
>> @@ -130,6 +130,7 @@ struct ceph_dir_layout {
>>   #define CEPH_MSG_CLIENT_REQUEST         24
>>   #define CEPH_MSG_CLIENT_REQUEST_FORWARD 25
>>   #define CEPH_MSG_CLIENT_REPLY           26
>> +#define CEPH_MSG_CLIENT_METRICS         29
>>   #define CEPH_MSG_CLIENT_CAPS            0x310
>>   #define CEPH_MSG_CLIENT_LEASE           0x311
>>   #define CEPH_MSG_CLIENT_SNAP            0x312

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH resend v5 08/11] ceph: periodically send perf metrics to MDS
  2020-02-06  2:36     ` Xiubo Li
@ 2020-02-06 11:31       ` Jeff Layton
  2020-02-06 12:26         ` Xiubo Li
  0 siblings, 1 reply; 31+ messages in thread
From: Jeff Layton @ 2020-02-06 11:31 UTC (permalink / raw)
  To: Xiubo Li, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel

On Thu, 2020-02-06 at 10:36 +0800, Xiubo Li wrote:
> On 2020/2/6 5:43, Jeff Layton wrote:
> > On Wed, 2020-01-29 at 03:27 -0500, xiubli@redhat.com wrote:
> [...]
> > > +
> > > +static int sending_metrics_get(void *data, u64 *val)
> > > +{
> > > +	struct ceph_fs_client *fsc = (struct ceph_fs_client *)data;
> > > +	struct ceph_mds_client *mdsc = fsc->mdsc;
> > > +
> > > +	mutex_lock(&mdsc->mutex);
> > > +	*val = (u64)mdsc->sending_metrics;
> > > +	mutex_unlock(&mdsc->mutex);
> > > +
> > > +	return 0;
> > > +}
> > > +DEFINE_SIMPLE_ATTRIBUTE(sending_metrics_fops, sending_metrics_get,
> > > +			sending_metrics_set, "%llu\n");
> > > +
> > I'd like to hear more about how we expect users to use this facility.
> > This debugfs file doesn't seem consistent with the rest of the UI, and I
> > imagine if the box reboots you'd have to (manually) re-enable it after
> > mount, right? Maybe this should be a mount option instead?
> 
> A mount option means we must do the unmounting to disable it.
> 

Technically, no. You could wire it up so that you could enable and
disable it via -o remount. For example:

    # mount -o remount,metrics=disabled

Another option might be a module parameter if this is something that you
really want to be global (and not per-mount or per-session).

> I was thinking with the debugfs file we can do the debug or tuning even 
> in the product setups at any time, usually this should be disabled since 
> it will send it per second.
> 

Meh, one frame per second doesn't seem like it'll add much overhead.

Also, why one update per second? Should that interval be tunable?

> Or we could merge the "sending_metric" to "metrics" UI, just writing 
> "enable"/"disable" to enable/disable sending the metrics to ceph, and 
> just like the "reset" does to clean the metrics.
> 
> Then the "/sys/kernel/debug/ceph/XXX.clientYYY/metrics" could be 
> writable with:
> 
> "reset"  --> to clean and reset the metrics counters
> 
> "enable" --> enable sending metrics to ceph cluster
> 
> "disable" --> disable sending metrics to ceph cluster
> 
> Will this be better ?
> 

I guess it's not clear to me how you intend for this to be used.

A debugfs switch means that this is being enabled and disabled on a per-
session basis. Is the user supposed to turn this on for all, or just one
session? How do they know?

Is this something we expect people to just turn on briefly when they are
experiencing a problem, or is this something that we expect to be turned
on and left on for long periods of time?

If it's the latter then setting up a mount in /etc/fstab is not going to
be sufficient for an admin. She'll have to write a script or something
that goes in after the mount and enables this by writing to debugfs
after rebooting. Yuck.

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH resend v5 08/11] ceph: periodically send perf metrics to MDS
  2020-02-06 11:31       ` Jeff Layton
@ 2020-02-06 12:26         ` Xiubo Li
  2020-02-06 15:21           ` Jeff Layton
  0 siblings, 1 reply; 31+ messages in thread
From: Xiubo Li @ 2020-02-06 12:26 UTC (permalink / raw)
  To: Jeff Layton, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel

On 2020/2/6 19:31, Jeff Layton wrote:
> On Thu, 2020-02-06 at 10:36 +0800, Xiubo Li wrote:
>> On 2020/2/6 5:43, Jeff Layton wrote:
>>> On Wed, 2020-01-29 at 03:27 -0500, xiubli@redhat.com wrote:
>> [...]
>>>> +
>>>> +static int sending_metrics_get(void *data, u64 *val)
>>>> +{
>>>> +	struct ceph_fs_client *fsc = (struct ceph_fs_client *)data;
>>>> +	struct ceph_mds_client *mdsc = fsc->mdsc;
>>>> +
>>>> +	mutex_lock(&mdsc->mutex);
>>>> +	*val = (u64)mdsc->sending_metrics;
>>>> +	mutex_unlock(&mdsc->mutex);
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +DEFINE_SIMPLE_ATTRIBUTE(sending_metrics_fops, sending_metrics_get,
>>>> +			sending_metrics_set, "%llu\n");
>>>> +
>>> I'd like to hear more about how we expect users to use this facility.
>>> This debugfs file doesn't seem consistent with the rest of the UI, and I
>>> imagine if the box reboots you'd have to (manually) re-enable it after
>>> mount, right? Maybe this should be a mount option instead?
>> A mount option means we must do the unmounting to disable it.
>>
> Technically, no. You could wire it up so that you could enable and
> disable it via -o remount. For example:
>
>      # mount -o remount,metrics=disabled

Yeah, this is cool.

>
> Another option might be a module parameter if this is something that you
> really want to be global (and not per-mount or per-session).
>
>> I was thinking with the debugfs file we can do the debug or tuning even
>> in the product setups at any time, usually this should be disabled since
>> it will send it per second.
>>
> Meh, one frame per second doesn't seem like it'll add much overhead.
Okay.
>
> Also, why one update per second? Should that interval be tunable?

Per second just keep it the same with the fuse client.


>> Or we could merge the "sending_metric" to "metrics" UI, just writing
>> "enable"/"disable" to enable/disable sending the metrics to ceph, and
>> just like the "reset" does to clean the metrics.
>>
>> Then the "/sys/kernel/debug/ceph/XXX.clientYYY/metrics" could be
>> writable with:
>>
>> "reset"  --> to clean and reset the metrics counters
>>
>> "enable" --> enable sending metrics to ceph cluster
>>
>> "disable" --> disable sending metrics to ceph cluster
>>
>> Will this be better ?
>>
> I guess it's not clear to me how you intend for this to be used.
>
> A debugfs switch means that this is being enabled and disabled on a per-
> session basis. Is the user supposed to turn this on for all, or just one
> session? How do they know?

Not for all, just per-superblock.

>
> Is this something we expect people to just turn on briefly when they are
> experiencing a problem, or is this something that we expect to be turned
> on and left on for long periods of time?

If this won't add much overhead even per second, let's keep sending the 
metrics to ceph always and the mount option for this switch is not 
needed any more.

And there is already a switch to enable/disable showing the metrics in 
the ceph side, if here add another switch per client, it will be also 
yucky for admins .

Let's make the update interval tunable and per second as default. Maybe 
we should make this as a global UI for all clients ?

Is this okay ?

Thanks.


> If it's the latter then setting up a mount in /etc/fstab is not going to
> be sufficient for an admin. She'll have to write a script or something
> that goes in after the mount and enables this by writing to debugfs
> after rebooting. Yuck.
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH resend v5 08/11] ceph: periodically send perf metrics to MDS
  2020-02-06 12:26         ` Xiubo Li
@ 2020-02-06 15:21           ` Jeff Layton
  2020-02-07  0:37             ` Xiubo Li
  0 siblings, 1 reply; 31+ messages in thread
From: Jeff Layton @ 2020-02-06 15:21 UTC (permalink / raw)
  To: Xiubo Li, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel

On Thu, 2020-02-06 at 20:26 +0800, Xiubo Li wrote:
> On 2020/2/6 19:31, Jeff Layton wrote:
> > On Thu, 2020-02-06 at 10:36 +0800, Xiubo Li wrote:
> > > On 2020/2/6 5:43, Jeff Layton wrote:
> > > > On Wed, 2020-01-29 at 03:27 -0500, xiubli@redhat.com wrote:
> > > [...]
> > > > > +
> > > > > +static int sending_metrics_get(void *data, u64 *val)
> > > > > +{
> > > > > +	struct ceph_fs_client *fsc = (struct ceph_fs_client *)data;
> > > > > +	struct ceph_mds_client *mdsc = fsc->mdsc;
> > > > > +
> > > > > +	mutex_lock(&mdsc->mutex);
> > > > > +	*val = (u64)mdsc->sending_metrics;
> > > > > +	mutex_unlock(&mdsc->mutex);
> > > > > +
> > > > > +	return 0;
> > > > > +}
> > > > > +DEFINE_SIMPLE_ATTRIBUTE(sending_metrics_fops, sending_metrics_get,
> > > > > +			sending_metrics_set, "%llu\n");
> > > > > +
> > > > I'd like to hear more about how we expect users to use this facility.
> > > > This debugfs file doesn't seem consistent with the rest of the UI, and I
> > > > imagine if the box reboots you'd have to (manually) re-enable it after
> > > > mount, right? Maybe this should be a mount option instead?
> > > A mount option means we must do the unmounting to disable it.
> > > 
> > Technically, no. You could wire it up so that you could enable and
> > disable it via -o remount. For example:
> > 
> >      # mount -o remount,metrics=disabled
> 
> Yeah, this is cool.
> 
> > Another option might be a module parameter if this is something that you
> > really want to be global (and not per-mount or per-session).
> > 
> > > I was thinking with the debugfs file we can do the debug or tuning even
> > > in the product setups at any time, usually this should be disabled since
> > > it will send it per second.
> > > 
> > Meh, one frame per second doesn't seem like it'll add much overhead.
> Okay.
> > Also, why one update per second? Should that interval be tunable?
> 
> Per second just keep it the same with the fuse client.
> 

Ok.

> 
> > > Or we could merge the "sending_metric" to "metrics" UI, just writing
> > > "enable"/"disable" to enable/disable sending the metrics to ceph, and
> > > just like the "reset" does to clean the metrics.
> > > 
> > > Then the "/sys/kernel/debug/ceph/XXX.clientYYY/metrics" could be
> > > writable with:
> > > 
> > > "reset"  --> to clean and reset the metrics counters
> > > 
> > > "enable" --> enable sending metrics to ceph cluster
> > > 
> > > "disable" --> disable sending metrics to ceph cluster
> > > 
> > > Will this be better ?
> > > 
> > I guess it's not clear to me how you intend for this to be used.
> > 
> > A debugfs switch means that this is being enabled and disabled on a per-
> > session basis. Is the user supposed to turn this on for all, or just one
> > session? How do they know?
> 
> Not for all, just per-superblock.
> 

If it's per-superblock, then a debugfs-based switch seems particularly
ill-suited for this, as that's really a per-session interface.

> > Is this something we expect people to just turn on briefly when they are
> > experiencing a problem, or is this something that we expect to be turned
> > on and left on for long periods of time?
> 
> If this won't add much overhead even per second, let's keep sending the 
> metrics to ceph always and the mount option for this switch is not 
> needed any more.
> 

Note that I don't really _know_ that it won't be a problem, just that it
doesn't sound too bad. I think we probably will want some mechanism to
enable/disable this until we have some experience with it in the field.

> And there is already a switch to enable/disable showing the metrics in 
> the ceph side, if here add another switch per client, it will be also 
> yucky for admins .
> 
> Let's make the update interval tunable and per second as default. Maybe 
> we should make this as a global UI for all clients ?
> 

If you want a global setting for the interval that would take effect on
all ceph mounts, then maybe a "metric_send_interval" module parameter
would be best. Make it an unsigned int, and allow the admin to set it to
0 to turn off stats transmission in the client.

We have a well-defined interface for setting module parameters on most
distros (via /etc/modprobe.d/), so that would be better than monkeying
around with debugfs here, IMO.

As to the default, it might be best to have this default to 0 initially.
Once we have more experience with it we could make it default to 1 in a
later release.

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH resend v5 08/11] ceph: periodically send perf metrics to MDS
  2020-02-06 15:21           ` Jeff Layton
@ 2020-02-07  0:37             ` Xiubo Li
  0 siblings, 0 replies; 31+ messages in thread
From: Xiubo Li @ 2020-02-07  0:37 UTC (permalink / raw)
  To: Jeff Layton, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel

On 2020/2/6 23:21, Jeff Layton wrote:
> On Thu, 2020-02-06 at 20:26 +0800, Xiubo Li wrote:
>> On 2020/2/6 19:31, Jeff Layton wrote:
>>> On Thu, 2020-02-06 at 10:36 +0800, Xiubo Li wrote:
>>>> On 2020/2/6 5:43, Jeff Layton wrote:
>>>>> On Wed, 2020-01-29 at 03:27 -0500, xiubli@redhat.com wrote:
>>>> [...]
>>>>>> +
>>>>>> +static int sending_metrics_get(void *data, u64 *val)
>>>>>> +{
>>>>>> +	struct ceph_fs_client *fsc = (struct ceph_fs_client *)data;
>>>>>> +	struct ceph_mds_client *mdsc = fsc->mdsc;
>>>>>> +
>>>>>> +	mutex_lock(&mdsc->mutex);
>>>>>> +	*val = (u64)mdsc->sending_metrics;
>>>>>> +	mutex_unlock(&mdsc->mutex);
>>>>>> +
>>>>>> +	return 0;
>>>>>> +}
>>>>>> +DEFINE_SIMPLE_ATTRIBUTE(sending_metrics_fops, sending_metrics_get,
>>>>>> +			sending_metrics_set, "%llu\n");
>>>>>> +
>>>>> I'd like to hear more about how we expect users to use this facility.
>>>>> This debugfs file doesn't seem consistent with the rest of the UI, and I
>>>>> imagine if the box reboots you'd have to (manually) re-enable it after
>>>>> mount, right? Maybe this should be a mount option instead?
>>>> A mount option means we must do the unmounting to disable it.
>>>>
>>> Technically, no. You could wire it up so that you could enable and
>>> disable it via -o remount. For example:
>>>
>>>       # mount -o remount,metrics=disabled
>> Yeah, this is cool.
>>
>>> Another option might be a module parameter if this is something that you
>>> really want to be global (and not per-mount or per-session).
>>>
>>>> I was thinking with the debugfs file we can do the debug or tuning even
>>>> in the product setups at any time, usually this should be disabled since
>>>> it will send it per second.
>>>>
>>> Meh, one frame per second doesn't seem like it'll add much overhead.
>> Okay.
>>> Also, why one update per second? Should that interval be tunable?
>> Per second just keep it the same with the fuse client.
>>
> Ok.
>
>>>> Or we could merge the "sending_metric" to "metrics" UI, just writing
>>>> "enable"/"disable" to enable/disable sending the metrics to ceph, and
>>>> just like the "reset" does to clean the metrics.
>>>>
>>>> Then the "/sys/kernel/debug/ceph/XXX.clientYYY/metrics" could be
>>>> writable with:
>>>>
>>>> "reset"  --> to clean and reset the metrics counters
>>>>
>>>> "enable" --> enable sending metrics to ceph cluster
>>>>
>>>> "disable" --> disable sending metrics to ceph cluster
>>>>
>>>> Will this be better ?
>>>>
>>> I guess it's not clear to me how you intend for this to be used.
>>>
>>> A debugfs switch means that this is being enabled and disabled on a per-
>>> session basis. Is the user supposed to turn this on for all, or just one
>>> session? How do they know?
>> Not for all, just per-superblock.
>>
> If it's per-superblock, then a debugfs-based switch seems particularly
> ill-suited for this, as that's really a per-session interface.
>
>>> Is this something we expect people to just turn on briefly when they are
>>> experiencing a problem, or is this something that we expect to be turned
>>> on and left on for long periods of time?
>> If this won't add much overhead even per second, let's keep sending the
>> metrics to ceph always and the mount option for this switch is not
>> needed any more.
>>
> Note that I don't really _know_ that it won't be a problem, just that it
> doesn't sound too bad. I think we probably will want some mechanism to
> enable/disable this until we have some experience with it in the field.
>
>> And there is already a switch to enable/disable showing the metrics in
>> the ceph side, if here add another switch per client, it will be also
>> yucky for admins .
>>
>> Let's make the update interval tunable and per second as default. Maybe
>> we should make this as a global UI for all clients ?
>>
> If you want a global setting for the interval that would take effect on
> all ceph mounts, then maybe a "metric_send_interval" module parameter
> would be best. Make it an unsigned int, and allow the admin to set it to
> 0 to turn off stats transmission in the client.
>
> We have a well-defined interface for setting module parameters on most
> distros (via /etc/modprobe.d/), so that would be better than monkeying
> around with debugfs here, IMO.
>
> As to the default, it might be best to have this default to 0 initially.
> Once we have more experience with it we could make it default to 1 in a
> later release.

Yeah, this makes sense.

Let's switch to the module parameter "metric_send_interval", at the same 
time this will also act as a switch.

0 means off, >0 will be the interval.

Thanks,

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2020-02-07  0:38 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-29  8:27 [PATCH resend v5 0/11] ceph: add perf metrics support xiubli
2020-01-29  8:27 ` [PATCH resend v5 01/11] ceph: add global dentry lease metric support xiubli
2020-01-29  8:27 ` [PATCH resend v5 02/11] ceph: add caps perf metric for each session xiubli
2020-01-29 14:21   ` Jeff Layton
2020-01-30  2:22     ` Xiubo Li
2020-01-30 19:00       ` Jeffrey Layton
2020-01-31  1:34         ` Xiubo Li
2020-01-31  9:02           ` Xiubo Li
2020-02-04 21:10             ` Jeff Layton
2020-02-05  0:58               ` Xiubo Li
2020-02-05  7:57               ` Xiubo Li
2020-01-29  8:27 ` [PATCH resend v5 03/11] ceph: move ceph_osdc_{read,write}pages to ceph.ko xiubli
2020-02-04 18:38   ` Jeff Layton
2020-01-29  8:27 ` [PATCH resend v5 04/11] ceph: add r_end_stamp for the osdc request xiubli
2020-02-05 19:14   ` Jeff Layton
2020-02-06  0:57     ` Xiubo Li
2020-01-29  8:27 ` [PATCH resend v5 05/11] ceph: add global read latency metric support xiubli
2020-02-05 20:15   ` Jeff Layton
2020-02-06  1:24     ` Xiubo Li
2020-01-29  8:27 ` [PATCH resend v5 06/11] ceph: add global write " xiubli
2020-01-29  8:27 ` [PATCH resend v5 07/11] ceph: add global metadata perf " xiubli
2020-01-29  8:27 ` [PATCH resend v5 08/11] ceph: periodically send perf metrics to MDS xiubli
2020-02-05 21:43   ` Jeff Layton
2020-02-06  2:36     ` Xiubo Li
2020-02-06 11:31       ` Jeff Layton
2020-02-06 12:26         ` Xiubo Li
2020-02-06 15:21           ` Jeff Layton
2020-02-07  0:37             ` Xiubo Li
2020-01-29  8:27 ` [PATCH resend v5 09/11] ceph: add CEPH_DEFINE_RW_FUNC helper support xiubli
2020-01-29  8:27 ` [PATCH resend v5 10/11] ceph: add reset metrics support xiubli
2020-01-29  8:27 ` [PATCH resend v5 11/11] ceph: send client provided metric flags in client metadata xiubli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.