[PATCH resend v5 0/11] ceph: add perf metrics support

* [PATCH resend v5 0/11] ceph: add perf metrics support
@ 2020-01-29  8:27 xiubli
  2020-01-29  8:27 ` [PATCH resend v5 01/11] ceph: add global dentry lease metric support xiubli
                   ` (10 more replies)
  0 siblings, 11 replies; 31+ messages in thread
From: xiubli @ 2020-01-29  8:27 UTC (permalink / raw)
  To: jlayton, idryomov, zyan; +Cc: sage, pdonnell, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

Changed in V2:
- add read/write/metadata latency metric support.
- add and send client provided metric flags in client metadata
- addressed the comments from Ilya and merged the 4/4 patch into 3/4.
- addressed all the other comments in v1 series.

Changed in V3:
- addressed Jeff's comments and let's the callers do the metric
counting.
- with some small fixes for the read/write latency
- tested based on the latest testing branch

Changed in V4:
- fix the lock issue

Changed in V5:
- add r_end_stamp for the osdc request
- delete reset metric and move it to metric sysfs
- move ceph_osdc_{read,write}pages to ceph.ko
- use percpu counters instead for read/write/metadata latencies

Changed in V5 resend:
- add the lost patch.

It will send the metrics to the MDSs every second if sending_metrics is enabled, disable as default.

We can get the metrics from the debugfs:

$ cat /sys/kernel/debug/ceph/0c93a60d-5645-4c46-8568-4c8f63db4c7f.client4267/metrics 
item          total       sum_lat(us)     avg_lat(us)
-----------------------------------------------------
read          13          417000          32076
write         42          131205000       3123928
metadata      104         493000          4740

item          total           miss            hit
-------------------------------------------------
d_lease       204             0               918

session       caps            miss            hit
-------------------------------------------------
0             204             213             368218

In the MDS side, we can get the metrics(NOTE: the latency is in
nanosecond):

$ ./bin/ceph fs perf stats | python -m json.tool
{
    "client_metadata": {
        "client.4267": {
            "IP": "v1:192.168.195.165",
            "hostname": "fedora1",
            "mount_point": "N/A",
            "root": "/"
        }
    },
    "counters": [
        "cap_hit"
    ],
    "global_counters": [
        "read_latency",
        "write_latency",
        "metadata_latency",
        "dentry_lease_hit"
    ],
    "global_metrics": {
        "client.4267": [
            [
                0,
                32076923
            ],
            [
                3,
                123928571
            ],
            [
                0,
                4740384
            ],
            [
                918,
                0
            ]
        ]
    },
    "metrics": {
        "delayed_ranks": [],
        "mds.0": {
            "client.4267": [
                [
                    368218,
                    213
                ]
            ]
        }
    }
}

The provided metric flags in client metadata

$./bin/cephfs-journal-tool --rank=1:0 event get --type=SESSION json
Wrote output to JSON file 'dump'
$ cat dump
[ 
    {
        "client instance": "client.4275 v1:192.168.195.165:0/461391971",
        "open": "true",
        "client map version": 1,
        "inos": "[]",
        "inotable version": 0,
        "client_metadata": {
            "client_features": {
                "feature_bits": "0000000000001bff"
            },
            "metric_spec": {
                "metric_flags": {
                    "feature_bits": "000000000000001f"
                }
            },
            "entity_id": "",
            "hostname": "fedora1",
            "kernel_version": "5.5.0-rc2+",
            "root": "/"
        }
    },
[...]

*** BLURB HERE ***

Xiubo Li (11):
  ceph: add global dentry lease metric support
  ceph: add caps perf metric for each session
  ceph: move ceph_osdc_{read,write}pages to ceph.ko
  ceph: add r_end_stamp for the osdc request
  ceph: add global read latency metric support
  ceph: add global write latency metric support
  ceph: add global metadata perf metric support
  ceph: periodically send perf metrics to MDS
  ceph: add CEPH_DEFINE_RW_FUNC helper support
  ceph: add reset metrics support
  ceph: send client provided metric flags in client metadata

 fs/ceph/acl.c                   |   2 +
 fs/ceph/addr.c                  | 106 +++++++++-
 fs/ceph/caps.c                  |  74 +++++++
 fs/ceph/debugfs.c               | 168 ++++++++++++++-
 fs/ceph/dir.c                   |  27 ++-
 fs/ceph/file.c                  |  26 +++
 fs/ceph/mds_client.c            | 350 ++++++++++++++++++++++++++++++--
 fs/ceph/mds_client.h            |  10 +
 fs/ceph/metric.h                | 150 ++++++++++++++
 fs/ceph/quota.c                 |   9 +-
 fs/ceph/super.h                 |  14 ++
 fs/ceph/xattr.c                 |  17 +-
 include/linux/ceph/ceph_fs.h    |   1 +
 include/linux/ceph/debugfs.h    |  14 ++
 include/linux/ceph/osd_client.h |  18 +-
 net/ceph/osd_client.c           |  81 +-------
 16 files changed, 935 insertions(+), 132 deletions(-)
 create mode 100644 fs/ceph/metric.h

-- 
2.21.0

^ permalink raw reply	[flat|nested] 31+ messages in thread