linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] md: export internal stats through debugfs
@ 2019-07-02 13:29 Hou Tao
  2019-07-02 13:29 ` [RFC PATCH 1/3] md-debugfs: add md_debugfs_create_files() Hou Tao
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Hou Tao @ 2019-07-02 13:29 UTC (permalink / raw)
  To: linux-raid, songliubraving
  Cc: neilb, linux-block, snitzer, agk, dm-devel, linux-kernel, houtao1

Hi,

There are so many io counters, stats and flags in md, so I think
export these info to userspace will be helpful for online-debugging,
especially when the vmlinux file and the crash utility are not
available. And these info can also be utilized during code
understanding.

MD has already exported some stats through sysfs files under
/sys/block/mdX/md, but using sysfs file to export more internal
stats are not a good choice, because we need to create a single
sysfs file for each internal stat according to the use convention
of sysfs and there are too many internal stats. Further, the
newly-created sysfs files would become APIs for userspace tools,
but that is not we wanted, because these files are related with
internal stats and internal stats may change from time to time.

And I think debugfs is a better choice. Because we can show multiple
related stats in a debugfs file, and the debugfs file will never be
used as an userspace API.

Two debugfs files are created to expose these internal stats:
* iostat: io counters and io related stats (e.g., mddev->active_io,
	r1conf->nr_pending, or r1confi->retry_list)
* stat: normal stats/flags (e.g., mddev->recovery, conf->array_frozen)

Because internal stats are spreaded all over md-core and md-personality,
so both md-core and md-personality will create these two debugfs files
under different debugfs directory.

Patch 1 factors out the debugfs files creation routine for md-core and
md-personality, patch 2 creates two debugfs files: iostat & stat under
/sys/kernel/debug/block/mdX for md-core, and patch 3 creates two debugfs
files: iostat & stat under /sys/kernel/debug/block/mdX/raid1 for md-raid1.

The following lines show the hierarchy and the content of these debugfs
files for a RAID1 device:

$ pwd
/sys/kernel/debug/block/md0
$ tree
.
├── iostat
├── raid1
│   ├── iostat
│   └── stat
└── stat

$ cat iostat
active_io 0
sb_wait 0 pending_writes 0
recovery_active 0
bitmap pending_writes 0

$ cat stat
flags 0x20
sb_flags 0x0
recovery 0x0

$ cat raid1/iostat
retry_list active 0
bio_end_io_list active 0
pending_bio_list active 0 cnt 0
sync_pending 0
nr_pending 0
nr_waiting 0
nr_queued 0
barrier 0

$ cat raid1/stat
array_frozen 0

I'm not sure whether the division of internal stats is appropriate and
whether the internal stats in debugfs files are sufficient, so questions
and comments are weclome.

Regards,
Tao

Hou Tao (3):
  md-debugfs: add md_debugfs_create_files()
  md: export inflight io counters and internal stats in debugfs
  raid1: export inflight io counters and internal stats in debugfs

 drivers/md/Makefile     |  2 +-
 drivers/md/md-debugfs.c | 35 ++++++++++++++++++
 drivers/md/md-debugfs.h | 16 +++++++++
 drivers/md/md.c         | 65 ++++++++++++++++++++++++++++++++++
 drivers/md/md.h         |  1 +
 drivers/md/raid1.c      | 78 +++++++++++++++++++++++++++++++++++++++++
 drivers/md/raid1.h      |  1 +
 7 files changed, 197 insertions(+), 1 deletion(-)
 create mode 100644 drivers/md/md-debugfs.c
 create mode 100644 drivers/md/md-debugfs.h

-- 
2.22.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH 1/3] md-debugfs: add md_debugfs_create_files()
  2019-07-02 13:29 [RFC PATCH 0/3] md: export internal stats through debugfs Hou Tao
@ 2019-07-02 13:29 ` Hou Tao
  2019-07-02 13:29 ` [RFC PATCH 2/3] md: export inflight io counters and internal stats in debugfs Hou Tao
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Hou Tao @ 2019-07-02 13:29 UTC (permalink / raw)
  To: linux-raid, songliubraving
  Cc: neilb, linux-block, snitzer, agk, dm-devel, linux-kernel, houtao1

It will be used by the following patches to create debugfs files
under /sys/kernel/debug/mdX.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 drivers/md/Makefile     |  2 +-
 drivers/md/md-debugfs.c | 35 +++++++++++++++++++++++++++++++++++
 drivers/md/md-debugfs.h | 16 ++++++++++++++++
 3 files changed, 52 insertions(+), 1 deletion(-)
 create mode 100644 drivers/md/md-debugfs.c
 create mode 100644 drivers/md/md-debugfs.h

diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index be7a6eb92abc..49355e3b4cce 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -19,7 +19,7 @@ dm-cache-y	+= dm-cache-target.o dm-cache-metadata.o dm-cache-policy.o \
 dm-cache-smq-y   += dm-cache-policy-smq.o
 dm-era-y	+= dm-era-target.o
 dm-verity-y	+= dm-verity-target.o
-md-mod-y	+= md.o md-bitmap.o
+md-mod-y	+= md.o md-bitmap.o md-debugfs.o
 raid456-y	+= raid5.o raid5-cache.o raid5-ppl.o
 dm-zoned-y	+= dm-zoned-target.o dm-zoned-metadata.o dm-zoned-reclaim.o
 linear-y	+= md-linear.o
diff --git a/drivers/md/md-debugfs.c b/drivers/md/md-debugfs.c
new file mode 100644
index 000000000000..318c35fed24f
--- /dev/null
+++ b/drivers/md/md-debugfs.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Based on debugfs_create_files() in blk-mq */
+
+#include <linux/fs.h>
+#include <linux/debugfs.h>
+
+#include "md-debugfs.h"
+
+static int md_debugfs_open(struct inode *inode, struct file *file)
+{
+	const struct md_debugfs_file *dbg_file = inode->i_private;
+	void *data = d_inode(file_dentry(file)->d_parent)->i_private;
+
+	return single_open(file, dbg_file->show, data);
+}
+
+static const struct file_operations md_debugfs_fops = {
+	.owner = THIS_MODULE,
+	.open = md_debugfs_open,
+	.llseek = seq_lseek,
+	.read = seq_read,
+	.release = single_release,
+};
+
+void md_debugfs_create_files(struct dentry *parent, void *data,
+		const struct md_debugfs_file *files)
+{
+	const struct md_debugfs_file *file;
+
+	d_inode(parent)->i_private = data;
+
+	for (file = files; file->name; file++)
+		debugfs_create_file(file->name, 0444, parent,
+				(void *)file, &md_debugfs_fops);
+}
diff --git a/drivers/md/md-debugfs.h b/drivers/md/md-debugfs.h
new file mode 100644
index 000000000000..13b581d4ab63
--- /dev/null
+++ b/drivers/md/md-debugfs.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _MD_DEBUGFS_H
+#define _MD_DEBUGFS_H
+
+#include <linux/seq_file.h>
+#include <linux/debugfs.h>
+
+struct md_debugfs_file {
+	const char *name;
+	int (*show)(struct seq_file *m, void *data);
+};
+
+extern void md_debugfs_create_files(struct dentry *parent, void *data,
+		const struct md_debugfs_file *files);
+#endif
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 2/3] md: export inflight io counters and internal stats in debugfs
  2019-07-02 13:29 [RFC PATCH 0/3] md: export internal stats through debugfs Hou Tao
  2019-07-02 13:29 ` [RFC PATCH 1/3] md-debugfs: add md_debugfs_create_files() Hou Tao
@ 2019-07-02 13:29 ` Hou Tao
  2019-07-02 13:29 ` [RFC PATCH 3/3] raid1: " Hou Tao
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Hou Tao @ 2019-07-02 13:29 UTC (permalink / raw)
  To: linux-raid, songliubraving
  Cc: neilb, linux-block, snitzer, agk, dm-devel, linux-kernel, houtao1

There are so many io counters and stats/flags in md, so I think
export these info in debugfs will be helpful for online-debugging,
especially when the vmlinux and crash utility are not available.
And these files can also be utilized during code understanding.

Debug info are divided into two debugfs files:
* iostat
  contains io counters and io related stats (e.g., mddev->pending_writes
  and the active status of mddev->sb_wait)
* stat
  contains internal stats or flags (e.g., mddev->sb_flags)

These two debugfs files will be created both by md-core and the used
md-personality for each md device. This patch creates debug files
for md-core under /sys/kernel/debug/block/mdX, and the following patch
will creates these files for RAID1. The following lines show the hierarchy
of debugfs files created for a RAID1 device:

  $ pwd
  /sys/kernel/debug/block/md0
  $ tree
  .
  ├── iostat
  ├── raid1
  │   ├── iostat
  │   └── stat
  └── stat

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 drivers/md/md.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/md/md.h |  1 +
 2 files changed, 66 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 9801d540fea1..dceb8fd59ba0 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -64,11 +64,14 @@
 #include "md.h"
 #include "md-bitmap.h"
 #include "md-cluster.h"
+#include "md-debugfs.h"
 
 #ifndef MODULE
 static void autostart_arrays(int part);
 #endif
 
+extern struct dentry *blk_debugfs_root;
+
 /* pers_list is a list of registered personalities protected
  * by pers_lock.
  * pers_lock does extra service to protect accesses to
@@ -5191,6 +5194,65 @@ md_attr_store(struct kobject *kobj, struct attribute *attr,
 	return rv;
 }
 
+static int md_dbg_iostat_show(struct seq_file *m, void *data)
+{
+	struct mddev *mddev = m->private;
+	int active_bm_io = 0;
+
+	spin_lock(&mddev->lock);
+	if (mddev->bitmap)
+		active_bm_io = atomic_read(&mddev->bitmap->pending_writes);
+	spin_unlock(&mddev->lock);
+
+	seq_printf(m, "active_io %d\n",
+			atomic_read(&mddev->active_io));
+	seq_printf(m, "sb_wait %d pending_writes %d\n",
+			waitqueue_active(&mddev->sb_wait),
+			atomic_read(&mddev->pending_writes));
+	seq_printf(m, "recovery_active %d\n",
+			atomic_read(&mddev->recovery_active));
+	seq_printf(m, "bitmap pending_writes %d\n", active_bm_io);
+
+	return 0;
+}
+
+static int md_dbg_stat_show(struct seq_file *m, void *data)
+{
+	struct mddev *mddev = m->private;
+
+	seq_printf(m, "flags 0x%lx\n", mddev->flags);
+	seq_printf(m, "sb_flags 0x%lx\n", mddev->sb_flags);
+	seq_printf(m, "recovery 0x%lx\n", mddev->recovery);
+
+	return 0;
+}
+
+static const struct md_debugfs_file md_dbg_files[] = {
+	{.name = "iostat", .show = md_dbg_iostat_show},
+	{.name = "stat", .show = md_dbg_stat_show},
+	{},
+};
+
+static void md_unregister_debugfs(struct mddev *mddev)
+{
+	debugfs_remove_recursive(mddev->debugfs_dir);
+}
+
+static void md_register_debugfs(struct mddev *mddev)
+{
+	const char *name;
+	struct dentry *dir;
+
+	name = kobject_name(&disk_to_dev(mddev->gendisk)->kobj);
+	dir = debugfs_create_dir(name, blk_debugfs_root);
+	if (!IS_ERR_OR_NULL(dir)) {
+		md_debugfs_create_files(dir, mddev, md_dbg_files);
+		mddev->debugfs_dir = dir;
+	} else {
+		mddev->debugfs_dir = NULL;
+	}
+}
+
 static void md_free(struct kobject *ko)
 {
 	struct mddev *mddev = container_of(ko, struct mddev, kobj);
@@ -5227,6 +5289,7 @@ static void mddev_delayed_delete(struct work_struct *ws)
 {
 	struct mddev *mddev = container_of(ws, struct mddev, del_work);
 
+	md_unregister_debugfs(mddev);
 	sysfs_remove_group(&mddev->kobj, &md_bitmap_group);
 	kobject_del(&mddev->kobj);
 	kobject_put(&mddev->kobj);
@@ -5353,6 +5416,8 @@ static int md_alloc(dev_t dev, char *name)
 	if (mddev->kobj.sd &&
 	    sysfs_create_group(&mddev->kobj, &md_bitmap_group))
 		pr_debug("pointless warning\n");
+
+	md_register_debugfs(mddev);
 	mutex_unlock(&mddev->open_mutex);
  abort:
 	mutex_unlock(&disks_mutex);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 7c930c091193..e79ef2101c45 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -466,6 +466,7 @@ struct mddev {
 	unsigned int			good_device_nr;	/* good device num within cluster raid */
 
 	bool	has_superblocks:1;
+	struct dentry *debugfs_dir;
 };
 
 enum recovery_flags {
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 3/3] raid1: export inflight io counters and internal stats in debugfs
  2019-07-02 13:29 [RFC PATCH 0/3] md: export internal stats through debugfs Hou Tao
  2019-07-02 13:29 ` [RFC PATCH 1/3] md-debugfs: add md_debugfs_create_files() Hou Tao
  2019-07-02 13:29 ` [RFC PATCH 2/3] md: export inflight io counters and internal stats in debugfs Hou Tao
@ 2019-07-02 13:29 ` Hou Tao
  2019-07-22 21:31 ` [RFC PATCH 0/3] md: export internal stats through debugfs Song Liu
  2019-07-22 23:30 ` Bob Liu
  4 siblings, 0 replies; 9+ messages in thread
From: Hou Tao @ 2019-07-02 13:29 UTC (permalink / raw)
  To: linux-raid, songliubraving
  Cc: neilb, linux-block, snitzer, agk, dm-devel, linux-kernel, houtao1

Just like the previous patch which exports debugfs files for md-core,
this patch exports debugfs file for md-raid1 under
/sys/kernel/debug/block/mdX/raid1.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 drivers/md/raid1.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/md/raid1.h |  1 +
 2 files changed, 79 insertions(+)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 2aa36e570e04..da06bb47195b 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -35,6 +35,7 @@
 #include "md.h"
 #include "raid1.h"
 #include "md-bitmap.h"
+#include "md-debugfs.h"
 
 #define UNSUPPORTED_MDDEV_FLAGS		\
 	((1L << MD_HAS_JOURNAL) |	\
@@ -2901,6 +2902,80 @@ static sector_t raid1_size(struct mddev *mddev, sector_t sectors, int raid_disks
 	return mddev->dev_sectors;
 }
 
+enum {
+	IOSTAT_NR_PENDING = 0,
+	IOSTAT_NR_WAITING,
+	IOSTAT_NR_QUEUED,
+	IOSTAT_BARRIER,
+	IOSTAT_CNT,
+};
+
+static int raid1_dbg_iostat_show(struct seq_file *m, void *data)
+{
+	struct r1conf *conf = m->private;
+	int idx;
+	int sum[IOSTAT_CNT] = {};
+
+	seq_printf(m, "retry_list active %d\n",
+			!list_empty(&conf->retry_list));
+	seq_printf(m, "bio_end_io_list active %d\n",
+			!list_empty(&conf->bio_end_io_list));
+	seq_printf(m, "pending_bio_list active %d cnt %d\n",
+			!bio_list_empty(&conf->pending_bio_list),
+			conf->pending_count);
+
+	for (idx = 0; idx < BARRIER_BUCKETS_NR; idx++) {
+		sum[IOSTAT_NR_PENDING] += atomic_read(&conf->nr_pending[idx]);
+		sum[IOSTAT_NR_WAITING] += atomic_read(&conf->nr_waiting[idx]);
+		sum[IOSTAT_NR_QUEUED] += atomic_read(&conf->nr_queued[idx]);
+		sum[IOSTAT_BARRIER] += atomic_read(&conf->barrier[idx]);
+	}
+
+	seq_printf(m, "sync_pending %d\n", atomic_read(&conf->nr_sync_pending));
+	seq_printf(m, "nr_pending %d\n", sum[IOSTAT_NR_PENDING]);
+	seq_printf(m, "nr_waiting %d\n", sum[IOSTAT_NR_WAITING]);
+	seq_printf(m, "nr_queued %d\n", sum[IOSTAT_NR_QUEUED]);
+	seq_printf(m, "barrier %d\n", sum[IOSTAT_BARRIER]);
+
+	return 0;
+}
+
+static int raid1_dbg_stat_show(struct seq_file *m, void *data)
+{
+	struct r1conf *conf = m->private;
+
+	seq_printf(m, "array_frozen %d\n", conf->array_frozen);
+	return 0;
+}
+
+static const struct md_debugfs_file raid1_dbg_files[] = {
+	{.name = "iostat", .show = raid1_dbg_iostat_show},
+	{.name = "stat", .show = raid1_dbg_stat_show},
+	{},
+};
+
+static void raid1_unregister_debugfs(struct r1conf *conf)
+{
+	debugfs_remove_recursive(conf->debugfs_dir);
+}
+
+static void raid1_register_debugfs(struct mddev *mddev, struct r1conf *conf)
+{
+	struct dentry *dir;
+
+	conf->debugfs_dir = NULL;
+
+	if (!mddev->debugfs_dir)
+		return;
+
+	dir = debugfs_create_dir("raid1", mddev->debugfs_dir);
+	if (IS_ERR_OR_NULL(dir))
+		return;
+
+	md_debugfs_create_files(dir, conf, raid1_dbg_files);
+	conf->debugfs_dir = dir;
+}
+
 static struct r1conf *setup_conf(struct mddev *mddev)
 {
 	struct r1conf *conf;
@@ -3022,6 +3097,8 @@ static struct r1conf *setup_conf(struct mddev *mddev)
 	if (!conf->thread)
 		goto abort;
 
+	raid1_register_debugfs(mddev, conf);
+
 	return conf;
 
  abort:
@@ -3136,6 +3213,7 @@ static void raid1_free(struct mddev *mddev, void *priv)
 {
 	struct r1conf *conf = priv;
 
+	raid1_unregister_debugfs(conf);
 	mempool_exit(&conf->r1bio_pool);
 	kfree(conf->mirrors);
 	safe_put_page(conf->tmppage);
diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
index e7ccad898736..d627020e92d4 100644
--- a/drivers/md/raid1.h
+++ b/drivers/md/raid1.h
@@ -139,6 +139,7 @@ struct r1conf {
 	sector_t		cluster_sync_low;
 	sector_t		cluster_sync_high;
 
+	struct dentry *debugfs_dir;
 };
 
 /*
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 0/3] md: export internal stats through debugfs
  2019-07-02 13:29 [RFC PATCH 0/3] md: export internal stats through debugfs Hou Tao
                   ` (2 preceding siblings ...)
  2019-07-02 13:29 ` [RFC PATCH 3/3] raid1: " Hou Tao
@ 2019-07-22 21:31 ` Song Liu
  2019-07-27  5:47   ` Hou Tao
  2019-07-22 23:30 ` Bob Liu
  4 siblings, 1 reply; 9+ messages in thread
From: Song Liu @ 2019-07-22 21:31 UTC (permalink / raw)
  To: Hou Tao
  Cc: linux-raid, Song Liu, NeilBrown, linux-block, snitzer, agk,
	dm-devel, open list

On Tue, Jul 2, 2019 at 6:25 AM Hou Tao <houtao1@huawei.com> wrote:
>
> Hi,
>
> There are so many io counters, stats and flags in md, so I think
> export these info to userspace will be helpful for online-debugging,
> especially when the vmlinux file and the crash utility are not
> available. And these info can also be utilized during code
> understanding.
>
> MD has already exported some stats through sysfs files under
> /sys/block/mdX/md, but using sysfs file to export more internal
> stats are not a good choice, because we need to create a single
> sysfs file for each internal stat according to the use convention
> of sysfs and there are too many internal stats. Further, the
> newly-created sysfs files would become APIs for userspace tools,
> but that is not we wanted, because these files are related with
> internal stats and internal stats may change from time to time.
>
> And I think debugfs is a better choice. Because we can show multiple
> related stats in a debugfs file, and the debugfs file will never be
> used as an userspace API.
>
> Two debugfs files are created to expose these internal stats:
> * iostat: io counters and io related stats (e.g., mddev->active_io,
>         r1conf->nr_pending, or r1confi->retry_list)
> * stat: normal stats/flags (e.g., mddev->recovery, conf->array_frozen)
>
> Because internal stats are spreaded all over md-core and md-personality,
> so both md-core and md-personality will create these two debugfs files
> under different debugfs directory.
>
> Patch 1 factors out the debugfs files creation routine for md-core and
> md-personality, patch 2 creates two debugfs files: iostat & stat under
> /sys/kernel/debug/block/mdX for md-core, and patch 3 creates two debugfs
> files: iostat & stat under /sys/kernel/debug/block/mdX/raid1 for md-raid1.
>
> The following lines show the hierarchy and the content of these debugfs
> files for a RAID1 device:
>
> $ pwd
> /sys/kernel/debug/block/md0
> $ tree
> .
> ├── iostat
> ├── raid1
> │   ├── iostat
> │   └── stat
> └── stat
>
> $ cat iostat
> active_io 0
> sb_wait 0 pending_writes 0
> recovery_active 0
> bitmap pending_writes 0
>
> $ cat stat
> flags 0x20
> sb_flags 0x0
> recovery 0x0
>
> $ cat raid1/iostat
> retry_list active 0
> bio_end_io_list active 0
> pending_bio_list active 0 cnt 0
> sync_pending 0
> nr_pending 0
> nr_waiting 0
> nr_queued 0
> barrier 0

Hi,

Sorry for the late reply.

I think these information are really debug information that we should not
show in /sys. Once we expose them in /sys, we need to support them
because some use space may start searching data from them.

Thanks,
Song

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 0/3] md: export internal stats through debugfs
  2019-07-02 13:29 [RFC PATCH 0/3] md: export internal stats through debugfs Hou Tao
                   ` (3 preceding siblings ...)
  2019-07-22 21:31 ` [RFC PATCH 0/3] md: export internal stats through debugfs Song Liu
@ 2019-07-22 23:30 ` Bob Liu
  2019-07-27  5:55   ` Hou Tao
  4 siblings, 1 reply; 9+ messages in thread
From: Bob Liu @ 2019-07-22 23:30 UTC (permalink / raw)
  To: Hou Tao, linux-raid, songliubraving
  Cc: neilb, linux-block, snitzer, agk, dm-devel, linux-kernel

On 7/2/19 9:29 PM, Hou Tao wrote:
> Hi,
> 
> There are so many io counters, stats and flags in md, so I think
> export these info to userspace will be helpful for online-debugging,

For online-debugging, I'd suggest you have a try eBPF which can be very helpful.

-Bob

> especially when the vmlinux file and the crash utility are not
> available. And these info can also be utilized during code
> understanding.
> 
> MD has already exported some stats through sysfs files under
> /sys/block/mdX/md, but using sysfs file to export more internal
> stats are not a good choice, because we need to create a single
> sysfs file for each internal stat according to the use convention
> of sysfs and there are too many internal stats. Further, the
> newly-created sysfs files would become APIs for userspace tools,
> but that is not we wanted, because these files are related with
> internal stats and internal stats may change from time to time.
> 
> And I think debugfs is a better choice. Because we can show multiple
> related stats in a debugfs file, and the debugfs file will never be
> used as an userspace API.
> 
> Two debugfs files are created to expose these internal stats:
> * iostat: io counters and io related stats (e.g., mddev->active_io,
> 	r1conf->nr_pending, or r1confi->retry_list)
> * stat: normal stats/flags (e.g., mddev->recovery, conf->array_frozen)
> 
> Because internal stats are spreaded all over md-core and md-personality,
> so both md-core and md-personality will create these two debugfs files
> under different debugfs directory.
> 
> Patch 1 factors out the debugfs files creation routine for md-core and
> md-personality, patch 2 creates two debugfs files: iostat & stat under
> /sys/kernel/debug/block/mdX for md-core, and patch 3 creates two debugfs
> files: iostat & stat under /sys/kernel/debug/block/mdX/raid1 for md-raid1.
> 
> The following lines show the hierarchy and the content of these debugfs
> files for a RAID1 device:
> 
> $ pwd
> /sys/kernel/debug/block/md0
> $ tree
> .
> ├── iostat
> ├── raid1
> │   ├── iostat
> │   └── stat
> └── stat
> 
> $ cat iostat
> active_io 0
> sb_wait 0 pending_writes 0
> recovery_active 0
> bitmap pending_writes 0
> 
> $ cat stat
> flags 0x20
> sb_flags 0x0
> recovery 0x0
> 
> $ cat raid1/iostat
> retry_list active 0
> bio_end_io_list active 0
> pending_bio_list active 0 cnt 0
> sync_pending 0
> nr_pending 0
> nr_waiting 0
> nr_queued 0
> barrier 0
> 
> $ cat raid1/stat
> array_frozen 0
> 
> I'm not sure whether the division of internal stats is appropriate and
> whether the internal stats in debugfs files are sufficient, so questions
> and comments are weclome.
> 
> Regards,
> Tao
> 
> Hou Tao (3):
>   md-debugfs: add md_debugfs_create_files()
>   md: export inflight io counters and internal stats in debugfs
>   raid1: export inflight io counters and internal stats in debugfs
> 
>  drivers/md/Makefile     |  2 +-
>  drivers/md/md-debugfs.c | 35 ++++++++++++++++++
>  drivers/md/md-debugfs.h | 16 +++++++++
>  drivers/md/md.c         | 65 ++++++++++++++++++++++++++++++++++
>  drivers/md/md.h         |  1 +
>  drivers/md/raid1.c      | 78 +++++++++++++++++++++++++++++++++++++++++
>  drivers/md/raid1.h      |  1 +
>  7 files changed, 197 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/md/md-debugfs.c
>  create mode 100644 drivers/md/md-debugfs.h
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 0/3] md: export internal stats through debugfs
  2019-07-22 21:31 ` [RFC PATCH 0/3] md: export internal stats through debugfs Song Liu
@ 2019-07-27  5:47   ` Hou Tao
  2019-07-31 21:07     ` Song Liu
  0 siblings, 1 reply; 9+ messages in thread
From: Hou Tao @ 2019-07-27  5:47 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-raid, Song Liu, NeilBrown, linux-block, snitzer, agk,
	dm-devel, open list

Hi,

On 2019/7/23 5:31, Song Liu wrote:
> On Tue, Jul 2, 2019 at 6:25 AM Hou Tao <houtao1@huawei.com> wrote:
>>
>> Hi,
>>
>> There are so many io counters, stats and flags in md, so I think
>> export these info to userspace will be helpful for online-debugging,
>> especially when the vmlinux file and the crash utility are not
>> available. And these info can also be utilized during code
>> understanding.
>>
>> MD has already exported some stats through sysfs files under
>> /sys/block/mdX/md, but using sysfs file to export more internal
>> stats are not a good choice, because we need to create a single
>> sysfs file for each internal stat according to the use convention
>> of sysfs and there are too many internal stats. Further, the
>> newly-created sysfs files would become APIs for userspace tools,
>> but that is not we wanted, because these files are related with
>> internal stats and internal stats may change from time to time.
>>
>> And I think debugfs is a better choice. Because we can show multiple
>> related stats in a debugfs file, and the debugfs file will never be
>> used as an userspace API.
>>
>> Two debugfs files are created to expose these internal stats:
>> * iostat: io counters and io related stats (e.g., mddev->active_io,
>>         r1conf->nr_pending, or r1confi->retry_list)
>> * stat: normal stats/flags (e.g., mddev->recovery, conf->array_frozen)
>>
>> Because internal stats are spreaded all over md-core and md-personality,
>> so both md-core and md-personality will create these two debugfs files
>> under different debugfs directory.
>>
>> Patch 1 factors out the debugfs files creation routine for md-core and
>> md-personality, patch 2 creates two debugfs files: iostat & stat under
>> /sys/kernel/debug/block/mdX for md-core, and patch 3 creates two debugfs
>> files: iostat & stat under /sys/kernel/debug/block/mdX/raid1 for md-raid1.
>>
>> The following lines show the hierarchy and the content of these debugfs
>> files for a RAID1 device:
>>
>> $ pwd
>> /sys/kernel/debug/block/md0
>> $ tree
>> .
>> ├── iostat
>> ├── raid1
>> │   ├── iostat
>> │   └── stat
>> └── stat
>>
>> $ cat iostat
>> active_io 0
>> sb_wait 0 pending_writes 0
>> recovery_active 0
>> bitmap pending_writes 0
>>
>> $ cat stat
>> flags 0x20
>> sb_flags 0x0
>> recovery 0x0
>>
>> $ cat raid1/iostat
>> retry_list active 0
>> bio_end_io_list active 0
>> pending_bio_list active 0 cnt 0
>> sync_pending 0
>> nr_pending 0
>> nr_waiting 0
>> nr_queued 0
>> barrier 0
> 
> Hi,
> 
> Sorry for the late reply.
> 
> I think these information are really debug information that we should not
> show in /sys. Once we expose them in /sys, we need to support them
> because some use space may start searching data from them.
So debugfs is used to place these debug information instead of sysfs.

It's OK for user-space tools to search data from these files as long as these
tools don't expect these information to be stable. And the most possible user
of these files would be test programs, and if some user-space tools may truly
expect some stable information from the debugfs file, maybe we should move
these information from debugfs to sysfs file.

Regards,
Tao

> 
> Thanks,
> Song
> 
> .
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 0/3] md: export internal stats through debugfs
  2019-07-22 23:30 ` Bob Liu
@ 2019-07-27  5:55   ` Hou Tao
  0 siblings, 0 replies; 9+ messages in thread
From: Hou Tao @ 2019-07-27  5:55 UTC (permalink / raw)
  To: Bob Liu, linux-raid, songliubraving
  Cc: neilb, linux-block, snitzer, agk, dm-devel, linux-kernel

Hi,

On 2019/7/23 7:30, Bob Liu wrote:
> On 7/2/19 9:29 PM, Hou Tao wrote:
>> Hi,
>>
>> There are so many io counters, stats and flags in md, so I think
>> export these info to userspace will be helpful for online-debugging,
> 
> For online-debugging, I'd suggest you have a try eBPF which can be very helpful.
> 

Thanks for your suggestion. Using an eBPF program to read these internal status
from a host which is fully under your control is a good choice, but when
the dependencies of an eBPF program (e.g., the minor version of the kernel
and the kernel configuration which will influence the struct layout) is
out-of-your-control, it's not a good choice.

Thanks,
Tao

> -Bob
> 
>> especially when the vmlinux file and the crash utility are not
>> available. And these info can also be utilized during code
>> understanding.
>>
>> MD has already exported some stats through sysfs files under
>> /sys/block/mdX/md, but using sysfs file to export more internal
>> stats are not a good choice, because we need to create a single
>> sysfs file for each internal stat according to the use convention
>> of sysfs and there are too many internal stats. Further, the
>> newly-created sysfs files would become APIs for userspace tools,
>> but that is not we wanted, because these files are related with
>> internal stats and internal stats may change from time to time.
>>
>> And I think debugfs is a better choice. Because we can show multiple
>> related stats in a debugfs file, and the debugfs file will never be
>> used as an userspace API.
>>
>> Two debugfs files are created to expose these internal stats:
>> * iostat: io counters and io related stats (e.g., mddev->active_io,
>> 	r1conf->nr_pending, or r1confi->retry_list)
>> * stat: normal stats/flags (e.g., mddev->recovery, conf->array_frozen)
>>
>> Because internal stats are spreaded all over md-core and md-personality,
>> so both md-core and md-personality will create these two debugfs files
>> under different debugfs directory.
>>
>> Patch 1 factors out the debugfs files creation routine for md-core and
>> md-personality, patch 2 creates two debugfs files: iostat & stat under
>> /sys/kernel/debug/block/mdX for md-core, and patch 3 creates two debugfs
>> files: iostat & stat under /sys/kernel/debug/block/mdX/raid1 for md-raid1.
>>
>> The following lines show the hierarchy and the content of these debugfs
>> files for a RAID1 device:
>>
>> $ pwd
>> /sys/kernel/debug/block/md0
>> $ tree
>> .
>> ├── iostat
>> ├── raid1
>> │   ├── iostat
>> │   └── stat
>> └── stat
>>
>> $ cat iostat
>> active_io 0
>> sb_wait 0 pending_writes 0
>> recovery_active 0
>> bitmap pending_writes 0
>>
>> $ cat stat
>> flags 0x20
>> sb_flags 0x0
>> recovery 0x0
>>
>> $ cat raid1/iostat
>> retry_list active 0
>> bio_end_io_list active 0
>> pending_bio_list active 0 cnt 0
>> sync_pending 0
>> nr_pending 0
>> nr_waiting 0
>> nr_queued 0
>> barrier 0
>>
>> $ cat raid1/stat
>> array_frozen 0
>>
>> I'm not sure whether the division of internal stats is appropriate and
>> whether the internal stats in debugfs files are sufficient, so questions
>> and comments are weclome.
>>
>> Regards,
>> Tao
>>
>> Hou Tao (3):
>>   md-debugfs: add md_debugfs_create_files()
>>   md: export inflight io counters and internal stats in debugfs
>>   raid1: export inflight io counters and internal stats in debugfs
>>
>>  drivers/md/Makefile     |  2 +-
>>  drivers/md/md-debugfs.c | 35 ++++++++++++++++++
>>  drivers/md/md-debugfs.h | 16 +++++++++
>>  drivers/md/md.c         | 65 ++++++++++++++++++++++++++++++++++
>>  drivers/md/md.h         |  1 +
>>  drivers/md/raid1.c      | 78 +++++++++++++++++++++++++++++++++++++++++
>>  drivers/md/raid1.h      |  1 +
>>  7 files changed, 197 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/md/md-debugfs.c
>>  create mode 100644 drivers/md/md-debugfs.h
>>
> 
> 
> .
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 0/3] md: export internal stats through debugfs
  2019-07-27  5:47   ` Hou Tao
@ 2019-07-31 21:07     ` Song Liu
  0 siblings, 0 replies; 9+ messages in thread
From: Song Liu @ 2019-07-31 21:07 UTC (permalink / raw)
  To: Hou Tao
  Cc: linux-raid, Song Liu, NeilBrown, linux-block, Mike Snitzer, agk,
	dm-devel, open list

On Fri, Jul 26, 2019 at 10:48 PM Hou Tao <houtao1@huawei.com> wrote:
>
> Hi,
>
[...]
> >
> > Hi,
> >
> > Sorry for the late reply.
> >
> > I think these information are really debug information that we should not
> > show in /sys. Once we expose them in /sys, we need to support them
> > because some use space may start searching data from them.
> So debugfs is used to place these debug information instead of sysfs.

I don't think we should dump random information into debugfs. It is common
for the developers to carry some local patches that dumps information for
debug. We cannot get these patches upstream.

Thanks,
Song

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-07-31 21:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-02 13:29 [RFC PATCH 0/3] md: export internal stats through debugfs Hou Tao
2019-07-02 13:29 ` [RFC PATCH 1/3] md-debugfs: add md_debugfs_create_files() Hou Tao
2019-07-02 13:29 ` [RFC PATCH 2/3] md: export inflight io counters and internal stats in debugfs Hou Tao
2019-07-02 13:29 ` [RFC PATCH 3/3] raid1: " Hou Tao
2019-07-22 21:31 ` [RFC PATCH 0/3] md: export internal stats through debugfs Song Liu
2019-07-27  5:47   ` Hou Tao
2019-07-31 21:07     ` Song Liu
2019-07-22 23:30 ` Bob Liu
2019-07-27  5:55   ` Hou Tao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).