Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v4 0/2] readmirror feature (sysfs and in-memory only approach)
@ 2020-01-05 15:14 Anand Jain
  2020-01-05 15:14 ` [PATCH 1/2] btrfs: add read_policy framework Anand Jain
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Anand Jain @ 2020-01-05 15:14 UTC (permalink / raw)
  To: linux-btrfs; +Cc: josef, btrfs-list

v4:
Rename readmirror attribute to read_policy. Drop separate kobj for
readmirror instead create read_policy attribute in fsid kobj.
merge v2:2/3 and v2:3/3 into v4:2/2. Patch titles have changed.
 
v3:
v2:
Mainly fixes the fs_devices::readmirror declaration type from atomic_t
to u8. (Thanks Josef).

v1:
As of now we use only %pid method to read stripped mirrored data. So
application's process id determines the stripe id to be read. This type
of routing typically helps in a system with many small independent
applications tying to read random data. On the other hand the %pid
based read IO distribution policy is inefficient if there is a single
application trying to read large data as because the overall disk
bandwidth would remains under utilized.

One type of readmirror policy isn't good enough and other choices are
routing the IO based on device's waitqueue or manual when we have a
read-preferred device or a readmirror policy based on the target storage
caching. So this patch-set introduces a framework where we could add more
readmirror policies.

This policy is a filesystem wide policy as of now, and though the
readmirror policy at the subvolume level is a novel approach as it
provides maximum flexibility in the data center, but as of now its not
practical to implement such a granularity as you can't really ensure
reflinked extents will be read from the stripe of its desire and so
there will be more limitations and it can be assessed separately.

The approach in this patch-set is sys interface with in-memory policy.
And does not add any new readmirror type in this set, which can be add
once we are ok with the framework. Also the default policy remains %pid.

Previous works:
----------------------------------------------------------------------
There were few RFCs [1] before, mainly to figure out storage
(or in memory only) for the readmirror policy and the interface needed.

[1]
https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg86368.html

https://lore.kernel.org/linux-btrfs/20190826090438.7044-1-anand.jain@oracle.com/

https://lore.kernel.org/linux-btrfs/5fcf9c23-89b5-b167-1f80-a0f4ac107d0b@oracle.com/

https://patchwork.kernel.org/cover/10859213/

Mount -o:
In the first trial it was attempted to use the mount -o option to carry
the readmirror policy, this is good for debugging which can make sure
even the mount thread metadata tree blocks are read from the disk desired.
It was very effective in testing radi1/raid10 write-holes.

Extended attribute:
As extended attribute is associated with the inode, to implement this
there is bit of extended attribute abuse or else makes it mandatory to
mount the rootid 5. Its messy unless readmirror policy is applied at the
subvol level which is not possible as of now. 

An item type:
The proposed patch was to create an item to hold the readmirror policy,
it makes sense when compared to the abusive extended attribute approach
but introduces a new item and so no backward compatibility.
-----------------------------------------------------------------------



Anand Jain (2):
  btrfs: add read_policy framework
  btrfs: sysfs, add read_policy attribute

 fs/btrfs/sysfs.c   | 67 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.c | 16 ++++++++++-
 fs/btrfs/volumes.h | 10 +++++++
 3 files changed, 92 insertions(+), 1 deletion(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] btrfs: add read_policy framework
  2020-01-05 15:14 [PATCH v4 0/2] readmirror feature (sysfs and in-memory only approach) Anand Jain
@ 2020-01-05 15:14 ` Anand Jain
  2020-01-06 16:22   ` Josef Bacik
  2020-01-29 18:26   ` David Sterba
  2020-01-05 15:14 ` [PATCH 2/2] btrfs: sysfs, add read_policy attribute Anand Jain
  2020-01-29 18:07 ` [PATCH v4 0/2] readmirror feature (sysfs and in-memory only approach) David Sterba
  2 siblings, 2 replies; 14+ messages in thread
From: Anand Jain @ 2020-01-05 15:14 UTC (permalink / raw)
  To: linux-btrfs; +Cc: josef, btrfs-list

As of now we use %pid method to read stripped mirrored data, which means
application's process id determines the stripe id to be read. This type
of read IO routing typically helps in a system with many small
independent applications tying to read random data. On the other hand
the %pid based read IO distribution policy is inefficient because if
there is a single application trying to read large data the overall disk
bandwidth remains under-utilized.

So this patch introduces read policy framework so that we could add more
read policies, such as IO routing based on device's wait-queue or manual
when we have a read-preferred device or a policy based on the target
storage caching.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
[Patch name changed]

v3: Declare fs_devices::readmirror as enum btrfs_readmirror_policy_type
v2: Declare fs_devices::readmirror as u8 instead of atomic_t
    A small change in comment and change log wordings.

 fs/btrfs/volumes.c | 16 +++++++++++++++-
 fs/btrfs/volumes.h |  9 +++++++++
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c95e47aa84f8..2ffffdf1d314 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1162,6 +1162,8 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices,
 	fs_devices->opened = 1;
 	fs_devices->latest_bdev = latest_dev->bdev;
 	fs_devices->total_rw_bytes = 0;
+	/* Set the default read policy */
+	fs_devices->read_policy = BTRFS_READ_POLICY_DEFAULT;
 out:
 	return ret;
 }
@@ -5300,7 +5302,19 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
 	else
 		num_stripes = map->num_stripes;
 
-	preferred_mirror = first + current->pid % num_stripes;
+	switch (fs_info->fs_devices->read_policy) {
+	case BTRFS_READ_BY_PID:
+		preferred_mirror = first + current->pid % num_stripes;
+		break;
+	default:
+		/*
+		 * Shouln't happen, just warn and use by_pid instead of failing.
+		 */
+		btrfs_warn_rl(fs_info,
+			      "unknown read_policy type %u, fallback to by_pid",
+			      fs_info->fs_devices->read_policy);
+		preferred_mirror = first + current->pid % num_stripes;
+	}
 
 	if (dev_replace_is_ongoing &&
 	    fs_info->dev_replace.cont_reading_from_srcdev_mode ==
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 68021d1ee216..3bbf0e51433f 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -209,6 +209,13 @@ BTRFS_DEVICE_GETSET_FUNCS(total_bytes);
 BTRFS_DEVICE_GETSET_FUNCS(disk_total_bytes);
 BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
 
+/* read_policy types */
+#define BTRFS_READ_POLICY_DEFAULT	BTRFS_READ_BY_PID
+enum btrfs_read_policy_type {
+	BTRFS_READ_BY_PID,
+	BTRFS_NR_READ_POLICY_TYPE,
+};
+
 struct btrfs_fs_devices {
 	u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
 	u8 metadata_uuid[BTRFS_FSID_SIZE];
@@ -260,6 +267,8 @@ struct btrfs_fs_devices {
 	struct kobject *devices_kobj;
 	struct kobject *devinfo_kobj;
 	struct completion kobj_unregister;
+
+	enum btrfs_read_policy_type read_policy;
 };
 
 #define BTRFS_BIO_INLINE_CSUM_SIZE	64
-- 
2.23.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/2] btrfs: sysfs, add read_policy attribute
  2020-01-05 15:14 [PATCH v4 0/2] readmirror feature (sysfs and in-memory only approach) Anand Jain
  2020-01-05 15:14 ` [PATCH 1/2] btrfs: add read_policy framework Anand Jain
@ 2020-01-05 15:14 ` Anand Jain
  2020-01-06 16:21   ` Josef Bacik
  2020-01-29 18:07 ` [PATCH v4 0/2] readmirror feature (sysfs and in-memory only approach) David Sterba
  2 siblings, 1 reply; 14+ messages in thread
From: Anand Jain @ 2020-01-05 15:14 UTC (permalink / raw)
  To: linux-btrfs; +Cc: josef, btrfs-list

Add

 /sys/fs/btrfs/UUID/read_policy

attribute so that the read policy for the raid1 and raid10 chunks can be
tuned.

When this attribute is read, it shall show all available policies, and
the active policy is with in [ ], read_policy attribute can be written
using one of the items showed in the read.

For example:
cat /sys/fs/btrfs/UUID/read_policy
[by_pid]
echo by_pid > /sys/fs/btrfs/UUID/read_policy
echo -n by_pid > /sys/fs/btrfs/UUID/read_policy

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/sysfs.c   | 67 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.h |  1 +
 2 files changed, 68 insertions(+)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index d414b98fb27f..ae2935184d75 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -642,6 +642,72 @@ static ssize_t btrfs_checksum_show(struct kobject *kobj,
 
 BTRFS_ATTR(, checksum, btrfs_checksum_show);
 
+static const inline char *btrfs_read_policy_name(enum btrfs_read_policy_type type)
+{
+	switch (type) {
+	case BTRFS_READ_BY_PID:
+		return "by_pid";
+	default:
+		return "null";
+	}
+}
+
+static ssize_t btrfs_read_policy_show(struct kobject *kobj,
+				      struct kobj_attribute *a, char *buf)
+{
+	int i;
+	ssize_t len = 0;
+	struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj);
+
+	for (i = 0; i < BTRFS_NR_READ_POLICY_TYPE; i++) {
+		if (len)
+			len += snprintf(buf + len, PAGE_SIZE, " ");
+		if (fs_devices->read_policy == i)
+			len += snprintf(buf + len, PAGE_SIZE, "[%s]",
+					btrfs_read_policy_name(i));
+		else
+			len += snprintf(buf + len, PAGE_SIZE, "%s",
+					btrfs_read_policy_name(i));
+	}
+
+	len += snprintf(buf + len, PAGE_SIZE, "\n");
+
+	return len;
+}
+
+static ssize_t btrfs_read_policy_store(struct kobject *kobj,
+				       struct kobj_attribute *a,
+				       const char *buf, size_t len)
+{
+	int i;
+	char *stripped;
+	char *policy_name;
+	struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj);
+
+	policy_name = kstrdup(buf, GFP_KERNEL);
+	if (!policy_name)
+		return -ENOMEM;
+
+	stripped = strstrip(policy_name);
+	if (strlen(stripped) > BTRFS_READ_POLICY_NAME_MAX) {
+		kfree(policy_name);
+		return -EINVAL;
+	}
+
+	for (i = 0; i < BTRFS_NR_READ_POLICY_TYPE; i++) {
+		if (strncmp(stripped, btrfs_read_policy_name(i),
+			    strlen(stripped)) == 0) {
+			fs_devices->read_policy = i;
+			kfree(policy_name);
+			return len;
+		}
+	}
+
+	kfree(policy_name);
+	return -EINVAL;
+}
+BTRFS_ATTR_RW(, read_policy, btrfs_read_policy_show, btrfs_read_policy_store);
+
 static const struct attribute *btrfs_attrs[] = {
 	BTRFS_ATTR_PTR(, label),
 	BTRFS_ATTR_PTR(, nodesize),
@@ -650,6 +716,7 @@ static const struct attribute *btrfs_attrs[] = {
 	BTRFS_ATTR_PTR(, quota_override),
 	BTRFS_ATTR_PTR(, metadata_uuid),
 	BTRFS_ATTR_PTR(, checksum),
+	BTRFS_ATTR_PTR(, read_policy),
 	NULL,
 };
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 3bbf0e51433f..dd8a8d2fbbe1 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -210,6 +210,7 @@ BTRFS_DEVICE_GETSET_FUNCS(disk_total_bytes);
 BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
 
 /* read_policy types */
+#define BTRFS_READ_POLICY_NAME_MAX	12
 #define BTRFS_READ_POLICY_DEFAULT	BTRFS_READ_BY_PID
 enum btrfs_read_policy_type {
 	BTRFS_READ_BY_PID,
-- 
2.23.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] btrfs: sysfs, add read_policy attribute
  2020-01-05 15:14 ` [PATCH 2/2] btrfs: sysfs, add read_policy attribute Anand Jain
@ 2020-01-06 16:21   ` Josef Bacik
  2020-01-07  4:52     ` [PATCH v2 " Anand Jain
  0 siblings, 1 reply; 14+ messages in thread
From: Josef Bacik @ 2020-01-06 16:21 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs; +Cc: btrfs-list

On 1/5/20 10:14 AM, Anand Jain wrote:
> Add
> 
>   /sys/fs/btrfs/UUID/read_policy
> 
> attribute so that the read policy for the raid1 and raid10 chunks can be
> tuned.
> 
> When this attribute is read, it shall show all available policies, and
> the active policy is with in [ ], read_policy attribute can be written
> using one of the items showed in the read.
> 
> For example:
> cat /sys/fs/btrfs/UUID/read_policy
> [by_pid]
> echo by_pid > /sys/fs/btrfs/UUID/read_policy
> echo -n by_pid > /sys/fs/btrfs/UUID/read_policy
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>   fs/btrfs/sysfs.c   | 67 ++++++++++++++++++++++++++++++++++++++++++++++
>   fs/btrfs/volumes.h |  1 +
>   2 files changed, 68 insertions(+)
> 
> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
> index d414b98fb27f..ae2935184d75 100644
> --- a/fs/btrfs/sysfs.c
> +++ b/fs/btrfs/sysfs.c
> @@ -642,6 +642,72 @@ static ssize_t btrfs_checksum_show(struct kobject *kobj,
>   
>   BTRFS_ATTR(, checksum, btrfs_checksum_show);
>   
> +static const inline char *btrfs_read_policy_name(enum btrfs_read_policy_type type)
> +{
> +	switch (type) {
> +	case BTRFS_READ_BY_PID:
> +		return "by_pid";
> +	default:
> +		return "null";
> +	}
> +}
> +
> +static ssize_t btrfs_read_policy_show(struct kobject *kobj,
> +				      struct kobj_attribute *a, char *buf)
> +{
> +	int i;
> +	ssize_t len = 0;
> +	struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj);
> +
> +	for (i = 0; i < BTRFS_NR_READ_POLICY_TYPE; i++) {
> +		if (len)
> +			len += snprintf(buf + len, PAGE_SIZE, " ");
> +		if (fs_devices->read_policy == i)
> +			len += snprintf(buf + len, PAGE_SIZE, "[%s]",
> +					btrfs_read_policy_name(i));
> +		else
> +			len += snprintf(buf + len, PAGE_SIZE, "%s",
> +					btrfs_read_policy_name(i));
> +	}
> +
> +	len += snprintf(buf + len, PAGE_SIZE, "\n");
> +
> +	return len;
> +}
> +
> +static ssize_t btrfs_read_policy_store(struct kobject *kobj,
> +				       struct kobj_attribute *a,
> +				       const char *buf, size_t len)
> +{
> +	int i;
> +	char *stripped;
> +	char *policy_name;
> +	struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj);
> +
> +	policy_name = kstrdup(buf, GFP_KERNEL);
> +	if (!policy_name)
> +		return -ENOMEM;
> +
> +	stripped = strstrip(policy_name);
> +	if (strlen(stripped) > BTRFS_READ_POLICY_NAME_MAX) {
> +		kfree(policy_name);
> +		return -EINVAL;
> +	}

We have the len passed to us, let's do the length check _before_ we arbitrarily 
kstrdup().  Thanks,

Josef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] btrfs: add read_policy framework
  2020-01-05 15:14 ` [PATCH 1/2] btrfs: add read_policy framework Anand Jain
@ 2020-01-06 16:22   ` Josef Bacik
  2020-01-29 18:26   ` David Sterba
  1 sibling, 0 replies; 14+ messages in thread
From: Josef Bacik @ 2020-01-06 16:22 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs; +Cc: btrfs-list

On 1/5/20 10:14 AM, Anand Jain wrote:
> As of now we use %pid method to read stripped mirrored data, which means
> application's process id determines the stripe id to be read. This type
> of read IO routing typically helps in a system with many small
> independent applications tying to read random data. On the other hand
> the %pid based read IO distribution policy is inefficient because if
> there is a single application trying to read large data the overall disk
> bandwidth remains under-utilized.
> 
> So this patch introduces read policy framework so that we could add more
> read policies, such as IO routing based on device's wait-queue or manual
> when we have a read-preferred device or a policy based on the target
> storage caching.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 2/2] btrfs: sysfs, add read_policy attribute
  2020-01-06 16:21   ` Josef Bacik
@ 2020-01-07  4:52     ` " Anand Jain
  2020-01-07 15:03       ` Josef Bacik
  2020-01-07 15:25       ` Holger Hoffstätte
  0 siblings, 2 replies; 14+ messages in thread
From: Anand Jain @ 2020-01-07  4:52 UTC (permalink / raw)
  To: linux-btrfs

Add

 /sys/fs/btrfs/UUID/read_policy

attribute so that the read policy for the raid1 and raid10 chunks can be
tuned.

When this attribute is read, it shall show all available policies, and
the active policy is with in [ ], read_policy attribute can be written
using one of the items showed in the read.

For example:
cat /sys/fs/btrfs/UUID/read_policy
[by_pid]
echo by_pid > /sys/fs/btrfs/UUID/read_policy
echo -n by_pid > /sys/fs/btrfs/UUID/read_policy

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v2: check input len before strip and kstrdup.

 fs/btrfs/sysfs.c   | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.h |  1 +
 2 files changed, 67 insertions(+)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 18dac99188ce..5092a0a26241 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -642,6 +642,71 @@ static ssize_t btrfs_checksum_show(struct kobject *kobj,
 
 BTRFS_ATTR(, checksum, btrfs_checksum_show);
 
+static const inline char *btrfs_read_policy_name(enum btrfs_read_policy_type type)
+{
+	switch (type) {
+	case BTRFS_READ_BY_PID:
+		return "by_pid";
+	default:
+		return "null";
+	}
+}
+
+static ssize_t btrfs_read_policy_show(struct kobject *kobj,
+				      struct kobj_attribute *a, char *buf)
+{
+	int i;
+	ssize_t len = 0;
+	struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj);
+
+	for (i = 0; i < BTRFS_NR_READ_POLICY_TYPE; i++) {
+		if (len)
+			len += snprintf(buf + len, PAGE_SIZE, " ");
+		if (fs_devices->read_policy == i)
+			len += snprintf(buf + len, PAGE_SIZE, "[%s]",
+					btrfs_read_policy_name(i));
+		else
+			len += snprintf(buf + len, PAGE_SIZE, "%s",
+					btrfs_read_policy_name(i));
+	}
+
+	len += snprintf(buf + len, PAGE_SIZE, "\n");
+
+	return len;
+}
+
+static ssize_t btrfs_read_policy_store(struct kobject *kobj,
+				       struct kobj_attribute *a,
+				       const char *buf, size_t len)
+{
+	int i;
+	char *stripped;
+	char *policy_name;
+	struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj);
+
+	if (len > BTRFS_READ_POLICY_NAME_MAX)
+		return -EINVAL;
+
+	policy_name = kstrdup(buf, GFP_KERNEL);
+	if (!policy_name)
+		return -ENOMEM;
+
+	stripped = strstrip(policy_name);
+
+	for (i = 0; i < BTRFS_NR_READ_POLICY_TYPE; i++) {
+		if (strncmp(stripped, btrfs_read_policy_name(i),
+			    strlen(stripped)) == 0) {
+			fs_devices->read_policy = i;
+			kfree(policy_name);
+			return len;
+		}
+	}
+
+	kfree(policy_name);
+	return -EINVAL;
+}
+BTRFS_ATTR_RW(, read_policy, btrfs_read_policy_show, btrfs_read_policy_store);
+
 static const struct attribute *btrfs_attrs[] = {
 	BTRFS_ATTR_PTR(, label),
 	BTRFS_ATTR_PTR(, nodesize),
@@ -650,6 +715,7 @@ static ssize_t btrfs_checksum_show(struct kobject *kobj,
 	BTRFS_ATTR_PTR(, quota_override),
 	BTRFS_ATTR_PTR(, metadata_uuid),
 	BTRFS_ATTR_PTR(, checksum),
+	BTRFS_ATTR_PTR(, read_policy),
 	NULL,
 };
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 401dc32e0caa..022271898e0f 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -211,6 +211,7 @@ struct btrfs_device {
 BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
 
 /* read_policy types */
+#define BTRFS_READ_POLICY_NAME_MAX	12
 #define BTRFS_READ_POLICY_DEFAULT	BTRFS_READ_BY_PID
 enum btrfs_read_policy_type {
 	BTRFS_READ_BY_PID,
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] btrfs: sysfs, add read_policy attribute
  2020-01-07  4:52     ` [PATCH v2 " Anand Jain
@ 2020-01-07 15:03       ` Josef Bacik
  2020-01-07 15:25       ` Holger Hoffstätte
  1 sibling, 0 replies; 14+ messages in thread
From: Josef Bacik @ 2020-01-07 15:03 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs

On 1/6/20 11:52 PM, Anand Jain wrote:
> Add
> 
>   /sys/fs/btrfs/UUID/read_policy
> 
> attribute so that the read policy for the raid1 and raid10 chunks can be
> tuned.
> 
> When this attribute is read, it shall show all available policies, and
> the active policy is with in [ ], read_policy attribute can be written
> using one of the items showed in the read.
> 
> For example:
> cat /sys/fs/btrfs/UUID/read_policy
> [by_pid]
> echo by_pid > /sys/fs/btrfs/UUID/read_policy
> echo -n by_pid > /sys/fs/btrfs/UUID/read_policy
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] btrfs: sysfs, add read_policy attribute
  2020-01-07  4:52     ` [PATCH v2 " Anand Jain
  2020-01-07 15:03       ` Josef Bacik
@ 2020-01-07 15:25       ` Holger Hoffstätte
  2020-01-08  4:16         ` [PATCH v3 " Anand Jain
  1 sibling, 1 reply; 14+ messages in thread
From: Holger Hoffstätte @ 2020-01-07 15:25 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs

On 1/7/20 5:52 AM, Anand Jain wrote:
> Add
> 
>   /sys/fs/btrfs/UUID/read_policy
> 
> attribute so that the read policy for the raid1 and raid10 chunks can be
> tuned.
> 
> When this attribute is read, it shall show all available policies, and
> the active policy is with in [ ], read_policy attribute can be written
> using one of the items showed in the read.
> 
> For example:
> cat /sys/fs/btrfs/UUID/read_policy
> [by_pid]
> echo by_pid > /sys/fs/btrfs/UUID/read_policy
> echo -n by_pid > /sys/fs/btrfs/UUID/read_policy

This may seem like pointless bikeshedding, but can we please name the policy
without the leading "by_", i.e. only "pid"? By definition what happens is always
"by" the chosen policy, so it's redundant.

Otherwise a great step forward, thank you!

cheers
Holger

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 2/2] btrfs: sysfs, add read_policy attribute
  2020-01-07 15:25       ` Holger Hoffstätte
@ 2020-01-08  4:16         ` " Anand Jain
  2020-01-29 18:49           ` David Sterba
  0 siblings, 1 reply; 14+ messages in thread
From: Anand Jain @ 2020-01-08  4:16 UTC (permalink / raw)
  To: linux-btrfs

Add

 /sys/fs/btrfs/UUID/read_policy

attribute so that the read policy for the raid1 and raid10 chunks can be
tuned.

When this attribute is read, it shall show all available policies, and
the active policy is with in [ ], read_policy attribute can be written
using one of the items showed in the read.

For example:
cat /sys/fs/btrfs/UUID/read_policy
[by_pid]
echo by_pid > /sys/fs/btrfs/UUID/read_policy
echo -n by_pid > /sys/fs/btrfs/UUID/read_policy

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
v3: rename [by_pid] to [pid]
v2: v2: check input len before strip and kstrdup

 fs/btrfs/sysfs.c   | 66 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.h |  1 +
 2 files changed, 67 insertions(+)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 104a97586744..cc4a642878a1 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -809,6 +809,71 @@ static ssize_t btrfs_checksum_show(struct kobject *kobj,
 
 BTRFS_ATTR(, checksum, btrfs_checksum_show);
 
+static const inline char *btrfs_read_policy_name(enum btrfs_read_policy_type type)
+{
+	switch (type) {
+	case BTRFS_READ_BY_PID:
+		return "pid";
+	default:
+		return "null";
+	}
+}
+
+static ssize_t btrfs_read_policy_show(struct kobject *kobj,
+				      struct kobj_attribute *a, char *buf)
+{
+	int i;
+	ssize_t len = 0;
+	struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj);
+
+	for (i = 0; i < BTRFS_NR_READ_POLICY_TYPE; i++) {
+		if (len)
+			len += snprintf(buf + len, PAGE_SIZE, " ");
+		if (fs_devices->read_policy == i)
+			len += snprintf(buf + len, PAGE_SIZE, "[%s]",
+					btrfs_read_policy_name(i));
+		else
+			len += snprintf(buf + len, PAGE_SIZE, "%s",
+					btrfs_read_policy_name(i));
+	}
+
+	len += snprintf(buf + len, PAGE_SIZE, "\n");
+
+	return len;
+}
+
+static ssize_t btrfs_read_policy_store(struct kobject *kobj,
+				       struct kobj_attribute *a,
+				       const char *buf, size_t len)
+{
+	int i;
+	char *stripped;
+	char *policy_name;
+	struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj);
+
+	if (len > BTRFS_READ_POLICY_NAME_MAX)
+		return -EINVAL;
+
+	policy_name = kstrdup(buf, GFP_KERNEL);
+	if (!policy_name)
+		return -ENOMEM;
+
+	stripped = strstrip(policy_name);
+
+	for (i = 0; i < BTRFS_NR_READ_POLICY_TYPE; i++) {
+		if (strncmp(stripped, btrfs_read_policy_name(i),
+			    strlen(stripped)) == 0) {
+			fs_devices->read_policy = i;
+			kfree(policy_name);
+			return len;
+		}
+	}
+
+	kfree(policy_name);
+	return -EINVAL;
+}
+BTRFS_ATTR_RW(, read_policy, btrfs_read_policy_show, btrfs_read_policy_store);
+
 static const struct attribute *btrfs_attrs[] = {
 	BTRFS_ATTR_PTR(, label),
 	BTRFS_ATTR_PTR(, nodesize),
@@ -817,6 +882,7 @@ static const struct attribute *btrfs_attrs[] = {
 	BTRFS_ATTR_PTR(, quota_override),
 	BTRFS_ATTR_PTR(, metadata_uuid),
 	BTRFS_ATTR_PTR(, checksum),
+	BTRFS_ATTR_PTR(, read_policy),
 	NULL,
 };
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 46f4e2258203..fe1494d95893 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -209,6 +209,7 @@ BTRFS_DEVICE_GETSET_FUNCS(disk_total_bytes);
 BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
 
 /* read_policy types */
+#define BTRFS_READ_POLICY_NAME_MAX	12
 #define BTRFS_READ_POLICY_DEFAULT	BTRFS_READ_BY_PID
 enum btrfs_read_policy_type {
 	BTRFS_READ_BY_PID,
-- 
2.23.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 0/2] readmirror feature (sysfs and in-memory only approach)
  2020-01-05 15:14 [PATCH v4 0/2] readmirror feature (sysfs and in-memory only approach) Anand Jain
  2020-01-05 15:14 ` [PATCH 1/2] btrfs: add read_policy framework Anand Jain
  2020-01-05 15:14 ` [PATCH 2/2] btrfs: sysfs, add read_policy attribute Anand Jain
@ 2020-01-29 18:07 ` David Sterba
  2 siblings, 0 replies; 14+ messages in thread
From: David Sterba @ 2020-01-29 18:07 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, josef, btrfs-list

On Sun, Jan 05, 2020 at 11:14:00PM +0800, Anand Jain wrote:
> There were few RFCs [1] before, mainly to figure out storage
> (or in memory only) for the readmirror policy and the interface needed.

The persistent storage of the policy is not settled but adding the sysfs
value that's both read and write is safe enough so this patchset is in
the list of features for 5.7.

With the new raid1c34 profiles we'll need a better mirror selection to
utilize all the copies more effectively. Besides the default PID policy
we should have at least one more so it's not just the bare interface
without any real use.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] btrfs: add read_policy framework
  2020-01-05 15:14 ` [PATCH 1/2] btrfs: add read_policy framework Anand Jain
  2020-01-06 16:22   ` Josef Bacik
@ 2020-01-29 18:26   ` David Sterba
  2020-02-11  8:31     ` Anand Jain
  1 sibling, 1 reply; 14+ messages in thread
From: David Sterba @ 2020-01-29 18:26 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, josef, btrfs-list

On Sun, Jan 05, 2020 at 11:14:01PM +0800, Anand Jain wrote:
> As of now we use %pid method to read stripped mirrored data, which means
> application's process id determines the stripe id to be read. This type
> of read IO routing typically helps in a system with many small
> independent applications tying to read random data. On the other hand
> the %pid based read IO distribution policy is inefficient because if
> there is a single application trying to read large data the overall disk
> bandwidth remains under-utilized.

AFAIK it's not always the application pid, for compression or metadata
it's the pid of btrfs worker thread. So it depends.

> So this patch introduces read policy framework so that we could add more
> read policies, such as IO routing based on device's wait-queue or manual
> when we have a read-preferred device or a policy based on the target
> storage caching.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
> [Patch name changed]
> 
> v3: Declare fs_devices::readmirror as enum btrfs_readmirror_policy_type
> v2: Declare fs_devices::readmirror as u8 instead of atomic_t
>     A small change in comment and change log wordings.
> 
>  fs/btrfs/volumes.c | 16 +++++++++++++++-
>  fs/btrfs/volumes.h |  9 +++++++++
>  2 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index c95e47aa84f8..2ffffdf1d314 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1162,6 +1162,8 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices,
>  	fs_devices->opened = 1;
>  	fs_devices->latest_bdev = latest_dev->bdev;
>  	fs_devices->total_rw_bytes = 0;
> +	/* Set the default read policy */

You can drop this comment

> +	fs_devices->read_policy = BTRFS_READ_POLICY_DEFAULT;
>  out:
>  	return ret;
>  }
> @@ -5300,7 +5302,19 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
>  	else
>  		num_stripes = map->num_stripes;
>  
> -	preferred_mirror = first + current->pid % num_stripes;
> +	switch (fs_info->fs_devices->read_policy) {
> +	case BTRFS_READ_BY_PID:
> +		preferred_mirror = first + current->pid % num_stripes;
> +		break;
> +	default:
> +		/*
> +		 * Shouln't happen, just warn and use by_pid instead of failing.
> +		 */
> +		btrfs_warn_rl(fs_info,
> +			      "unknown read_policy type %u, fallback to by_pid",
> +			      fs_info->fs_devices->read_policy);
> +		preferred_mirror = first + current->pid % num_stripes;

This is repeating the BY_PID code, please move the defaut: case above it
so the code is shared.

> +	}
>  
>  	if (dev_replace_is_ongoing &&
>  	    fs_info->dev_replace.cont_reading_from_srcdev_mode ==
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 68021d1ee216..3bbf0e51433f 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -209,6 +209,13 @@ BTRFS_DEVICE_GETSET_FUNCS(total_bytes);
>  BTRFS_DEVICE_GETSET_FUNCS(disk_total_bytes);
>  BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
>  
> +/* read_policy types */

More explanation would be nice

> +#define BTRFS_READ_POLICY_DEFAULT	BTRFS_READ_BY_PID

Add this to the enums so we don't have mixed enums and defines

> +enum btrfs_read_policy_type {

I think you can drop _type here, the 'enum' must be spelled everywhere
is used so the type name can be shorter without losing descriptivity.

> +	BTRFS_READ_BY_PID,

This should share the namespace so, BTRFS_READ_POLICY_PID

> +	BTRFS_NR_READ_POLICY_TYPE,
> +};
> +
>  struct btrfs_fs_devices {
>  	u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
>  	u8 metadata_uuid[BTRFS_FSID_SIZE];
> @@ -260,6 +267,8 @@ struct btrfs_fs_devices {
>  	struct kobject *devices_kobj;
>  	struct kobject *devinfo_kobj;
>  	struct completion kobj_unregister;
> +
> +	enum btrfs_read_policy_type read_policy;

What's the reason to add this to btrfs_fs_devices rather than to
btrfs_fs_info? The lifetime of the policy and the related sysfs object
is the same as of the mounted filesystem.

>  };
>  
>  #define BTRFS_BIO_INLINE_CSUM_SIZE	64

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/2] btrfs: sysfs, add read_policy attribute
  2020-01-08  4:16         ` [PATCH v3 " Anand Jain
@ 2020-01-29 18:49           ` David Sterba
  2020-02-12 14:24             ` Anand Jain
  0 siblings, 1 reply; 14+ messages in thread
From: David Sterba @ 2020-01-29 18:49 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

On Wed, Jan 08, 2020 at 12:16:47PM +0800, Anand Jain wrote:
> Add
> 
>  /sys/fs/btrfs/UUID/read_policy
> 
> attribute so that the read policy for the raid1 and raid10 chunks can be
> tuned.
> 
> When this attribute is read, it shall show all available policies, and
> the active policy is with in [ ], read_policy attribute can be written
> using one of the items showed in the read.
> 
> For example:
> cat /sys/fs/btrfs/UUID/read_policy
> [by_pid]
> echo by_pid > /sys/fs/btrfs/UUID/read_policy
> echo -n by_pid > /sys/fs/btrfs/UUID/read_policy

Dropping "by_" is a good thing, but it should be removed everywhere.
Also '-n' to echo should not be necessary and the store handler of sysfs
should deal with that.

> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> ---
> v3: rename [by_pid] to [pid]
> v2: v2: check input len before strip and kstrdup
> 
>  fs/btrfs/sysfs.c   | 66 ++++++++++++++++++++++++++++++++++++++++++++++
>  fs/btrfs/volumes.h |  1 +
>  2 files changed, 67 insertions(+)
> 
> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
> index 104a97586744..cc4a642878a1 100644
> --- a/fs/btrfs/sysfs.c
> +++ b/fs/btrfs/sysfs.c
> @@ -809,6 +809,71 @@ static ssize_t btrfs_checksum_show(struct kobject *kobj,
>  
>  BTRFS_ATTR(, checksum, btrfs_checksum_show);
>  
> +static const inline char *btrfs_read_policy_name(enum btrfs_read_policy_type type)
> +{
> +	switch (type) {
> +	case BTRFS_READ_BY_PID:
> +		return "pid";
> +	default:
> +		return "null";
> +	}
> +}

A simple table should do, similar what we have for the compression
number -> string mapping.

> +
> +static ssize_t btrfs_read_policy_show(struct kobject *kobj,
> +				      struct kobj_attribute *a, char *buf)
> +{
> +	int i;
> +	ssize_t len = 0;

As this is used as return value, plese rename it to 'ret'

> +	struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj);
> +
> +	for (i = 0; i < BTRFS_NR_READ_POLICY_TYPE; i++) {
> +		if (len)
> +			len += snprintf(buf + len, PAGE_SIZE, " ");

You can use the same thning for the separator as is in
supported_checksums_show, ie. (i == 0 ? "" : " ") and add one more %s to
the format.

> +		if (fs_devices->read_policy == i)
> +			len += snprintf(buf + len, PAGE_SIZE, "[%s]",
> +					btrfs_read_policy_name(i));
> +		else
> +			len += snprintf(buf + len, PAGE_SIZE, "%s",
> +					btrfs_read_policy_name(i));

Keeping the default and the rest as separte calls to snprintf is
probably better so with the separator it would be

		if (fs_devices->read_policy == i)
			len += snprintf(buf + len, PAGE_SIZE, "%s[%s]",
					(i == 0 ? "" : " "),
					btrfs_read_policy_name(i));
		else
			len += snprintf(buf + len, PAGE_SIZE, "%s%s",
					(i == 0 ? "" : " "),
					btrfs_read_policy_name(i));

> +	}
> +
> +	len += snprintf(buf + len, PAGE_SIZE, "\n");
> +
> +	return len;
> +}
> +
> +static ssize_t btrfs_read_policy_store(struct kobject *kobj,
> +				       struct kobj_attribute *a,
> +				       const char *buf, size_t len)
> +{
> +	int i;
> +	char *stripped;
> +	char *policy_name;
> +	struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj);
> +
> +	if (len > BTRFS_READ_POLICY_NAME_MAX)
> +		return -EINVAL;
> +
> +	policy_name = kstrdup(buf, GFP_KERNEL);

Can you avoid the allocation? None of the sysfs handlers should need it.

> +	if (!policy_name)
> +		return -ENOMEM;
> +
> +	stripped = strstrip(policy_name);

So the allocation is to make a copy of a string to get rid of leading
and trailing whitespace. There shouldn't be any leading space that we
should care about, but anyway skip_spaces() can be used on a read-only
string just fine.

The trailing whitespace is for the potential '\n' that we want to
handle. But doing an allocation here is an overkill, you can add a
helper that will verify that there's no garbage at the end of the
string, once the policy string matches one of ours.

> +
> +	for (i = 0; i < BTRFS_NR_READ_POLICY_TYPE; i++) {
> +		if (strncmp(stripped, btrfs_read_policy_name(i),
> +			    strlen(stripped)) == 0) {
> +			fs_devices->read_policy = i;
> +			kfree(policy_name);
> +			return len;
> +		}
> +	}
> +
> +	kfree(policy_name);
> +	return -EINVAL;
> +}
> +BTRFS_ATTR_RW(, read_policy, btrfs_read_policy_show, btrfs_read_policy_store);
> +
>  static const struct attribute *btrfs_attrs[] = {
>  	BTRFS_ATTR_PTR(, label),
>  	BTRFS_ATTR_PTR(, nodesize),
> @@ -817,6 +882,7 @@ static const struct attribute *btrfs_attrs[] = {
>  	BTRFS_ATTR_PTR(, quota_override),
>  	BTRFS_ATTR_PTR(, metadata_uuid),
>  	BTRFS_ATTR_PTR(, checksum),
> +	BTRFS_ATTR_PTR(, read_policy),
>  	NULL,
>  };
>  
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 46f4e2258203..fe1494d95893 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -209,6 +209,7 @@ BTRFS_DEVICE_GETSET_FUNCS(disk_total_bytes);
>  BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
>  
>  /* read_policy types */
> +#define BTRFS_READ_POLICY_NAME_MAX	12

And this can be dropped afterwards

>  #define BTRFS_READ_POLICY_DEFAULT	BTRFS_READ_BY_PID
>  enum btrfs_read_policy_type {
>  	BTRFS_READ_BY_PID,
> -- 
> 2.23.0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] btrfs: add read_policy framework
  2020-01-29 18:26   ` David Sterba
@ 2020-02-11  8:31     ` Anand Jain
  0 siblings, 0 replies; 14+ messages in thread
From: Anand Jain @ 2020-02-11  8:31 UTC (permalink / raw)
  To: dsterba; +Cc: linux-btrfs, josef, btrfs-list

On 1/30/20 2:26 AM, David Sterba wrote:
> On Sun, Jan 05, 2020 at 11:14:01PM +0800, Anand Jain wrote:
>> As of now we use %pid method to read stripped mirrored data, which means
>> application's process id determines the stripe id to be read. This type
>> of read IO routing typically helps in a system with many small
>> independent applications tying to read random data. On the other hand
>> the %pid based read IO distribution policy is inefficient because if
>> there is a single application trying to read large data the overall disk
>> bandwidth remains under-utilized.
> 
> AFAIK it's not always the application pid, for compression or metadata
> it's the pid of btrfs worker thread. So it depends.
> 
>> So this patch introduces read policy framework so that we could add more
>> read policies, such as IO routing based on device's wait-queue or manual
>> when we have a read-preferred device or a policy based on the target
>> storage caching.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>> [Patch name changed]
>>
>> v3: Declare fs_devices::readmirror as enum btrfs_readmirror_policy_type
>> v2: Declare fs_devices::readmirror as u8 instead of atomic_t
>>      A small change in comment and change log wordings.
>>
>>   fs/btrfs/volumes.c | 16 +++++++++++++++-
>>   fs/btrfs/volumes.h |  9 +++++++++
>>   2 files changed, 24 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index c95e47aa84f8..2ffffdf1d314 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -1162,6 +1162,8 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices,
>>   	fs_devices->opened = 1;
>>   	fs_devices->latest_bdev = latest_dev->bdev;
>>   	fs_devices->total_rw_bytes = 0;
>> +	/* Set the default read policy */
> 
> You can drop this comment
> 
>> +	fs_devices->read_policy = BTRFS_READ_POLICY_DEFAULT;
>>   out:
>>   	return ret;
>>   }
>> @@ -5300,7 +5302,19 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
>>   	else
>>   		num_stripes = map->num_stripes;
>>   
>> -	preferred_mirror = first + current->pid % num_stripes;
>> +	switch (fs_info->fs_devices->read_policy) {
>> +	case BTRFS_READ_BY_PID:
>> +		preferred_mirror = first + current->pid % num_stripes;
>> +		break;
>> +	default:
>> +		/*
>> +		 * Shouln't happen, just warn and use by_pid instead of failing.
>> +		 */
>> +		btrfs_warn_rl(fs_info,
>> +			      "unknown read_policy type %u, fallback to by_pid",
>> +			      fs_info->fs_devices->read_policy);
>> +		preferred_mirror = first + current->pid % num_stripes;
> 
> This is repeating the BY_PID code, please move the defaut: case above it
> so the code is shared.
> 
>> +	}
>>   
>>   	if (dev_replace_is_ongoing &&
>>   	    fs_info->dev_replace.cont_reading_from_srcdev_mode ==
>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>> index 68021d1ee216..3bbf0e51433f 100644
>> --- a/fs/btrfs/volumes.h
>> +++ b/fs/btrfs/volumes.h
>> @@ -209,6 +209,13 @@ BTRFS_DEVICE_GETSET_FUNCS(total_bytes);
>>   BTRFS_DEVICE_GETSET_FUNCS(disk_total_bytes);
>>   BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
>>   
>> +/* read_policy types */
> 
> More explanation would be nice
> 
>> +#define BTRFS_READ_POLICY_DEFAULT	BTRFS_READ_BY_PID
> 
> Add this to the enums so we don't have mixed enums and defines
> 
>> +enum btrfs_read_policy_type {
> 
> I think you can drop _type here, the 'enum' must be spelled everywhere
> is used so the type name can be shorter without losing descriptivity.
> 
>> +	BTRFS_READ_BY_PID,
> 
> This should share the namespace so, BTRFS_READ_POLICY_PID
> 
>> +	BTRFS_NR_READ_POLICY_TYPE,
>> +};
>> +
>>   struct btrfs_fs_devices {
>>   	u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
>>   	u8 metadata_uuid[BTRFS_FSID_SIZE];
>> @@ -260,6 +267,8 @@ struct btrfs_fs_devices {
>>   	struct kobject *devices_kobj;
>>   	struct kobject *devinfo_kobj;
>>   	struct completion kobj_unregister;
>> +
>> +	enum btrfs_read_policy_type read_policy;
> 

  All the comments above are accepted. Thanks.

> What's the reason to add this to btrfs_fs_devices rather than to
> btrfs_fs_info? The lifetime of the policy and the related sysfs object
> is the same as of the mounted filesystem.

   Reasons to choose struct btrfs_fs_devices instead of struct
   btrfs_fs_info:
   Theoretically:
     - Its about the device and most of our device related stuffs
       including dev_state is in struct btrfs_devices in volumes.h
       (read_policy::by_devid (new patch not in ML yet) uses dev_state).
     - The core of the read_policy which is in find_live_mirror() is also
       in volumes.h
     - enum btrfs_read_policy is also in volumes.h

   If at some point we decide to make read_policy persistent, which then
   open_ctree() would read the relevant item and update the
   btrfs_fs_devices::read_policy. If read_policy is persistent (at some
   point), then we should apply the stored policy even if its a seed
   device and the only place where we can apply it is fs_device.
   Being theoretically correct pays off.

Thanks, Anand


> 
>>   };
>>   
>>   #define BTRFS_BIO_INLINE_CSUM_SIZE	64


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 2/2] btrfs: sysfs, add read_policy attribute
  2020-01-29 18:49           ` David Sterba
@ 2020-02-12 14:24             ` Anand Jain
  0 siblings, 0 replies; 14+ messages in thread
From: Anand Jain @ 2020-02-12 14:24 UTC (permalink / raw)
  To: dsterba, linux-btrfs

On 1/30/20 2:49 AM, David Sterba wrote:
> On Wed, Jan 08, 2020 at 12:16:47PM +0800, Anand Jain wrote:
>> Add
>>
>>   /sys/fs/btrfs/UUID/read_policy
>>
>> attribute so that the read policy for the raid1 and raid10 chunks can be
>> tuned.
>>
>> When this attribute is read, it shall show all available policies, and
>> the active policy is with in [ ], read_policy attribute can be written
>> using one of the items showed in the read.
>>
>> For example:
>> cat /sys/fs/btrfs/UUID/read_policy
>> [by_pid]
>> echo by_pid > /sys/fs/btrfs/UUID/read_policy
>> echo -n by_pid > /sys/fs/btrfs/UUID/read_policy
> 
> Dropping "by_" is a good thing, but it should be removed everywhere.
> Also '-n' to echo should not be necessary and the store handler of sysfs
> should deal with that.

My reference was Block device's scheduler [1],

[1]
cat /sys/block/sda/queue/scheduler
[mq-deadline] kyber none

/sys/block/sda/queue$ echo mq-deadline > ./scheduler
/sys/block/sda/queue$ echo "mq-deadline " > ./scheduler
/sys/block/sda/queue$ echo " mq-deadline " > ./scheduler
/sys/block/sda/queue$ echo -n mq-deadline > ./scheduler
/sys/block/sda/queue$ echo -n " mq-deadline " > ./scheduler
/sys/block/sda/queue$ echo -n " mq-deadline test" > ./scheduler
echo: write error: Invalid argument
/sys/block/sda/queue$ echo -n "mq-deadline kyber" > ./scheduler
echo: write error: Invalid argument

We could allow both echo and echo -n.

>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
>> ---
>> v3: rename [by_pid] to [pid]
>> v2: v2: check input len before strip and kstrdup
>>
>>   fs/btrfs/sysfs.c   | 66 ++++++++++++++++++++++++++++++++++++++++++++++
>>   fs/btrfs/volumes.h |  1 +
>>   2 files changed, 67 insertions(+)
>>
>> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
>> index 104a97586744..cc4a642878a1 100644
>> --- a/fs/btrfs/sysfs.c
>> +++ b/fs/btrfs/sysfs.c
>> @@ -809,6 +809,71 @@ static ssize_t btrfs_checksum_show(struct kobject *kobj,
>>   
>>   BTRFS_ATTR(, checksum, btrfs_checksum_show);
>>   
>> +static const inline char *btrfs_read_policy_name(enum btrfs_read_policy_type type)
>> +{
>> +	switch (type) {
>> +	case BTRFS_READ_BY_PID:
>> +		return "pid";
>> +	default:
>> +		return "null";
>> +	}
>> +}
> 
> A simple table should do, similar what we have for the compression
> number -> string mapping.
> 

  Yes. Much better thanks.

>> +
>> +static ssize_t btrfs_read_policy_show(struct kobject *kobj,
>> +				      struct kobj_attribute *a, char *buf)
>> +{
>> +	int i;
>> +	ssize_t len = 0;
> 
> As this is used as return value, plese rename it to 'ret'
> 

  Ok.

>> +	struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj);
>> +
>> +	for (i = 0; i < BTRFS_NR_READ_POLICY_TYPE; i++) {
>> +		if (len)
>> +			len += snprintf(buf + len, PAGE_SIZE, " ");
> 
> You can use the same thning for the separator as is in
> supported_checksums_show, ie. (i == 0 ? "" : " ") and add one more %s to
> the format.
> 
  Ok.

>> +		if (fs_devices->read_policy == i)
>> +			len += snprintf(buf + len, PAGE_SIZE, "[%s]",
>> +					btrfs_read_policy_name(i));
>> +		else
>> +			len += snprintf(buf + len, PAGE_SIZE, "%s",
>> +					btrfs_read_policy_name(i));
> 
> Keeping the default and the rest as separte calls to snprintf is
> probably better so with the separator it would be
> 
> 		if (fs_devices->read_policy == i)
> 			len += snprintf(buf + len, PAGE_SIZE, "%s[%s]",
> 					(i == 0 ? "" : " "),
> 					btrfs_read_policy_name(i));
> 		else
> 			len += snprintf(buf + len, PAGE_SIZE, "%s%s",
> 					(i == 0 ? "" : " "),
> 					btrfs_read_policy_name(i));
> 

  Yes. fixed.

>> +	}
>> +
>> +	len += snprintf(buf + len, PAGE_SIZE, "\n");
>> +
>> +	return len;
>> +}
>> +
>> +static ssize_t btrfs_read_policy_store(struct kobject *kobj,
>> +				       struct kobj_attribute *a,
>> +				       const char *buf, size_t len)
>> +{
>> +	int i;
>> +	char *stripped;
>> +	char *policy_name;
>> +	struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj);
>> +
>> +	if (len > BTRFS_READ_POLICY_NAME_MAX)
>> +		return -EINVAL;
>> +
>> +	policy_name = kstrdup(buf, GFP_KERNEL);
> 
> Can you avoid the allocation? None of the sysfs handlers should need it.
> 
>> +	if (!policy_name)
>> +		return -ENOMEM;
>> +
>> +	stripped = strstrip(policy_name);
> 
> So the allocation is to make a copy of a string to get rid of leading
> and trailing whitespace. There shouldn't be any leading space that we
> should care about, but anyway skip_spaces() can be used on a read-only
> string just fine.

  Ah. Yes.

> The trailing whitespace is for the potential '\n' that we want to
> handle. But doing an allocation here is an overkill, you can add a
> helper that will verify that there's no garbage at the end of the
> string, once the policy string matches one of ours.

  ok. Good idea.

Thanks, Anand

>> +
>> +	for (i = 0; i < BTRFS_NR_READ_POLICY_TYPE; i++) {
>> +		if (strncmp(stripped, btrfs_read_policy_name(i),
>> +			    strlen(stripped)) == 0) {
>> +			fs_devices->read_policy = i;
>> +			kfree(policy_name);
>> +			return len;
>> +		}
>> +	}
>> +
>> +	kfree(policy_name);
>> +	return -EINVAL;
>> +}
>> +BTRFS_ATTR_RW(, read_policy, btrfs_read_policy_show, btrfs_read_policy_store);
>> +
>>   static const struct attribute *btrfs_attrs[] = {
>>   	BTRFS_ATTR_PTR(, label),
>>   	BTRFS_ATTR_PTR(, nodesize),
>> @@ -817,6 +882,7 @@ static const struct attribute *btrfs_attrs[] = {
>>   	BTRFS_ATTR_PTR(, quota_override),
>>   	BTRFS_ATTR_PTR(, metadata_uuid),
>>   	BTRFS_ATTR_PTR(, checksum),
>> +	BTRFS_ATTR_PTR(, read_policy),
>>   	NULL,
>>   };
>>   
>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>> index 46f4e2258203..fe1494d95893 100644
>> --- a/fs/btrfs/volumes.h
>> +++ b/fs/btrfs/volumes.h
>> @@ -209,6 +209,7 @@ BTRFS_DEVICE_GETSET_FUNCS(disk_total_bytes);
>>   BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
>>   
>>   /* read_policy types */
>> +#define BTRFS_READ_POLICY_NAME_MAX	12
> 
> And this can be dropped afterwards
> 
>>   #define BTRFS_READ_POLICY_DEFAULT	BTRFS_READ_BY_PID
>>   enum btrfs_read_policy_type {
>>   	BTRFS_READ_BY_PID,
>> -- 
>> 2.23.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, back to index

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-05 15:14 [PATCH v4 0/2] readmirror feature (sysfs and in-memory only approach) Anand Jain
2020-01-05 15:14 ` [PATCH 1/2] btrfs: add read_policy framework Anand Jain
2020-01-06 16:22   ` Josef Bacik
2020-01-29 18:26   ` David Sterba
2020-02-11  8:31     ` Anand Jain
2020-01-05 15:14 ` [PATCH 2/2] btrfs: sysfs, add read_policy attribute Anand Jain
2020-01-06 16:21   ` Josef Bacik
2020-01-07  4:52     ` [PATCH v2 " Anand Jain
2020-01-07 15:03       ` Josef Bacik
2020-01-07 15:25       ` Holger Hoffstätte
2020-01-08  4:16         ` [PATCH v3 " Anand Jain
2020-01-29 18:49           ` David Sterba
2020-02-12 14:24             ` Anand Jain
2020-01-29 18:07 ` [PATCH v4 0/2] readmirror feature (sysfs and in-memory only approach) David Sterba

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git