linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] readmirror feature (sysfs and in-memory only approach)
@ 2020-01-02 10:12 Anand Jain
  2020-01-02 10:12 ` [PATCH v2 1/3] btrfs: add readmirror type framework Anand Jain
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Anand Jain @ 2020-01-02 10:12 UTC (permalink / raw)
  To: linux-btrfs

v2:
Mainly fixes the fs_devices::readmirror declaration type from atomic_t
to u8. (Thanks Josef).

v1:
As of now we use only %pid method to read stripped mirrored data. So
application's process id determines the stripe id to be read. This type
of routing typically helps in a system with many small independent
applications tying to read random data. On the other hand the %pid
based read IO distribution policy is inefficient if there is a single
application trying to read large data as because the overall disk
bandwidth would remains under utilized.

One type of readmirror policy isn't good enough and other choices are
routing the IO based on device's waitqueue or manual when we have a
read-preferred device or a readmirror policy based on the target storage
caching. So this patch-set introduces a framework where we could add more
readmirror policies.

This policy is a filesystem wide policy as of now, and though the
readmirror policy at the subvolume level is a novel approach as it
provides maximum flexibility in the data center, but as of now its not
practical to implement such a granularity as you can't really ensure
reflinked extents will be read from the stripe of its desire and so
there will be more limitations and it can be assessed separately.

The approach in this patch-set is sys interface with in-memory policy.
And does not add any new readmirror type in this set, which can be add
once we are ok with the framework. Also the default policy remains %pid.

Previous works:
----------------------------------------------------------------------
There were few RFCs [1] before, mainly to figure out storage
(or in memory only) for the readmirror policy and the interface needed.

[1]
https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg86368.html

https://lore.kernel.org/linux-btrfs/20190826090438.7044-1-anand.jain@oracle.com/

https://lore.kernel.org/linux-btrfs/5fcf9c23-89b5-b167-1f80-a0f4ac107d0b@oracle.com/

https://patchwork.kernel.org/cover/10859213/

Mount -o:
In the first trial it was attempted to use the mount -o option to carry
the readmirror policy, this is good for debugging which can make sure
even the mount thread metadata tree blocks are read from the disk desired.
It was very effective in testing radi1/raid10 write-holes.

Extended attribute:
As extended attribute is associated with the inode, to implement this
there is bit of extended attribute abuse or else makes it mandatory to
mount the rootid 5. Its messy unless readmirror policy is applied at the
subvol level which is not possible as of now. 

An item type:
The proposed patch was to create an item to hold the readmirror policy,
it makes sense when compared to the abusive extended attribute approach
but introduces a new item and so no backward compatibility.
-----------------------------------------------------------------------


Anand Jain (3):
  btrfs: add readmirror type framework
  btrfs: sysfs, add readmirror kobject
  btrfs: sysfs, create by_pid readmirror attribute

 fs/btrfs/sysfs.c   | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.c | 16 +++++++++++-
 fs/btrfs/volumes.h |  9 +++++++
 3 files changed, 98 insertions(+), 1 deletion(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/3] btrfs: add readmirror type framework
  2020-01-02 10:12 [PATCH v2 0/3] readmirror feature (sysfs and in-memory only approach) Anand Jain
@ 2020-01-02 10:12 ` Anand Jain
  2020-01-02 16:24   ` Josef Bacik
                     ` (2 more replies)
  2020-01-02 10:12 ` [PATCH v2 2/3] btrfs: sysfs, add readmirror kobject Anand Jain
  2020-01-02 10:12 ` [PATCH v2 3/3] btrfs: sysfs, create by_pid readmirror attribute Anand Jain
  2 siblings, 3 replies; 10+ messages in thread
From: Anand Jain @ 2020-01-02 10:12 UTC (permalink / raw)
  To: linux-btrfs

As of now we use %pid method to read stripped mirrored data. So
application's process id determines the stripe id to be read. This type
of read IO routing typically helps in a system with many small
independent applications tying to read random data. On the other hand
the %pid based read IO distribution policy is inefficient if there is a
single application trying to read large data and the overall disk
bandwidth remains under utilized.

So this patch introduces a framework where we could add more readmirror
policies, such as routing the IO based on device's wait-queue or manual
when we have a read-preferred device or a policy based on the target
storage caching.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v2: Declare fs_devices::readmirror as u8 instead of atomic_t
    A small change in comment and change log wordings.

 fs/btrfs/volumes.c | 16 +++++++++++++++-
 fs/btrfs/volumes.h |  8 ++++++++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c95e47aa84f8..e26af766f2b9 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1162,6 +1162,8 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices,
 	fs_devices->opened = 1;
 	fs_devices->latest_bdev = latest_dev->bdev;
 	fs_devices->total_rw_bytes = 0;
+	/* Set the default readmirror policy */
+	fs_devices->readmirror = BTRFS_READMIRROR_DEFAULT;
 out:
 	return ret;
 }
@@ -5300,7 +5302,19 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
 	else
 		num_stripes = map->num_stripes;
 
-	preferred_mirror = first + current->pid % num_stripes;
+	switch (fs_info->fs_devices->readmirror) {
+	case BTRFS_READMIRROR_BY_PID:
+		preferred_mirror = first + current->pid % num_stripes;
+		break;
+	default:
+		/*
+		 * Shouln't happen, just warn and use by_pid instead of failing.
+		 */
+		btrfs_warn_rl(fs_info,
+			      "unknown readmirror type %u, fallback to by_pid",
+			      fs_info->fs_devices->readmirror);
+		preferred_mirror = first + current->pid % num_stripes;
+	}
 
 	if (dev_replace_is_ongoing &&
 	    fs_info->dev_replace.cont_reading_from_srcdev_mode ==
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 68021d1ee216..f5f091f3c72b 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -209,6 +209,12 @@ struct btrfs_device {
 BTRFS_DEVICE_GETSET_FUNCS(disk_total_bytes);
 BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
 
+/* readmirror_policy types */
+#define BTRFS_READMIRROR_DEFAULT	BTRFS_READMIRROR_BY_PID
+enum btrfs_readmirror_policy_type {
+	BTRFS_READMIRROR_BY_PID,
+};
+
 struct btrfs_fs_devices {
 	u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
 	u8 metadata_uuid[BTRFS_FSID_SIZE];
@@ -260,6 +266,8 @@ struct btrfs_fs_devices {
 	struct kobject *devices_kobj;
 	struct kobject *devinfo_kobj;
 	struct completion kobj_unregister;
+
+	u8 readmirror;
 };
 
 #define BTRFS_BIO_INLINE_CSUM_SIZE	64
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/3] btrfs: sysfs, add readmirror kobject
  2020-01-02 10:12 [PATCH v2 0/3] readmirror feature (sysfs and in-memory only approach) Anand Jain
  2020-01-02 10:12 ` [PATCH v2 1/3] btrfs: add readmirror type framework Anand Jain
@ 2020-01-02 10:12 ` Anand Jain
  2020-01-02 10:12 ` [PATCH v2 3/3] btrfs: sysfs, create by_pid readmirror attribute Anand Jain
  2 siblings, 0 replies; 10+ messages in thread
From: Anand Jain @ 2020-01-02 10:12 UTC (permalink / raw)
  To: linux-btrfs

Add

 /sys/fs/btrfs/UUID/readmirror

kobject so that we can add readmirror policies as attributes under it.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v2: Fix compile error fs_info::readmirror_kobj member not found. As its
    fix in v1 went into patch 3/3 instead it should be here.
    
 fs/btrfs/sysfs.c   | 22 ++++++++++++++++++++++
 fs/btrfs/volumes.h |  1 +
 2 files changed, 23 insertions(+)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index d414b98fb27f..e604f292b42b 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -355,6 +355,10 @@ static ssize_t supported_checksums_show(struct kobject *kobj,
 
 #endif
 
+static const struct attribute *btrfs_readmirror_attrs[] = {
+	NULL,
+};
+
 static ssize_t btrfs_show_u64(u64 *value_ptr, spinlock_t *lock, char *buf)
 {
 	u64 val;
@@ -772,6 +776,13 @@ void btrfs_sysfs_remove_mounted(struct btrfs_fs_info *fs_info)
 {
 	btrfs_reset_fs_info_ptr(fs_info);
 
+	if (fs_info->fs_devices->readmirror_kobj) {
+		sysfs_remove_files(fs_info->fs_devices->readmirror_kobj,
+				   btrfs_readmirror_attrs);
+		kobject_del(fs_info->fs_devices->readmirror_kobj);
+		kobject_put(fs_info->fs_devices->readmirror_kobj);
+	}
+
 	if (fs_info->space_info_kobj) {
 		sysfs_remove_files(fs_info->space_info_kobj, allocation_attrs);
 		kobject_del(fs_info->space_info_kobj);
@@ -1224,6 +1235,17 @@ int btrfs_sysfs_add_mounted(struct btrfs_fs_info *fs_info)
 	if (error)
 		goto failure;
 
+	fs_devs->readmirror_kobj = kobject_create_and_add("readmirror",
+							  &fs_devs->fsid_kobj);
+	if (!fs_devs->readmirror_kobj) {
+		error = -ENOMEM;
+		goto failure;
+	}
+	error = sysfs_create_files(fs_devs->readmirror_kobj,
+				   btrfs_readmirror_attrs);
+	if (error)
+		goto failure;
+
 	return 0;
 failure:
 	btrfs_sysfs_remove_mounted(fs_info);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index f5f091f3c72b..b49afa1cfdd7 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -268,6 +268,7 @@ struct btrfs_fs_devices {
 	struct completion kobj_unregister;
 
 	u8 readmirror;
+	struct kobject *readmirror_kobj;
 };
 
 #define BTRFS_BIO_INLINE_CSUM_SIZE	64
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 3/3] btrfs: sysfs, create by_pid readmirror attribute
  2020-01-02 10:12 [PATCH v2 0/3] readmirror feature (sysfs and in-memory only approach) Anand Jain
  2020-01-02 10:12 ` [PATCH v2 1/3] btrfs: add readmirror type framework Anand Jain
  2020-01-02 10:12 ` [PATCH v2 2/3] btrfs: sysfs, add readmirror kobject Anand Jain
@ 2020-01-02 10:12 ` Anand Jain
  2 siblings, 0 replies; 10+ messages in thread
From: Anand Jain @ 2020-01-02 10:12 UTC (permalink / raw)
  To: linux-btrfs

Add existing %pid based readmirror policy as an attribute

 /sys/fs/btrfs/UUID/readmirror/by_pid

When read, this returns 0 or 1. 1 indicates currently active policy.
When written with 1, it sets by_pid as the current active policy and
when written 0 it resets to the default readmirror policy, which as
of now is pid as well. For any other value returns EINVAL.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v2: fs_devs::readmirror is no more atomic_t so update accordingly.

 fs/btrfs/sysfs.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index e604f292b42b..123d1ef72059 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -355,7 +355,59 @@ static ssize_t supported_checksums_show(struct kobject *kobj,
 
 #endif
 
+#define readmirror_to_fs_devices(_kobj)	to_fs_devs((_kobj)->parent)
+/*
+ * Set the readmirror type to BTRFS_READMIRROR_BY_PID
+ */
+static ssize_t btrfs_sysfs_readmirror_by_pid_store(struct kobject *kobj,
+						   struct kobj_attribute *a,
+						   const char *buf, size_t count)
+{
+	int ret;
+	unsigned long val;
+	struct btrfs_fs_devices *fs_devices;
+
+	fs_devices = readmirror_to_fs_devices(kobj);
+
+	ret = kstrtoul(skip_spaces(buf), 0, &val);
+	if (ret)
+		return ret;
+
+	switch (val) {
+	case 0:
+		fs_devices->readmirror = BTRFS_READMIRROR_DEFAULT;
+		break;
+	case 1:
+		fs_devices->readmirror = BTRFS_READMIRROR_BY_PID;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return count;
+}
+
+static ssize_t btrfs_sysfs_readmirror_by_pid_show(struct kobject *kobj,
+						  struct kobj_attribute *a,
+						  char *buf)
+{
+	int val;
+	struct btrfs_fs_devices *fs_devices;
+
+	fs_devices = readmirror_to_fs_devices(kobj);
+
+	if (fs_devices->readmirror == BTRFS_READMIRROR_BY_PID)
+		val = 1;
+	else
+		val = 0;
+
+	return snprintf(buf, PAGE_SIZE, "%d\n", val);
+}
+BTRFS_ATTR_RW(readmirror, by_pid, btrfs_sysfs_readmirror_by_pid_show,
+	      btrfs_sysfs_readmirror_by_pid_store);
+
 static const struct attribute *btrfs_readmirror_attrs[] = {
+	BTRFS_ATTR_PTR(readmirror, by_pid),
 	NULL,
 };
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] btrfs: add readmirror type framework
  2020-01-02 10:12 ` [PATCH v2 1/3] btrfs: add readmirror type framework Anand Jain
@ 2020-01-02 16:24   ` Josef Bacik
  2020-01-03  9:57     ` Anand Jain
  2020-01-02 19:32   ` Steven Davies
  2020-01-03 10:31   ` [PATCH v3 " Anand Jain
  2 siblings, 1 reply; 10+ messages in thread
From: Josef Bacik @ 2020-01-02 16:24 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs

On 1/2/20 5:12 AM, Anand Jain wrote:
> As of now we use %pid method to read stripped mirrored data. So
> application's process id determines the stripe id to be read. This type
> of read IO routing typically helps in a system with many small
> independent applications tying to read random data. On the other hand
> the %pid based read IO distribution policy is inefficient if there is a
> single application trying to read large data and the overall disk
> bandwidth remains under utilized.
> 
> So this patch introduces a framework where we could add more readmirror
> policies, such as routing the IO based on device's wait-queue or manual
> when we have a read-preferred device or a policy based on the target
> storage caching.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
> v2: Declare fs_devices::readmirror as u8 instead of atomic_t
>      A small change in comment and change log wordings.
> 
>   fs/btrfs/volumes.c | 16 +++++++++++++++-
>   fs/btrfs/volumes.h |  8 ++++++++
>   2 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index c95e47aa84f8..e26af766f2b9 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1162,6 +1162,8 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices,
>   	fs_devices->opened = 1;
>   	fs_devices->latest_bdev = latest_dev->bdev;
>   	fs_devices->total_rw_bytes = 0;
> +	/* Set the default readmirror policy */
> +	fs_devices->readmirror = BTRFS_READMIRROR_DEFAULT;
>   out:
>   	return ret;
>   }
> @@ -5300,7 +5302,19 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
>   	else
>   		num_stripes = map->num_stripes;
>   
> -	preferred_mirror = first + current->pid % num_stripes;
> +	switch (fs_info->fs_devices->readmirror) {
> +	case BTRFS_READMIRROR_BY_PID:
> +		preferred_mirror = first + current->pid % num_stripes;
> +		break;
> +	default:
> +		/*
> +		 * Shouln't happen, just warn and use by_pid instead of failing.
> +		 */
> +		btrfs_warn_rl(fs_info,
> +			      "unknown readmirror type %u, fallback to by_pid",
> +			      fs_info->fs_devices->readmirror);
> +		preferred_mirror = first + current->pid % num_stripes;
> +	}
>   
>   	if (dev_replace_is_ongoing &&
>   	    fs_info->dev_replace.cont_reading_from_srcdev_mode ==
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 68021d1ee216..f5f091f3c72b 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -209,6 +209,12 @@ struct btrfs_device {
>   BTRFS_DEVICE_GETSET_FUNCS(disk_total_bytes);
>   BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
>   
> +/* readmirror_policy types */
> +#define BTRFS_READMIRROR_DEFAULT	BTRFS_READMIRROR_BY_PID
> +enum btrfs_readmirror_policy_type {
> +	BTRFS_READMIRROR_BY_PID,
> +};
> +
>   struct btrfs_fs_devices {
>   	u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
>   	u8 metadata_uuid[BTRFS_FSID_SIZE];
> @@ -260,6 +266,8 @@ struct btrfs_fs_devices {
>   	struct kobject *devices_kobj;
>   	struct kobject *devinfo_kobj;
>   	struct completion kobj_unregister;
> +
> +	u8 readmirror;

The only valid values for this are the enum, so make this

enum btrfs_readmirror_policy_type readmirror;

Thanks,

Josef

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] btrfs: add readmirror type framework
  2020-01-02 10:12 ` [PATCH v2 1/3] btrfs: add readmirror type framework Anand Jain
  2020-01-02 16:24   ` Josef Bacik
@ 2020-01-02 19:32   ` Steven Davies
  2020-01-03 10:28     ` Anand Jain
  2020-01-03 10:31   ` [PATCH v3 " Anand Jain
  2 siblings, 1 reply; 10+ messages in thread
From: Steven Davies @ 2020-01-02 19:32 UTC (permalink / raw)
  To: linux-btrfs

On 02/01/2020 10:12, Anand Jain wrote:
> As of now we use %pid method to read stripped mirrored data. So
> application's process id determines the stripe id to be read. This type
> of read IO routing typically helps in a system with many small
> independent applications tying to read random data. On the other hand
> the %pid based read IO distribution policy is inefficient if there is a
> single application trying to read large data and the overall disk
> bandwidth remains under utilized.
> 
> So this patch introduces a framework where we could add more readmirror
> policies, such as routing the IO based on device's wait-queue or manual
> when we have a read-preferred device or a policy based on the target
> storage caching.

I think the idea is good but that it would be cleaner if the tunable was 
named read_policy rather than readmirror as it's more obvious that it 
contains a policy tunable.

Do you envisage allowing more than one policy to be active for a 
filesystem? If not, what about using the same structure as the CPU 
frequency and block IO schedulers with the format

#cat /sys/block/sda/queue/scheduler
noop [deadline] cfq

Such that btrfs would (eventually) have something like

#cat /sys/fs/btrfs/UUID/read_policy
by_pid [user_defined_device] by_shortest_queue

And the policy would be changed by echo'ing the new policy name to the 
read_policy kobject.

Steve

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] btrfs: add readmirror type framework
  2020-01-02 16:24   ` Josef Bacik
@ 2020-01-03  9:57     ` Anand Jain
  0 siblings, 0 replies; 10+ messages in thread
From: Anand Jain @ 2020-01-03  9:57 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs



On 3/1/20 12:24 AM, Josef Bacik wrote:
> On 1/2/20 5:12 AM, Anand Jain wrote:
>> As of now we use %pid method to read stripped mirrored data. So
>> application's process id determines the stripe id to be read. This type
>> of read IO routing typically helps in a system with many small
>> independent applications tying to read random data. On the other hand
>> the %pid based read IO distribution policy is inefficient if there is a
>> single application trying to read large data and the overall disk
>> bandwidth remains under utilized.
>>
>> So this patch introduces a framework where we could add more readmirror
>> policies, such as routing the IO based on device's wait-queue or manual
>> when we have a read-preferred device or a policy based on the target
>> storage caching.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>> v2: Declare fs_devices::readmirror as u8 instead of atomic_t
>>      A small change in comment and change log wordings.
>>
>>   fs/btrfs/volumes.c | 16 +++++++++++++++-
>>   fs/btrfs/volumes.h |  8 ++++++++
>>   2 files changed, 23 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index c95e47aa84f8..e26af766f2b9 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -1162,6 +1162,8 @@ static int open_fs_devices(struct 
>> btrfs_fs_devices *fs_devices,
>>       fs_devices->opened = 1;
>>       fs_devices->latest_bdev = latest_dev->bdev;
>>       fs_devices->total_rw_bytes = 0;
>> +    /* Set the default readmirror policy */
>> +    fs_devices->readmirror = BTRFS_READMIRROR_DEFAULT;
>>   out:
>>       return ret;
>>   }
>> @@ -5300,7 +5302,19 @@ static int find_live_mirror(struct 
>> btrfs_fs_info *fs_info,
>>       else
>>           num_stripes = map->num_stripes;
>> -    preferred_mirror = first + current->pid % num_stripes;
>> +    switch (fs_info->fs_devices->readmirror) {
>> +    case BTRFS_READMIRROR_BY_PID:
>> +        preferred_mirror = first + current->pid % num_stripes;
>> +        break;
>> +    default:
>> +        /*
>> +         * Shouln't happen, just warn and use by_pid instead of failing.
>> +         */
>> +        btrfs_warn_rl(fs_info,
>> +                  "unknown readmirror type %u, fallback to by_pid",
>> +                  fs_info->fs_devices->readmirror);
>> +        preferred_mirror = first + current->pid % num_stripes;
>> +    }
>>       if (dev_replace_is_ongoing &&
>>           fs_info->dev_replace.cont_reading_from_srcdev_mode ==
>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>> index 68021d1ee216..f5f091f3c72b 100644
>> --- a/fs/btrfs/volumes.h
>> +++ b/fs/btrfs/volumes.h
>> @@ -209,6 +209,12 @@ struct btrfs_device {
>>   BTRFS_DEVICE_GETSET_FUNCS(disk_total_bytes);
>>   BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
>> +/* readmirror_policy types */
>> +#define BTRFS_READMIRROR_DEFAULT    BTRFS_READMIRROR_BY_PID
>> +enum btrfs_readmirror_policy_type {
>> +    BTRFS_READMIRROR_BY_PID,
>> +};
>> +
>>   struct btrfs_fs_devices {
>>       u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
>>       u8 metadata_uuid[BTRFS_FSID_SIZE];
>> @@ -260,6 +266,8 @@ struct btrfs_fs_devices {
>>       struct kobject *devices_kobj;
>>       struct kobject *devinfo_kobj;
>>       struct completion kobj_unregister;
>> +
>> +    u8 readmirror;
> 
> The only valid values for this are the enum, so make this
> 
> enum btrfs_readmirror_policy_type readmirror;

  Oh. Ok.

Thanks, Anand


> Thanks,
> 
> Josef

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] btrfs: add readmirror type framework
  2020-01-02 19:32   ` Steven Davies
@ 2020-01-03 10:28     ` Anand Jain
  2020-01-03 14:51       ` Steven Davies
  0 siblings, 1 reply; 10+ messages in thread
From: Anand Jain @ 2020-01-03 10:28 UTC (permalink / raw)
  To: Steven Davies, David Sterba; +Cc: linux-btrfs

On 3/1/20 3:32 AM, Steven Davies wrote:
> On 02/01/2020 10:12, Anand Jain wrote:
>> As of now we use %pid method to read stripped mirrored data. So
>> application's process id determines the stripe id to be read. This type
>> of read IO routing typically helps in a system with many small
>> independent applications tying to read random data. On the other hand
>> the %pid based read IO distribution policy is inefficient if there is a
>> single application trying to read large data and the overall disk
>> bandwidth remains under utilized.
>>
>> So this patch introduces a framework where we could add more readmirror
>> policies, such as routing the IO based on device's wait-queue or manual
>> when we have a read-preferred device or a policy based on the target
>> storage caching.
> 
> I think the idea is good but that it would be cleaner if the tunable was 
> named read_policy rather than readmirror as it's more obvious that it 
> contains a policy tunable.

  Um. 'read_policy' sounds good, but I hope it is clear enough to
  indicate that we are talking about read for only mirrored-chunks.
  Will rename to read_policy.

> Do you envisage allowing more than one policy to be active for a 
> filesystem? If not, what about using the same structure as the CPU 
> frequency and block IO schedulers with the format
> 
> #cat /sys/block/sda/queue/scheduler
> noop [deadline] cfq
> 
> Such that btrfs would (eventually) have something like
> 
> #cat /sys/fs/btrfs/UUID/read_policy
> by_pid [user_defined_device] by_shortest_queue
> 

  And in case of user_defined_device, the device for the read shall be
  specified in

   cat /sys/fs/btrfs/UUID/devinfo/<devid>/read_preferred

   0 = unset, 1 = set.

   (devinfo patches are in the ML [1] open for comment)
   [1]
   [PATCH v3 4/4] btrfs: sysfs, add devid/dev_state kobject and attribute

> And the policy would be changed by echo'ing the new policy name to the 
> read_policy kobject.

  I like this approach, will change it to use this format.

Thanks, Anand

> Steve


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/3] btrfs: add readmirror type framework
  2020-01-02 10:12 ` [PATCH v2 1/3] btrfs: add readmirror type framework Anand Jain
  2020-01-02 16:24   ` Josef Bacik
  2020-01-02 19:32   ` Steven Davies
@ 2020-01-03 10:31   ` Anand Jain
  2 siblings, 0 replies; 10+ messages in thread
From: Anand Jain @ 2020-01-03 10:31 UTC (permalink / raw)
  To: linux-btrfs

As of now we use %pid method to read stripped mirrored data, so
application's process id determines the stripe id to be read. This type
of read IO routing typically helps in a system with many small
independent applications tying to read random data. On the other hand
the %pid based read IO distribution policy is inefficient if there is a
single application trying to read large data as the overall disk
bandwidth remains under utilized.

So this patch introduces a framework where we could add more readmirror
policies, such as IO routing based on device's wait-queue or manual
when we have a read-preferred device or a policy based on the target
storage caching.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v3: Declare fs_devices::readmirror as enum btrfs_readmirror_policy_type
v2: Declare fs_devices::readmirror as u8 instead of atomic_t
    A small change in comment and change log wordings.

 fs/btrfs/volumes.c | 16 +++++++++++++++-
 fs/btrfs/volumes.h |  8 ++++++++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c95e47aa84f8..e26af766f2b9 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1162,6 +1162,8 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices,
 	fs_devices->opened = 1;
 	fs_devices->latest_bdev = latest_dev->bdev;
 	fs_devices->total_rw_bytes = 0;
+	/* Set the default readmirror policy */
+	fs_devices->readmirror = BTRFS_READMIRROR_DEFAULT;
 out:
 	return ret;
 }
@@ -5300,7 +5302,19 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
 	else
 		num_stripes = map->num_stripes;
 
-	preferred_mirror = first + current->pid % num_stripes;
+	switch (fs_info->fs_devices->readmirror) {
+	case BTRFS_READMIRROR_BY_PID:
+		preferred_mirror = first + current->pid % num_stripes;
+		break;
+	default:
+		/*
+		 * Shouln't happen, just warn and use by_pid instead of failing.
+		 */
+		btrfs_warn_rl(fs_info,
+			      "unknown readmirror type %u, fallback to by_pid",
+			      fs_info->fs_devices->readmirror);
+		preferred_mirror = first + current->pid % num_stripes;
+	}
 
 	if (dev_replace_is_ongoing &&
 	    fs_info->dev_replace.cont_reading_from_srcdev_mode ==
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 68021d1ee216..513610a3e6b8 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -209,6 +209,12 @@ BTRFS_DEVICE_GETSET_FUNCS(total_bytes);
 BTRFS_DEVICE_GETSET_FUNCS(disk_total_bytes);
 BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
 
+/* readmirror_policy types */
+#define BTRFS_READMIRROR_DEFAULT	BTRFS_READMIRROR_BY_PID
+enum btrfs_readmirror_policy_type {
+	BTRFS_READMIRROR_BY_PID,
+};
+
 struct btrfs_fs_devices {
 	u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
 	u8 metadata_uuid[BTRFS_FSID_SIZE];
@@ -260,6 +266,8 @@ struct btrfs_fs_devices {
 	struct kobject *devices_kobj;
 	struct kobject *devinfo_kobj;
 	struct completion kobj_unregister;
+
+	enum btrfs_readmirror_policy_type readmirror;
 };
 
 #define BTRFS_BIO_INLINE_CSUM_SIZE	64
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] btrfs: add readmirror type framework
  2020-01-03 10:28     ` Anand Jain
@ 2020-01-03 14:51       ` Steven Davies
  0 siblings, 0 replies; 10+ messages in thread
From: Steven Davies @ 2020-01-03 14:51 UTC (permalink / raw)
  To: Anand Jain, David Sterba; +Cc: linux-btrfs

On 03/01/2020 10:28, Anand Jain wrote:
> On 3/1/20 3:32 AM, Steven Davies wrote:
>> On 02/01/2020 10:12, Anand Jain wrote:

>>> So this patch introduces a framework where we could add more readmirror
>>> policies, such as routing the IO based on device's wait-queue or manual
>>> when we have a read-preferred device or a policy based on the target
>>> storage caching.
>>
>> I think the idea is good but that it would be cleaner if the tunable 
>> was named read_policy rather than readmirror as it's more obvious that 
>> it contains a policy tunable.
> 
>   Um. 'read_policy' sounds good, but I hope it is clear enough to
>   indicate that we are talking about read for only mirrored-chunks.
>   Will rename to read_policy.

I think it would be obvious that the policy will only apply to mirrored 
chunks; after all, you can only read a chunk from a device that contains it.

>> Do you envisage allowing more than one policy to be active for a 
>> filesystem? If not, what about using the same structure as the CPU 
>> frequency and block IO schedulers with the format
>>
>> #cat /sys/block/sda/queue/scheduler
>> noop [deadline] cfq
>>
>> Such that btrfs would (eventually) have something like
>>
>> #cat /sys/fs/btrfs/UUID/read_policy
>> by_pid [user_defined_device] by_shortest_queue
>>
> 
>   And in case of user_defined_device, the device for the read shall be
>   specified in
> 
>    cat /sys/fs/btrfs/UUID/devinfo/<devid>/read_preferred
> 
>    0 = unset, 1 = set.
> 
>    (devinfo patches are in the ML [1] open for comment)
>    [1]
>    [PATCH v3 4/4] btrfs: sysfs, add devid/dev_state kobject and attribute

I remember seeing that patch and I think the approach is logical.

Steve

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-01-03 14:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-02 10:12 [PATCH v2 0/3] readmirror feature (sysfs and in-memory only approach) Anand Jain
2020-01-02 10:12 ` [PATCH v2 1/3] btrfs: add readmirror type framework Anand Jain
2020-01-02 16:24   ` Josef Bacik
2020-01-03  9:57     ` Anand Jain
2020-01-02 19:32   ` Steven Davies
2020-01-03 10:28     ` Anand Jain
2020-01-03 14:51       ` Steven Davies
2020-01-03 10:31   ` [PATCH v3 " Anand Jain
2020-01-02 10:12 ` [PATCH v2 2/3] btrfs: sysfs, add readmirror kobject Anand Jain
2020-01-02 10:12 ` [PATCH v2 3/3] btrfs: sysfs, create by_pid readmirror attribute Anand Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).