All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] readmirror feature (sysfs and in-memory only approach)
@ 2019-12-20  5:06 Anand Jain
  2019-12-20  5:06 ` [PATCH 1/3] btrfs: add readmirror type framework Anand Jain
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Anand Jain @ 2019-12-20  5:06 UTC (permalink / raw)
  To: linux-btrfs

As of now we use only %pid method to read stripped mirrored data. So
application's process id determines the stripe id to be read. This type
of routing typically helps in a system with many small independent
applications tying to read random data. On the other hand the %pid
based read IO distribution policy is inefficient if there is a single
application trying to read large data as because the overall disk
bandwidth would remains under utilized.

One type of readmirror policy isn't good enough and other choices are
routing the IO based on device's waitqueue or manual when we have a
read-preferred device or a readmirror policy based on the target storage
caching. So this patch-set introduces a framework where we could add more
readmirror policies.

This policy is a filesystem wide policy as of now, and though the
readmirror policy at the subvolume level is a novel approach as it
provides maximum flexibility in the data center, but as of now its not
practical to implement such a granularity as you can't really ensure
reflinked extents will be read from the stripe of its desire and so
there will be more limitations and it can be assessed separately.

The approach in this patch-set is sys interface with in-memory policy.
And does not add any new readmirror type in this set, which can be add
once we are ok with the framework. Also the default policy remains %pid.

Previous works:
----------------------------------------------------------------------
There were few RFCs [1] before, mainly to figure out storage
(or in memory only) for the readmirror policy and the interface needed.

[1]
https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg86368.html

https://lore.kernel.org/linux-btrfs/20190826090438.7044-1-anand.jain@oracle.com/

https://lore.kernel.org/linux-btrfs/5fcf9c23-89b5-b167-1f80-a0f4ac107d0b@oracle.com/

https://patchwork.kernel.org/cover/10859213/

Mount -o:
In the first trial it was attempted to use the mount -o option to carry
the readmirror policy, this is good for debugging which can make sure
even the mount thread metadata tree blocks are read from the disk desired.
It was very effective in testing radi1/raid10 write-holes.

Extended attribute:
As extended attribute is associated with the inode, to implement this
there is bit of extended attribute abuse or else makes it mandatory to
mount the rootid 5. Its messy unless readmirror policy is applied at the
subvol level which is not possible as of now. 

An item type:
The proposed patch was to create an item to hold the readmirror policy,
it makes sense when compared to the abusive extended attribute approach
but introduces a new item and so no backward compatibility.
-----------------------------------------------------------------------

Anand Jain (3):
  btrfs: add readmirror type framework
  btrfs: sysfs, add readmirror kobject
  btrfs: sysfs, create by_pid readmirror attribute

 fs/btrfs/sysfs.c   | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.c | 16 +++++++++++-
 fs/btrfs/volumes.h |  9 +++++++
 3 files changed, 98 insertions(+), 1 deletion(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] btrfs: add readmirror type framework
  2019-12-20  5:06 [PATCH 0/3] readmirror feature (sysfs and in-memory only approach) Anand Jain
@ 2019-12-20  5:06 ` Anand Jain
  2019-12-20 14:44   ` Josef Bacik
  2019-12-20  5:06 ` [PATCH 2/3] btrfs: sysfs, add readmirror kobject Anand Jain
  2019-12-20  5:06 ` [PATCH 3/3] btrfs: sysfs, create by_pid readmirror attribute Anand Jain
  2 siblings, 1 reply; 6+ messages in thread
From: Anand Jain @ 2019-12-20  5:06 UTC (permalink / raw)
  To: linux-btrfs

As of now we use %pid method to read stripped mirrored data. So
application's process id determines the stripe id to be read. This type
of routing typically helps in a system with many small independent
applications tying to read random data. On the other hand the %pid
based read IO distribution policy is inefficient if there is a single
application trying to read large data and the overall disk bandwidth
remains under utilized.

So this patch introduces a framework where we could add more readmirror
policies, such as routing the IO based on device's waitqueue or manual
when we have a read-preferred device or a policy based on the target
storage caching.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/volumes.c | 16 +++++++++++++++-
 fs/btrfs/volumes.h |  8 ++++++++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c95e47aa84f8..0c6caae29248 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1162,6 +1162,8 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices,
 	fs_devices->opened = 1;
 	fs_devices->latest_bdev = latest_dev->bdev;
 	fs_devices->total_rw_bytes = 0;
+	/* Set the default readmirror policy */
+	atomic_set(&fs_devices->readmirror, BTRFS_READMIRROR_DEFAULT);
 out:
 	return ret;
 }
@@ -5300,7 +5302,19 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
 	else
 		num_stripes = map->num_stripes;
 
-	preferred_mirror = first + current->pid % num_stripes;
+	switch (atomic_read(&fs_info->fs_devices->readmirror)) {
+	case BTRFS_READMIRROR_BY_PID:
+		preferred_mirror = first + current->pid % num_stripes;
+		break;
+	default:
+		/*
+		 * Shouln't happen, just warn and use default instead of failing.
+		 */
+		btrfs_warn_rl(fs_info,
+			      "unknown readmirror type %u, fallback to by_pid",
+			      atomic_read(&fs_info->fs_devices->readmirror));
+		preferred_mirror = first + current->pid % num_stripes;
+	}
 
 	if (dev_replace_is_ongoing &&
 	    fs_info->dev_replace.cont_reading_from_srcdev_mode ==
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 68021d1ee216..d9c4c4e1dbc2 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -209,6 +209,12 @@ struct btrfs_device {
 BTRFS_DEVICE_GETSET_FUNCS(disk_total_bytes);
 BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
 
+/* readmirror_policy types */
+#define BTRFS_READMIRROR_DEFAULT	BTRFS_READMIRROR_BY_PID
+enum btrfs_readmirror_policy_type {
+	BTRFS_READMIRROR_BY_PID,
+};
+
 struct btrfs_fs_devices {
 	u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
 	u8 metadata_uuid[BTRFS_FSID_SIZE];
@@ -260,6 +266,8 @@ struct btrfs_fs_devices {
 	struct kobject *devices_kobj;
 	struct kobject *devinfo_kobj;
 	struct completion kobj_unregister;
+
+	atomic_t readmirror;
 };
 
 #define BTRFS_BIO_INLINE_CSUM_SIZE	64
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] btrfs: sysfs, add readmirror kobject
  2019-12-20  5:06 [PATCH 0/3] readmirror feature (sysfs and in-memory only approach) Anand Jain
  2019-12-20  5:06 ` [PATCH 1/3] btrfs: add readmirror type framework Anand Jain
@ 2019-12-20  5:06 ` Anand Jain
  2019-12-20  5:06 ` [PATCH 3/3] btrfs: sysfs, create by_pid readmirror attribute Anand Jain
  2 siblings, 0 replies; 6+ messages in thread
From: Anand Jain @ 2019-12-20  5:06 UTC (permalink / raw)
  To: linux-btrfs

Add

 /sys/fs/btrfs/UUID/readmirror

kobject so that we can be the readmirror policies as attributes under it.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/sysfs.c   | 22 ++++++++++++++++++++++
 fs/btrfs/volumes.h |  1 +
 2 files changed, 23 insertions(+)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index d414b98fb27f..da5e1938e9b9 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -355,6 +355,10 @@ static ssize_t supported_checksums_show(struct kobject *kobj,
 
 #endif
 
+static const struct attribute *btrfs_readmirror_attrs[] = {
+	NULL,
+};
+
 static ssize_t btrfs_show_u64(u64 *value_ptr, spinlock_t *lock, char *buf)
 {
 	u64 val;
@@ -772,6 +776,13 @@ void btrfs_sysfs_remove_mounted(struct btrfs_fs_info *fs_info)
 {
 	btrfs_reset_fs_info_ptr(fs_info);
 
+	if (fs_info->fs_devices->readmirror_kobj) {
+		sysfs_remove_files(fs_info->fs_devices->readmirror_kobj,
+				   btrfs_readmirror_attrs);
+		kobject_del(fs_info->fs_devices->readmirror_kobj);
+		kobject_put(fs_info->fs_devices->readmirror_kobj);
+	}
+
 	if (fs_info->space_info_kobj) {
 		sysfs_remove_files(fs_info->space_info_kobj, allocation_attrs);
 		kobject_del(fs_info->space_info_kobj);
@@ -1224,6 +1235,17 @@ int btrfs_sysfs_add_mounted(struct btrfs_fs_info *fs_info)
 	if (error)
 		goto failure;
 
+	fs_devs->readmirror_kobj = kobject_create_and_add("readmirror",
+							  &fs_devs->fsid_kobj);
+	if (!fs_devs->readmirror_kobj) {
+		error = -ENOMEM;
+		goto failure;
+	}
+	error = sysfs_create_files(fs_info->readmirror_kobj,
+				   btrfs_readmirror_attrs);
+	if (error)
+		goto failure;
+
 	return 0;
 failure:
 	btrfs_sysfs_remove_mounted(fs_info);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index d9c4c4e1dbc2..5a9fca16a8a6 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -268,6 +268,7 @@ struct btrfs_fs_devices {
 	struct completion kobj_unregister;
 
 	atomic_t readmirror;
+	struct kobject *readmirror_kobj;
 };
 
 #define BTRFS_BIO_INLINE_CSUM_SIZE	64
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] btrfs: sysfs, create by_pid readmirror attribute
  2019-12-20  5:06 [PATCH 0/3] readmirror feature (sysfs and in-memory only approach) Anand Jain
  2019-12-20  5:06 ` [PATCH 1/3] btrfs: add readmirror type framework Anand Jain
  2019-12-20  5:06 ` [PATCH 2/3] btrfs: sysfs, add readmirror kobject Anand Jain
@ 2019-12-20  5:06 ` Anand Jain
  2 siblings, 0 replies; 6+ messages in thread
From: Anand Jain @ 2019-12-20  5:06 UTC (permalink / raw)
  To: linux-btrfs

Add the exisiting %pid based readmirror policy as an attribute

 /sys/fs/btrfs/UUID/readmirror/by_pid

When read, this returns 0 or 1. 1 indicates the currently active policy.
When written with 1, it sets pid as the current active policy and when
written 0 it resets to the default readmirror policy which as of now is
pid as well. For any other value it returns EINVAL.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/sysfs.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index da5e1938e9b9..5a09cf868328 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -355,7 +355,59 @@ static ssize_t supported_checksums_show(struct kobject *kobj,
 
 #endif
 
+#define readmirror_to_fs_devices(_kobj)	to_fs_devs((_kobj)->parent)
+/*
+ * Set the readmirror type to BTRFS_READMIRROR_BY_PID
+ */
+static ssize_t btrfs_sysfs_readmirror_by_pid_store(struct kobject *kobj,
+						   struct kobj_attribute *a,
+						   const char *buf, size_t count)
+{
+	int ret;
+	unsigned long val;
+	struct btrfs_fs_devices *fs_devices;
+
+	fs_devices = readmirror_to_fs_devices(kobj);
+
+	ret = kstrtoul(skip_spaces(buf), 0, &val);
+	if (ret)
+		return ret;
+
+	switch (val) {
+	case 0:
+		atomic_set(&fs_devices->readmirror, BTRFS_READMIRROR_DEFAULT);
+		break;
+	case 1:
+		atomic_set(&fs_devices->readmirror, BTRFS_READMIRROR_BY_PID);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return count;
+}
+
+static ssize_t btrfs_sysfs_readmirror_by_pid_show(struct kobject *kobj,
+						  struct kobj_attribute *a,
+						  char *buf)
+{
+	int val;
+	struct btrfs_fs_devices *fs_devices;
+
+	fs_devices = readmirror_to_fs_devices(kobj);
+
+	if (atomic_read(&fs_devices->readmirror) == BTRFS_READMIRROR_BY_PID)
+		val = 1;
+	else
+		val = 0;
+
+	return snprintf(buf, PAGE_SIZE, "%d\n", val);
+}
+BTRFS_ATTR_RW(readmirror, by_pid, btrfs_sysfs_readmirror_by_pid_show,
+	      btrfs_sysfs_readmirror_by_pid_store);
+
 static const struct attribute *btrfs_readmirror_attrs[] = {
+	BTRFS_ATTR_PTR(readmirror, by_pid),
 	NULL,
 };
 
@@ -1241,7 +1293,7 @@ int btrfs_sysfs_add_mounted(struct btrfs_fs_info *fs_info)
 		error = -ENOMEM;
 		goto failure;
 	}
-	error = sysfs_create_files(fs_info->readmirror_kobj,
+	error = sysfs_create_files(fs_devs->readmirror_kobj,
 				   btrfs_readmirror_attrs);
 	if (error)
 		goto failure;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/3] btrfs: add readmirror type framework
  2019-12-20  5:06 ` [PATCH 1/3] btrfs: add readmirror type framework Anand Jain
@ 2019-12-20 14:44   ` Josef Bacik
  2020-01-02 10:12     ` Anand Jain
  0 siblings, 1 reply; 6+ messages in thread
From: Josef Bacik @ 2019-12-20 14:44 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs

On 12/20/19 12:06 AM, Anand Jain wrote:
> As of now we use %pid method to read stripped mirrored data. So
> application's process id determines the stripe id to be read. This type
> of routing typically helps in a system with many small independent
> applications tying to read random data. On the other hand the %pid
> based read IO distribution policy is inefficient if there is a single
> application trying to read large data and the overall disk bandwidth
> remains under utilized.
> 
> So this patch introduces a framework where we could add more readmirror
> policies, such as routing the IO based on device's waitqueue or manual
> when we have a read-preferred device or a policy based on the target
> storage caching.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>   fs/btrfs/volumes.c | 16 +++++++++++++++-
>   fs/btrfs/volumes.h |  8 ++++++++
>   2 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index c95e47aa84f8..0c6caae29248 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1162,6 +1162,8 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices,
>   	fs_devices->opened = 1;
>   	fs_devices->latest_bdev = latest_dev->bdev;
>   	fs_devices->total_rw_bytes = 0;
> +	/* Set the default readmirror policy */
> +	atomic_set(&fs_devices->readmirror, BTRFS_READMIRROR_DEFAULT);
There's no reason for this to be atomic, it's just a behavior change, if you 
really want to be super safe use READ_ONCE/WRITE_ONCE and have readmirror be 
your enum.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/3] btrfs: add readmirror type framework
  2019-12-20 14:44   ` Josef Bacik
@ 2020-01-02 10:12     ` Anand Jain
  0 siblings, 0 replies; 6+ messages in thread
From: Anand Jain @ 2020-01-02 10:12 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs

On 12/20/19 10:44 PM, Josef Bacik wrote:
> On 12/20/19 12:06 AM, Anand Jain wrote:
>> As of now we use %pid method to read stripped mirrored data. So
>> application's process id determines the stripe id to be read. This type
>> of routing typically helps in a system with many small independent
>> applications tying to read random data. On the other hand the %pid
>> based read IO distribution policy is inefficient if there is a single
>> application trying to read large data and the overall disk bandwidth
>> remains under utilized.
>>
>> So this patch introduces a framework where we could add more readmirror
>> policies, such as routing the IO based on device's waitqueue or manual
>> when we have a read-preferred device or a policy based on the target
>> storage caching.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>   fs/btrfs/volumes.c | 16 +++++++++++++++-
>>   fs/btrfs/volumes.h |  8 ++++++++
>>   2 files changed, 23 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index c95e47aa84f8..0c6caae29248 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -1162,6 +1162,8 @@ static int open_fs_devices(struct 
>> btrfs_fs_devices *fs_devices,
>>       fs_devices->opened = 1;
>>       fs_devices->latest_bdev = latest_dev->bdev;
>>       fs_devices->total_rw_bytes = 0;
>> +    /* Set the default readmirror policy */
>> +    atomic_set(&fs_devices->readmirror, BTRFS_READMIRROR_DEFAULT);
> There's no reason for this to be atomic, it's just a behavior change, if 
> you really want to be super safe use READ_ONCE/WRITE_ONCE and have 
> readmirror be your enum.  Thanks,

  Agreed fs_devices::readmirror doesn't have to be atmoic_t. Fixed this
  to declare it as u8 in v2.

Thanks, Anand


> Josef


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-01-02 10:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-20  5:06 [PATCH 0/3] readmirror feature (sysfs and in-memory only approach) Anand Jain
2019-12-20  5:06 ` [PATCH 1/3] btrfs: add readmirror type framework Anand Jain
2019-12-20 14:44   ` Josef Bacik
2020-01-02 10:12     ` Anand Jain
2019-12-20  5:06 ` [PATCH 2/3] btrfs: sysfs, add readmirror kobject Anand Jain
2019-12-20  5:06 ` [PATCH 3/3] btrfs: sysfs, create by_pid readmirror attribute Anand Jain

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.