All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 rebased 0/5] readmirror feature (sysfs and in-memory only approach; with new read_policy device)
@ 2020-04-06 11:51 Anand Jain
  2020-04-06 11:51 ` [PATCH v7 1/5] btrfs: add btrfs_strmatch helper Anand Jain
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Anand Jain @ 2020-04-06 11:51 UTC (permalink / raw)
  To: linux-btrfs

v7:
Fix switch's fall through warning. Changle logs updates where necessary.

v6:
Patch 4/5 - If there is no change in device's read prefer then don't log
Patch 4/5 - Add pid to the logs
Patch 5/5 - If there isn't read preferred device in the chunk don't reset
read policy to default, instead just use stripe 0. As this is in
the read path it avoids going through the device list to find
read preferred device. So inline to this drop to check if there
is read preferred device before setting read policy to device.

__ Original email: __

v5:
Worked on review comments as received in its previous version.
Please refer to individual patches for the specific changes.
Introduces the new read_policy 'device'.

v4:
Rename readmirror attribute to read_policy. Drop separate kobj for
readmirror instead create read_policy attribute in fsid kobj.
merge v2:2/3 and v2:3/3 into v4:2/2. Patch titles have changed.
 
v3:
v2:
Mainly fixes the fs_devices::readmirror declaration type from atomic_t
to u8. (Thanks Josef).

v1:
As of now we use only %pid method to read stripped mirrored data. So
application's process id determines the stripe id to be read. This type
of routing typically helps in a system with many small independent
applications tying to read random data. On the other hand the %pid
based read IO distribution policy is inefficient if there is a single
application trying to read large data as because the overall disk
bandwidth would remains under utilized.

One type of readmirror policy isn't good enough and other choices are
routing the IO based on device's waitqueue or manual when we have a
read-preferred device or a readmirror policy based on the target storage
caching. So this patch-set introduces a framework where we could add more
readmirror policies.

This policy is a filesystem wide policy as of now, and though the
readmirror policy at the subvolume level is a novel approach as it
provides maximum flexibility in the data center, but as of now its not
practical to implement such a granularity as you can't really ensure
reflinked extents will be read from the stripe of its desire and so
there will be more limitations and it can be assessed separately.

The approach in this patch-set is sys interface with in-memory policy.
And does not add any new readmirror type in this set, which can be add
once we are ok with the framework. Also the default policy remains %pid.

Previous works:
----------------------------------------------------------------------
There were few RFCs [1] before, mainly to figure out storage
(or in memory only) for the readmirror policy and the interface needed.

[1]
https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg86368.html

https://lore.kernel.org/linux-btrfs/20190826090438.7044-1-anand.jain@oracle.com/

https://lore.kernel.org/linux-btrfs/5fcf9c23-89b5-b167-1f80-a0f4ac107d0b@oracle.com/

https://patchwork.kernel.org/cover/10859213/

Mount -o:
In the first trial it was attempted to use the mount -o option to carry
the readmirror policy, this is good for debugging which can make sure
even the mount thread metadata tree blocks are read from the disk desired.
It was very effective in testing radi1/raid10 write-holes.

Extended attribute:
As extended attribute is associated with the inode, to implement this
there is bit of extended attribute abuse or else makes it mandatory to
mount the rootid 5. Its messy unless readmirror policy is applied at the
subvol level which is not possible as of now. 

An item type:
The proposed patch was to create an item to hold the readmirror policy,
it makes sense when compared to the abusive extended attribute approach
but introduces a new item and so no backward compatibility.
-----------------------------------------------------------------------

Anand Jain (5):
  btrfs: add btrfs_strmatch helper
  btrfs: create read policy framework
  btrfs: create read policy sysfs attribute, pid
  btrfs: introduce new device-state read_preferred
  btrfs: introduce new read_policy device

 fs/btrfs/sysfs.c   | 128 +++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.c |  39 +++++++++++++-
 fs/btrfs/volumes.h |  16 ++++++
 3 files changed, 182 insertions(+), 1 deletion(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v7 1/5] btrfs: add btrfs_strmatch helper
  2020-04-06 11:51 [PATCH v7 rebased 0/5] readmirror feature (sysfs and in-memory only approach; with new read_policy device) Anand Jain
@ 2020-04-06 11:51 ` Anand Jain
  2020-04-06 11:51 ` [PATCH v7 2/5] btrfs: create read policy framework Anand Jain
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Anand Jain @ 2020-04-06 11:51 UTC (permalink / raw)
  To: linux-btrfs

Add a generic helper to match the golden-string in the given-string,
and ignore the leading and trailing whitespaces if any.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Suggested-by: David Sterba <dsterba@suse.com>
---
v5: born

 fs/btrfs/sysfs.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 93cf76118a04..7bb68cef98ab 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -809,6 +809,29 @@ static ssize_t btrfs_checksum_show(struct kobject *kobj,
 
 BTRFS_ATTR(, checksum, btrfs_checksum_show);
 
+/*
+ * Match the %golden in the %given. Ignore the leading and trailing whitespaces
+ * if any.
+ */
+static int btrfs_strmatch(const char *given, const char *golden)
+{
+	size_t len = strlen(golden);
+	char *stripped;
+
+	/* strip leading whitespace */
+	stripped = skip_spaces(given);
+
+	if (strncmp(stripped, golden, len) == 0) {
+		/* strip trailing whitespace */
+		if (strlen(skip_spaces(stripped + len)))
+			return -EINVAL;
+
+		return 0;
+	}
+
+	return -EINVAL;
+}
+
 static const struct attribute *btrfs_attrs[] = {
 	BTRFS_ATTR_PTR(, label),
 	BTRFS_ATTR_PTR(, nodesize),
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v7 2/5] btrfs: create read policy framework
  2020-04-06 11:51 [PATCH v7 rebased 0/5] readmirror feature (sysfs and in-memory only approach; with new read_policy device) Anand Jain
  2020-04-06 11:51 ` [PATCH v7 1/5] btrfs: add btrfs_strmatch helper Anand Jain
@ 2020-04-06 11:51 ` Anand Jain
  2020-04-06 11:51 ` [PATCH v7 3/5] btrfs: create read policy sysfs attribute, pid Anand Jain
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Anand Jain @ 2020-04-06 11:51 UTC (permalink / raw)
  To: linux-btrfs

As of now we use %pid method to read stripped mirrored data, which means
process id determines the stripe id to be read. This type of routing
typically helps in a system with many small independent processes tying
to read random data. On the other hand the %pid based read IO policy is
inefficient because if there is a single process trying to read large
data the overall disk bandwidth remains under-utilized.

So this patch introduces read policy framework so that we could add more
read policies, such as IO routing based on device's wait-queue or manual
when we have a read-preferred device or a policy based on the target
storage caching.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
(rebased on 5.6)
v7: Fix missing /* fall through */ in the switch
    Removed Reviewed-by: Josef Bacik <josef@toxicpanda.com>
v6:-
v5: Title renamed from:- btrfs: add read_policy framework
    Change log updated.
    Unnecessary comment dropped, added more where necessary.
    Optimize code in the switch remove duplicate code.
    Define BTRFS_READ_POLICY_DEFAULT dropped.
    Rename enum btrfs_read_policy_type to enum btrfs_read_policy.
    Rename BTRFS_READ_BY_PID to BTRFS_READ_POLICY_PID.
    (As its mainly renames. Reviewed-by retained).
v4: -
v3: Declare fs_devices::readmirror as enum btrfs_readmirror_policy_type
v2: Declare fs_devices::readmirror as u8 instead of atomic_t 
    A small change in comment and change log wordings.

 fs/btrfs/volumes.c | 15 ++++++++++++++-
 fs/btrfs/volumes.h | 14 ++++++++++++++
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c1909e5f4506..bafcf10f72ea 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1206,6 +1206,7 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices,
 	fs_devices->latest_bdev = latest_dev->bdev;
 	fs_devices->total_rw_bytes = 0;
 	fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_REGULAR;
+	fs_devices->read_policy = BTRFS_READ_POLICY_PID;
 out:
 	return ret;
 }
@@ -5445,7 +5446,19 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
 	else
 		num_stripes = map->num_stripes;
 
-	preferred_mirror = first + current->pid % num_stripes;
+	switch (fs_info->fs_devices->read_policy) {
+	default:
+		/*
+		 * Shouldn't happen, just warn and use pid instead of failing.
+		 */
+		btrfs_warn_rl(fs_info,
+			      "unknown read_policy type %u, fallback to pid",
+			      fs_info->fs_devices->read_policy);
+		/* fall through */
+	case BTRFS_READ_POLICY_PID:
+		preferred_mirror = first + current->pid % num_stripes;
+		break;
+	}
 
 	if (dev_replace_is_ongoing &&
 	    fs_info->dev_replace.cont_reading_from_srcdev_mode ==
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index f067b5934c46..f5ed864e4c5d 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -211,6 +211,15 @@ enum btrfs_chunk_allocation_policy {
 	BTRFS_CHUNK_ALLOC_REGULAR,
 };
 
+/*
+ * Read policies for the mirrored block groups, read picks the stripe based
+ * on these policies.
+ */
+enum btrfs_read_policy {
+	BTRFS_READ_POLICY_PID,
+	BTRFS_NR_READ_POLICY,
+};
+
 struct btrfs_fs_devices {
 	u8 fsid[BTRFS_FSID_SIZE]; /* FS specific uuid */
 	u8 metadata_uuid[BTRFS_FSID_SIZE];
@@ -264,6 +273,11 @@ struct btrfs_fs_devices {
 	struct completion kobj_unregister;
 
 	enum btrfs_chunk_allocation_policy chunk_alloc_policy;
+
+	/*
+	 * policy used to read the mirrored stripes
+	 */
+	enum btrfs_read_policy read_policy;
 };
 
 #define BTRFS_BIO_INLINE_CSUM_SIZE	64
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v7 3/5] btrfs: create read policy sysfs attribute, pid
  2020-04-06 11:51 [PATCH v7 rebased 0/5] readmirror feature (sysfs and in-memory only approach; with new read_policy device) Anand Jain
  2020-04-06 11:51 ` [PATCH v7 1/5] btrfs: add btrfs_strmatch helper Anand Jain
  2020-04-06 11:51 ` [PATCH v7 2/5] btrfs: create read policy framework Anand Jain
@ 2020-04-06 11:51 ` Anand Jain
  2020-05-19 10:07   ` Johannes Thumshirn
  2020-04-06 11:51 ` [PATCH v7 4/5] btrfs: introduce new device-state read_preferred Anand Jain
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Anand Jain @ 2020-04-06 11:51 UTC (permalink / raw)
  To: linux-btrfs

Add

 /sys/fs/btrfs/UUID/read_policy

attribute so that the read policy for the raid1 and raid10 chunks can be
tuned.

When this attribute is read, it shall show all available policies, with
active policy being with in [ ]. The read_policy attribute can be written
using one of the items listed in there.

For example:
  $cat /sys/fs/btrfs/UUID/read_policy
  [pid]
  $echo pid > /sys/fs/btrfs/UUID/read_policy

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v5: 
  Title rename: old: btrfs: sysfs, add read_policy attribute
  Uses the btrfs_strmatch() helper (BTRFS_READ_POLICY_NAME_MAX dropped).
  Use the table for the policy names.
  Rename len to ret.
  Use a simple logic to prefix space in btrfs_read_policy_show()
  Reviewed-by: Josef Bacik <josef@toxicpanda.com> dropped.

v4:-
v3: rename [by_pid] to [pid]
v2: v2: check input len before strip and kstrdup

 fs/btrfs/sysfs.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 7bb68cef98ab..c9a8850b186a 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -832,6 +832,54 @@ static int btrfs_strmatch(const char *given, const char *golden)
 	return -EINVAL;
 }
 
+static const char* const btrfs_read_policy_name[] = { "pid" };
+
+static ssize_t btrfs_read_policy_show(struct kobject *kobj,
+				      struct kobj_attribute *a, char *buf)
+{
+	int i;
+	ssize_t ret = 0;
+	struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj);
+
+	for (i = 0; i < BTRFS_NR_READ_POLICY; i++) {
+		if (fs_devices->read_policy == i)
+			ret += snprintf(buf + ret, PAGE_SIZE - ret, "%s[%s]",
+					(ret == 0 ? "" : " "),
+					btrfs_read_policy_name[i]);
+		else
+			ret += snprintf(buf + ret, PAGE_SIZE - ret, "%s%s",
+					(ret == 0 ? "" : " "),
+					btrfs_read_policy_name[i]);
+	}
+
+	ret += snprintf(buf + ret, PAGE_SIZE - ret, "\n");
+
+	return ret;
+}
+
+static ssize_t btrfs_read_policy_store(struct kobject *kobj,
+				       struct kobj_attribute *a,
+				       const char *buf, size_t len)
+{
+	int i;
+	struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj);
+
+	for (i = 0; i < BTRFS_NR_READ_POLICY; i++) {
+		if (btrfs_strmatch(buf, btrfs_read_policy_name[i]) == 0) {
+			if (i != fs_devices->read_policy) {
+				fs_devices->read_policy = i;
+				btrfs_info(fs_devices->fs_info,
+					   "read policy set to '%s'",
+					   btrfs_read_policy_name[i]);
+			}
+			return len;
+		}
+	}
+
+	return -EINVAL;
+}
+BTRFS_ATTR_RW(, read_policy, btrfs_read_policy_show, btrfs_read_policy_store);
+
 static const struct attribute *btrfs_attrs[] = {
 	BTRFS_ATTR_PTR(, label),
 	BTRFS_ATTR_PTR(, nodesize),
@@ -840,6 +888,7 @@ static const struct attribute *btrfs_attrs[] = {
 	BTRFS_ATTR_PTR(, quota_override),
 	BTRFS_ATTR_PTR(, metadata_uuid),
 	BTRFS_ATTR_PTR(, checksum),
+	BTRFS_ATTR_PTR(, read_policy),
 	NULL,
 };
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v7 4/5] btrfs: introduce new device-state read_preferred
  2020-04-06 11:51 [PATCH v7 rebased 0/5] readmirror feature (sysfs and in-memory only approach; with new read_policy device) Anand Jain
                   ` (2 preceding siblings ...)
  2020-04-06 11:51 ` [PATCH v7 3/5] btrfs: create read policy sysfs attribute, pid Anand Jain
@ 2020-04-06 11:51 ` Anand Jain
  2020-04-06 11:51 ` [PATCH v7 5/5] btrfs: introduce new read_policy device Anand Jain
  2020-04-30  9:02 ` [PATCH v7 rebased 0/5] readmirror feature (sysfs and in-memory only approach; with new read_policy device) Anand Jain
  5 siblings, 0 replies; 16+ messages in thread
From: Anand Jain @ 2020-04-06 11:51 UTC (permalink / raw)
  To: linux-btrfs

This is a preparatory patch and introduces a new device flag
'read_preferred', and is a generic flag which along with the read_policy
'device' in the following patch the user can route the read IO to the
device of choice.

This also provides a sysfs interface to set the device state as
read_preferred.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v7: Change log updated.
v6: If there is no change in device's read prefer then don't log. 
    Add pid to the logs.
v5: born

 fs/btrfs/sysfs.c   | 55 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.h |  1 +
 2 files changed, 56 insertions(+)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index c9a8850b186a..72daaedb7b04 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -1317,11 +1317,66 @@ static ssize_t btrfs_devinfo_writeable_show(struct kobject *kobj,
 }
 BTRFS_ATTR(devid, writeable, btrfs_devinfo_writeable_show);
 
+static ssize_t btrfs_devinfo_read_pref_show(struct kobject *kobj,
+					    struct kobj_attribute *a, char *buf)
+{
+	int val;
+	struct btrfs_device *device = container_of(kobj, struct btrfs_device,
+						   devid_kobj);
+
+	val = !!test_bit(BTRFS_DEV_STATE_READ_PREFERRED, &device->dev_state);
+
+	return snprintf(buf, PAGE_SIZE, "%d\n", val);
+}
+
+static ssize_t btrfs_devinfo_read_pref_store(struct kobject *kobj,
+					     struct kobj_attribute *a,
+					     const char *buf, size_t len)
+{
+	int ret;
+	unsigned long val;
+	struct btrfs_device *device;
+
+	ret = kstrtoul(skip_spaces(buf), 0, &val);
+	if (ret)
+		return ret;
+
+	if (val != 0 && val != 1)
+		return -EINVAL;
+
+	/*
+	 * lock is not required, the btrfs_device struct can't be freed while
+	 * its kobject btrfs_device::devid_kobj is still open.
+	 */
+	device = container_of(kobj, struct btrfs_device, devid_kobj);
+
+	if (val &&
+	    ! test_bit(BTRFS_DEV_STATE_READ_PREFERRED, &device->dev_state)) {
+
+		set_bit(BTRFS_DEV_STATE_READ_PREFERRED, &device->dev_state);
+		btrfs_info(device->fs_devices->fs_info,
+			   "set read preferred on devid %llu (%d)",
+			   device->devid, task_pid_nr(current));
+	} else if (!val &&
+		   test_bit(BTRFS_DEV_STATE_READ_PREFERRED, &device->dev_state)) {
+
+		clear_bit(BTRFS_DEV_STATE_READ_PREFERRED, &device->dev_state);
+		btrfs_info(device->fs_devices->fs_info,
+			   "reset read preferred on devid %llu (%d)",
+			   device->devid, task_pid_nr(current));
+	}
+
+	return len;
+}
+BTRFS_ATTR_RW(devid, read_preferred, btrfs_devinfo_read_pref_show,
+	      btrfs_devinfo_read_pref_store);
+
 static struct attribute *devid_attrs[] = {
 	BTRFS_ATTR_PTR(devid, in_fs_metadata),
 	BTRFS_ATTR_PTR(devid, missing),
 	BTRFS_ATTR_PTR(devid, replace_target),
 	BTRFS_ATTR_PTR(devid, writeable),
+	BTRFS_ATTR_PTR(devid, read_preferred),
 	NULL
 };
 ATTRIBUTE_GROUPS(devid);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 1775d35706ab..487a54c3140e 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -50,6 +50,7 @@ struct btrfs_io_geometry {
 #define BTRFS_DEV_STATE_MISSING		(2)
 #define BTRFS_DEV_STATE_REPLACE_TGT	(3)
 #define BTRFS_DEV_STATE_FLUSH_SENT	(4)
+#define BTRFS_DEV_STATE_READ_PREFERRED	(5)
 
 struct btrfs_device {
 	struct list_head dev_list; /* device_list_mutex */
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v7 5/5] btrfs: introduce new read_policy device
  2020-04-06 11:51 [PATCH v7 rebased 0/5] readmirror feature (sysfs and in-memory only approach; with new read_policy device) Anand Jain
                   ` (3 preceding siblings ...)
  2020-04-06 11:51 ` [PATCH v7 4/5] btrfs: introduce new device-state read_preferred Anand Jain
@ 2020-04-06 11:51 ` Anand Jain
  2020-04-30  9:02 ` [PATCH v7 rebased 0/5] readmirror feature (sysfs and in-memory only approach; with new read_policy device) Anand Jain
  5 siblings, 0 replies; 16+ messages in thread
From: Anand Jain @ 2020-04-06 11:51 UTC (permalink / raw)
  To: linux-btrfs

Read-policy type 'device' and device flag 'read-preferred':

The read-policy type device picks the device(s) flagged as
read-preferred for reading chunks of type raid1, raid10,
raid1c3 and raid1c4.

As system might contain ssd, nvme, iscsi or san lun, and which are all
a non-rotational device its not a good idea to set the read-preferred
automatically. Instead device read-policy along with the read-preferred
flag provides an ability to do it manually. This advance tuning is
useful in more than one situation, like for example,
 - In heterogeneous-disk volume it provides an ability to choose the
   low latency disks for reading.
 - Useful for more accurate testing.
 - Avoid known problematic device from reading the chunk until it is
   replaced (by mark the good devices as read-preferred).

Note:

If the read-policy type is set to 'device', but there isn't any device
which is flagged as read-preferred, then stripe 0 is used for reading.

The device replace won't migrate the read-preferred flag to the new
replace target device.

As of now this is in-memory only feature.

Its point less to set the read-preferred flag on the missing device,
as IOs aren't submitted to the missing device.

If there are more than one read-preferred device in a chunk, the read IO
shall go to the stripe 0 (as of now, when qdepth patches are integrated
we will use the least busy device among the read-preferred devices).

Usage example:

Consider a typical two disks raid1.

Configure devid1 for reading.

$ echo 1 > devinfo/1/read_preferred
$ cat devinfo/1/read_preferred; cat devinfo/2/read_preferred
1
0

$ pwd
/sys/fs/btrfs/12345678-1234-1234-1234-123456789abc

$ cat read_policy; echo device > ./read_policy; cat read_policy
[pid] device
pid [device]

Now read IOs are sent to devid 1 (sdb).

$ echo 3 > /proc/sys/vm/drop_caches; md5sum /btrfs/YkZI

$ iostat -zy 1 | egrep 'sdb|sdc' (from another terminal)
sdb              50.00     40048.00         0.00      40048          0

Change the read-preferred device from devid 1 to devid 2 (sdc).

$ echo 0 > ./devinfo/1/read_preferred; echo 1 > ./devinfo/2/read_preferred;

[ 3343.918658] BTRFS info (device sdb): reset read preferred on devid 1 (1334)
[ 3343.919876] BTRFS info (device sdb): set read preferred on devid 2 (1334)

Further read ios are sent to devid 2 (sdc).

$ echo 3 > /proc/sys/vm/drop_caches; md5sum /btrfs/YkZI

$ iostat -zy 1 | egrep 'sdb|sdc' (from another terminal)
sdc              49.00     40048.00         0.00      40048          0

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v7: Change log updated.
v6:
. If there isn't read preferred device in the chunk don't reset
read policy to default, instead just use stripe 0. As this is in
the read path it avoids going through the device list to find
read preferred device. So inline to this drop to check if there
is read preferred device before setting read policy to device.
. Commit log updated. Adds more info about this new feature.
v5: born

 fs/btrfs/sysfs.c   |  3 ++-
 fs/btrfs/volumes.c | 24 ++++++++++++++++++++++++
 fs/btrfs/volumes.h |  1 +
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 72daaedb7b04..af53ed879dd6 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -832,7 +832,8 @@ static int btrfs_strmatch(const char *given, const char *golden)
 	return -EINVAL;
 }
 
-static const char* const btrfs_read_policy_name[] = { "pid" };
+/* Must follow the order as in enum btrfs_read_policy */
+static const char* const btrfs_read_policy_name[] = { "pid", "device" };
 
 static ssize_t btrfs_read_policy_show(struct kobject *kobj,
 				      struct kobj_attribute *a, char *buf)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9dd7e3687463..5e53380e1d8d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5380,6 +5380,26 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
 	return ret;
 }
 
+static int btrfs_find_read_preferred(struct map_lookup *map, int num_stripe)
+{
+	int i;
+
+	/*
+	 * If there are more than one read preferred devices, then just pick the
+	 * first found read preferred device as of now. Once we have the Qdepth
+	 * based device selection, we could pick the least busy device among the
+	 * read preferred devices.
+	 */
+	for (i = 0; i < num_stripe; i++) {
+		if (test_bit(BTRFS_DEV_STATE_READ_PREFERRED,
+			     &map->stripes[i].dev->dev_state))
+			return i;
+        }
+
+	/* If there is no read preferred device then just use stripe 0 */
+	return 0;
+}
+
 static int find_live_mirror(struct btrfs_fs_info *fs_info,
 			    struct map_lookup *map, int first,
 			    int dev_replace_is_ongoing)
@@ -5399,6 +5419,10 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
 		num_stripes = map->num_stripes;
 
 	switch (fs_info->fs_devices->read_policy) {
+	case BTRFS_READ_POLICY_DEVICE:
+		preferred_mirror = btrfs_find_read_preferred(map, num_stripes);
+		preferred_mirror = first + preferred_mirror;
+		break;
 	default:
 		/*
 		 * Shouldn't happen, just warn and use pid instead of failing.
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 487a54c3140e..efa9635a4748 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -214,6 +214,7 @@ BTRFS_DEVICE_GETSET_FUNCS(bytes_used);
  */
 enum btrfs_read_policy {
 	BTRFS_READ_POLICY_PID,
+	BTRFS_READ_POLICY_DEVICE,
 	BTRFS_NR_READ_POLICY,
 };
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v7 rebased 0/5] readmirror feature (sysfs and in-memory only approach; with new read_policy device)
  2020-04-06 11:51 [PATCH v7 rebased 0/5] readmirror feature (sysfs and in-memory only approach; with new read_policy device) Anand Jain
                   ` (4 preceding siblings ...)
  2020-04-06 11:51 ` [PATCH v7 5/5] btrfs: introduce new read_policy device Anand Jain
@ 2020-04-30  9:02 ` Anand Jain
  2020-05-15 19:58   ` David Sterba
  5 siblings, 1 reply; 16+ messages in thread
From: Anand Jain @ 2020-04-30  9:02 UTC (permalink / raw)
  To: dsterba; +Cc: linux-btrfs


David,

  I am not sure if this will be integrated in 5.8 and worth the time to
  rebase. Kindly suggest.

-Anand

On 6/4/20 7:51 pm, Anand Jain wrote:
> v7:
> Fix switch's fall through warning. Changle logs updates where necessary.
> 
> v6:
> Patch 4/5 - If there is no change in device's read prefer then don't log
> Patch 4/5 - Add pid to the logs
> Patch 5/5 - If there isn't read preferred device in the chunk don't reset
> read policy to default, instead just use stripe 0. As this is in
> the read path it avoids going through the device list to find
> read preferred device. So inline to this drop to check if there
> is read preferred device before setting read policy to device.
> 
> __ Original email: __
> 
> v5:
> Worked on review comments as received in its previous version.
> Please refer to individual patches for the specific changes.
> Introduces the new read_policy 'device'.
> 
> v4:
> Rename readmirror attribute to read_policy. Drop separate kobj for
> readmirror instead create read_policy attribute in fsid kobj.
> merge v2:2/3 and v2:3/3 into v4:2/2. Patch titles have changed.
>   
> v3:
> v2:
> Mainly fixes the fs_devices::readmirror declaration type from atomic_t
> to u8. (Thanks Josef).
> 
> v1:
> As of now we use only %pid method to read stripped mirrored data. So
> application's process id determines the stripe id to be read. This type
> of routing typically helps in a system with many small independent
> applications tying to read random data. On the other hand the %pid
> based read IO distribution policy is inefficient if there is a single
> application trying to read large data as because the overall disk
> bandwidth would remains under utilized.
> 
> One type of readmirror policy isn't good enough and other choices are
> routing the IO based on device's waitqueue or manual when we have a
> read-preferred device or a readmirror policy based on the target storage
> caching. So this patch-set introduces a framework where we could add more
> readmirror policies.
> 
> This policy is a filesystem wide policy as of now, and though the
> readmirror policy at the subvolume level is a novel approach as it
> provides maximum flexibility in the data center, but as of now its not
> practical to implement such a granularity as you can't really ensure
> reflinked extents will be read from the stripe of its desire and so
> there will be more limitations and it can be assessed separately.
> 
> The approach in this patch-set is sys interface with in-memory policy.
> And does not add any new readmirror type in this set, which can be add
> once we are ok with the framework. Also the default policy remains %pid.
> 
> Previous works:
> ----------------------------------------------------------------------
> There were few RFCs [1] before, mainly to figure out storage
> (or in memory only) for the readmirror policy and the interface needed.
> 
> [1]
> https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg86368.html
> 
> https://lore.kernel.org/linux-btrfs/20190826090438.7044-1-anand.jain@oracle.com/
> 
> https://lore.kernel.org/linux-btrfs/5fcf9c23-89b5-b167-1f80-a0f4ac107d0b@oracle.com/
> 
> https://patchwork.kernel.org/cover/10859213/
> 
> Mount -o:
> In the first trial it was attempted to use the mount -o option to carry
> the readmirror policy, this is good for debugging which can make sure
> even the mount thread metadata tree blocks are read from the disk desired.
> It was very effective in testing radi1/raid10 write-holes.
> 
> Extended attribute:
> As extended attribute is associated with the inode, to implement this
> there is bit of extended attribute abuse or else makes it mandatory to
> mount the rootid 5. Its messy unless readmirror policy is applied at the
> subvol level which is not possible as of now.
> 
> An item type:
> The proposed patch was to create an item to hold the readmirror policy,
> it makes sense when compared to the abusive extended attribute approach
> but introduces a new item and so no backward compatibility.
> -----------------------------------------------------------------------
> 
> Anand Jain (5):
>    btrfs: add btrfs_strmatch helper
>    btrfs: create read policy framework
>    btrfs: create read policy sysfs attribute, pid
>    btrfs: introduce new device-state read_preferred
>    btrfs: introduce new read_policy device
> 
>   fs/btrfs/sysfs.c   | 128 +++++++++++++++++++++++++++++++++++++++++++++
>   fs/btrfs/volumes.c |  39 +++++++++++++-
>   fs/btrfs/volumes.h |  16 ++++++
>   3 files changed, 182 insertions(+), 1 deletion(-)
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v7 rebased 0/5] readmirror feature (sysfs and in-memory only approach; with new read_policy device)
  2020-04-30  9:02 ` [PATCH v7 rebased 0/5] readmirror feature (sysfs and in-memory only approach; with new read_policy device) Anand Jain
@ 2020-05-15 19:58   ` David Sterba
  2020-05-19 10:02     ` Anand Jain
  0 siblings, 1 reply; 16+ messages in thread
From: David Sterba @ 2020-05-15 19:58 UTC (permalink / raw)
  To: Anand Jain; +Cc: dsterba, linux-btrfs

On Thu, Apr 30, 2020 at 05:02:27PM +0800, Anand Jain wrote:
>   I am not sure if this will be integrated in 5.8 and worth the time to
>   rebase. Kindly suggest.

The preparatory work is ok, but the actual mirror selection policy
addresses a usecase that I think is not the one most users are
interested in. Devices of vastly different performance capabilities like
rotational disks vs nvme vs ssd vs network block devices in one
filesystem are not something commonly found.

What we really need is a saner balancing mechanism than pid-based, that
is also going to be used any time there are more devices from the same
speed class for the fast devices too.

So, no the patchset is not on track for a merge without the improved
default balancing. The preferred device for reads can be one of the
policies, I understand the usecase and have not problem with that
although wouldn't probably have use for it.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v7 rebased 0/5] readmirror feature (sysfs and in-memory only approach; with new read_policy device)
  2020-05-15 19:58   ` David Sterba
@ 2020-05-19 10:02     ` Anand Jain
  2020-05-22 13:46       ` David Sterba
  2020-05-22 19:15       ` Steven Davies
  0 siblings, 2 replies; 16+ messages in thread
From: Anand Jain @ 2020-05-19 10:02 UTC (permalink / raw)
  To: dsterba, dsterba, linux-btrfs



On 16/5/20 3:58 am, David Sterba wrote:
> On Thu, Apr 30, 2020 at 05:02:27PM +0800, Anand Jain wrote:
>>    I am not sure if this will be integrated in 5.8 and worth the time to
>>    rebase. Kindly suggest.
> 
> The preparatory work is ok, but the actual mirror selection policy
> addresses a usecase that I think is not the one most users are
> interested in. Devices of vastly different performance capabilities like
> rotational disks vs nvme vs ssd vs network block devices in one
> filesystem are not something commonly found.
> 
> What we really need is a saner balancing mechanism than pid-based, that
> is also going to be used any time there are more devices from the same
> speed class for the fast devices too.
> 

There are two things here, the read_policy framework in the preparatory
patches and a new balancing or read_policy, device.

> So, no the patchset is not on track for a merge without the improved
> default balancing.

It can be worked on top of the preparatory read_policy framework?
This patchset does not change any default read_policy (or balancing)
which is pid as of now. Working on a default read_policy/balancing
was out of the scope of this patchset.

> The preferred device for reads can be one of the
> policies, I understand the usecase and have not problem with that
> although wouldn't probably have use for it.

For us, read_policy:device helps to reproduce raid1 data corruption
    https://patchwork.kernel.org/patch/11475417/
And xfstests btrfs/14[0-3] can be improved so that the reads directly
go the device of the choice, instead of waiting for the odd/even pid.

Common configuration won't need this, advance configurations assembled
with heterogeneous devices where read performance is more critical than
write will find read_policy:device useful.

Thanks, Anand

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v7 3/5] btrfs: create read policy sysfs attribute, pid
  2020-04-06 11:51 ` [PATCH v7 3/5] btrfs: create read policy sysfs attribute, pid Anand Jain
@ 2020-05-19 10:07   ` Johannes Thumshirn
  2020-05-20  8:54     ` Anand Jain
  0 siblings, 1 reply; 16+ messages in thread
From: Johannes Thumshirn @ 2020-05-19 10:07 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs

On 06/04/2020 13:51, Anand Jain wrote:
> +static ssize_t btrfs_read_policy_store(struct kobject *kobj,
> +				       struct kobj_attribute *a,
> +				       const char *buf, size_t len)
> +{
> +	int i;
> +	struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj);
> +
> +	for (i = 0; i < BTRFS_NR_READ_POLICY; i++) {
> +		if (btrfs_strmatch(buf, btrfs_read_policy_name[i]) == 0) {
> +			if (i != fs_devices->read_policy) {
> +				fs_devices->read_policy = i;
> +				btrfs_info(fs_devices->fs_info,
> +					   "read policy set to '%s'",
> +					   btrfs_read_policy_name[i]);
> +			}
> +			return len;
> +		}
> +	}

Naive question, what's the advantage over sysfs_match_string()?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v7 3/5] btrfs: create read policy sysfs attribute, pid
  2020-05-19 10:07   ` Johannes Thumshirn
@ 2020-05-20  8:54     ` Anand Jain
  2020-05-20  8:55       ` Johannes Thumshirn
  0 siblings, 1 reply; 16+ messages in thread
From: Anand Jain @ 2020-05-20  8:54 UTC (permalink / raw)
  To: Johannes Thumshirn, linux-btrfs

On 19/5/20 6:07 pm, Johannes Thumshirn wrote:
> On 06/04/2020 13:51, Anand Jain wrote:
>> +static ssize_t btrfs_read_policy_store(struct kobject *kobj,
>> +				       struct kobj_attribute *a,
>> +				       const char *buf, size_t len)
>> +{
>> +	int i;
>> +	struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj);
>> +
>> +	for (i = 0; i < BTRFS_NR_READ_POLICY; i++) {
>> +		if (btrfs_strmatch(buf, btrfs_read_policy_name[i]) == 0) {
>> +			if (i != fs_devices->read_policy) {
>> +				fs_devices->read_policy = i;
>> +				btrfs_info(fs_devices->fs_info,
>> +					   "read policy set to '%s'",
>> +					   btrfs_read_policy_name[i]);
>> +			}
>> +			return len;
>> +		}
>> +	}
> 
> Naive question, what's the advantage over sysfs_match_string()?

We strip both leading and trailing whitespaces, its not
possible to do the same with sysfs_match_string().

For example:
echo ==== These should pass ======
run_nocheck "echo ' pid' > $sysfs_path/read_policy"
run_nocheck "echo ' pid ' > $sysfs_path/read_policy"
run_nocheck "echo -n ' pid ' > $sysfs_path/read_policy"
run_nocheck "echo -n ' pid' > $sysfs_path/read_policy"
run_nocheck "echo 'pid ' > $sysfs_path/read_policy"
run_nocheck "echo -n 'pid ' > $sysfs_path/read_policy"
run_nocheck "echo -n pid > $sysfs_path/read_policy"
run_nocheck "echo pid > $sysfs_path/read_policy"

Thanks, Anand

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v7 3/5] btrfs: create read policy sysfs attribute, pid
  2020-05-20  8:54     ` Anand Jain
@ 2020-05-20  8:55       ` Johannes Thumshirn
  0 siblings, 0 replies; 16+ messages in thread
From: Johannes Thumshirn @ 2020-05-20  8:55 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs

On 20/05/2020 10:54, Anand Jain wrote:
> On 19/5/20 6:07 pm, Johannes Thumshirn wrote:
>> On 06/04/2020 13:51, Anand Jain wrote:
>>> +static ssize_t btrfs_read_policy_store(struct kobject *kobj,
>>> +				       struct kobj_attribute *a,
>>> +				       const char *buf, size_t len)
>>> +{
>>> +	int i;
>>> +	struct btrfs_fs_devices *fs_devices = to_fs_devs(kobj);
>>> +
>>> +	for (i = 0; i < BTRFS_NR_READ_POLICY; i++) {
>>> +		if (btrfs_strmatch(buf, btrfs_read_policy_name[i]) == 0) {
>>> +			if (i != fs_devices->read_policy) {
>>> +				fs_devices->read_policy = i;
>>> +				btrfs_info(fs_devices->fs_info,
>>> +					   "read policy set to '%s'",
>>> +					   btrfs_read_policy_name[i]);
>>> +			}
>>> +			return len;
>>> +		}
>>> +	}
>>
>> Naive question, what's the advantage over sysfs_match_string()?
> 
> We strip both leading and trailing whitespaces, its not
> possible to do the same with sysfs_match_string().
> 
> For example:
> echo ==== These should pass ======
> run_nocheck "echo ' pid' > $sysfs_path/read_policy"
> run_nocheck "echo ' pid ' > $sysfs_path/read_policy"
> run_nocheck "echo -n ' pid ' > $sysfs_path/read_policy"
> run_nocheck "echo -n ' pid' > $sysfs_path/read_policy"
> run_nocheck "echo 'pid ' > $sysfs_path/read_policy"
> run_nocheck "echo -n 'pid ' > $sysfs_path/read_policy"
> run_nocheck "echo -n pid > $sysfs_path/read_policy"
> run_nocheck "echo pid > $sysfs_path/read_policy"

Ah ok

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v7 rebased 0/5] readmirror feature (sysfs and in-memory only approach; with new read_policy device)
  2020-05-19 10:02     ` Anand Jain
@ 2020-05-22 13:46       ` David Sterba
  2020-05-26  7:23         ` Anand Jain
  2020-05-22 19:15       ` Steven Davies
  1 sibling, 1 reply; 16+ messages in thread
From: David Sterba @ 2020-05-22 13:46 UTC (permalink / raw)
  To: Anand Jain; +Cc: dsterba, dsterba, linux-btrfs

On Tue, May 19, 2020 at 06:02:32PM +0800, Anand Jain wrote:
> On 16/5/20 3:58 am, David Sterba wrote:
> > On Thu, Apr 30, 2020 at 05:02:27PM +0800, Anand Jain wrote:
> >>    I am not sure if this will be integrated in 5.8 and worth the time to
> >>    rebase. Kindly suggest.
> > 
> > The preparatory work is ok, but the actual mirror selection policy
> > addresses a usecase that I think is not the one most users are
> > interested in. Devices of vastly different performance capabilities like
> > rotational disks vs nvme vs ssd vs network block devices in one
> > filesystem are not something commonly found.
> > 
> > What we really need is a saner balancing mechanism than pid-based, that
> > is also going to be used any time there are more devices from the same
> > speed class for the fast devices too.
> 
> There are two things here, the read_policy framework in the preparatory
> patches and a new balancing or read_policy, device.
> 
> > So, no the patchset is not on track for a merge without the improved
> > default balancing.
> 
> It can be worked on top of the preparatory read_policy framework?

Yes.

> This patchset does not change any default read_policy (or balancing)
> which is pid as of now. Working on a default read_policy/balancing
> was out of the scope of this patchset.
> 
> > The preferred device for reads can be one of the
> > policies, I understand the usecase and have not problem with that
> > although wouldn't probably have use for it.
> 
> For us, read_policy:device helps to reproduce raid1 data corruption
>     https://patchwork.kernel.org/patch/11475417/
> And xfstests btrfs/14[0-3] can be improved so that the reads directly
> go the device of the choice, instead of waiting for the odd/even pid.
> 
> Common configuration won't need this, advance configurations assembled
> with heterogeneous devices where read performance is more critical than
> write will find read_policy:device useful.

Yes that's the usecase and the possibility to make more targeted tests
is also good, but that still means the feature is half-baked and missing
the main part. If it was out of scope, ok fair, but I don't want to
merge it at that state. It would be embarassing to announce mirror
selection followed by "ah no it's useless for anything than this special
usecase".

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v7 rebased 0/5] readmirror feature (sysfs and in-memory only approach; with new read_policy device)
  2020-05-19 10:02     ` Anand Jain
  2020-05-22 13:46       ` David Sterba
@ 2020-05-22 19:15       ` Steven Davies
  1 sibling, 0 replies; 16+ messages in thread
From: Steven Davies @ 2020-05-22 19:15 UTC (permalink / raw)
  To: Anand Jain, dsterba, dsterba, linux-btrfs

On 19/05/2020 11:02, Anand Jain wrote:
> 
> 
> On 16/5/20 3:58 am, David Sterba wrote:
>> On Thu, Apr 30, 2020 at 05:02:27PM +0800, Anand Jain wrote:
>>>    I am not sure if this will be integrated in 5.8 and worth the time to
>>>    rebase. Kindly suggest.
>>
>> The preparatory work is ok, but the actual mirror selection policy
>> addresses a usecase that I think is not the one most users are
>> interested in. Devices of vastly different performance capabilities like
>> rotational disks vs nvme vs ssd vs network block devices in one
>> filesystem are not something commonly found.
>>
>> What we really need is a saner balancing mechanism than pid-based, that
>> is also going to be used any time there are more devices from the same
>> speed class for the fast devices too.

I can't find documentation about how mdadm chooses which mirror to read 
from (in the absence of --write-mostly) but everything suggests it uses 
different mirrors for different processes which sounds the same as or 
very much like %pid. Are you thinking along the lines of checking I/O 
queue length or average seek/read time when choosing the device?

> There are two things here, the read_policy framework in the preparatory
> patches and a new balancing or read_policy, device.
> 
>> So, no the patchset is not on track for a merge without the improved
>> default balancing.
> 
> It can be worked on top of the preparatory read_policy framework?
> This patchset does not change any default read_policy (or balancing)
> which is pid as of now. Working on a default read_policy/balancing
> was out of the scope of this patchset.
> 
>> The preferred device for reads can be one of the
>> policies, I understand the usecase and have not problem with that
>> although wouldn't probably have use for it.
> 
> For us, read_policy:device helps to reproduce raid1 data corruption
>     https://patchwork.kernel.org/patch/11475417/
> And xfstests btrfs/14[0-3] can be improved so that the reads directly
> go the device of the choice, instead of waiting for the odd/even pid.
> 
> Common configuration won't need this, advance configurations assembled
> with heterogeneous devices where read performance is more critical than
> write will find read_policy:device useful.

+1 for merging this as-is; I'm one of the users with heterogeneous 
devices (SATA SSD and NVMe SSD) and it would bring a feature similar to 
mdadm's --write-mostly. I've seen a few other requests for this feature 
on other discussion fora as well.

-- 
Steven Davies

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v7 rebased 0/5] readmirror feature (sysfs and in-memory only approach; with new read_policy device)
  2020-05-22 13:46       ` David Sterba
@ 2020-05-26  7:23         ` Anand Jain
  2020-06-01 15:07           ` David Sterba
  0 siblings, 1 reply; 16+ messages in thread
From: Anand Jain @ 2020-05-26  7:23 UTC (permalink / raw)
  To: dsterba, dsterba, linux-btrfs



On 22/5/20 9:46 pm, David Sterba wrote:
> On Tue, May 19, 2020 at 06:02:32PM +0800, Anand Jain wrote:
>> On 16/5/20 3:58 am, David Sterba wrote:
>>> On Thu, Apr 30, 2020 at 05:02:27PM +0800, Anand Jain wrote:
>>>>     I am not sure if this will be integrated in 5.8 and worth the time to
>>>>     rebase. Kindly suggest.
>>>
>>> The preparatory work is ok, but the actual mirror selection policy
>>> addresses a usecase that I think is not the one most users are
>>> interested in. Devices of vastly different performance capabilities like
>>> rotational disks vs nvme vs ssd vs network block devices in one
>>> filesystem are not something commonly found.
>>>
>>> What we really need is a saner balancing mechanism than pid-based, that
>>> is also going to be used any time there are more devices from the same
>>> speed class for the fast devices too.
>>
>> There are two things here, the read_policy framework in the preparatory
>> patches and a new balancing or read_policy, device.
>>
>>> So, no the patchset is not on track for a merge without the improved
>>> default balancing.
>>
>> It can be worked on top of the preparatory read_policy framework?
> 
> Yes.
> 
>> This patchset does not change any default read_policy (or balancing)
>> which is pid as of now. Working on a default read_policy/balancing
>> was out of the scope of this patchset.
>>
>>> The preferred device for reads can be one of the
>>> policies, I understand the usecase and have not problem with that
>>> although wouldn't probably have use for it.
>>
>> For us, read_policy:device helps to reproduce raid1 data corruption
>>      https://patchwork.kernel.org/patch/11475417/
>> And xfstests btrfs/14[0-3] can be improved so that the reads directly
>> go the device of the choice, instead of waiting for the odd/even pid.
>>
>> Common configuration won't need this, advance configurations assembled
>> with heterogeneous devices where read performance is more critical than
>> write will find read_policy:device useful.
> 
> Yes that's the usecase and the possibility to make more targeted tests
> is also good, but that still means the feature is half-baked and missing
> the main part. If it was out of scope, ok fair, but I don't want to
> merge it at that state. It would be embarassing to announce mirror
> selection followed by "ah no it's useless for anything than this special
> usecase".

I didn't realize the need for default policy is prioritized before this 
patch set.

Potential default read policy is interesting, looking into it.

Thanks, Anand

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v7 rebased 0/5] readmirror feature (sysfs and in-memory only approach; with new read_policy device)
  2020-05-26  7:23         ` Anand Jain
@ 2020-06-01 15:07           ` David Sterba
  0 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2020-06-01 15:07 UTC (permalink / raw)
  To: Anand Jain; +Cc: dsterba, dsterba, linux-btrfs

On Tue, May 26, 2020 at 03:23:08PM +0800, Anand Jain wrote:
> > Yes that's the usecase and the possibility to make more targeted tests
> > is also good, but that still means the feature is half-baked and missing
> > the main part. If it was out of scope, ok fair, but I don't want to
> > merge it at that state. It would be embarassing to announce mirror
> > selection followed by "ah no it's useless for anything than this special
> > usecase".
> 
> I didn't realize the need for default policy is prioritized before this 
> patch set.

The updated default policy has been asked for for a long time so this is
what makes it important.

> Potential default read policy is interesting, looking into it.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-06-01 15:08 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-06 11:51 [PATCH v7 rebased 0/5] readmirror feature (sysfs and in-memory only approach; with new read_policy device) Anand Jain
2020-04-06 11:51 ` [PATCH v7 1/5] btrfs: add btrfs_strmatch helper Anand Jain
2020-04-06 11:51 ` [PATCH v7 2/5] btrfs: create read policy framework Anand Jain
2020-04-06 11:51 ` [PATCH v7 3/5] btrfs: create read policy sysfs attribute, pid Anand Jain
2020-05-19 10:07   ` Johannes Thumshirn
2020-05-20  8:54     ` Anand Jain
2020-05-20  8:55       ` Johannes Thumshirn
2020-04-06 11:51 ` [PATCH v7 4/5] btrfs: introduce new device-state read_preferred Anand Jain
2020-04-06 11:51 ` [PATCH v7 5/5] btrfs: introduce new read_policy device Anand Jain
2020-04-30  9:02 ` [PATCH v7 rebased 0/5] readmirror feature (sysfs and in-memory only approach; with new read_policy device) Anand Jain
2020-05-15 19:58   ` David Sterba
2020-05-19 10:02     ` Anand Jain
2020-05-22 13:46       ` David Sterba
2020-05-26  7:23         ` Anand Jain
2020-06-01 15:07           ` David Sterba
2020-05-22 19:15       ` Steven Davies

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.