All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/4] btrfs: read_policy types latency, device and round-robin
@ 2020-10-28 13:25 Anand Jain
  2020-10-28 13:26 ` [PATCH 1/4] btrfs: add read_policy latency Anand Jain
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Anand Jain @ 2020-10-28 13:25 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, josef

Based on misc-next

Depends on the following 3 patches in the mailing list.
  btrfs: add btrfs_strmatch helper
  btrfs: create read policy framework
  btrfs: create read policy sysfs attribute, pid

v1:
Drop tracing patch
Drop factoring inflight command
  Here below is the performance differences, when inflight is used, it pushed
  few commands to the other device, so losing the potential merges.

  with inflight:
   READ: bw=195MiB/s (204MB/s), 195MiB/s-195MiB/s (204MB/s-204MB/s), io=15.6GiB (16.8GB), run=82203-82203msec
sda 256054
sdc 20

  without inflight:
   READ: bw=192MiB/s (202MB/s), 192MiB/s-192MiB/s (202MB/s-202MB/s), io=15.6GiB (16.8GB), run=83231-83231msec
sda 141006
sdc 0

Few C styles fixes.


RFC:
Patch 1-3 adds new read policy: latency.
Patch 4 adds tracing function.
Patch 5-6 were sent before they add read policy: device.
Patch 7 adds new read policy: round-robin.

I am sending this patchset to receive feedback, comments, and more test
results. Few cleanups, optimization, and broader performance test are
may be possible. This patchset does not alter the original default
read_policy that is PID.

This has been tested on VMs only, as my hardware based test host is
currently down.

This patchset adds read policies type latency, device, and round-robin.

These policies are for the mirrored raid profiles such as raid1, raid1c3,
raid1c4, and raid10 as they provide a choice to read from any one of the
given devices or any one pair of devices for raid10 for a given block.

Latency:

Latency read policy routes the read IO based on the historical average
wait time experienced by the read IOs on the individual device factored
by 1/10 of inflight commands in the queue. I need to add a factor of 1/10
because for most of the block devices the queue depth is more than 1, and
there can be commands in the queue even before the previous commands have
completed. However, the factor 1/10 for the inflight commands is yet to be
fine-tuned. Suggestions are welcome.

Device:

With the device read policy along with the read_preferred flag, you can
set the device to be used for reading manually. Useful to test mirrors in
a deterministic way and helps advance system administrations.

Round-robin:

When set picks the next device to read in the round-robin loop. To
achieve this first we put the stripes in an array, sort it by devid and
pick the next device.

The test results from my VM with 2 devices of type sata and 2 devices of 
type virtio, are as below.

Here below I have included a few scripts which were useful for testing.

-------------------8<--------------------------------
$ cat readpolicyset
#!/bin/bash

: ${1?"arg1 <mnt> missing"}
: ${2?"arg2 <pid|latency|device|roundrobin> missing"}

mnt=$1
policy=$2
[ $policy == "device" ] && { : ${3?"arg3 <devid> missing"}; }
devid=$3

uuid=$(btrfs fi show -m /btrfs | grep uuid | awk '{print $4}')
p=/sys/fs/btrfs/$uuid/read_policy
q=/sys/fs/btrfs/$uuid/devinfo

[ $policy == "device" ] && { echo 1 > ${q}/$devid/read_preferred || exit $?; }

echo $policy > $p
exit $?
-------------------8<--------------------------------

$ cat readpolicy
#!/bin/bash

: ${1?"arg1 <mnt> missing"}
mnt=$1

uuid=$(btrfs fi show -m /btrfs | grep uuid | awk '{print $4}')
p=/sys/fs/btrfs/$uuid/read_policy
q=/sys/fs/btrfs/$uuid/devinfo

policy=$(cat $p)
echo -n "$policy ( "

for i in $(find $q -type f -name read_preferred | xargs cat)
do
	echo -n "$i"
done
echo ")"
-------------------8<--------------------------------

$ cat readstat 
#!/bin/bash

: ${1?"arg1 <mnt> is missing"}
: ${2?"arg2 <cmd-to-run> is missing"}

mnt=$1; shift
mountpoint -q $mnt || { echo "ERROR: $mnt is not mounted"; exit 1; }

declare -A devread

for dev in $(btrfs filesystem show -m $mnt | grep devid |awk '{print $8}')
do
	prefix=$(echo $dev | rev | cut -d"/" -f1 | rev)
	sysfs_path=$(find /sys | grep $prefix/stat$)

	devread[$sysfs_path]=$(cat $sysfs_path | awk '{print $1}')
done

"$@" | grep "READ: bw"

echo
echo
for sysfs_path in ${!devread[@]}
do
	dev=$(echo $sysfs_path | rev | cut -d"/" -f2 | rev)
	new=$(cat $sysfs_path | awk '{print $1}')
	old=${devread[$sysfs_path]}
	echo "$dev $((new - old))"
done
-------------------8<--------------------------------

$ cat fioread 
#!/bin/bash

: ${1?"arg1 </mnt/file> is missing"}
: ${2?"arg2 <1Gi|50Gi> is missing"}

tf=$1
sz=$2
mnt=$(stat -c '%m' $tf)

fio \
--filename=$tf \
--directory=$mnt \
--filesize=$sz \
--size=$sz \
--rw=randread \
--bs=64k \
--ioengine=libaio \
--direct=1 \
--numjobs=32 \
--group_reporting \
--thread \
--name iops-test-job
-------------------8<--------------------------------


Here below are the performance results for raid1c4, raid10, and raid1.

The workload is fio read 32 threads, 500m random reads.

Fio is passed to the script readstat, which returns the number of read IOs
per device sent during the fio.

Few inferences:

1.
In some cases as mine, PID is also associated with the stripe 0 being
circulating among the devices at the chunk level before moving to the next
device. So PID is not a bad option for such configurations. However, it is
not a good choice if the devices are heterogeneous in terms of type, speed,
and size.

2.
Latency provides performance equal to PID on my test setup. But needs iostat
be enabled. And there is a cost of calculating the avg wait time. (If you
factor in the similar cost to PID using the test code [1] then latency's
performance is better than PID. So that proves that read IO distribution
as per latency is really working)

3.
Round robin is worst (unless there is a bug in my patch). As the total
number of new IOs issued was almost double compared to the PID and the
latency read_policy, that's because I think there was a fewer number of
merges due to constant switching of devices.

4.
Device read_policy is quite useful in testing and advance sysadmin
maintenance. When known how to use, this policy could help to avert
too many csum errors in midest of peak production.

Supporting fio logs are below. And readstat shows the number of read IOs
issued to devices (excluding the merges).

raid1c4
-------

pid:

$ readpolicyset /btrfs pid && readpolicy /btrfs && dropcache && readstat /btrfs fioread /btrfs/largefile 500m
[pid] latency device roundrobin ( 0000)
 READ: bw=87.0MiB/s (91.2MB/s), 87.0MiB/s-87.0MiB/s (91.2MB/s-91.2MB/s), io=15.6GiB (16.8GB), run=183884-183884msec

vdb 64060
vdc 64053
sdb 64072
sda 64054


latency:  (All devices are non-rotational, but sda and sdb are of type
sata and vdb and vdc are of type virtio).

$ readpolicyset /btrfs latency && readpolicy /btrfs && dropcache && readstat /btrfs fioread /btrfs/largefile 500m

pid [latency] device roundrobin ( 0000)
 READ: bw=87.1MiB/s (91.3MB/s), 87.1MiB/s-87.1MiB/s (91.3MB/s-91.3MB/s), io=15.6GiB (16.8GB), run=183774-183774msec

vdb 255844
vdc 559
sdb 0
sda 93

roundrobin:

$ readpolicyset /btrfs roundrobin && readpolicy /btrfs && dropcache && readstat /btrfs fioread /btrfs/largefile 500m

pid latency device [roundrobin] ( 0000)
 READ: bw=51.0MiB/s (54.5MB/s), 51.0MiB/s-51.0MiB/s (54.5MB/s-54.5MB/s), io=15.6GiB (16.8GB), run=307755-307755msec

vdb 866859
vdc 866651
sdb 864139
sda 865533


raid10
------

pid:

$ readpolicyset /btrfs pid && readpolicy /btrfs && dropcache && readstat /btrfs fioread /btrfs/largefile 500m
[pid] latency device roundrobin ( 0000)
 READ: bw=85.2MiB/s (89.3MB/s), 85.2MiB/s-85.2MiB/s (89.3MB/s-89.3MB/s), io=15.6GiB (16.8GB), run=187864-187864msec


sdf 64053
sde 64036
sdd 64043
sdc 64038


latency:

$ readpolicyset /btrfs latency && readpolicy /btrfs && dropcache && readstat /btrfs fioread /btrfs/largefile 500m
pid [latency] device roundrobin ( 0000)
 READ: bw=85.4MiB/s (89.5MB/s), 85.4MiB/s-85.4MiB/s (89.5MB/s-89.5MB/s), io=15.6GiB (16.8GB), run=187370-187370msec


sdf 117494
sde 10748
sdd 125247
sdc 2921

roundrobin:

$ readpolicyset /btrfs roundrobin && readpolicy /btrfs && dropcache && readstat /btrfs fioread /btrfs/largefile 500m
pid latency device [roundrobin] ( 0000)
 READ: bw=55.4MiB/s (58.1MB/s), 55.4MiB/s-55.4MiB/s (58.1MB/s-58.1MB/s), io=15.6GiB (16.8GB), run=288701-288701msec

sdf 617593
sde 617381
sdd 618486
sdc 618633


raid1:

$ readpolicyset /btrfs pid && readpolicy /btrfs && dropcache && readstat /btrfs fioread /btrfs/largefile 500m
[pid] latency device roundrobin ( 00)
 READ: bw=78.8MiB/s (82.6MB/s), 78.8MiB/s-78.8MiB/s (82.6MB/s-82.6MB/s), io=15.6GiB (16.8GB), run=203158-203158msec


sdb 128087
sda 128090

$ readpolicyset /btrfs latency && readpolicy /btrfs && dropcache && readstat /btrfs fioread /btrfs/largefile 500m
pid [latency] device roundrobin ( 00)
 READ: bw=86.5MiB/s (90.7MB/s), 86.5MiB/s-86.5MiB/s (90.7MB/s-90.7MB/s), io=15.6GiB (16.8GB), run=185023-185023msec


sdb 567
sda 255942


From the latency test results we know sda is providing low latency read
IO. So set sda as read proffered device.

$ readpolicyset /btrfs device 1 && readpolicy /btrfs && dropcache && readstat /btrfs fioread /btrfs/largefile 500m
pid latency [device] roundrobin ( 10)
 READ: bw=88.2MiB/s (92.5MB/s), 88.2MiB/s-88.2MiB/s (92.5MB/s-92.5MB/s), io=15.6GiB (16.8GB), run=181374-181374msec


sdb 0
sda 256191


$ readpolicyset /btrfs roundrobin && readpolicy /btrfs && dropcache && readstat /btrfs fioread /btrfs/largefile 500m
pid latency device [roundrobin] ( 00)
 READ: bw=54.1MiB/s (56.7MB/s), 54.1MiB/s-54.1MiB/s (56.7MB/s-56.7MB/s), io=15.6GiB (16.8GB), run=295693-295693msec


sdb 1252584
sda 1254258


Thanks, Anand


------------------
[1]
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d3023879bdf6..72ec633e9063 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5665,6 +5665,12 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
 fs_info->fs_devices->read_policy = BTRFS_READ_POLICY_PID;
 fallthrough;
 case BTRFS_READ_POLICY_PID:
+ /*
+ * Just to factor in the cost of calculating the avg wait using
+ * iostat for testing.
+ */
+ btrfs_find_best_stripe(fs_info, map, first, num_stripes, log,
+ logsz);
 preferred_mirror = first + current->pid % num_stripes;
 scnprintf(log, logsz,
 "first %d num_stripe %d %s (%d) preferred %d",


Anand Jain (4):
  btrfs: add read_policy latency
  btrfs: introduce new device-state read_preferred
  btrfs: introduce new read_policy device
  btrfs: introduce new read_policy round-robin

 fs/btrfs/sysfs.c   |  59 +++++++++++++++++++++++-
 fs/btrfs/volumes.c | 111 ++++++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/volumes.h |   5 ++
 3 files changed, 173 insertions(+), 2 deletions(-)

-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 1/4] btrfs: add read_policy latency
  2020-10-28 13:25 [PATCH v1 0/4] btrfs: read_policy types latency, device and round-robin Anand Jain
@ 2020-10-28 13:26 ` Anand Jain
  2020-10-28 14:30   ` Josef Bacik
  2020-10-28 13:26 ` [PATCH 2/4] btrfs: introduce new device-state read_preferred Anand Jain
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Anand Jain @ 2020-10-28 13:26 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, josef

The read policy type latency routes the read IO based on the historical
average wait time experienced by the read IOs through the individual
device factored by 1/10 of inflight commands in the queue. The factor
1/10 is because generally the block device queue depth is more than 1,
so there can be commands in the queue even before the previous commands
have been completed. This patch obtains the historical read IO stats from
the kernel block layer.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v1: Drop part_stat_read_all instead use part_stat_read
    Drop inflight
    
 fs/btrfs/sysfs.c   |  3 ++-
 fs/btrfs/volumes.c | 39 ++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/volumes.h |  1 +
 3 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 4dbf90ff088a..88cbf7b2edf0 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -906,7 +906,8 @@ static bool btrfs_strmatch(const char *given, const char *golden)
 	return false;
 }
 
-static const char * const btrfs_read_policy_name[] = { "pid" };
+/* Must follow the order as in enum btrfs_read_policy */
+static const char * const btrfs_read_policy_name[] = { "pid", "latency" };
 
 static ssize_t btrfs_read_policy_show(struct kobject *kobj,
 				      struct kobj_attribute *a, char *buf)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 6bf487626f23..48587009b656 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -14,6 +14,7 @@
 #include <linux/semaphore.h>
 #include <linux/uuid.h>
 #include <linux/list_sort.h>
+#include <linux/part_stat.h>
 #include "misc.h"
 #include "ctree.h"
 #include "extent_map.h"
@@ -5468,6 +5469,39 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
 	return ret;
 }
 
+static int btrfs_find_best_stripe(struct btrfs_fs_info *fs_info,
+				  struct map_lookup *map, int first,
+				  int num_stripe)
+{
+	u64 est_wait = 0;
+	int best_stripe = 0;
+	int index;
+
+	for (index = first; index < first + num_stripe; index++) {
+		u64 read_wait;
+		u64 avg_wait = 0;
+		unsigned long read_ios;
+		struct btrfs_device *device = map->stripes[index].dev;
+
+		read_wait = part_stat_read(device->bdev->bd_part, nsecs[READ]);
+		read_ios = part_stat_read(device->bdev->bd_part, ios[READ]);
+
+		if (read_wait && read_ios && read_wait >= read_ios)
+			avg_wait = div_u64(read_wait, read_ios);
+		else
+			btrfs_info_rl(device->fs_devices->fs_info,
+			"devid: %llu avg_wait ZERO read_wait %llu read_ios %lu",
+				      device->devid, read_wait, read_ios);
+
+		if (est_wait == 0 || est_wait > avg_wait) {
+			est_wait = avg_wait;
+			best_stripe = index;
+		}
+	}
+
+	return best_stripe;
+}
+
 static int find_live_mirror(struct btrfs_fs_info *fs_info,
 			    struct map_lookup *map, int first,
 			    int dev_replace_is_ongoing)
@@ -5498,6 +5532,10 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
 	case BTRFS_READ_POLICY_PID:
 		preferred_mirror = first + current->pid % num_stripes;
 		break;
+	case BTRFS_READ_POLICY_LATENCY:
+		preferred_mirror = btrfs_find_best_stripe(fs_info, map, first,
+							  num_stripes);
+		break;
 	}
 
 	if (dev_replace_is_ongoing &&
@@ -6114,7 +6152,6 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 
 	} else if (map->type & BTRFS_BLOCK_GROUP_RAID10) {
 		u32 factor = map->num_stripes / map->sub_stripes;
-
 		stripe_nr = div_u64_rem(stripe_nr, factor, &stripe_index);
 		stripe_index *= map->sub_stripes;
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 97f075516696..24db586a9837 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -217,6 +217,7 @@ enum btrfs_chunk_allocation_policy {
  */
 enum btrfs_read_policy {
 	BTRFS_READ_POLICY_PID,
+	BTRFS_READ_POLICY_LATENCY,
 	BTRFS_NR_READ_POLICY,
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] btrfs: introduce new device-state read_preferred
  2020-10-28 13:25 [PATCH v1 0/4] btrfs: read_policy types latency, device and round-robin Anand Jain
  2020-10-28 13:26 ` [PATCH 1/4] btrfs: add read_policy latency Anand Jain
@ 2020-10-28 13:26 ` Anand Jain
  2020-10-28 14:37   ` Josef Bacik
  2020-10-28 13:26 ` [PATCH 3/4] btrfs: introduce new read_policy device Anand Jain
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Anand Jain @ 2020-10-28 13:26 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, josef

This is a preparatory patch and introduces a new device flag
'read_preferred', RW-able using sysfs interface.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/sysfs.c   | 55 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.h |  1 +
 2 files changed, 56 insertions(+)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 88cbf7b2edf0..52b4c9bef673 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -1413,11 +1413,66 @@ static ssize_t btrfs_devinfo_writeable_show(struct kobject *kobj,
 }
 BTRFS_ATTR(devid, writeable, btrfs_devinfo_writeable_show);
 
+static ssize_t btrfs_devinfo_read_pref_show(struct kobject *kobj,
+					    struct kobj_attribute *a, char *buf)
+{
+	int val;
+	struct btrfs_device *device = container_of(kobj, struct btrfs_device,
+						   devid_kobj);
+
+	val = !!test_bit(BTRFS_DEV_STATE_READ_PREFERRED, &device->dev_state);
+
+	return snprintf(buf, PAGE_SIZE, "%d\n", val);
+}
+
+static ssize_t btrfs_devinfo_read_pref_store(struct kobject *kobj,
+					     struct kobj_attribute *a,
+					     const char *buf, size_t len)
+{
+	int ret;
+	unsigned long val;
+	struct btrfs_device *device;
+
+	ret = kstrtoul(skip_spaces(buf), 0, &val);
+	if (ret)
+		return ret;
+
+	if (val != 0 && val != 1)
+		return -EINVAL;
+
+	/*
+	 * lock is not required, the btrfs_device struct can't be freed while
+	 * its kobject btrfs_device::devid_kobj is still open.
+	 */
+	device = container_of(kobj, struct btrfs_device, devid_kobj);
+
+	if (val &&
+	    ! test_bit(BTRFS_DEV_STATE_READ_PREFERRED, &device->dev_state)) {
+
+		set_bit(BTRFS_DEV_STATE_READ_PREFERRED, &device->dev_state);
+		btrfs_info(device->fs_devices->fs_info,
+			   "set read preferred on devid %llu (%d)",
+			   device->devid, task_pid_nr(current));
+	} else if (!val &&
+		   test_bit(BTRFS_DEV_STATE_READ_PREFERRED, &device->dev_state)) {
+
+		clear_bit(BTRFS_DEV_STATE_READ_PREFERRED, &device->dev_state);
+		btrfs_info(device->fs_devices->fs_info,
+			   "reset read preferred on devid %llu (%d)",
+			   device->devid, task_pid_nr(current));
+	}
+
+	return len;
+}
+BTRFS_ATTR_RW(devid, read_preferred, btrfs_devinfo_read_pref_show,
+	      btrfs_devinfo_read_pref_store);
+
 static struct attribute *devid_attrs[] = {
 	BTRFS_ATTR_PTR(devid, in_fs_metadata),
 	BTRFS_ATTR_PTR(devid, missing),
 	BTRFS_ATTR_PTR(devid, replace_target),
 	BTRFS_ATTR_PTR(devid, writeable),
+	BTRFS_ATTR_PTR(devid, read_preferred),
 	NULL
 };
 ATTRIBUTE_GROUPS(devid);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 24db586a9837..f1cbbb18f5ef 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -51,6 +51,7 @@ struct btrfs_io_geometry {
 #define BTRFS_DEV_STATE_REPLACE_TGT	(3)
 #define BTRFS_DEV_STATE_FLUSH_SENT	(4)
 #define BTRFS_DEV_STATE_NO_READA	(5)
+#define BTRFS_DEV_STATE_READ_PREFERRED	(6)
 
 struct btrfs_device {
 	struct list_head dev_list; /* device_list_mutex */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] btrfs: introduce new read_policy device
  2020-10-28 13:25 [PATCH v1 0/4] btrfs: read_policy types latency, device and round-robin Anand Jain
  2020-10-28 13:26 ` [PATCH 1/4] btrfs: add read_policy latency Anand Jain
  2020-10-28 13:26 ` [PATCH 2/4] btrfs: introduce new device-state read_preferred Anand Jain
@ 2020-10-28 13:26 ` Anand Jain
  2020-10-28 14:40   ` Josef Bacik
  2020-10-28 13:26 ` [PATCH RFC 4/4] btrfs: introduce new read_policy round-robin Anand Jain
  2020-10-28 14:32 ` [PATCH v1 0/4] btrfs: read_policy types latency, device and round-robin Josef Bacik
  4 siblings, 1 reply; 16+ messages in thread
From: Anand Jain @ 2020-10-28 13:26 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, josef

Read-policy type 'device' and device flag 'read-preferred':

The read-policy type device picks the device(s) flagged as
read-preferred for reading chunks of type raid1, raid10,
raid1c3 and raid1c4.

A system might contain SSD, nvme, iscsi or san lun, and which are all
a non-rotational device, so it is not a good idea to set the read-preferred
automatically. Instead device read-policy along with the read-preferred
flag provides an ability to do it manually. This advance tuning is
useful in more than one situation, for example,
 - In heterogeneous-disk volume, it provides an ability to manually choose
    the low latency disks for reading.
 - Useful for more accurate testing.
 - Avoid known problematic device from reading the chunk until it is
   replaced (by marking the other good devices as read-preferred).

Note:

If the read-policy type is set to 'device', but there isn't any device
which is flagged as read-preferred, then stripe 0 is used for reading.

The device replace won't migrate the read-preferred flag to the new
replace the target device.

As of now, this is an in-memory only feature.

It's pointless to set the read-preferred flag on the missing device,
as IOs aren't submitted to the missing device.

If there is more than one read-preferred device in a chunk, the read IO
shall go to the stripe 0 (as of now, when depth patches are integrated
we will use the least busy device among the read-preferred devices).

Usage example:

Consider a typical two disks raid1.

Configure devid1 for reading.

$ echo 1 > devinfo/1/read_preferred
$ cat devinfo/1/read_preferred; cat devinfo/2/read_preferred
1
0

$ pwd
/sys/fs/btrfs/12345678-1234-1234-1234-123456789abc

$ cat read_policy; echo device > ./read_policy; cat read_policy
[pid] device
pid [device]

Now read IOs are sent to devid 1 (sdb).

$ echo 3 > /proc/sys/vm/drop_caches; md5sum /btrfs/YkZI

$ iostat -zy 1 | egrep 'sdb|sdc' (from another terminal)
sdb              50.00     40048.00         0.00      40048          0

Change the read-preferred device from devid 1 to devid 2 (sdc).

$ echo 0 > ./devinfo/1/read_preferred; echo 1 > ./devinfo/2/read_preferred;

[ 3343.918658] BTRFS info (device sdb): reset read preferred on devid 1 (1334)
[ 3343.919876] BTRFS info (device sdb): set read preferred on devid 2 (1334)

Further read ios are sent to devid 2 (sdc).

$ echo 3 > /proc/sys/vm/drop_caches; md5sum /btrfs/YkZI

$ iostat -zy 1 | egrep 'sdb|sdc' (from another terminal)
sdc              49.00     40048.00         0.00      40048          0

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/sysfs.c   |  3 ++-
 fs/btrfs/volumes.c | 22 ++++++++++++++++++++++
 fs/btrfs/volumes.h |  1 +
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 52b4c9bef673..d2a974e1a1c4 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -907,7 +907,8 @@ static bool btrfs_strmatch(const char *given, const char *golden)
 }
 
 /* Must follow the order as in enum btrfs_read_policy */
-static const char * const btrfs_read_policy_name[] = { "pid", "latency" };
+static const char * const btrfs_read_policy_name[] = { "pid", "latency",
+						       "device" };
 
 static ssize_t btrfs_read_policy_show(struct kobject *kobj,
 				      struct kobj_attribute *a, char *buf)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 48587009b656..7ac675504051 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5502,6 +5502,25 @@ static int btrfs_find_best_stripe(struct btrfs_fs_info *fs_info,
 	return best_stripe;
 }
 
+static int btrfs_find_read_preferred(struct map_lookup *map, int first, int num_stripe)
+{
+	int stripe_index;
+	int last = first + num_stripe;
+
+	/*
+	 * If there are more than one read preferred devices, then just pick the
+	 * first found read preferred device as of now.
+	 */
+	for (stripe_index = first; stripe_index < last; stripe_index++) {
+		if (test_bit(BTRFS_DEV_STATE_READ_PREFERRED,
+			     &map->stripes[stripe_index].dev->dev_state))
+			return stripe_index;
+        }
+
+	/* If there is no read preferred device then just use the first stripe */
+	return first;
+}
+
 static int find_live_mirror(struct btrfs_fs_info *fs_info,
 			    struct map_lookup *map, int first,
 			    int dev_replace_is_ongoing)
@@ -5536,6 +5555,9 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
 		preferred_mirror = btrfs_find_best_stripe(fs_info, map, first,
 							  num_stripes);
 		break;
+	case BTRFS_READ_POLICY_DEVICE:
+		preferred_mirror = btrfs_find_read_preferred(map, first, num_stripes);
+		break;
 	}
 
 	if (dev_replace_is_ongoing &&
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index f1cbbb18f5ef..1448adb8993d 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -219,6 +219,7 @@ enum btrfs_chunk_allocation_policy {
 enum btrfs_read_policy {
 	BTRFS_READ_POLICY_PID,
 	BTRFS_READ_POLICY_LATENCY,
+	BTRFS_READ_POLICY_DEVICE,
 	BTRFS_NR_READ_POLICY,
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH RFC 4/4] btrfs: introduce new read_policy round-robin
  2020-10-28 13:25 [PATCH v1 0/4] btrfs: read_policy types latency, device and round-robin Anand Jain
                   ` (2 preceding siblings ...)
  2020-10-28 13:26 ` [PATCH 3/4] btrfs: introduce new read_policy device Anand Jain
@ 2020-10-28 13:26 ` Anand Jain
  2020-10-28 14:44   ` Josef Bacik
  2020-10-28 14:32 ` [PATCH v1 0/4] btrfs: read_policy types latency, device and round-robin Josef Bacik
  4 siblings, 1 reply; 16+ messages in thread
From: Anand Jain @ 2020-10-28 13:26 UTC (permalink / raw)
  To: linux-btrfs; +Cc: dsterba, josef

Add round-robin read policy to route the read IO to the next device in the
round-robin order. The chunk allocation and thus the stripe-index follows
the order of free space available on devices. So to make the round-robin
effective it shall follow the devid order instead of the stripe-index
order.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
RFC: because I am not too sure if any workload or block layer
configurations shall suit round-robin read_policy.

 fs/btrfs/sysfs.c   |  2 +-
 fs/btrfs/volumes.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.h |  2 ++
 3 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index d2a974e1a1c4..293311c79321 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -908,7 +908,7 @@ static bool btrfs_strmatch(const char *given, const char *golden)
 
 /* Must follow the order as in enum btrfs_read_policy */
 static const char * const btrfs_read_policy_name[] = { "pid", "latency",
-						       "device" };
+						       "device", "roundrobin" };
 
 static ssize_t btrfs_read_policy_show(struct kobject *kobj,
 				      struct kobj_attribute *a, char *buf)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 7ac675504051..fa1b1a3ebc87 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5469,6 +5469,52 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
 	return ret;
 }
 
+struct stripe_mirror {
+	u64 devid;
+	int map;
+};
+
+static int btrfs_cmp_devid(const void *a, const void *b)
+{
+	struct stripe_mirror *s1 = (struct stripe_mirror *)a;
+	struct stripe_mirror *s2 = (struct stripe_mirror *)b;
+
+	if (s1->devid < s2->devid)
+		return -1;
+	if (s1->devid > s2->devid)
+		return 1;
+	return 0;
+}
+
+static int btrfs_find_read_round_robin(struct map_lookup *map, int first,
+				       int num_stripe)
+{
+	struct stripe_mirror stripes[4] = {0}; //4: for testing, works for now.
+	struct btrfs_fs_devices *fs_devices;
+	u64 devid;
+	int index, j, cnt;
+	int next_stripe;
+
+	index = 0;
+	for (j = first; j < first + num_stripe; j++) {
+		devid = map->stripes[j].dev->devid;
+
+		stripes[index].devid = devid;
+		stripes[index].map = j;
+
+		index++;
+	}
+
+	sort(stripes, num_stripe, sizeof(struct stripe_mirror),
+	     btrfs_cmp_devid, NULL);
+
+	fs_devices = map->stripes[first].dev->fs_devices;
+	cnt = atomic_inc_return(&fs_devices->total_reads);
+	next_stripe = stripes[cnt % num_stripe].map;
+
+	return next_stripe;
+}
+
 static int btrfs_find_best_stripe(struct btrfs_fs_info *fs_info,
 				  struct map_lookup *map, int first,
 				  int num_stripe)
@@ -5558,6 +5604,10 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
 	case BTRFS_READ_POLICY_DEVICE:
 		preferred_mirror = btrfs_find_read_preferred(map, first, num_stripes);
 		break;
+	case BTRFS_READ_POLICY_ROUND_ROBIN:
+		preferred_mirror = btrfs_find_read_round_robin(map, first,
+							       num_stripes);
+		break;
 	}
 
 	if (dev_replace_is_ongoing &&
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 1448adb8993d..fc00f9c7f1ab 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -220,6 +220,7 @@ enum btrfs_read_policy {
 	BTRFS_READ_POLICY_PID,
 	BTRFS_READ_POLICY_LATENCY,
 	BTRFS_READ_POLICY_DEVICE,
+	BTRFS_READ_POLICY_ROUND_ROBIN,
 	BTRFS_NR_READ_POLICY,
 };
 
@@ -281,6 +282,7 @@ struct btrfs_fs_devices {
 	 * policy used to read the mirrored stripes
 	 */
 	enum btrfs_read_policy read_policy;
+	atomic_t total_reads;
 };
 
 #define BTRFS_BIO_INLINE_CSUM_SIZE	64
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] btrfs: add read_policy latency
  2020-10-28 13:26 ` [PATCH 1/4] btrfs: add read_policy latency Anand Jain
@ 2020-10-28 14:30   ` Josef Bacik
  2020-10-29  1:06     ` Anand Jain
  0 siblings, 1 reply; 16+ messages in thread
From: Josef Bacik @ 2020-10-28 14:30 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs; +Cc: dsterba

On 10/28/20 9:26 AM, Anand Jain wrote:
> The read policy type latency routes the read IO based on the historical
> average wait time experienced by the read IOs through the individual
> device factored by 1/10 of inflight commands in the queue. The factor
> 1/10 is because generally the block device queue depth is more than 1,
> so there can be commands in the queue even before the previous commands
> have been completed. This patch obtains the historical read IO stats from
> the kernel block layer.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
> v1: Drop part_stat_read_all instead use part_stat_read
>      Drop inflight
>      
>   fs/btrfs/sysfs.c   |  3 ++-
>   fs/btrfs/volumes.c | 39 ++++++++++++++++++++++++++++++++++++++-
>   fs/btrfs/volumes.h |  1 +
>   3 files changed, 41 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
> index 4dbf90ff088a..88cbf7b2edf0 100644
> --- a/fs/btrfs/sysfs.c
> +++ b/fs/btrfs/sysfs.c
> @@ -906,7 +906,8 @@ static bool btrfs_strmatch(const char *given, const char *golden)
>   	return false;
>   }
>   
> -static const char * const btrfs_read_policy_name[] = { "pid" };
> +/* Must follow the order as in enum btrfs_read_policy */
> +static const char * const btrfs_read_policy_name[] = { "pid", "latency" };
>   
>   static ssize_t btrfs_read_policy_show(struct kobject *kobj,
>   				      struct kobj_attribute *a, char *buf)
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 6bf487626f23..48587009b656 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -14,6 +14,7 @@
>   #include <linux/semaphore.h>
>   #include <linux/uuid.h>
>   #include <linux/list_sort.h>
> +#include <linux/part_stat.h>
>   #include "misc.h"
>   #include "ctree.h"
>   #include "extent_map.h"
> @@ -5468,6 +5469,39 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
>   	return ret;
>   }
>   
> +static int btrfs_find_best_stripe(struct btrfs_fs_info *fs_info,
> +				  struct map_lookup *map, int first,
> +				  int num_stripe)
> +{
> +	u64 est_wait = 0;
> +	int best_stripe = 0;
> +	int index;
> +
> +	for (index = first; index < first + num_stripe; index++) {
> +		u64 read_wait;
> +		u64 avg_wait = 0;
> +		unsigned long read_ios;
> +		struct btrfs_device *device = map->stripes[index].dev;
> +
> +		read_wait = part_stat_read(device->bdev->bd_part, nsecs[READ]);
> +		read_ios = part_stat_read(device->bdev->bd_part, ios[READ]);
> +
> +		if (read_wait && read_ios && read_wait >= read_ios)
> +			avg_wait = div_u64(read_wait, read_ios);
> +		else
> +			btrfs_info_rl(device->fs_devices->fs_info,
> +			"devid: %llu avg_wait ZERO read_wait %llu read_ios %lu",
> +				      device->devid, read_wait, read_ios);

Do we even care about this?  I feel like if we do at all it can be btrfs_debug 
or something.

> +
> +		if (est_wait == 0 || est_wait > avg_wait) {
> +			est_wait = avg_wait;
> +			best_stripe = index;
> +		}
> +	}
> +
> +	return best_stripe;
> +}
> +
>   static int find_live_mirror(struct btrfs_fs_info *fs_info,
>   			    struct map_lookup *map, int first,
>   			    int dev_replace_is_ongoing)
> @@ -5498,6 +5532,10 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
>   	case BTRFS_READ_POLICY_PID:
>   		preferred_mirror = first + current->pid % num_stripes;
>   		break;
> +	case BTRFS_READ_POLICY_LATENCY:
> +		preferred_mirror = btrfs_find_best_stripe(fs_info, map, first,
> +							  num_stripes);
> +		break;
>   	}
>   
>   	if (dev_replace_is_ongoing &&
> @@ -6114,7 +6152,6 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
>   
>   	} else if (map->type & BTRFS_BLOCK_GROUP_RAID10) {
>   		u32 factor = map->num_stripes / map->sub_stripes;
> -

Unrelated change.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 0/4] btrfs: read_policy types latency, device and round-robin
  2020-10-28 13:25 [PATCH v1 0/4] btrfs: read_policy types latency, device and round-robin Anand Jain
                   ` (3 preceding siblings ...)
  2020-10-28 13:26 ` [PATCH RFC 4/4] btrfs: introduce new read_policy round-robin Anand Jain
@ 2020-10-28 14:32 ` Josef Bacik
  2020-10-29  1:08   ` Anand Jain
  4 siblings, 1 reply; 16+ messages in thread
From: Josef Bacik @ 2020-10-28 14:32 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs; +Cc: dsterba

On 10/28/20 9:25 AM, Anand Jain wrote:
> Based on misc-next
> 
> Depends on the following 3 patches in the mailing list.
>    btrfs: add btrfs_strmatch helper
>    btrfs: create read policy framework
>    btrfs: create read policy sysfs attribute, pid
> 
> v1:
> Drop tracing patch
> Drop factoring inflight command
>    Here below is the performance differences, when inflight is used, it pushed
>    few commands to the other device, so losing the potential merges.
> 
>    with inflight:
>     READ: bw=195MiB/s (204MB/s), 195MiB/s-195MiB/s (204MB/s-204MB/s), io=15.6GiB (16.8GB), run=82203-82203msec
> sda 256054
> sdc 20
> 
>    without inflight:
>     READ: bw=192MiB/s (202MB/s), 192MiB/s-192MiB/s (202MB/s-202MB/s), io=15.6GiB (16.8GB), run=83231-83231msec
> sda 141006
> sdc 0
> 

What's the baseline?  I think 3mib/s is not that big of a tradeoff for 
complexity, but if baseline is like 190mib/s then maybe its worth it.  If 
baseline is 90mib/s then I say it's not worth the inflight.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] btrfs: introduce new device-state read_preferred
  2020-10-28 13:26 ` [PATCH 2/4] btrfs: introduce new device-state read_preferred Anand Jain
@ 2020-10-28 14:37   ` Josef Bacik
  2020-10-29  1:12     ` Anand Jain
  0 siblings, 1 reply; 16+ messages in thread
From: Josef Bacik @ 2020-10-28 14:37 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs; +Cc: dsterba

On 10/28/20 9:26 AM, Anand Jain wrote:
> This is a preparatory patch and introduces a new device flag
> 'read_preferred', RW-able using sysfs interface.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>   fs/btrfs/sysfs.c   | 55 ++++++++++++++++++++++++++++++++++++++++++++++
>   fs/btrfs/volumes.h |  1 +
>   2 files changed, 56 insertions(+)
> 
> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
> index 88cbf7b2edf0..52b4c9bef673 100644
> --- a/fs/btrfs/sysfs.c
> +++ b/fs/btrfs/sysfs.c
> @@ -1413,11 +1413,66 @@ static ssize_t btrfs_devinfo_writeable_show(struct kobject *kobj,
>   }
>   BTRFS_ATTR(devid, writeable, btrfs_devinfo_writeable_show);
>   
> +static ssize_t btrfs_devinfo_read_pref_show(struct kobject *kobj,
> +					    struct kobj_attribute *a, char *buf)
> +{
> +	int val;
> +	struct btrfs_device *device = container_of(kobj, struct btrfs_device,
> +						   devid_kobj);
> +
> +	val = !!test_bit(BTRFS_DEV_STATE_READ_PREFERRED, &device->dev_state);
> +
> +	return snprintf(buf, PAGE_SIZE, "%d\n", val);
> +}
> +
> +static ssize_t btrfs_devinfo_read_pref_store(struct kobject *kobj,
> +					     struct kobj_attribute *a,
> +					     const char *buf, size_t len)
> +{
> +	int ret;
> +	unsigned long val;
> +	struct btrfs_device *device;
> +
> +	ret = kstrtoul(skip_spaces(buf), 0, &val);
> +	if (ret)
> +		return ret;
> +
> +	if (val != 0 && val != 1)
> +		return -EINVAL;
> +
> +	/*
> +	 * lock is not required, the btrfs_device struct can't be freed while
> +	 * its kobject btrfs_device::devid_kobj is still open.
> +	 */
> +	device = container_of(kobj, struct btrfs_device, devid_kobj);
> +
> +	if (val &&
> +	    ! test_bit(BTRFS_DEV_STATE_READ_PREFERRED, &device->dev_state)) {
> +

No extra space between the ! test_bit, and you don't need these extra newlines.

I sort of wonder what happens if we have multiple SSD's with multiple slow 
disks, we may want to load balance between the SSD's and set the two SSD's to 
preferred.  But typing it out made me think its not really related to this 
change, so don't worry too much about it, just thoughts for the future.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] btrfs: introduce new read_policy device
  2020-10-28 13:26 ` [PATCH 3/4] btrfs: introduce new read_policy device Anand Jain
@ 2020-10-28 14:40   ` Josef Bacik
  2020-10-29  1:56     ` Anand Jain
  0 siblings, 1 reply; 16+ messages in thread
From: Josef Bacik @ 2020-10-28 14:40 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs; +Cc: dsterba

On 10/28/20 9:26 AM, Anand Jain wrote:
> Read-policy type 'device' and device flag 'read-preferred':
> 
> The read-policy type device picks the device(s) flagged as
> read-preferred for reading chunks of type raid1, raid10,
> raid1c3 and raid1c4.
> 
> A system might contain SSD, nvme, iscsi or san lun, and which are all
> a non-rotational device, so it is not a good idea to set the read-preferred
> automatically. Instead device read-policy along with the read-preferred
> flag provides an ability to do it manually. This advance tuning is
> useful in more than one situation, for example,
>   - In heterogeneous-disk volume, it provides an ability to manually choose
>      the low latency disks for reading.
>   - Useful for more accurate testing.
>   - Avoid known problematic device from reading the chunk until it is
>     replaced (by marking the other good devices as read-preferred).
> 
> Note:
> 
> If the read-policy type is set to 'device', but there isn't any device
> which is flagged as read-preferred, then stripe 0 is used for reading.
> 
> The device replace won't migrate the read-preferred flag to the new
> replace the target device.
> 
> As of now, this is an in-memory only feature.
> 
> It's pointless to set the read-preferred flag on the missing device,
> as IOs aren't submitted to the missing device.
> 
> If there is more than one read-preferred device in a chunk, the read IO
> shall go to the stripe 0 (as of now, when depth patches are integrated
> we will use the least busy device among the read-preferred devices).
> 
> Usage example:
> 
> Consider a typical two disks raid1.
> 
> Configure devid1 for reading.
> 
> $ echo 1 > devinfo/1/read_preferred
> $ cat devinfo/1/read_preferred; cat devinfo/2/read_preferred
> 1
> 0
> 
> $ pwd
> /sys/fs/btrfs/12345678-1234-1234-1234-123456789abc
> 
> $ cat read_policy; echo device > ./read_policy; cat read_policy
> [pid] device
> pid [device]
> 
> Now read IOs are sent to devid 1 (sdb).
> 
> $ echo 3 > /proc/sys/vm/drop_caches; md5sum /btrfs/YkZI
> 
> $ iostat -zy 1 | egrep 'sdb|sdc' (from another terminal)
> sdb              50.00     40048.00         0.00      40048          0
> 
> Change the read-preferred device from devid 1 to devid 2 (sdc).
> 
> $ echo 0 > ./devinfo/1/read_preferred; echo 1 > ./devinfo/2/read_preferred;
> 
> [ 3343.918658] BTRFS info (device sdb): reset read preferred on devid 1 (1334)
> [ 3343.919876] BTRFS info (device sdb): set read preferred on devid 2 (1334)
> 
> Further read ios are sent to devid 2 (sdc).
> 
> $ echo 3 > /proc/sys/vm/drop_caches; md5sum /btrfs/YkZI
> 
> $ iostat -zy 1 | egrep 'sdb|sdc' (from another terminal)
> sdc              49.00     40048.00         0.00      40048          0
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>   fs/btrfs/sysfs.c   |  3 ++-
>   fs/btrfs/volumes.c | 22 ++++++++++++++++++++++
>   fs/btrfs/volumes.h |  1 +
>   3 files changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
> index 52b4c9bef673..d2a974e1a1c4 100644
> --- a/fs/btrfs/sysfs.c
> +++ b/fs/btrfs/sysfs.c
> @@ -907,7 +907,8 @@ static bool btrfs_strmatch(const char *given, const char *golden)
>   }
>   
>   /* Must follow the order as in enum btrfs_read_policy */
> -static const char * const btrfs_read_policy_name[] = { "pid", "latency" };
> +static const char * const btrfs_read_policy_name[] = { "pid", "latency",
> +						       "device" };
>   
>   static ssize_t btrfs_read_policy_show(struct kobject *kobj,
>   				      struct kobj_attribute *a, char *buf)
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 48587009b656..7ac675504051 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -5502,6 +5502,25 @@ static int btrfs_find_best_stripe(struct btrfs_fs_info *fs_info,
>   	return best_stripe;
>   }
>   
> +static int btrfs_find_read_preferred(struct map_lookup *map, int first, int num_stripe)
> +{
> +	int stripe_index;
> +	int last = first + num_stripe;
> +
> +	/*
> +	 * If there are more than one read preferred devices, then just pick the
> +	 * first found read preferred device as of now.
> +	 */
> +	for (stripe_index = first; stripe_index < last; stripe_index++) {
> +		if (test_bit(BTRFS_DEV_STATE_READ_PREFERRED,
> +			     &map->stripes[stripe_index].dev->dev_state))
> +			return stripe_index;
> +        }

b4 isn't working for me because these patches didn't make it to linux-btrfs 
proper for some reason, so I could be wrong here, but it looks like this } is 
off?  There's spaces here instead of a tab maybe?  If not then just ignore me. 
Thanks,

Josef

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 4/4] btrfs: introduce new read_policy round-robin
  2020-10-28 13:26 ` [PATCH RFC 4/4] btrfs: introduce new read_policy round-robin Anand Jain
@ 2020-10-28 14:44   ` Josef Bacik
  2020-10-29  2:06     ` Anand Jain
  0 siblings, 1 reply; 16+ messages in thread
From: Josef Bacik @ 2020-10-28 14:44 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs; +Cc: dsterba

On 10/28/20 9:26 AM, Anand Jain wrote:
> Add round-robin read policy to route the read IO to the next device in the
> round-robin order. The chunk allocation and thus the stripe-index follows
> the order of free space available on devices. So to make the round-robin
> effective it shall follow the devid order instead of the stripe-index
> order.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
> RFC: because I am not too sure if any workload or block layer
> configurations shall suit round-robin read_policy.
> 
>   fs/btrfs/sysfs.c   |  2 +-
>   fs/btrfs/volumes.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++
>   fs/btrfs/volumes.h |  2 ++
>   3 files changed, 53 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
> index d2a974e1a1c4..293311c79321 100644
> --- a/fs/btrfs/sysfs.c
> +++ b/fs/btrfs/sysfs.c
> @@ -908,7 +908,7 @@ static bool btrfs_strmatch(const char *given, const char *golden)
>   
>   /* Must follow the order as in enum btrfs_read_policy */
>   static const char * const btrfs_read_policy_name[] = { "pid", "latency",
> -						       "device" };
> +						       "device", "roundrobin" };
>   
>   static ssize_t btrfs_read_policy_show(struct kobject *kobj,
>   				      struct kobj_attribute *a, char *buf)
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 7ac675504051..fa1b1a3ebc87 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -5469,6 +5469,52 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
>   	return ret;
>   }
>   
> +struct stripe_mirror {
> +	u64 devid;
> +	int map;
> +};
> +
> +static int btrfs_cmp_devid(const void *a, const void *b)
> +{
> +	struct stripe_mirror *s1 = (struct stripe_mirror *)a;
> +	struct stripe_mirror *s2 = (struct stripe_mirror *)b;
> +
> +	if (s1->devid < s2->devid)
> +		return -1;
> +	if (s1->devid > s2->devid)
> +		return 1;
> +	return 0;
> +}
> +
> +static int btrfs_find_read_round_robin(struct map_lookup *map, int first,
> +				       int num_stripe)
> +{
> +	struct stripe_mirror stripes[4] = {0}; //4: for testing, works for now.
> +	struct btrfs_fs_devices *fs_devices;
> +	u64 devid;
> +	int index, j, cnt;
> +	int next_stripe;
> +
> +	index = 0;
> +	for (j = first; j < first + num_stripe; j++) {
> +		devid = map->stripes[j].dev->devid;
> +
> +		stripes[index].devid = devid;
> +		stripes[index].map = j;
> +
> +		index++;
> +	}
> +
> +	sort(stripes, num_stripe, sizeof(struct stripe_mirror),
> +	     btrfs_cmp_devid, NULL);
> +
> +	fs_devices = map->stripes[first].dev->fs_devices;
> +	cnt = atomic_inc_return(&fs_devices->total_reads);
> +	next_stripe = stripes[cnt % num_stripe].map;

This is heavy handed for policy decisions, just do something like

stripe_nr = atomic_inc_return(&fs_devices->total_reds) % num_stripes;
return stripes[stripe_nr].map;

There's no reason to sort the stripes by devid every time we call this.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] btrfs: add read_policy latency
  2020-10-28 14:30   ` Josef Bacik
@ 2020-10-29  1:06     ` Anand Jain
  0 siblings, 0 replies; 16+ messages in thread
From: Anand Jain @ 2020-10-29  1:06 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs; +Cc: dsterba



On 28/10/20 10:30 pm, Josef Bacik wrote:
> On 10/28/20 9:26 AM, Anand Jain wrote:
>> The read policy type latency routes the read IO based on the historical
>> average wait time experienced by the read IOs through the individual
>> device factored by 1/10 of inflight commands in the queue. The factor
>> 1/10 is because generally the block device queue depth is more than 1,
>> so there can be commands in the queue even before the previous commands
>> have been completed. This patch obtains the historical read IO stats from
>> the kernel block layer.

Oops I need to fix the change log.

>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>> v1: Drop part_stat_read_all instead use part_stat_read
>>      Drop inflight
>>   fs/btrfs/sysfs.c   |  3 ++-
>>   fs/btrfs/volumes.c | 39 ++++++++++++++++++++++++++++++++++++++-
>>   fs/btrfs/volumes.h |  1 +
>>   3 files changed, 41 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
>> index 4dbf90ff088a..88cbf7b2edf0 100644
>> --- a/fs/btrfs/sysfs.c
>> +++ b/fs/btrfs/sysfs.c
>> @@ -906,7 +906,8 @@ static bool btrfs_strmatch(const char *given, 
>> const char *golden)
>>       return false;
>>   }
>> -static const char * const btrfs_read_policy_name[] = { "pid" };
>> +/* Must follow the order as in enum btrfs_read_policy */
>> +static const char * const btrfs_read_policy_name[] = { "pid", 
>> "latency" };
>>   static ssize_t btrfs_read_policy_show(struct kobject *kobj,
>>                         struct kobj_attribute *a, char *buf)
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 6bf487626f23..48587009b656 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -14,6 +14,7 @@
>>   #include <linux/semaphore.h>
>>   #include <linux/uuid.h>
>>   #include <linux/list_sort.h>
>> +#include <linux/part_stat.h>
>>   #include "misc.h"
>>   #include "ctree.h"
>>   #include "extent_map.h"
>> @@ -5468,6 +5469,39 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info 
>> *fs_info, u64 logical, u64 len)
>>       return ret;
>>   }
>> +static int btrfs_find_best_stripe(struct btrfs_fs_info *fs_info,
>> +                  struct map_lookup *map, int first,
>> +                  int num_stripe)
>> +{
>> +    u64 est_wait = 0;
>> +    int best_stripe = 0;
>> +    int index;
>> +
>> +    for (index = first; index < first + num_stripe; index++) {
>> +        u64 read_wait;
>> +        u64 avg_wait = 0;
>> +        unsigned long read_ios;
>> +        struct btrfs_device *device = map->stripes[index].dev;
>> +
>> +        read_wait = part_stat_read(device->bdev->bd_part, nsecs[READ]);
>> +        read_ios = part_stat_read(device->bdev->bd_part, ios[READ]);
>> +
>> +        if (read_wait && read_ios && read_wait >= read_ios)
>> +            avg_wait = div_u64(read_wait, read_ios);
>> +        else
>> +            btrfs_info_rl(device->fs_devices->fs_info,
>> +            "devid: %llu avg_wait ZERO read_wait %llu read_ios %lu",
>> +                      device->devid, read_wait, read_ios);
> 
> Do we even care about this?  I feel like if we do at all it can be 
> btrfs_debug or something.

  Yeah I think btrfs_debug is better until we cover most the hardware?

> 
>> +
>> +        if (est_wait == 0 || est_wait > avg_wait) {
>> +            est_wait = avg_wait;
>> +            best_stripe = index;
>> +        }
>> +    }
>> +
>> +    return best_stripe;
>> +}
>> +
>>   static int find_live_mirror(struct btrfs_fs_info *fs_info,
>>                   struct map_lookup *map, int first,
>>                   int dev_replace_is_ongoing)
>> @@ -5498,6 +5532,10 @@ static int find_live_mirror(struct 
>> btrfs_fs_info *fs_info,
>>       case BTRFS_READ_POLICY_PID:
>>           preferred_mirror = first + current->pid % num_stripes;
>>           break;
>> +    case BTRFS_READ_POLICY_LATENCY:
>> +        preferred_mirror = btrfs_find_best_stripe(fs_info, map, first,
>> +                              num_stripes);
>> +        break;
>>       }
>>       if (dev_replace_is_ongoing &&
>> @@ -6114,7 +6152,6 @@ static int __btrfs_map_block(struct 
>> btrfs_fs_info *fs_info,
>>       } else if (map->type & BTRFS_BLOCK_GROUP_RAID10) {
>>           u32 factor = map->num_stripes / map->sub_stripes;
>> -
> 
> Unrelated change.  Thanks,

Oops. Thanks You I will fix.

Anand

> 
> Josef

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 0/4] btrfs: read_policy types latency, device and round-robin
  2020-10-28 14:32 ` [PATCH v1 0/4] btrfs: read_policy types latency, device and round-robin Josef Bacik
@ 2020-10-29  1:08   ` Anand Jain
  2020-10-29  7:44     ` Anand Jain
  0 siblings, 1 reply; 16+ messages in thread
From: Anand Jain @ 2020-10-29  1:08 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs; +Cc: dsterba



On 28/10/20 10:32 pm, Josef Bacik wrote:
> On 10/28/20 9:25 AM, Anand Jain wrote:
>> Based on misc-next
>>
>> Depends on the following 3 patches in the mailing list.
>>    btrfs: add btrfs_strmatch helper
>>    btrfs: create read policy framework
>>    btrfs: create read policy sysfs attribute, pid
>>
>> v1:
>> Drop tracing patch
>> Drop factoring inflight command
>>    Here below is the performance differences, when inflight is used, 
>> it pushed
>>    few commands to the other device, so losing the potential merges.
>>
>>    with inflight:
>>     READ: bw=195MiB/s (204MB/s), 195MiB/s-195MiB/s (204MB/s-204MB/s), 
>> io=15.6GiB (16.8GB), run=82203-82203msec
>> sda 256054
>> sdc 20
>>
>>    without inflight:
>>     READ: bw=192MiB/s (202MB/s), 192MiB/s-192MiB/s (202MB/s-202MB/s), 
>> io=15.6GiB (16.8GB), run=83231-83231msec
>> sda 141006
>> sdc 0
>>
> 
> What's the baseline?  I think 3mib/s is not that big of a tradeoff for 
> complexity, but if baseline is like 190mib/s then maybe its worth it.  
> If baseline is 90mib/s then I say it's not worth the inflight.  Thanks,

  Oh no I have to rerun the test cases here. As far as I remember
  without inflight was better than with inflight. Because with
  inflight there were fewer merges leading to more read IOs.

  Will rerun and send the data.

Thanks, Anand

> 
> Josef

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] btrfs: introduce new device-state read_preferred
  2020-10-28 14:37   ` Josef Bacik
@ 2020-10-29  1:12     ` Anand Jain
  0 siblings, 0 replies; 16+ messages in thread
From: Anand Jain @ 2020-10-29  1:12 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs; +Cc: dsterba



On 28/10/20 10:37 pm, Josef Bacik wrote:
> On 10/28/20 9:26 AM, Anand Jain wrote:
>> This is a preparatory patch and introduces a new device flag
>> 'read_preferred', RW-able using sysfs interface.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>   fs/btrfs/sysfs.c   | 55 ++++++++++++++++++++++++++++++++++++++++++++++
>>   fs/btrfs/volumes.h |  1 +
>>   2 files changed, 56 insertions(+)
>>
>> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
>> index 88cbf7b2edf0..52b4c9bef673 100644
>> --- a/fs/btrfs/sysfs.c
>> +++ b/fs/btrfs/sysfs.c
>> @@ -1413,11 +1413,66 @@ static ssize_t 
>> btrfs_devinfo_writeable_show(struct kobject *kobj,
>>   }
>>   BTRFS_ATTR(devid, writeable, btrfs_devinfo_writeable_show);
>> +static ssize_t btrfs_devinfo_read_pref_show(struct kobject *kobj,
>> +                        struct kobj_attribute *a, char *buf)
>> +{
>> +    int val;
>> +    struct btrfs_device *device = container_of(kobj, struct 
>> btrfs_device,
>> +                           devid_kobj);
>> +
>> +    val = !!test_bit(BTRFS_DEV_STATE_READ_PREFERRED, 
>> &device->dev_state);
>> +
>> +    return snprintf(buf, PAGE_SIZE, "%d\n", val);
>> +}
>> +
>> +static ssize_t btrfs_devinfo_read_pref_store(struct kobject *kobj,
>> +                         struct kobj_attribute *a,
>> +                         const char *buf, size_t len)
>> +{
>> +    int ret;
>> +    unsigned long val;
>> +    struct btrfs_device *device;
>> +
>> +    ret = kstrtoul(skip_spaces(buf), 0, &val);
>> +    if (ret)
>> +        return ret;
>> +
>> +    if (val != 0 && val != 1)
>> +        return -EINVAL;
>> +
>> +    /*
>> +     * lock is not required, the btrfs_device struct can't be freed 
>> while
>> +     * its kobject btrfs_device::devid_kobj is still open.
>> +     */
>> +    device = container_of(kobj, struct btrfs_device, devid_kobj);
>> +
>> +    if (val &&
>> +        ! test_bit(BTRFS_DEV_STATE_READ_PREFERRED, 
>> &device->dev_state)) {
>> +
> 
> No extra space between the ! test_bit, and you don't need these extra 
> newlines.
> 
  Will fix.

> I sort of wonder what happens if we have multiple SSD's with multiple 
> slow disks, we may want to load balance between the SSD's and set the 
> two SSD's to preferred.  But typing it out made me think its not really 
> related to this change, so don't worry too much about it, just thoughts 
> for the future.  Thanks,
> 

  That's true. I had the same challenges when I wrote this. Then later
  it lead to have device groups, which is for next developments.

Thanks, Anand

> Josef

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] btrfs: introduce new read_policy device
  2020-10-28 14:40   ` Josef Bacik
@ 2020-10-29  1:56     ` Anand Jain
  0 siblings, 0 replies; 16+ messages in thread
From: Anand Jain @ 2020-10-29  1:56 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs; +Cc: dsterba




>> +static int btrfs_find_read_preferred(struct map_lookup *map, int 
>> first, int num_stripe)
>> +{
>> +    int stripe_index;
>> +    int last = first + num_stripe;
>> +
>> +    /*
>> +     * If there are more than one read preferred devices, then just 
>> pick the
>> +     * first found read preferred device as of now.
>> +     */
>> +    for (stripe_index = first; stripe_index < last; stripe_index++) {
>> +        if (test_bit(BTRFS_DEV_STATE_READ_PREFERRED,
>> +                 &map->stripes[stripe_index].dev->dev_state))
>> +            return stripe_index;
>> +        }
> 
> b4 isn't working for me because these patches didn't make it to 
> linux-btrfs proper for some reason, 

  Yep. Since yday I am getting email delivery failure.

-----
for recipient address <linux-block@vger.kernel.org> the policy analysis 
reported: zpostgrey: connect: Connection refused
-----

> so I could be wrong here, but it 
> looks like this } is off?  There's spaces here instead of a tab maybe?

> If not then just ignore me. Thanks,
Yeah tab is ok. Its just in the email.

Thanks, Anand

> Josef

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC 4/4] btrfs: introduce new read_policy round-robin
  2020-10-28 14:44   ` Josef Bacik
@ 2020-10-29  2:06     ` Anand Jain
  0 siblings, 0 replies; 16+ messages in thread
From: Anand Jain @ 2020-10-29  2:06 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs; +Cc: dsterba



On 28/10/20 10:44 pm, Josef Bacik wrote:
> On 10/28/20 9:26 AM, Anand Jain wrote:
>> Add round-robin read policy to route the read IO to the next device in 
>> the
>> round-robin order. The chunk allocation and thus the stripe-index follows
>> the order of free space available on devices. So to make the round-robin
>> effective it shall follow the devid order instead of the stripe-index
>> order.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>> RFC: because I am not too sure if any workload or block layer
>> configurations shall suit round-robin read_policy.
>>
>>   fs/btrfs/sysfs.c   |  2 +-
>>   fs/btrfs/volumes.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++
>>   fs/btrfs/volumes.h |  2 ++
>>   3 files changed, 53 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
>> index d2a974e1a1c4..293311c79321 100644
>> --- a/fs/btrfs/sysfs.c
>> +++ b/fs/btrfs/sysfs.c
>> @@ -908,7 +908,7 @@ static bool btrfs_strmatch(const char *given, 
>> const char *golden)
>>   /* Must follow the order as in enum btrfs_read_policy */
>>   static const char * const btrfs_read_policy_name[] = { "pid", 
>> "latency",
>> -                               "device" };
>> +                               "device", "roundrobin" };
>>   static ssize_t btrfs_read_policy_show(struct kobject *kobj,
>>                         struct kobj_attribute *a, char *buf)
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 7ac675504051..fa1b1a3ebc87 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -5469,6 +5469,52 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info 
>> *fs_info, u64 logical, u64 len)
>>       return ret;
>>   }
>> +struct stripe_mirror {
>> +    u64 devid;
>> +    int map;
>> +};
>> +
>> +static int btrfs_cmp_devid(const void *a, const void *b)
>> +{
>> +    struct stripe_mirror *s1 = (struct stripe_mirror *)a;
>> +    struct stripe_mirror *s2 = (struct stripe_mirror *)b;
>> +
>> +    if (s1->devid < s2->devid)
>> +        return -1;
>> +    if (s1->devid > s2->devid)
>> +        return 1;
>> +    return 0;
>> +}
>> +
>> +static int btrfs_find_read_round_robin(struct map_lookup *map, int 
>> first,
>> +                       int num_stripe)
>> +{
>> +    struct stripe_mirror stripes[4] = {0}; //4: for testing, works 
>> for now.
>> +    struct btrfs_fs_devices *fs_devices;
>> +    u64 devid;
>> +    int index, j, cnt;
>> +    int next_stripe;
>> +
>> +    index = 0;
>> +    for (j = first; j < first + num_stripe; j++) {
>> +        devid = map->stripes[j].dev->devid;
>> +
>> +        stripes[index].devid = devid;
>> +        stripes[index].map = j;
>> +
>> +        index++;
>> +    }
>> +
>> +    sort(stripes, num_stripe, sizeof(struct stripe_mirror),
>> +         btrfs_cmp_devid, NULL);
>> +
>> +    fs_devices = map->stripes[first].dev->fs_devices;
>> +    cnt = atomic_inc_return(&fs_devices->total_reads);
>> +    next_stripe = stripes[cnt % num_stripe].map;
> 
> This is heavy handed for policy decisions, just do something like
> 
> stripe_nr = atomic_inc_return(&fs_devices->total_reds) % num_stripes;
> return stripes[stripe_nr].map;
> 
> There's no reason to sort the stripes by devid every time we call this.  
> Thanks,


But that won't be round-robin at the device level, which I think we are
trying to achieve. It shall be round-robin at the strip level. As
stripe id isn't guaranteed to be on the same device across chunks.
Because chunk allocation follows the device free space size order.

But I agree the proposal in this patch is heavy. So I was thinking if we 
could fix the chunk allocation to sort by devid instead of size.? Any idea?

Thanks, Anand

> 
> Josef

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v1 0/4] btrfs: read_policy types latency, device and round-robin
  2020-10-29  1:08   ` Anand Jain
@ 2020-10-29  7:44     ` Anand Jain
  0 siblings, 0 replies; 16+ messages in thread
From: Anand Jain @ 2020-10-29  7:44 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs; +Cc: dsterba



On 29/10/20 9:08 am, Anand Jain wrote:
> 
> 
> On 28/10/20 10:32 pm, Josef Bacik wrote:
>> On 10/28/20 9:25 AM, Anand Jain wrote:
>>> Based on misc-next
>>>
>>> Depends on the following 3 patches in the mailing list.
>>>    btrfs: add btrfs_strmatch helper
>>>    btrfs: create read policy framework
>>>    btrfs: create read policy sysfs attribute, pid
>>>
>>> v1:
>>> Drop tracing patch
>>> Drop factoring inflight command
>>>    Here below is the performance differences, when inflight is used, 
>>> it pushed
>>>    few commands to the other device, so losing the potential merges.
>>>
>>>    with inflight:
>>>     READ: bw=195MiB/s (204MB/s), 195MiB/s-195MiB/s (204MB/s-204MB/s), 
>>> io=15.6GiB (16.8GB), run=82203-82203msec
>>> sda 256054
>>> sdc 20
>>>
>>>    without inflight:
>>>     READ: bw=192MiB/s (202MB/s), 192MiB/s-192MiB/s (202MB/s-202MB/s), 
>>> io=15.6GiB (16.8GB), run=83231-83231msec
>>> sda 141006
>>> sdc 0
>>>
>>
>> What's the baseline?  I think 3mib/s is not that big of a tradeoff for 
>> complexity, but if baseline is like 190mib/s then maybe its worth it. 
>> If baseline is 90mib/s then I say it's not worth the inflight.  Thanks,
> 
>   Oh no I have to rerun the test cases here. As far as I remember
>   without inflight was better than with inflight. Because with
>   inflight there were fewer merges leading to more read IOs.
> 
>   Will rerun and send the data.
> 

<raid1>

With inflight: (the inflight patches were in the RFC patchset):
pid [latency] device roundrobin ( 00)
    READ: bw=40.3MiB/s (42.3MB/s), 40.3MiB/s-40.3MiB/s 
(42.3MB/s-42.3MB/s), io=15.6GiB (16.8GB), run=396546-396546msec
vdb 363575
sda 200771


Without inflight:
pid [latency] device roundrobin ( 00)
    READ: bw=41.7MiB/s (43.8MB/s), 41.7MiB/s-41.7MiB/s 
(43.8MB/s-43.8MB/s), io=15.6GiB (16.8GB), run=383274-383274msec
vdb 256238
sda 0

Without inflight is better due to lesser IO.

Thanks, Anand

> Thanks, Anand
> 
>>
>> Josef

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-10-29  7:45 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-28 13:25 [PATCH v1 0/4] btrfs: read_policy types latency, device and round-robin Anand Jain
2020-10-28 13:26 ` [PATCH 1/4] btrfs: add read_policy latency Anand Jain
2020-10-28 14:30   ` Josef Bacik
2020-10-29  1:06     ` Anand Jain
2020-10-28 13:26 ` [PATCH 2/4] btrfs: introduce new device-state read_preferred Anand Jain
2020-10-28 14:37   ` Josef Bacik
2020-10-29  1:12     ` Anand Jain
2020-10-28 13:26 ` [PATCH 3/4] btrfs: introduce new read_policy device Anand Jain
2020-10-28 14:40   ` Josef Bacik
2020-10-29  1:56     ` Anand Jain
2020-10-28 13:26 ` [PATCH RFC 4/4] btrfs: introduce new read_policy round-robin Anand Jain
2020-10-28 14:44   ` Josef Bacik
2020-10-29  2:06     ` Anand Jain
2020-10-28 14:32 ` [PATCH v1 0/4] btrfs: read_policy types latency, device and round-robin Josef Bacik
2020-10-29  1:08   ` Anand Jain
2020-10-29  7:44     ` Anand Jain

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.