linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] btrfs: add read mirror policy
@ 2018-05-16 10:03 Anand Jain
  2018-05-16 10:03 ` [PATCH v2 1/3] btrfs: add mount option read_mirror_policy Anand Jain
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Anand Jain @ 2018-05-16 10:03 UTC (permalink / raw)
  To: linux-btrfs

Not yet ready for the integration. As I need to introduce
-o no_read_mirror_policy instead of -o read_mirror_policy=-<devid>
to reset the policy as in 3/3. But I am sending this early so that
we can use it for btrfs/161 in the ML, and this patch-set is stable
enough for the testing.

Anand Jain (3):
  btrfs: add mount option read_mirror_policy
  btrfs: add read_mirror_policy parameter devid
  btrfs: read_mirror_policy ability to reset

 fs/btrfs/ctree.h   |  2 ++
 fs/btrfs/super.c   | 44 ++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.c | 18 +++++++++++++++++-
 fs/btrfs/volumes.h |  7 +++++++
 4 files changed, 70 insertions(+), 1 deletion(-)

-- 
2.7.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/3] btrfs: add mount option read_mirror_policy
  2018-05-16 10:03 [PATCH v2 0/3] btrfs: add read mirror policy Anand Jain
@ 2018-05-16 10:03 ` Anand Jain
  2018-05-16 10:03 ` [PATCH v2 2/3] btrfs: add read_mirror_policy parameter devid Anand Jain
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Anand Jain @ 2018-05-16 10:03 UTC (permalink / raw)
  To: linux-btrfs

In case of RAID1 and RAID10 devices are mirror-ed, a read IO can
pick any device for reading. This choice of picking a device for
reading should be configurable. In short not one policy would
satisfy all types of workload and configs.

So before we add more policies, this patch-set makes existing
$pid policy configurable from the mount option.

For example..
  mount -o read_mirror_policy=pid (which is also default)

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/ctree.h   |  2 ++
 fs/btrfs/super.c   | 10 ++++++++++
 fs/btrfs/volumes.c |  8 +++++++-
 fs/btrfs/volumes.h |  5 +++++
 4 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index bfa96697209a..0d997939cdae 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1121,6 +1121,8 @@ struct btrfs_fs_info {
 	spinlock_t ref_verify_lock;
 	struct rb_root block_tree;
 #endif
+	/* Policy to balance read across mirrored devices */
+	int read_mirror_policy;
 };
 
 static inline struct btrfs_fs_info *btrfs_sb(struct super_block *sb)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index c67fafaa2fe7..21eff0ac1e4f 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -345,6 +345,7 @@ enum {
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
 	Opt_ref_verify,
 #endif
+	Opt_read_mirror_policy,
 	Opt_err,
 };
 
@@ -414,6 +415,7 @@ static const match_table_t tokens = {
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
 	{Opt_ref_verify, "ref_verify"},
 #endif
+	{Opt_read_mirror_policy, "read_mirror_policy=%s"},
 	{Opt_err, NULL},
 };
 
@@ -844,6 +846,14 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
 			btrfs_set_opt(info->mount_opt, REF_VERIFY);
 			break;
 #endif
+		case Opt_read_mirror_policy:
+			if (strcmp(args[0].from, "pid") == 0) {
+				info->read_mirror_policy =
+					BTRFS_READ_MIRROR_BY_PID;
+				break;
+			}
+			ret = -EINVAL;
+			goto out;
 		case Opt_err:
 			btrfs_info(info, "unrecognized mount option '%s'", p);
 			ret = -EINVAL;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 81fb38884cac..64dba5c4cf33 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5219,7 +5219,13 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
 	else
 		num_stripes = map->num_stripes;
 
-	preferred_mirror = first + current->pid % num_stripes;
+	switch(fs_info->read_mirror_policy) {
+	case BTRFS_READ_MIRROR_DEFAULT:
+	case BTRFS_READ_MIRROR_BY_PID:
+	default:
+		preferred_mirror = first + current->pid % num_stripes;
+		break;
+	}
 
 	if (dev_replace_is_ongoing &&
 	    fs_info->dev_replace.cont_reading_from_srcdev_mode ==
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 5139ec8daf4c..953df9925832 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -34,6 +34,11 @@ struct btrfs_pending_bios {
 #define btrfs_device_data_ordered_init(device) do { } while (0)
 #endif
 
+enum btrfs_read_mirror_type {
+	BTRFS_READ_MIRROR_DEFAULT,
+	BTRFS_READ_MIRROR_BY_PID,
+};
+
 #define BTRFS_DEV_STATE_WRITEABLE	(0)
 #define BTRFS_DEV_STATE_IN_FS_METADATA	(1)
 #define BTRFS_DEV_STATE_MISSING		(2)
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/3] btrfs: add read_mirror_policy parameter devid
  2018-05-16 10:03 [PATCH v2 0/3] btrfs: add read mirror policy Anand Jain
  2018-05-16 10:03 ` [PATCH v2 1/3] btrfs: add mount option read_mirror_policy Anand Jain
@ 2018-05-16 10:03 ` Anand Jain
  2019-01-21 11:56   ` Steven Davies
  2018-05-16 10:03 ` [PATCH v2 3/3] btrfs: read_mirror_policy ability to reset Anand Jain
  2018-05-16 22:35 ` [PATCH v2 0/3] btrfs: add read mirror policy David Sterba
  3 siblings, 1 reply; 15+ messages in thread
From: Anand Jain @ 2018-05-16 10:03 UTC (permalink / raw)
  To: linux-btrfs

Adds the mount option:
  mount -o read_mirror_policy=<devid>

To set the devid of the device which should be used for read. That
means all the normal reads will go to that particular device only.

This also helps testing and gives a better control for the test
scripts including mount context reads.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/super.c   | 21 +++++++++++++++++++++
 fs/btrfs/volumes.c | 10 ++++++++++
 fs/btrfs/volumes.h |  2 ++
 3 files changed, 33 insertions(+)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 21eff0ac1e4f..7ddecf4178a6 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -852,6 +852,27 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
 					BTRFS_READ_MIRROR_BY_PID;
 				break;
 			}
+
+			intarg = 0;
+			if (match_int(&args[0], &intarg) == 0) {
+				struct btrfs_device *device;
+
+				device = btrfs_find_device(info, intarg,
+							   NULL, NULL);
+				if (!device) {
+					btrfs_err(info,
+					  "read_mirror_policy: invalid devid %d",
+					  intarg);
+					ret = -EINVAL;
+					goto out;
+				}
+				info->read_mirror_policy =
+						BTRFS_READ_MIRROR_BY_DEV;
+				set_bit(BTRFS_DEV_STATE_READ_MIRROR,
+					&device->dev_state);
+				break;
+			}
+
 			ret = -EINVAL;
 			goto out;
 		case Opt_err:
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 64dba5c4cf33..3a9c3804ff17 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5220,6 +5220,16 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info,
 		num_stripes = map->num_stripes;
 
 	switch(fs_info->read_mirror_policy) {
+	case BTRFS_READ_MIRROR_BY_DEV:
+		preferred_mirror = first;
+		if (test_bit(BTRFS_DEV_STATE_READ_MIRROR,
+			     &map->stripes[preferred_mirror].dev->dev_state))
+			break;
+		if (test_bit(BTRFS_DEV_STATE_READ_MIRROR,
+			     &map->stripes[++preferred_mirror].dev->dev_state))
+			break;
+		preferred_mirror = first;
+		break;
 	case BTRFS_READ_MIRROR_DEFAULT:
 	case BTRFS_READ_MIRROR_BY_PID:
 	default:
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 953df9925832..55b2eebbf183 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -37,6 +37,7 @@ struct btrfs_pending_bios {
 enum btrfs_read_mirror_type {
 	BTRFS_READ_MIRROR_DEFAULT,
 	BTRFS_READ_MIRROR_BY_PID,
+	BTRFS_READ_MIRROR_BY_DEV,
 };
 
 #define BTRFS_DEV_STATE_WRITEABLE	(0)
@@ -44,6 +45,7 @@ enum btrfs_read_mirror_type {
 #define BTRFS_DEV_STATE_MISSING		(2)
 #define BTRFS_DEV_STATE_REPLACE_TGT	(3)
 #define BTRFS_DEV_STATE_FLUSH_SENT	(4)
+#define BTRFS_DEV_STATE_READ_MIRROR	(5)
 
 struct btrfs_device {
 	struct list_head dev_list;
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/3] btrfs: read_mirror_policy ability to reset
  2018-05-16 10:03 [PATCH v2 0/3] btrfs: add read mirror policy Anand Jain
  2018-05-16 10:03 ` [PATCH v2 1/3] btrfs: add mount option read_mirror_policy Anand Jain
  2018-05-16 10:03 ` [PATCH v2 2/3] btrfs: add read_mirror_policy parameter devid Anand Jain
@ 2018-05-16 10:03 ` Anand Jain
  2018-05-16 22:35 ` [PATCH v2 0/3] btrfs: add read mirror policy David Sterba
  3 siblings, 0 replies; 15+ messages in thread
From: Anand Jain @ 2018-05-16 10:03 UTC (permalink / raw)
  To: linux-btrfs

Adds an ability to change the read_mirror_policy at remount.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/super.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 7ddecf4178a6..e70592036b95 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -856,6 +856,12 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
 			intarg = 0;
 			if (match_int(&args[0], &intarg) == 0) {
 				struct btrfs_device *device;
+				int clear = 0;
+
+				if (intarg < 0) {
+					clear = 1;
+					intarg = -intarg;
+				}
 
 				device = btrfs_find_device(info, intarg,
 							   NULL, NULL);
@@ -866,10 +872,17 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
 					ret = -EINVAL;
 					goto out;
 				}
-				info->read_mirror_policy =
+
+				if (clear) {
+					info->read_mirror_policy = 0;
+					clear_bit(BTRFS_DEV_STATE_READ_MIRROR,
+						&device->dev_state);
+				} else {
+					info->read_mirror_policy =
 						BTRFS_READ_MIRROR_BY_DEV;
-				set_bit(BTRFS_DEV_STATE_READ_MIRROR,
-					&device->dev_state);
+					set_bit(BTRFS_DEV_STATE_READ_MIRROR,
+						  &device->dev_state);
+				}
 				break;
 			}
 
-- 
2.7.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/3] btrfs: add read mirror policy
  2018-05-16 10:03 [PATCH v2 0/3] btrfs: add read mirror policy Anand Jain
                   ` (2 preceding siblings ...)
  2018-05-16 10:03 ` [PATCH v2 3/3] btrfs: read_mirror_policy ability to reset Anand Jain
@ 2018-05-16 22:35 ` David Sterba
  2018-05-17  2:32   ` Anand Jain
  2018-05-17 14:46   ` Jeff Mahoney
  3 siblings, 2 replies; 15+ messages in thread
From: David Sterba @ 2018-05-16 22:35 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

On Wed, May 16, 2018 at 06:03:56PM +0800, Anand Jain wrote:
> Not yet ready for the integration. As I need to introduce
> -o no_read_mirror_policy instead of -o read_mirror_policy=-<devid>

Mount option is mostly likely not the right interface for setting such
options, as usual.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/3] btrfs: add read mirror policy
  2018-05-16 22:35 ` [PATCH v2 0/3] btrfs: add read mirror policy David Sterba
@ 2018-05-17  2:32   ` Anand Jain
  2018-05-17 12:25     ` Austin S. Hemmelgarn
  2018-05-17 14:46   ` Jeff Mahoney
  1 sibling, 1 reply; 15+ messages in thread
From: Anand Jain @ 2018-05-17  2:32 UTC (permalink / raw)
  To: dsterba, linux-btrfs



On 05/17/2018 06:35 AM, David Sterba wrote:
> On Wed, May 16, 2018 at 06:03:56PM +0800, Anand Jain wrote:
>> Not yet ready for the integration. As I need to introduce
>> -o no_read_mirror_policy instead of -o read_mirror_policy=-<devid>
> 
> Mount option is mostly likely not the right interface for setting such
> options, as usual.

  I am ok to make it ioctl for the final. What do you think?


  But to reproduce the bug posted in
    Btrfs: fix the corruption by reading stale btree blocks
  It needs to be a mount option, as randomly the pid can
  still pick the disk specified in the mount option.

-Anand

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/3] btrfs: add read mirror policy
  2018-05-17  2:32   ` Anand Jain
@ 2018-05-17 12:25     ` Austin S. Hemmelgarn
  2018-05-17 14:46       ` Jeff Mahoney
  0 siblings, 1 reply; 15+ messages in thread
From: Austin S. Hemmelgarn @ 2018-05-17 12:25 UTC (permalink / raw)
  To: Anand Jain, dsterba, linux-btrfs

On 2018-05-16 22:32, Anand Jain wrote:
> 
> 
> On 05/17/2018 06:35 AM, David Sterba wrote:
>> On Wed, May 16, 2018 at 06:03:56PM +0800, Anand Jain wrote:
>>> Not yet ready for the integration. As I need to introduce
>>> -o no_read_mirror_policy instead of -o read_mirror_policy=-<devid>
>>
>> Mount option is mostly likely not the right interface for setting such
>> options, as usual.
> 
>   I am ok to make it ioctl for the final. What do you think?
> 
> 
>   But to reproduce the bug posted in
>     Btrfs: fix the corruption by reading stale btree blocks
>   It needs to be a mount option, as randomly the pid can
>   still pick the disk specified in the mount option.
> 
Personally, I'd vote for filesystem property (thus handled through the 
standard `btrfs property` command) that can be overridden by a mount 
option.  With that approach, no new tool (or change to an existing tool) 
would be needed, existing volumes could be converted to use it in a 
backwards compatible manner (old kernels would just ignore the 
property), and you could still have the behavior you want in tests (and 
in theory it could easily be adapted to be a per-subvolume setting if we 
ever get per-subvolume chunk profile support).

Of course, I'd actually like to see most of the mount options available 
as filesystem level properties with the option to override through mount 
options, but that's a lot more ambitious of an undertaking.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/3] btrfs: add read mirror policy
  2018-05-16 22:35 ` [PATCH v2 0/3] btrfs: add read mirror policy David Sterba
  2018-05-17  2:32   ` Anand Jain
@ 2018-05-17 14:46   ` Jeff Mahoney
  2018-05-17 15:22     ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 15+ messages in thread
From: Jeff Mahoney @ 2018-05-17 14:46 UTC (permalink / raw)
  To: dsterba, Anand Jain, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 810 bytes --]

On 5/16/18 6:35 PM, David Sterba wrote:
> On Wed, May 16, 2018 at 06:03:56PM +0800, Anand Jain wrote:
>> Not yet ready for the integration. As I need to introduce
>> -o no_read_mirror_policy instead of -o read_mirror_policy=-<devid>
> 
> Mount option is mostly likely not the right interface for setting such
> options, as usual.


I've seen a few alternate suggestions in the thread.  I suppose the real
question is: what and where is the intended persistence for this choice?

A mount option gets it via fstab.  How would a user be expected to set
it consistently via ioctl on each mount?  Properties could work, but
there's more discussion needed there.  Personally, I like the property
idea since it could conceivably be used on a per-file basis.

-Jeff

-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/3] btrfs: add read mirror policy
  2018-05-17 12:25     ` Austin S. Hemmelgarn
@ 2018-05-17 14:46       ` Jeff Mahoney
  2018-05-18  8:06         ` Anand Jain
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff Mahoney @ 2018-05-17 14:46 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Anand Jain, dsterba, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2194 bytes --]

On 5/17/18 8:25 AM, Austin S. Hemmelgarn wrote:
> On 2018-05-16 22:32, Anand Jain wrote:
>>
>>
>> On 05/17/2018 06:35 AM, David Sterba wrote:
>>> On Wed, May 16, 2018 at 06:03:56PM +0800, Anand Jain wrote:
>>>> Not yet ready for the integration. As I need to introduce
>>>> -o no_read_mirror_policy instead of -o read_mirror_policy=-<devid>
>>>
>>> Mount option is mostly likely not the right interface for setting such
>>> options, as usual.
>>
>>   I am ok to make it ioctl for the final. What do you think?
>>
>>
>>   But to reproduce the bug posted in
>>     Btrfs: fix the corruption by reading stale btree blocks
>>   It needs to be a mount option, as randomly the pid can
>>   still pick the disk specified in the mount option.
>>
> Personally, I'd vote for filesystem property (thus handled through the
> standard `btrfs property` command) that can be overridden by a mount
> option.  With that approach, no new tool (or change to an existing tool)
> would be needed, existing volumes could be converted to use it in a
> backwards compatible manner (old kernels would just ignore the
> property), and you could still have the behavior you want in tests (and
> in theory it could easily be adapted to be a per-subvolume setting if we
> ever get per-subvolume chunk profile support).

Properties are a combination of interfaces presented through a single
command.  Although the kernel API would allow a direct-to-property
interface via the btrfs.* extended attributes, those are currently
limited to a single inode.  The label property is set via ioctl and
stored in the superblock.  The read-only subvolume property is also set
by ioctl but stored in the root flags.

As it stands, every property is explicitly defined in the tools, so any
addition would require tools changes.  This is a bigger discussion,
though.  We *could* use the xattr interface to access per-root or
fs-global properties, but we'd need to define that interface.
btrfs_listxattr could get interesting, though I suppose we could
simplify it by only allowing the per-subvolume and fs-global operations
on root inodes.

-Jeff

-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/3] btrfs: add read mirror policy
  2018-05-17 14:46   ` Jeff Mahoney
@ 2018-05-17 15:22     ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 15+ messages in thread
From: Austin S. Hemmelgarn @ 2018-05-17 15:22 UTC (permalink / raw)
  To: Jeff Mahoney, dsterba, Anand Jain, linux-btrfs

On 2018-05-17 10:46, Jeff Mahoney wrote:
> On 5/16/18 6:35 PM, David Sterba wrote:
>> On Wed, May 16, 2018 at 06:03:56PM +0800, Anand Jain wrote:
>>> Not yet ready for the integration. As I need to introduce
>>> -o no_read_mirror_policy instead of -o read_mirror_policy=-<devid>
>>
>> Mount option is mostly likely not the right interface for setting such
>> options, as usual.
> 
> 
> I've seen a few alternate suggestions in the thread.  I suppose the real
> question is: what and where is the intended persistence for this choice?
> 
> A mount option gets it via fstab.  How would a user be expected to set
> it consistently via ioctl on each mount?  Properties could work, but
> there's more discussion needed there.  Personally, I like the property
> idea since it could conceivably be used on a per-file basis.

For the specific proposed use case (the tests), it probably doesn't need 
to be persistent beyond mount options.

However, this also allows for a trivial configuration using a slow 
storage device to provide redundancy for a fast storage device of the 
same size, which is potentially very useful for some people.  In that 
case, I can see most people who would be using it wanting it to follow 
the filesystem regardless of what context it's being mounted in (for 
example, it shouldn't need an extra option if mounted from a recovery 
environment or if it's moved to another system).

Most of my reason for recommending properties is that filesystem level 
properties appear to be the best thing BTRFS has to store per-volume 
configuration that's supposed to stay with the volume, despite not 
really being used for that even though there are quite a few mount 
options that are logical candidates for this type of thing (for example, 
the `ssd` options, `metadata_ratio`, and `max_inline` all make more 
logical sense as a property of the volume, not the mount).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/3] btrfs: add read mirror policy
  2018-05-17 14:46       ` Jeff Mahoney
@ 2018-05-18  8:06         ` Anand Jain
  2018-05-18 12:36           ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 15+ messages in thread
From: Anand Jain @ 2018-05-18  8:06 UTC (permalink / raw)
  To: Jeff Mahoney, Austin S. Hemmelgarn, dsterba, linux-btrfs



Thanks Austin and Jeff for the suggestion.

I am not particularly a fan of mount option either mainly because
those options aren't persistent and host independent luns will
have tough time to have them synchronize manually.

Properties are better as it is persistent. And we can apply this
read_mirror_policy property on the fsid object.

But if we are talking about the properties then it can be stored
as extended attributes or ondisk key value pair, and I am doubt
if ondisk key value pair will get a nod.
I can explore the extended attribute approach but appreciate more
comments.

-Anand


On 05/17/2018 10:46 PM, Jeff Mahoney wrote:
> On 5/17/18 8:25 AM, Austin S. Hemmelgarn wrote:
>> On 2018-05-16 22:32, Anand Jain wrote:
>>>
>>>
>>> On 05/17/2018 06:35 AM, David Sterba wrote:
>>>> On Wed, May 16, 2018 at 06:03:56PM +0800, Anand Jain wrote:
>>>>> Not yet ready for the integration. As I need to introduce
>>>>> -o no_read_mirror_policy instead of -o read_mirror_policy=-<devid>
>>>>
>>>> Mount option is mostly likely not the right interface for setting such
>>>> options, as usual.
>>>
>>>    I am ok to make it ioctl for the final. What do you think?
>>>
>>>
>>>    But to reproduce the bug posted in
>>>      Btrfs: fix the corruption by reading stale btree blocks
>>>    It needs to be a mount option, as randomly the pid can
>>>    still pick the disk specified in the mount option.
>>>
>> Personally, I'd vote for filesystem property (thus handled through the
>> standard `btrfs property` command) that can be overridden by a mount
>> option.  With that approach, no new tool (or change to an existing tool)
>> would be needed, existing volumes could be converted to use it in a
>> backwards compatible manner (old kernels would just ignore the
>> property), and you could still have the behavior you want in tests (and
>> in theory it could easily be adapted to be a per-subvolume setting if we
>> ever get per-subvolume chunk profile support).
> 
> Properties are a combination of interfaces presented through a single
> command.  Although the kernel API would allow a direct-to-property
> interface via the btrfs.* extended attributes, those are currently
> limited to a single inode.  The label property is set via ioctl and
> stored in the superblock.  The read-only subvolume property is also set
> by ioctl but stored in the root flags.
> 
> As it stands, every property is explicitly defined in the tools, so any
> addition would require tools changes.  This is a bigger discussion,
> though.  We *could* use the xattr interface to access per-root or
> fs-global properties, but we'd need to define that interface.
> btrfs_listxattr could get interesting, though I suppose we could
> simplify it by only allowing the per-subvolume and fs-global operations
> on root inodes.
> 
> -Jeff
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/3] btrfs: add read mirror policy
  2018-05-18  8:06         ` Anand Jain
@ 2018-05-18 12:36           ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 15+ messages in thread
From: Austin S. Hemmelgarn @ 2018-05-18 12:36 UTC (permalink / raw)
  To: Anand Jain, Jeff Mahoney, dsterba, linux-btrfs

On 2018-05-18 04:06, Anand Jain wrote:
> 
> 
> Thanks Austin and Jeff for the suggestion.
> 
> I am not particularly a fan of mount option either mainly because
> those options aren't persistent and host independent luns will
> have tough time to have them synchronize manually.
> 
> Properties are better as it is persistent. And we can apply this
> read_mirror_policy property on the fsid object.
> 
> But if we are talking about the properties then it can be stored
> as extended attributes or ondisk key value pair, and I am doubt
> if ondisk key value pair will get a nod.
> I can explore the extended attribute approach but appreciate more
> comments.

Hmm, thinking a bit further, might it be easier to just keep this as a 
mount option, and add something that lets you embed default mount 
options in the volume in a free-form manner?  Then, you could set this 
persistently there, and could specify any others you want too.  Doing 
that would also give very well defined behavior for exactly when changes 
would apply (the next time you mount or remount the volume), though 
handling of whether or not an option came from there or was specified on 
the command-line might be a bit complicated.
> 
> 
> On 05/17/2018 10:46 PM, Jeff Mahoney wrote:
>> On 5/17/18 8:25 AM, Austin S. Hemmelgarn wrote:
>>> On 2018-05-16 22:32, Anand Jain wrote:
>>>>
>>>>
>>>> On 05/17/2018 06:35 AM, David Sterba wrote:
>>>>> On Wed, May 16, 2018 at 06:03:56PM +0800, Anand Jain wrote:
>>>>>> Not yet ready for the integration. As I need to introduce
>>>>>> -o no_read_mirror_policy instead of -o read_mirror_policy=-<devid>
>>>>>
>>>>> Mount option is mostly likely not the right interface for setting such
>>>>> options, as usual.
>>>>
>>>>    I am ok to make it ioctl for the final. What do you think?
>>>>
>>>>
>>>>    But to reproduce the bug posted in
>>>>      Btrfs: fix the corruption by reading stale btree blocks
>>>>    It needs to be a mount option, as randomly the pid can
>>>>    still pick the disk specified in the mount option.
>>>>
>>> Personally, I'd vote for filesystem property (thus handled through the
>>> standard `btrfs property` command) that can be overridden by a mount
>>> option.  With that approach, no new tool (or change to an existing tool)
>>> would be needed, existing volumes could be converted to use it in a
>>> backwards compatible manner (old kernels would just ignore the
>>> property), and you could still have the behavior you want in tests (and
>>> in theory it could easily be adapted to be a per-subvolume setting if we
>>> ever get per-subvolume chunk profile support).
>>
>> Properties are a combination of interfaces presented through a single
>> command.  Although the kernel API would allow a direct-to-property
>> interface via the btrfs.* extended attributes, those are currently
>> limited to a single inode.  The label property is set via ioctl and
>> stored in the superblock.  The read-only subvolume property is also set
>> by ioctl but stored in the root flags.
>>
>> As it stands, every property is explicitly defined in the tools, so any
>> addition would require tools changes.  This is a bigger discussion,
>> though.  We *could* use the xattr interface to access per-root or
>> fs-global properties, but we'd need to define that interface.
>> btrfs_listxattr could get interesting, though I suppose we could
>> simplify it by only allowing the per-subvolume and fs-global operations
>> on root inodes.
>>
>> -Jeff
>>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/3] btrfs: add read_mirror_policy parameter devid
  2018-05-16 10:03 ` [PATCH v2 2/3] btrfs: add read_mirror_policy parameter devid Anand Jain
@ 2019-01-21 11:56   ` Steven Davies
  2019-01-22 13:43     ` Anand Jain
  0 siblings, 1 reply; 15+ messages in thread
From: Steven Davies @ 2019-01-21 11:56 UTC (permalink / raw)
  To: linux-btrfs

On 2018-05-16 11:03, Anand Jain wrote:

Going back to an old patchset I was testing this weekend:

> Adds the mount option:
>   mount -o read_mirror_policy=<devid>
> 
> To set the devid of the device which should be used for read. That
> means all the normal reads will go to that particular device only.
> 
> This also helps testing and gives a better control for the test
> scripts including mount context reads.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
Tested-By: Steven Davies <btrfs-list@steev.me.uk>

> ---
>  fs/btrfs/super.c   | 21 +++++++++++++++++++++
>  fs/btrfs/volumes.c | 10 ++++++++++
>  fs/btrfs/volumes.h |  2 ++
>  3 files changed, 33 insertions(+)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 21eff0ac1e4f..7ddecf4178a6 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -852,6 +852,27 @@ int btrfs_parse_options(struct btrfs_fs_info
> *info, char *options,
>  					BTRFS_READ_MIRROR_BY_PID;
>  				break;
>  			}
> +
> +			intarg = 0;
> +			if (match_int(&args[0], &intarg) == 0) {
> +				struct btrfs_device *device;
> +
> +				device = btrfs_find_device(info, intarg,
> +							   NULL, NULL);
> +				if (!device) {
> +					btrfs_err(info,
> +					  "read_mirror_policy: invalid devid %d",
> +					  intarg);
> +					ret = -EINVAL;
> +					goto out;
> +				}
> +				info->read_mirror_policy =
> +						BTRFS_READ_MIRROR_BY_DEV;
> +				set_bit(BTRFS_DEV_STATE_READ_MIRROR,
> +					&device->dev_state);

Not an expert, but might this be overwritten with another state? e.g. 
BTRFS_DEV_STATE_REPLACE_TGT. The device would then lose its READ_MIRROR 
flag and the fs would always use the first device for reading.

> +				break;
> +			}

I noticed that it's possible to pass this option multiple times at 
mount, which sets multiple devices as read mirrors. While that doesn't 
do anything harmful, the code below will only use the first device. It 
may be worth at least mentioning this in documentation.

> +
>  			ret = -EINVAL;
>  			goto out;
>  		case Opt_err:
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 64dba5c4cf33..3a9c3804ff17 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -5220,6 +5220,16 @@ static int find_live_mirror(struct
> btrfs_fs_info *fs_info,
>  		num_stripes = map->num_stripes;
> 
>  	switch(fs_info->read_mirror_policy) {
> +	case BTRFS_READ_MIRROR_BY_DEV:
> +		preferred_mirror = first;
> +		if (test_bit(BTRFS_DEV_STATE_READ_MIRROR,
> +			     &map->stripes[preferred_mirror].dev->dev_state))
> +			break;
> +		if (test_bit(BTRFS_DEV_STATE_READ_MIRROR,
> +			     &map->stripes[++preferred_mirror].dev->dev_state))
> +			break;
> +		preferred_mirror = first;

Why set preferred_mirror again? The only effect of re-setting it will be 
to use the lowest devid (1) rather than the highest (2). Is there any 
difference? Either way it should never happen, because the above code 
traps it (except when the BTRFS_DEV_STATE_READ_MIRROR flag has been 
changed as mentioned above).

 From Goffredo's comment[1] on Timofey's similar effect patch, if it 
becomes possible to have more mirrors in a RAID1/RAID10 fs then this 
code will need to be updated to test dev_state for more than two 
devices. Would it be sensible to implement this as a for loop straight 
away?

> +		break;
>  	case BTRFS_READ_MIRROR_DEFAULT:
>  	case BTRFS_READ_MIRROR_BY_PID:
>  	default:
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 953df9925832..55b2eebbf183 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -37,6 +37,7 @@ struct btrfs_pending_bios {
>  enum btrfs_read_mirror_type {
>  	BTRFS_READ_MIRROR_DEFAULT,
>  	BTRFS_READ_MIRROR_BY_PID,
> +	BTRFS_READ_MIRROR_BY_DEV,
>  };
> 
>  #define BTRFS_DEV_STATE_WRITEABLE	(0)
> @@ -44,6 +45,7 @@ enum btrfs_read_mirror_type {
>  #define BTRFS_DEV_STATE_MISSING		(2)
>  #define BTRFS_DEV_STATE_REPLACE_TGT	(3)
>  #define BTRFS_DEV_STATE_FLUSH_SENT	(4)
> +#define BTRFS_DEV_STATE_READ_MIRROR	(5)
> 
>  struct btrfs_device {
>  	struct list_head dev_list;

Happy to add Tested-By on patch 1/3 too, but I haven't looked at 3/3 
yet.

[1]: https://patchwork.kernel.org/patch/10678531/#22315779

-- 
Steven Davies

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/3] btrfs: add read_mirror_policy parameter devid
  2019-01-21 11:56   ` Steven Davies
@ 2019-01-22 13:43     ` Anand Jain
  2019-01-22 14:28       ` Steven Davies
  0 siblings, 1 reply; 15+ messages in thread
From: Anand Jain @ 2019-01-22 13:43 UTC (permalink / raw)
  To: Steven Davies, linux-btrfs



On 01/21/2019 07:56 PM, Steven Davies wrote:
> On 2018-05-16 11:03, Anand Jain wrote:
> 
> Going back to an old patchset I was testing this weekend:
> 
>> Adds the mount option:
>>   mount -o read_mirror_policy=<devid>
>>
>> To set the devid of the device which should be used for read. That
>> means all the normal reads will go to that particular device only.
>>
>> This also helps testing and gives a better control for the test
>> scripts including mount context reads.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> Tested-By: Steven Davies <btrfs-list@steev.me.uk>

Thanks for testing.

> 
>> ---
>>  fs/btrfs/super.c   | 21 +++++++++++++++++++++
>>  fs/btrfs/volumes.c | 10 ++++++++++
>>  fs/btrfs/volumes.h |  2 ++
>>  3 files changed, 33 insertions(+)
>>
>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>> index 21eff0ac1e4f..7ddecf4178a6 100644
>> --- a/fs/btrfs/super.c
>> +++ b/fs/btrfs/super.c
>> @@ -852,6 +852,27 @@ int btrfs_parse_options(struct btrfs_fs_info
>> *info, char *options,
>>                      BTRFS_READ_MIRROR_BY_PID;
>>                  break;
>>              }
>> +
>> +            intarg = 0;
>> +            if (match_int(&args[0], &intarg) == 0) {
>> +                struct btrfs_device *device;
>> +
>> +                device = btrfs_find_device(info, intarg,
>> +                               NULL, NULL);
>> +                if (!device) {
>> +                    btrfs_err(info,
>> +                      "read_mirror_policy: invalid devid %d",
>> +                      intarg);
>> +                    ret = -EINVAL;
>> +                    goto out;
>> +                }
>> +                info->read_mirror_policy =
>> +                        BTRFS_READ_MIRROR_BY_DEV;
>> +                set_bit(BTRFS_DEV_STATE_READ_MIRROR,
>> +                    &device->dev_state);
> 
> Not an expert, but might this be overwritten with another state? e.g. 
> BTRFS_DEV_STATE_REPLACE_TGT. The device would then lose its READ_MIRROR 
> flag and the fs would always use the first device for reading.

  It won't it is defined as bitmap.

>> +                break;
>> +            }
> 
> I noticed that it's possible to pass this option multiple times at 
> mount, which sets multiple devices as read mirrors. While that doesn't 
> do anything harmful, the code below will only use the first device. It 
> may be worth at least mentioning this in documentation.

  There were few feedback if read_mirror_policy should be a mount
  option or a sysfs or a property. IMO property is better as it would
  be persistent. In sysfs and mount-option, the user or a config file
  has to remember. Will fix.

>> +
>>              ret = -EINVAL;
>>              goto out;
>>          case Opt_err:
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 64dba5c4cf33..3a9c3804ff17 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -5220,6 +5220,16 @@ static int find_live_mirror(struct
>> btrfs_fs_info *fs_info,
>>          num_stripes = map->num_stripes;
>>
>>      switch(fs_info->read_mirror_policy) {
>> +    case BTRFS_READ_MIRROR_BY_DEV:
>> +        preferred_mirror = first;
>> +        if (test_bit(BTRFS_DEV_STATE_READ_MIRROR,
>> +                 &map->stripes[preferred_mirror].dev->dev_state))
>> +            break;
>> +        if (test_bit(BTRFS_DEV_STATE_READ_MIRROR,
>> +                 &map->stripes[++preferred_mirror].dev->dev_state)) <--[*]
>> +            break;
>> +        preferred_mirror = first;
> 
> Why set preferred_mirror again? The only effect of re-setting it will be 
> to use the lowest devid (1) rather than the highest (2). Is there any 
> difference? Either way it should never happen, because the above code 
> traps it (except when the BTRFS_DEV_STATE_READ_MIRROR flag has been 
> changed as mentioned above).

  Code at [*] above does ++preferred_mirror. So the following
    preferred_mirror = first;
  is not redundant.

>  From Goffredo's comment[1] on Timofey's similar effect patch, if it 
> becomes possible to have more mirrors in a RAID1/RAID10 fs then this 
> code will need to be updated to test dev_state for more than two 
> devices. Would it be sensible to implement this as a for loop straight 
> away?

  Right. A loop is better, will add.

>> +        break;
>>      case BTRFS_READ_MIRROR_DEFAULT:
>>      case BTRFS_READ_MIRROR_BY_PID:
>>      default:
>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>> index 953df9925832..55b2eebbf183 100644
>> --- a/fs/btrfs/volumes.h
>> +++ b/fs/btrfs/volumes.h
>> @@ -37,6 +37,7 @@ struct btrfs_pending_bios {
>>  enum btrfs_read_mirror_type {
>>      BTRFS_READ_MIRROR_DEFAULT,
>>      BTRFS_READ_MIRROR_BY_PID,
>> +    BTRFS_READ_MIRROR_BY_DEV,
>>  };
>>
>>  #define BTRFS_DEV_STATE_WRITEABLE    (0)
>> @@ -44,6 +45,7 @@ enum btrfs_read_mirror_type {
>>  #define BTRFS_DEV_STATE_MISSING        (2)
>>  #define BTRFS_DEV_STATE_REPLACE_TGT    (3)
>>  #define BTRFS_DEV_STATE_FLUSH_SENT    (4)
>> +#define BTRFS_DEV_STATE_READ_MIRROR    (5)
>>
>>  struct btrfs_device {
>>      struct list_head dev_list;
> 
> Happy to add Tested-By on patch 1/3 too, but I haven't looked at 3/3 yet.
> 
> [1]: https://patchwork.kernel.org/patch/10678531/#22315779


  Thanks,
Anand.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/3] btrfs: add read_mirror_policy parameter devid
  2019-01-22 13:43     ` Anand Jain
@ 2019-01-22 14:28       ` Steven Davies
  0 siblings, 0 replies; 15+ messages in thread
From: Steven Davies @ 2019-01-22 14:28 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

On 2019-01-22 13:43, Anand Jain wrote:
> On 01/21/2019 07:56 PM, Steven Davies wrote:
>> On 2018-05-16 11:03, Anand Jain wrote:

>>> +                break;
>>> +            }
>> 
>> I noticed that it's possible to pass this option multiple times at 
>> mount, which sets multiple devices as read mirrors. While that doesn't 
>> do anything harmful, the code below will only use the first device. It 
>> may be worth at least mentioning this in documentation.
> 
>  There were few feedback if read_mirror_policy should be a mount
>  option or a sysfs or a property. IMO property is better as it would
>  be persistent. In sysfs and mount-option, the user or a config file
>  has to remember. Will fix.

I agree. Would/could this be set at mkfs time? It would then be similar 
to how mdadm's --write-mostly is set up.

>>> +
>>>              ret = -EINVAL;
>>>              goto out;
>>>          case Opt_err:
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 64dba5c4cf33..3a9c3804ff17 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -5220,6 +5220,16 @@ static int find_live_mirror(struct
>>> btrfs_fs_info *fs_info,
>>>          num_stripes = map->num_stripes;
>>> 
>>>      switch(fs_info->read_mirror_policy) {
>>> +    case BTRFS_READ_MIRROR_BY_DEV:
>>> +        preferred_mirror = first;
>>> +        if (test_bit(BTRFS_DEV_STATE_READ_MIRROR,
>>> +                 &map->stripes[preferred_mirror].dev->dev_state))
>>> +            break;
>>> +        if (test_bit(BTRFS_DEV_STATE_READ_MIRROR,
>>> +                 &map->stripes[++preferred_mirror].dev->dev_state)) 
>>> <--[*]
>>> +            break;
>>> +        preferred_mirror = first;
>> 
>> Why set preferred_mirror again? The only effect of re-setting it will 
>> be to use the lowest devid (1) rather than the highest (2). Is there 
>> any difference? Either way it should never happen, because the above 
>> code traps it (except when the BTRFS_DEV_STATE_READ_MIRROR flag has 
>> been changed as mentioned above).
> 
>  Code at [*] above does ++preferred_mirror. So the following
>    preferred_mirror = first;
>  is not redundant.

Yes, but preferred_mirror will be first + num_stripes; meaning that 
without that line the last stripe is preferred and with it the first 
stripe is preferred. It only changes the fallback case from reading from 
devid 2 to devid 1, which barely matters. Perhaps it would be even 
better to fall back to pid % num_stripes in this case anyway?

>>  From Goffredo's comment[1] on Timofey's similar effect patch, if it 
>> becomes possible to have more mirrors in a RAID1/RAID10 fs then this 
>> code will need to be updated to test dev_state for more than two 
>> devices. Would it be sensible to implement this as a for loop straight 
>> away?
> 
>  Right. A loop is better, will add.

Thinking more about this, if it becomes possible to have more than two 
devices in a RAID1 or part-RAID10 then do we also need to consider that 
there may be more than one READ_MIRROR drive? e.g. slow+fast+fast drives 
in a RAID1 with this approach would result in all chunks being read from 
the first drive with READ_MIRROR set if both chunks are on the fast 
drives. This is where Timofey's queue length patch in [1] would become 
useful - in any case, that is an existing problem and probably deserving 
of an entirely new patch.

>> [1]: https://patchwork.kernel.org/patch/10678531/#22315779
> 
> 
>  Thanks,
> Anand.

Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-01-22 14:28 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-16 10:03 [PATCH v2 0/3] btrfs: add read mirror policy Anand Jain
2018-05-16 10:03 ` [PATCH v2 1/3] btrfs: add mount option read_mirror_policy Anand Jain
2018-05-16 10:03 ` [PATCH v2 2/3] btrfs: add read_mirror_policy parameter devid Anand Jain
2019-01-21 11:56   ` Steven Davies
2019-01-22 13:43     ` Anand Jain
2019-01-22 14:28       ` Steven Davies
2018-05-16 10:03 ` [PATCH v2 3/3] btrfs: read_mirror_policy ability to reset Anand Jain
2018-05-16 22:35 ` [PATCH v2 0/3] btrfs: add read mirror policy David Sterba
2018-05-17  2:32   ` Anand Jain
2018-05-17 12:25     ` Austin S. Hemmelgarn
2018-05-17 14:46       ` Jeff Mahoney
2018-05-18  8:06         ` Anand Jain
2018-05-18 12:36           ` Austin S. Hemmelgarn
2018-05-17 14:46   ` Jeff Mahoney
2018-05-17 15:22     ` Austin S. Hemmelgarn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).