All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH V6] btrfs: allocation_hint mode
@ 2021-02-01 21:28 Goffredo Baroncelli
  2021-02-01 21:28 ` [PATCH 1/5] btrfs: add ioctl BTRFS_IOC_DEV_PROPERTIES Goffredo Baroncelli
                   ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: Goffredo Baroncelli @ 2021-02-01 21:28 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Zygo Blaxell, Josef Bacik


Hi all,

the previous V5 serie was called "btrfs: preferred_metadata: preferred device
for metadata".

This patches set was born after some discussion between me, Zygo and Josef.
Some details can be found in https://github.com/btrfs/btrfs-todo/issues/19.

Some further information about a real use case can be found in 
https://lore.kernel.org/linux-btrfs/20210116002533.GE31381@hungrycats.org/

The idea behind this patches set, is to dedicate some disks (the fastest one)
to the metadata chunk. My initial idea was a "soft" hint. However Zygo
asked an option for a "strong" hint (== mandatory). The result is that
each disk can be "tagged" by one of the following flags:
- BTRFS_DEV_ALLOCATION_METADATA_ONLY
- BTRFS_DEV_ALLOCATION_PREFERRED_METADATA
- BTRFS_DEV_ALLOCATION_PREFERRED_DATA
- BTRFS_DEV_ALLOCATION_DATA_ONLY

When the chunk allocator search a disks to allocate a chunk, scans the disks
in an order decided by these tags. For metadata, the order is:
*_METADATA_ONLY
*_PREFERRED_METADATA
*_PREFERRED_DATA

The *_DATA_ONLY are not eligible from metadata chunk allocation.

For the data chunk, the order is reversed, and the *_METADATA_ONLY are
excluded.

The exact sort logic is to sort first for the "tag", and then for the space
available. If there is no space available, the next "tag" disks set are
selected.

To set these tags, a new property called "allocation_hint" was created.
There is a dedicated btrfs-prog patches set [[PATCH V5] btrfs-progs:
allocation_hint disk property].

To enable this new allocation mode, the filesystem has to be mounted
each time using the option "allocation_hint=1". This option can be changed
on the fly using the "remount" mount option.

By default this mode is disabled. 

$ sudo mount -o allocation_hint=1 /dev/loop0 /mnt/test-btrfs/
$ for i in /dev/loop[0-9]; do sudo ./btrfs prop get $i allocation_hint; done
devid=1, path=/dev/loop0: allocation_hint=PREFERRED_METADATA
devid=2, path=/dev/loop1: allocation_hint=PREFERRED_METADATA
devid=3, path=/dev/loop2: allocation_hint=PREFERRED_DATA
devid=4, path=/dev/loop3: allocation_hint=PREFERRED_DATA
devid=5, path=/dev/loop4: allocation_hint=PREFERRED_DATA
devid=6, path=/dev/loop5: allocation_hint=DATA_ONLY
devid=7, path=/dev/loop6: allocation_hint=METADATA_ONLY
devid=8, path=/dev/loop7: allocation_hint=METADATA_ONLY

$ sudo ./btrfs fi us /mnt/test-btrfs/
Overall:
    Device size:		   2.75GiB
    Device allocated:		   1.34GiB
    Device unallocated:		   1.41GiB
    Device missing:		     0.00B
    Used:			 400.89MiB
    Free (estimated):		   1.04GiB	(min: 1.04GiB)
    Data ratio:			      2.00
    Metadata ratio:		      1.00
    Global reserve:		   3.25MiB	(used: 0.00B)
    Multiple profiles:		        no

Data,RAID1: Size:542.00MiB, Used:200.25MiB (36.95%)
   /dev/loop0	 288.00MiB
   /dev/loop1	 288.00MiB
   /dev/loop2	 127.00MiB
   /dev/loop3	 127.00MiB
   /dev/loop4	 127.00MiB
   /dev/loop5	 127.00MiB

Metadata,single: Size:256.00MiB, Used:384.00KiB (0.15%)
   /dev/loop1	 256.00MiB

System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
   /dev/loop0	  32.00MiB

Unallocated:
   /dev/loop0	 704.00MiB
   /dev/loop1	 480.00MiB
   /dev/loop2	   1.00MiB
   /dev/loop3	   1.00MiB
   /dev/loop4	   1.00MiB
   /dev/loop5	   1.00MiB
   /dev/loop6	 128.00MiB
   /dev/loop7	 128.00MiB

# change the tag of some disks

$ sudo ./btrfs prop set /dev/loop0 allocation_hint DATA_ONLY
$ sudo ./btrfs prop set /dev/loop1 allocation_hint DATA_ONLY
$ sudo ./btrfs prop set /dev/loop5 allocation_hint METADATA_ONLY

$ for i in /dev/loop[0-9]; do sudo ./btrfs prop get $i allocation_hint; done
devid=1, path=/dev/loop0: allocation_hint=DATA_ONLY
devid=2, path=/dev/loop1: allocation_hint=DATA_ONLY
devid=3, path=/dev/loop2: allocation_hint=PREFERRED_DATA
devid=4, path=/dev/loop3: allocation_hint=PREFERRED_DATA
devid=5, path=/dev/loop4: allocation_hint=PREFERRED_DATA
devid=6, path=/dev/loop5: allocation_hint=METADATA_ONLY
devid=7, path=/dev/loop6: allocation_hint=METADATA_ONLY
devid=8, path=/dev/loop7: allocation_hint=METADATA_ONLY

$ sudo btrfs bal start --full-balance /mnt/test-btrfs/
$ sudo ./btrfs fi us /mnt/test-btrfs/
Overall:
    Device size:		   2.75GiB
    Device allocated:		 735.00MiB
    Device unallocated:		   2.03GiB
    Device missing:		     0.00B
    Used:			 400.72MiB
    Free (estimated):		   1.10GiB	(min: 1.10GiB)
    Data ratio:			      2.00
    Metadata ratio:		      1.00
    Global reserve:		   3.25MiB	(used: 0.00B)
    Multiple profiles:		        no

Data,RAID1: Size:288.00MiB, Used:200.19MiB (69.51%)
   /dev/loop0	 288.00MiB
   /dev/loop1	 288.00MiB

Metadata,single: Size:127.00MiB, Used:336.00KiB (0.26%)
   /dev/loop5	 127.00MiB

System,single: Size:32.00MiB, Used:16.00KiB (0.05%)
   /dev/loop7	  32.00MiB

Unallocated:
   /dev/loop0	 736.00MiB
   /dev/loop1	 736.00MiB
   /dev/loop2	 128.00MiB
   /dev/loop3	 128.00MiB
   /dev/loop4	 128.00MiB
   /dev/loop5	   1.00MiB
   /dev/loop6	 128.00MiB
   /dev/loop7	  96.00MiB


#As you can see all the metadata were placed on the disk loop5/loop7 even if
#the most empty one are loop0 and loop1.



TODO:
- more tests
- the tool which show the space available should consider the tagging (
  the disks tagged by _METADATA_ONLY should be excluded from the data
  availability)


Comments are welcome
BR
G.Baroncelli

--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/5] btrfs: add ioctl BTRFS_IOC_DEV_PROPERTIES.
  2021-02-01 21:28 [RFC][PATCH V6] btrfs: allocation_hint mode Goffredo Baroncelli
@ 2021-02-01 21:28 ` Goffredo Baroncelli
  2021-02-10 16:08   ` Josef Bacik
  2021-02-01 21:28 ` [PATCH 2/5] btrfs: add flags to give an hint to the chunk allocator Goffredo Baroncelli
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Goffredo Baroncelli @ 2021-02-01 21:28 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Zygo Blaxell, Josef Bacik, Goffredo Baroncelli

From: Goffredo Baroncelli <kreijack@inwind.it>

This ioctl is a base for returning / setting information from / to  the
fields of the btrfs_dev_item object.

For now only the "type" field is returned / set.

Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
---
 fs/btrfs/ioctl.c           | 67 ++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.c         |  2 +-
 fs/btrfs/volumes.h         |  2 ++
 include/uapi/linux/btrfs.h | 40 +++++++++++++++++++++++
 4 files changed, 110 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 703212ff50a5..9e67741fa966 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4842,6 +4842,71 @@ static int btrfs_ioctl_set_features(struct file *file, void __user *arg)
 	return ret;
 }
 
+static long btrfs_ioctl_dev_properties(struct file *file,
+						void __user *argp)
+{
+	struct inode *inode = file_inode(file);
+	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	struct btrfs_ioctl_dev_properties dev_props;
+	struct btrfs_device	*device;
+        struct btrfs_root *root = fs_info->chunk_root;
+        struct btrfs_trans_handle *trans;
+	int ret;
+	u64 prev_type;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (copy_from_user(&dev_props, argp, sizeof(dev_props)))
+		return -EFAULT;
+
+	device = btrfs_find_device(fs_info->fs_devices, dev_props.devid,
+				NULL, NULL);
+	if (!device) {
+		btrfs_info(fs_info, "change_dev_properties: unable to find device %llu",
+			   dev_props.devid);
+		return -ENODEV;
+	}
+
+	if (dev_props.properties & BTRFS_DEV_PROPERTY_READ) {
+		u64 props = dev_props.properties;
+		memset(&dev_props, 0, sizeof(dev_props));
+		if (props & BTRFS_DEV_PROPERTY_TYPE) {
+			dev_props.properties = BTRFS_DEV_PROPERTY_TYPE;
+			dev_props.type = device->type;
+		}
+		if(copy_to_user(argp, &dev_props, sizeof(dev_props)))
+			return -EFAULT;
+		return 0;
+	}
+
+	/* it is possible to set only BTRFS_DEV_PROPERTY_TYPE for now */
+	if (dev_props.properties & ~(BTRFS_DEV_PROPERTY_TYPE))
+		return -EPERM;
+
+	trans = btrfs_start_transaction(root, 0);
+        if (IS_ERR(trans))
+                return PTR_ERR(trans);
+
+	prev_type = device->type;
+	device->type = dev_props.type;
+	ret = btrfs_update_device(trans, device);
+
+        if (ret < 0) {
+                btrfs_abort_transaction(trans, ret);
+                btrfs_end_transaction(trans);
+		device->type = prev_type;
+		return  ret;
+        }
+
+        ret = btrfs_commit_transaction(trans);
+	if (ret < 0)
+		device->type = prev_type;
+
+	return ret;
+
+}
+
 static int _btrfs_ioctl_send(struct file *file, void __user *argp, bool compat)
 {
 	struct btrfs_ioctl_send_args *arg;
@@ -5025,6 +5090,8 @@ long btrfs_ioctl(struct file *file, unsigned int
 		return btrfs_ioctl_get_subvol_rootref(file, argp);
 	case BTRFS_IOC_INO_LOOKUP_USER:
 		return btrfs_ioctl_ino_lookup_user(file, argp);
+	case BTRFS_IOC_DEV_PROPERTIES:
+		return btrfs_ioctl_dev_properties(file, argp);
 	}
 
 	return -ENOTTY;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ee086fc56c30..68b346c5465d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2744,7 +2744,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	return ret;
 }
 
-static noinline int btrfs_update_device(struct btrfs_trans_handle *trans,
+int btrfs_update_device(struct btrfs_trans_handle *trans,
 					struct btrfs_device *device)
 {
 	int ret;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 1997a4649a66..d776b7f55d56 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -595,5 +595,7 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info,
 int btrfs_bg_type_to_factor(u64 flags);
 const char *btrfs_bg_type_to_raid_name(u64 flags);
 int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info);
+int btrfs_update_device(struct btrfs_trans_handle *trans,
+                                        struct btrfs_device *device);
 
 #endif
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index 5df73001aad4..e6caef42837a 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -860,6 +860,44 @@ struct btrfs_ioctl_get_subvol_rootref_args {
 		__u8 align[7];
 };
 
+#define BTRFS_DEV_PROPERTY_TYPE		(1ULL << 0)
+#define BTRFS_DEV_PROPERTY_DEV_GROUP	(1ULL << 1)
+#define BTRFS_DEV_PROPERTY_SEEK_SPEED	(1ULL << 2)
+#define BTRFS_DEV_PROPERTY_BANDWIDTH	(1ULL << 3)
+#define BTRFS_DEV_PROPERTY_READ		(1ULL << 60)
+
+/*
+ * The ioctl BTRFS_IOC_DEV_PROPERTIES can read and write the device properties.
+ *
+ * The properties that the user want to write have to be set
+ * in the 'properties' field using the BTRFS_DEV_PROPERTY_xxxx constants.
+ *
+ * If the ioctl is used to read the device properties, the bit
+ * BTRFS_DEV_PROPERTY_READ has to be set in the 'properties' field.
+ * In this case the properties that the user want have to be set in the
+ * 'properties' field. The kernel doesn't return a property that was not
+ * required, however it may return a subset of the requested properties.
+ * The returned properties have the corrispondent BTRFS_DEV_PROPERTY_xxxx
+ * flag set in the 'properties' field.
+ *
+ * Up to 2020/05/11 the only properties that can be read/write is the 'type'
+ * one.
+ */
+struct btrfs_ioctl_dev_properties {
+	__u64	devid;
+	__u64	properties;
+	__u64	type;
+	__u32	dev_group;
+	__u8	seek_speed;
+	__u8	bandwidth;
+
+	/*
+	 * for future expansion
+	 */
+	__u8	unused1[2];
+	__u64	unused2[4];
+};
+
 /* Error codes as returned by the kernel */
 enum btrfs_err_code {
 	BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET = 1,
@@ -988,5 +1026,7 @@ enum btrfs_err_code {
 				struct btrfs_ioctl_ino_lookup_user_args)
 #define BTRFS_IOC_SNAP_DESTROY_V2 _IOW(BTRFS_IOCTL_MAGIC, 63, \
 				struct btrfs_ioctl_vol_args_v2)
+#define BTRFS_IOC_DEV_PROPERTIES _IOW(BTRFS_IOCTL_MAGIC, 64, \
+				struct btrfs_ioctl_dev_properties)
 
 #endif /* _UAPI_LINUX_BTRFS_H */
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/5] btrfs: add flags to give an hint to the chunk allocator
  2021-02-01 21:28 [RFC][PATCH V6] btrfs: allocation_hint mode Goffredo Baroncelli
  2021-02-01 21:28 ` [PATCH 1/5] btrfs: add ioctl BTRFS_IOC_DEV_PROPERTIES Goffredo Baroncelli
@ 2021-02-01 21:28 ` Goffredo Baroncelli
  2021-02-10 16:09   ` Josef Bacik
  2021-02-01 21:28 ` [PATCH 3/5] btrfs: export dev_item.type in /sys/fs/btrfs/<uuid>/devinfo/<devid>/type Goffredo Baroncelli
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Goffredo Baroncelli @ 2021-02-01 21:28 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Zygo Blaxell, Josef Bacik, Goffredo Baroncelli

From: Goffredo Baroncelli <kreijack@inwind.it>

Add the following flags to give an hint about which chunk should be
allocated in which a disk.
The following flags are created:

- BTRFS_DEV_ALLOCATION_PREFERRED_DATA
  preferred data chunk, but metadata chunk allowed
- BTRFS_DEV_ALLOCATION_PREFERRED_METADATA
  preferred metadata chunk, but data chunk allowed
- BTRFS_DEV_ALLOCATION_METADATA_ONLY
  only metadata chunk allowed
- BTRFS_DEV_ALLOCATION_DATA_ONLY
  only data chunk allowed

Signed-off-by: Goffredo Baroncelli <kreijack@inwid.it>
---
 include/uapi/linux/btrfs_tree.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index 58d7cff9afb1..bd3af853df0c 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -361,6 +361,24 @@ struct btrfs_key {
 	__u64 offset;
 } __attribute__ ((__packed__));
 
+/* dev_item.type */
+
+/* btrfs chunk allocation hints */
+#define BTRFS_DEV_ALLOCATION_MASK_BIT_COUNT	3
+#define BTRFS_DEV_ALLOCATION_MASK ((1ULL << \
+		BTRFS_DEV_ALLOCATION_MASK_BIT_COUNT) -1)
+#define BTRFS_DEV_ALLOCATION_MASK_COUNT (1ULL << \
+		BTRFS_DEV_ALLOCATION_MASK_BIT_COUNT)
+/* preferred data chunk, but metadata chunk allowed */
+#define BTRFS_DEV_ALLOCATION_PREFERRED_DATA	(0ULL)
+/* preferred metadata chunk, but data chunk allowed */
+#define BTRFS_DEV_ALLOCATION_PREFERRED_METADATA	(1ULL)
+/* only metadata chunk are allowed */
+#define BTRFS_DEV_ALLOCATION_METADATA_ONLY	(2ULL)
+/* only data chunk allowed */
+#define BTRFS_DEV_ALLOCATION_DATA_ONLY		(3ULL)
+/* 5..7 are unused values */
+
 struct btrfs_dev_item {
 	/* the internal btrfs device id */
 	__le64 devid;
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/5] btrfs: export dev_item.type in /sys/fs/btrfs/<uuid>/devinfo/<devid>/type
  2021-02-01 21:28 [RFC][PATCH V6] btrfs: allocation_hint mode Goffredo Baroncelli
  2021-02-01 21:28 ` [PATCH 1/5] btrfs: add ioctl BTRFS_IOC_DEV_PROPERTIES Goffredo Baroncelli
  2021-02-01 21:28 ` [PATCH 2/5] btrfs: add flags to give an hint to the chunk allocator Goffredo Baroncelli
@ 2021-02-01 21:28 ` Goffredo Baroncelli
  2021-02-01 21:28 ` [PATCH 4/5] btrfs: add allocation_hint option Goffredo Baroncelli
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Goffredo Baroncelli @ 2021-02-01 21:28 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Zygo Blaxell, Josef Bacik, Goffredo Baroncelli

From: Goffredo Baroncelli <kreijack@inwind.it>

Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
---
 fs/btrfs/sysfs.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 19b9fffa2c9c..594e8445fe21 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -1416,11 +1416,22 @@ static ssize_t btrfs_devinfo_writeable_show(struct kobject *kobj,
 }
 BTRFS_ATTR(devid, writeable, btrfs_devinfo_writeable_show);
 
+static ssize_t btrfs_devinfo_type_show(struct kobject *kobj,
+					    struct kobj_attribute *a, char *buf)
+{
+	struct btrfs_device *device = container_of(kobj, struct btrfs_device,
+						   devid_kobj);
+
+	return scnprintf(buf, PAGE_SIZE, "0x%08llx\n",device->type);
+}
+BTRFS_ATTR(devid, type, btrfs_devinfo_type_show);
+
 static struct attribute *devid_attrs[] = {
 	BTRFS_ATTR_PTR(devid, in_fs_metadata),
 	BTRFS_ATTR_PTR(devid, missing),
 	BTRFS_ATTR_PTR(devid, replace_target),
 	BTRFS_ATTR_PTR(devid, writeable),
+	BTRFS_ATTR_PTR(devid, type),
 	NULL
 };
 ATTRIBUTE_GROUPS(devid);
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/5] btrfs: add allocation_hint option.
  2021-02-01 21:28 [RFC][PATCH V6] btrfs: allocation_hint mode Goffredo Baroncelli
                   ` (2 preceding siblings ...)
  2021-02-01 21:28 ` [PATCH 3/5] btrfs: export dev_item.type in /sys/fs/btrfs/<uuid>/devinfo/<devid>/type Goffredo Baroncelli
@ 2021-02-01 21:28 ` Goffredo Baroncelli
  2021-02-10 16:14   ` Josef Bacik
  2021-02-01 21:28 ` [PATCH 5/5] btrfs: add allocator_hint mode Goffredo Baroncelli
  2021-02-10 16:04 ` [RFC][PATCH V6] btrfs: allocation_hint mode Josef Bacik
  5 siblings, 1 reply; 21+ messages in thread
From: Goffredo Baroncelli @ 2021-02-01 21:28 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Zygo Blaxell, Josef Bacik, Goffredo Baroncelli

From: Goffredo Baroncelli <kreijack@inwind.it>

Add allocation_hint mount option. This option accepts the following values:

- 0 (default):  the chunks allocator ignores the disk hints
- 1:            the chunks allocator considers the disk hints

Signed-off-by: Goffredo Baroncelli <kreijack@winwind.it>
---
 fs/btrfs/ctree.h   | 12 ++++++++++++
 fs/btrfs/disk-io.c |  2 ++
 fs/btrfs/super.c   | 17 +++++++++++++++++
 3 files changed, 31 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 1d3c1e479f3d..5cd6d658f157 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -570,6 +570,15 @@ enum btrfs_exclusive_operation {
 	BTRFS_EXCLOP_SWAP_ACTIVATE,
 };
 
+/*
+ * allocation_hint mode
+ */
+
+enum btrfs_allocation_hint_modes {
+	BTRFS_ALLOCATION_HINT_DISABLED,
+	BTRFS_ALLOCATION_HINT_ENABLED
+};
+
 struct btrfs_fs_info {
 	u8 chunk_tree_uuid[BTRFS_UUID_SIZE];
 	unsigned long flags;
@@ -961,6 +970,9 @@ struct btrfs_fs_info {
 		u64 zoned;
 	};
 
+	/* allocation_hint mode */
+	int allocation_hint_mode;
+
 	/* Max size to emit ZONE_APPEND write command */
 	u64 max_zone_append_size;
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 765deefda92b..1edd219c347c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2794,6 +2794,8 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 	fs_info->swapfile_pins = RB_ROOT;
 
 	fs_info->send_in_progress = 0;
+
+	fs_info->allocation_hint_mode = BTRFS_ALLOCATION_HINT_DISABLED;
 }
 
 static int init_mount_fs_info(struct btrfs_fs_info *fs_info, struct super_block *sb)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 022f20810089..d0c69c950cd9 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -359,6 +359,7 @@ enum {
 	Opt_thread_pool,
 	Opt_treelog, Opt_notreelog,
 	Opt_user_subvol_rm_allowed,
+	Opt_allocation_hint,
 
 	/* Rescue options */
 	Opt_rescue,
@@ -432,6 +433,7 @@ static const match_table_t tokens = {
 	{Opt_treelog, "treelog"},
 	{Opt_notreelog, "notreelog"},
 	{Opt_user_subvol_rm_allowed, "user_subvol_rm_allowed"},
+	{Opt_allocation_hint, "allocation_hint=%d"},
 
 	/* Rescue options */
 	{Opt_rescue, "rescue=%s"},
@@ -889,6 +891,19 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
 		case Opt_user_subvol_rm_allowed:
 			btrfs_set_opt(info->mount_opt, USER_SUBVOL_RM_ALLOWED);
 			break;
+		case Opt_allocation_hint:
+			ret = match_int(&args[0], &intarg);
+			if (ret || (intarg != 1 && intarg != 0)) {
+				btrfs_err(info, "invalid allocation_hint= parameter\n");
+				ret = -EINVAL;
+				goto out;
+			}
+			if (intarg)
+				btrfs_info(info, "allocation_hint enabled");
+			else
+				btrfs_info(info, "allocation_hint disabled");
+			info->allocation_hint_mode = intarg;
+			break;
 		case Opt_enospc_debug:
 			btrfs_set_opt(info->mount_opt, ENOSPC_DEBUG);
 			break;
@@ -1495,6 +1510,8 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry)
 		seq_puts(seq, ",clear_cache");
 	if (btrfs_test_opt(info, USER_SUBVOL_RM_ALLOWED))
 		seq_puts(seq, ",user_subvol_rm_allowed");
+	if (info->allocation_hint_mode)
+		seq_puts(seq, ",allocation_hint=1");
 	if (btrfs_test_opt(info, ENOSPC_DEBUG))
 		seq_puts(seq, ",enospc_debug");
 	if (btrfs_test_opt(info, AUTO_DEFRAG))
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/5] btrfs: add allocator_hint mode
  2021-02-01 21:28 [RFC][PATCH V6] btrfs: allocation_hint mode Goffredo Baroncelli
                   ` (3 preceding siblings ...)
  2021-02-01 21:28 ` [PATCH 4/5] btrfs: add allocation_hint option Goffredo Baroncelli
@ 2021-02-01 21:28 ` Goffredo Baroncelli
  2021-02-04 23:24   ` Zygo Blaxell
                     ` (2 more replies)
  2021-02-10 16:04 ` [RFC][PATCH V6] btrfs: allocation_hint mode Josef Bacik
  5 siblings, 3 replies; 21+ messages in thread
From: Goffredo Baroncelli @ 2021-02-01 21:28 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Zygo Blaxell, Josef Bacik, Goffredo Baroncelli

From: Goffredo Baroncelli <kreijack@inwind.it>

When this mode is enabled, the chunk allocation policy is modified as follow.

Each disk may have a different tag:
- BTRFS_DEV_ALLOCATION_PREFERRED_METADATA
- BTRFS_DEV_ALLOCATION_METADATA_ONLY
- BTRFS_DEV_ALLOCATION_DATA_ONLY
- BTRFS_DEV_ALLOCATION_PREFERRED_DATA (default)

Where:
- ALLOCATION_PREFERRED_X means that it is preferred to use this disk for the
X chunk type (the other type may be allowed when the space is low)
- ALLOCATION_X_ONLY means that it is used *only* for the X chunk type. This
means also that it is a preferred choice.

Each time the allocator allocates a chunk of type X , first it takes the disks
tagged as ALLOCATION_X_ONLY or ALLOCATION_PREFERRED_X; if the space is not
enough, it uses also the disks tagged as ALLOCATION_METADATA_ONLY; if the space
is not enough, it uses also the other disks, with the exception of the one
marked as ALLOCATION_PREFERRED_Y, where Y the other type of chunk (i.e. not X).

Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
---
 fs/btrfs/volumes.c | 81 +++++++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/volumes.h |  1 +
 2 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 68b346c5465d..57ee3e2fdac0 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4806,13 +4806,18 @@ static int btrfs_add_system_chunk(struct btrfs_fs_info *fs_info,
 }
 
 /*
- * sort the devices in descending order by max_avail, total_avail
+ * sort the devices in descending order by alloc_hint,
+ * max_avail, total_avail
  */
 static int btrfs_cmp_device_info(const void *a, const void *b)
 {
 	const struct btrfs_device_info *di_a = a;
 	const struct btrfs_device_info *di_b = b;
 
+	if (di_a->alloc_hint > di_b->alloc_hint)
+		return -1;
+	if (di_a->alloc_hint < di_b->alloc_hint)
+		return 1;
 	if (di_a->max_avail > di_b->max_avail)
 		return -1;
 	if (di_a->max_avail < di_b->max_avail)
@@ -4939,6 +4944,15 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices,
 	int ndevs = 0;
 	u64 max_avail;
 	u64 dev_offset;
+	int hint;
+
+	static const char alloc_hint_map[BTRFS_DEV_ALLOCATION_MASK_COUNT] = {
+		[BTRFS_DEV_ALLOCATION_DATA_ONLY] = -1,
+		[BTRFS_DEV_ALLOCATION_PREFERRED_DATA] = 0,
+		[BTRFS_DEV_ALLOCATION_METADATA_ONLY] = 1,
+		[BTRFS_DEV_ALLOCATION_PREFERRED_METADATA] = 2
+		/* the other values are set to 0 */
+	};
 
 	/*
 	 * in the first pass through the devices list, we gather information
@@ -4991,16 +5005,81 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices,
 		devices_info[ndevs].max_avail = max_avail;
 		devices_info[ndevs].total_avail = total_avail;
 		devices_info[ndevs].dev = device;
+
+		if (((ctl->type & BTRFS_BLOCK_GROUP_DATA) &&
+		     (ctl->type & BTRFS_BLOCK_GROUP_METADATA)) ||
+		    info->allocation_hint_mode == 
+		     BTRFS_ALLOCATION_HINT_DISABLED) {
+			/*
+			 * if mixed bg or the allocator hint is
+			 * disable, set all the alloc_hint
+			 * fields to the same value, so the sorting
+			 * is not affected
+			 */
+			devices_info[ndevs].alloc_hint = 0;
+		} else if(ctl->type & BTRFS_BLOCK_GROUP_DATA) {
+			hint = device->type & BTRFS_DEV_ALLOCATION_MASK;
+
+			/*
+			 * skip BTRFS_DEV_METADATA_ONLY disks
+			 */
+			if (hint == BTRFS_DEV_ALLOCATION_METADATA_ONLY)
+				continue;
+			/*
+			 * if a data chunk must be allocated,
+			 * sort also by hint (data disk
+			 * higher priority)
+			 */
+			devices_info[ndevs].alloc_hint = -alloc_hint_map[hint];
+		} else { /* BTRFS_BLOCK_GROUP_METADATA */
+			hint = device->type & BTRFS_DEV_ALLOCATION_MASK;
+
+			/*
+			 * skip BTRFS_DEV_DATA_ONLY disks
+			 */
+			if (hint == BTRFS_DEV_ALLOCATION_DATA_ONLY)
+				continue;
+			/*
+			 * if a data chunk must be allocated,
+			 * sort also by hint (metadata hint
+			 * higher priority)
+			 */
+			devices_info[ndevs].alloc_hint = alloc_hint_map[hint];
+		}
+
 		++ndevs;
 	}
 	ctl->ndevs = ndevs;
 
+	/*
+	 * no devices available
+	 */
+	if (!ndevs)
+		return 0;
+
 	/*
 	 * now sort the devices by hole size / available space
 	 */
 	sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
 	     btrfs_cmp_device_info, NULL);
 
+	/*
+	 * select the minimum set of disks grouped by hint that
+	 * can host the chunk
+	 */
+	ndevs = 0;
+	while (ndevs < ctl->ndevs) {
+		hint = devices_info[ndevs++].alloc_hint;
+		while (devices_info[ndevs].alloc_hint == hint &&
+		       ndevs < ctl->ndevs)
+				ndevs++;
+		if (ndevs >= ctl->devs_min)
+			break;
+	}
+
+	BUG_ON(ndevs > ctl->ndevs);
+	ctl->ndevs = ndevs;
+
 	return 0;
 }
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index d776b7f55d56..31a3e4cf93b5 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -364,6 +364,7 @@ struct btrfs_device_info {
 	u64 dev_offset;
 	u64 max_avail;
 	u64 total_avail;
+	int alloc_hint;
 };
 
 struct btrfs_raid_attr {
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] btrfs: add allocator_hint mode
  2021-02-01 21:28 ` [PATCH 5/5] btrfs: add allocator_hint mode Goffredo Baroncelli
@ 2021-02-04 23:24   ` Zygo Blaxell
  2021-02-05 18:01     ` Goffredo Baroncelli
  2021-02-10 16:12   ` Josef Bacik
  2021-02-19 18:51   ` Goffredo Baroncelli
  2 siblings, 1 reply; 21+ messages in thread
From: Zygo Blaxell @ 2021-02-04 23:24 UTC (permalink / raw)
  To: Goffredo Baroncelli; +Cc: linux-btrfs, Josef Bacik, Goffredo Baroncelli

On Mon, Feb 01, 2021 at 10:28:20PM +0100, Goffredo Baroncelli wrote:
> From: Goffredo Baroncelli <kreijack@inwind.it>
> 
> When this mode is enabled, the chunk allocation policy is modified as follow.
> 
> Each disk may have a different tag:
> - BTRFS_DEV_ALLOCATION_PREFERRED_METADATA
> - BTRFS_DEV_ALLOCATION_METADATA_ONLY
> - BTRFS_DEV_ALLOCATION_DATA_ONLY
> - BTRFS_DEV_ALLOCATION_PREFERRED_DATA (default)
> 
> Where:
> - ALLOCATION_PREFERRED_X means that it is preferred to use this disk for the
> X chunk type (the other type may be allowed when the space is low)
> - ALLOCATION_X_ONLY means that it is used *only* for the X chunk type. This
> means also that it is a preferred choice.
> 
> Each time the allocator allocates a chunk of type X , first it takes the disks
> tagged as ALLOCATION_X_ONLY or ALLOCATION_PREFERRED_X; if the space is not
> enough, it uses also the disks tagged as ALLOCATION_METADATA_ONLY; if the space
> is not enough, it uses also the other disks, with the exception of the one
> marked as ALLOCATION_PREFERRED_Y, where Y the other type of chunk (i.e. not X).
> 
> Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
> ---
>  fs/btrfs/volumes.c | 81 +++++++++++++++++++++++++++++++++++++++++++++-
>  fs/btrfs/volumes.h |  1 +
>  2 files changed, 81 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 68b346c5465d..57ee3e2fdac0 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4806,13 +4806,18 @@ static int btrfs_add_system_chunk(struct btrfs_fs_info *fs_info,
>  }
>  
>  /*
> - * sort the devices in descending order by max_avail, total_avail
> + * sort the devices in descending order by alloc_hint,
> + * max_avail, total_avail
>   */
>  static int btrfs_cmp_device_info(const void *a, const void *b)
>  {
>  	const struct btrfs_device_info *di_a = a;
>  	const struct btrfs_device_info *di_b = b;
>  
> +	if (di_a->alloc_hint > di_b->alloc_hint)
> +		return -1;
> +	if (di_a->alloc_hint < di_b->alloc_hint)
> +		return 1;
>  	if (di_a->max_avail > di_b->max_avail)
>  		return -1;
>  	if (di_a->max_avail < di_b->max_avail)
> @@ -4939,6 +4944,15 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices,
>  	int ndevs = 0;
>  	u64 max_avail;
>  	u64 dev_offset;
> +	int hint;
> +
> +	static const char alloc_hint_map[BTRFS_DEV_ALLOCATION_MASK_COUNT] = {
> +		[BTRFS_DEV_ALLOCATION_DATA_ONLY] = -1,
> +		[BTRFS_DEV_ALLOCATION_PREFERRED_DATA] = 0,
> +		[BTRFS_DEV_ALLOCATION_METADATA_ONLY] = 1,
> +		[BTRFS_DEV_ALLOCATION_PREFERRED_METADATA] = 2
> +		/* the other values are set to 0 */
> +	};
>  
>  	/*
>  	 * in the first pass through the devices list, we gather information
> @@ -4991,16 +5005,81 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices,
>  		devices_info[ndevs].max_avail = max_avail;
>  		devices_info[ndevs].total_avail = total_avail;
>  		devices_info[ndevs].dev = device;
> +
> +		if (((ctl->type & BTRFS_BLOCK_GROUP_DATA) &&
> +		     (ctl->type & BTRFS_BLOCK_GROUP_METADATA)) ||
> +		    info->allocation_hint_mode == 
> +		     BTRFS_ALLOCATION_HINT_DISABLED) {

Well, I guess if you're going to keep putting the mount option in each
new patch version, then I'm going to keep saying "please remove the
mount option" from each new patch version.

The right side of this || can be deleted, and the entire patch 4/5
(which adds the mount option).


> +			/*
> +			 * if mixed bg or the allocator hint is
> +			 * disable, set all the alloc_hint
> +			 * fields to the same value, so the sorting
> +			 * is not affected
> +			 */
> +			devices_info[ndevs].alloc_hint = 0;
> +		} else if(ctl->type & BTRFS_BLOCK_GROUP_DATA) {
> +			hint = device->type & BTRFS_DEV_ALLOCATION_MASK;
> +
> +			/*
> +			 * skip BTRFS_DEV_METADATA_ONLY disks
> +			 */
> +			if (hint == BTRFS_DEV_ALLOCATION_METADATA_ONLY)
> +				continue;
> +			/*
> +			 * if a data chunk must be allocated,
> +			 * sort also by hint (data disk
> +			 * higher priority)
> +			 */
> +			devices_info[ndevs].alloc_hint = -alloc_hint_map[hint];
> +		} else { /* BTRFS_BLOCK_GROUP_METADATA */
> +			hint = device->type & BTRFS_DEV_ALLOCATION_MASK;
> +
> +			/*
> +			 * skip BTRFS_DEV_DATA_ONLY disks
> +			 */
> +			if (hint == BTRFS_DEV_ALLOCATION_DATA_ONLY)
> +				continue;
> +			/*
> +			 * if a data chunk must be allocated,
> +			 * sort also by hint (metadata hint
> +			 * higher priority)
> +			 */
> +			devices_info[ndevs].alloc_hint = alloc_hint_map[hint];
> +		}
> +
>  		++ndevs;
>  	}
>  	ctl->ndevs = ndevs;
>  
> +	/*
> +	 * no devices available
> +	 */
> +	if (!ndevs)
> +		return 0;
> +
>  	/*
>  	 * now sort the devices by hole size / available space
>  	 */
>  	sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
>  	     btrfs_cmp_device_info, NULL);
>  
> +	/*
> +	 * select the minimum set of disks grouped by hint that
> +	 * can host the chunk
> +	 */
> +	ndevs = 0;
> +	while (ndevs < ctl->ndevs) {
> +		hint = devices_info[ndevs++].alloc_hint;
> +		while (devices_info[ndevs].alloc_hint == hint &&
> +		       ndevs < ctl->ndevs)
> +				ndevs++;

So I ripped out the unnecessary line and patch 4 and started running it,
and hit the following pretty quickly:

	[Thu Feb  4 18:13:14 2021] BTRFS info (device dm-0): balance: start -mlimit=1 -slimit=1
	[Thu Feb  4 18:13:14 2021] ==================================================================
	[Thu Feb  4 18:13:14 2021] BUG: KASAN: slab-out-of-bounds in btrfs_alloc_chunk+0x74b/0x1320
	[Thu Feb  4 18:13:14 2021] Read of size 4 at addr ffff88812e1a7100 by task btrfs/7605

	[Thu Feb  4 18:13:14 2021] CPU: 1 PID: 7605 Comm: btrfs Tainted: G        W         5.10.9-acdf531fae43+ #4
	[Thu Feb  4 18:13:14 2021] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
	[Thu Feb  4 18:13:14 2021] Call Trace:
	[Thu Feb  4 18:13:14 2021]  dump_stack+0xbc/0xf9
	[Thu Feb  4 18:13:14 2021]  ? btrfs_alloc_chunk+0x74b/0x1320
	[Thu Feb  4 18:13:14 2021]  print_address_description.constprop.8+0x21/0x210
	[Thu Feb  4 18:13:14 2021]  ? record_print_text.cold.34+0x11/0x11
	[Thu Feb  4 18:13:14 2021]  ? btrfs_alloc_chunk+0x74b/0x1320
	[Thu Feb  4 18:13:14 2021]  ? btrfs_alloc_chunk+0x74b/0x1320
	[Thu Feb  4 18:13:14 2021]  kasan_report.cold.10+0x20/0x37
	[Thu Feb  4 18:13:14 2021]  ? update_dev_time+0x30/0x40
	[Thu Feb  4 18:13:14 2021]  ? btrfs_alloc_chunk+0x74b/0x1320
	[Thu Feb  4 18:13:14 2021]  __asan_load4+0x69/0x90
	[Thu Feb  4 18:13:14 2021]  btrfs_alloc_chunk+0x74b/0x1320
	[Thu Feb  4 18:13:14 2021]  ? btrfs_shrink_device+0x8f0/0x8f0
	[Thu Feb  4 18:13:14 2021]  ? _raw_spin_unlock+0x22/0x30
	[Thu Feb  4 18:13:14 2021]  ? btrfs_block_rsv_add_bytes+0x53/0x80
	[Thu Feb  4 18:13:14 2021]  ? btrfs_block_rsv_add+0x43/0x50
	[Thu Feb  4 18:13:14 2021]  ? check_system_chunk+0x1ac/0x1e0
	[Thu Feb  4 18:13:14 2021]  btrfs_chunk_alloc+0x2aa/0x430
	[Thu Feb  4 18:13:14 2021]  find_free_extent+0x1159/0x18a0
	[Thu Feb  4 18:13:14 2021]  ? unpin_extent_range+0xb60/0xb60
	[Thu Feb  4 18:13:14 2021]  ? do_raw_spin_lock+0x1e0/0x1e0
	[Thu Feb  4 18:13:14 2021]  ? _raw_spin_unlock+0x22/0x30
	[Thu Feb  4 18:13:14 2021]  ? btrfs_get_alloc_profile+0x1d1/0x340
	[Thu Feb  4 18:13:14 2021]  btrfs_reserve_extent+0xe0/0x200
	[Thu Feb  4 18:13:14 2021]  btrfs_alloc_tree_block+0x1ab/0x850
	[Thu Feb  4 18:13:14 2021]  ? is_bpf_text_address+0x86/0xf0
	[Thu Feb  4 18:13:14 2021]  ? btrfs_alloc_logged_file_extent+0x1d0/0x1d0
	[Thu Feb  4 18:13:14 2021]  ? kmem_cache_free+0x13/0x20
	[Thu Feb  4 18:13:14 2021]  ? stack_trace_save+0x87/0xb0
	[Thu Feb  4 18:13:14 2021]  ? stack_trace_consume_entry+0x90/0x90
	[Thu Feb  4 18:13:14 2021]  alloc_tree_block_no_bg_flush+0xca/0xf0
	[Thu Feb  4 18:13:14 2021]  __btrfs_cow_block+0x274/0x950
	[Thu Feb  4 18:13:14 2021]  ? lock_downgrade+0x3d0/0x3f0
	[Thu Feb  4 18:13:14 2021]  ? update_ref_for_cow+0x550/0x550
	[Thu Feb  4 18:13:14 2021]  ? __kasan_check_read+0x11/0x20
	[Thu Feb  4 18:13:14 2021]  btrfs_cow_block+0x1b9/0x370
	[Thu Feb  4 18:13:14 2021]  btrfs_search_slot+0x97a/0xfc0
	[Thu Feb  4 18:13:14 2021]  ? kmem_cache_free.part.56+0x114/0x180
	[Thu Feb  4 18:13:14 2021]  ? split_leaf+0x9a0/0x9a0
	[Thu Feb  4 18:13:14 2021]  ? btrfs_drop_extent_cache+0x201/0x7a0
	[Thu Feb  4 18:13:14 2021]  ? __kasan_kmalloc.constprop.18+0xbe/0xd0
	[Thu Feb  4 18:13:14 2021]  ? btrfs_buffered_write.isra.30+0xaa0/0xaa0
	[Thu Feb  4 18:13:14 2021]  ? kmem_cache_free+0x13/0x20
	[Thu Feb  4 18:13:14 2021]  btrfs_truncate_inode_items+0x339/0x1570
	[Thu Feb  4 18:13:14 2021]  ? ___might_sleep+0x10f/0x1e0
	[Thu Feb  4 18:13:14 2021]  ? __might_sleep+0x71/0xe0
	[Thu Feb  4 18:13:14 2021]  ? btrfs_rmdir+0x290/0x290
	[Thu Feb  4 18:13:14 2021]  ? __kasan_check_write+0x14/0x20
	[Thu Feb  4 18:13:14 2021]  ? unmap_mapping_pages+0xb7/0x1d0
	[Thu Feb  4 18:13:14 2021]  ? do_wp_page+0x4f0/0x4f0
	[Thu Feb  4 18:13:14 2021]  ? trace_hardirqs_on+0x55/0x120
	[Thu Feb  4 18:13:14 2021]  btrfs_truncate_free_space_cache+0x1ef/0x320
	[Thu Feb  4 18:13:14 2021]  delete_block_group_cache+0x80/0xd0
	[Thu Feb  4 18:13:14 2021]  btrfs_relocate_block_group+0x1a3/0x4c0
	[Thu Feb  4 18:13:14 2021]  ? block_group_cache_tree_search+0x156/0x190
	[Thu Feb  4 18:13:14 2021]  btrfs_relocate_chunk+0x52/0x120
	[Thu Feb  4 18:13:14 2021]  btrfs_balance+0xe09/0x18e0
	[Thu Feb  4 18:13:14 2021]  ? __kasan_check_read+0x11/0x20
	[Thu Feb  4 18:13:14 2021]  ? lock_acquire+0xd0/0x550
	[Thu Feb  4 18:13:14 2021]  ? btrfs_relocate_chunk+0x120/0x120
	[Thu Feb  4 18:13:14 2021]  ? kasan_free_pages+0x4f/0x60
	[Thu Feb  4 18:13:14 2021]  ? kmem_cache_alloc_trace+0x74b/0xca0
	[Thu Feb  4 18:13:14 2021]  ? _copy_from_user+0x83/0xc0
	[Thu Feb  4 18:13:14 2021]  btrfs_ioctl_balance+0x3a7/0x460
	[Thu Feb  4 18:13:14 2021]  btrfs_ioctl+0x3197/0x3fa0
	[Thu Feb  4 18:13:14 2021]  ? __kasan_check_read+0x11/0x20
	[Thu Feb  4 18:13:14 2021]  ? btrfs_ioctl_get_supported_features+0x30/0x30
	[Thu Feb  4 18:13:14 2021]  ? __kasan_check_read+0x11/0x20
	[Thu Feb  4 18:13:14 2021]  ? lock_release+0xc8/0x640
	[Thu Feb  4 18:13:14 2021]  ? check_flags+0x30/0x30
	[Thu Feb  4 18:13:14 2021]  ? handle_mm_fault+0x168d/0x2150
	[Thu Feb  4 18:13:14 2021]  ? lock_downgrade+0x3f0/0x3f0
	[Thu Feb  4 18:13:14 2021]  ? handle_mm_fault+0x159e/0x2150
	[Thu Feb  4 18:13:14 2021]  ? __kasan_check_read+0x11/0x20
	[Thu Feb  4 18:13:14 2021]  ? lock_release+0xc8/0x640
	[Thu Feb  4 18:13:14 2021]  ? do_user_addr_fault+0x299/0x5a0
	[Thu Feb  4 18:13:14 2021]  ? do_raw_spin_unlock+0xa8/0x140
	[Thu Feb  4 18:13:14 2021]  ? lock_downgrade+0x3f0/0x3f0
	[Thu Feb  4 18:13:14 2021]  ? _raw_spin_unlock+0x22/0x30
	[Thu Feb  4 18:13:14 2021]  ? handle_mm_fault+0xad6/0x2150
	[Thu Feb  4 18:13:14 2021]  ? do_vfs_ioctl+0xfc/0x9d0
	[Thu Feb  4 18:13:14 2021]  ? ioctl_file_clone+0xe0/0xe0
	[Thu Feb  4 18:13:14 2021]  ? __kasan_check_write+0x14/0x20
	[Thu Feb  4 18:13:14 2021]  ? up_read+0x176/0x4f0
	[Thu Feb  4 18:13:14 2021]  ? down_read_killable_nested+0x4e0/0x4e0
	[Thu Feb  4 18:13:14 2021]  ? vmacache_find+0xc9/0x120
	[Thu Feb  4 18:13:14 2021]  ? __kasan_check_read+0x11/0x20
	[Thu Feb  4 18:13:14 2021]  ? __fget_light+0xae/0x110
	[Thu Feb  4 18:13:14 2021]  __x64_sys_ioctl+0xc3/0x100
	[Thu Feb  4 18:13:14 2021]  do_syscall_64+0x37/0x80
	[Thu Feb  4 18:13:14 2021]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
	[Thu Feb  4 18:13:14 2021] RIP: 0033:0x7fc6f9b97427
	[Thu Feb  4 18:13:14 2021] Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 0c 00 f7 d8 64 89 01 48
	[Thu Feb  4 18:13:14 2021] RSP: 002b:00007ffe8166f088 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
	[Thu Feb  4 18:13:14 2021] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fc6f9b97427
	[Thu Feb  4 18:13:14 2021] RDX: 00007ffe8166f110 RSI: 00000000c4009420 RDI: 0000000000000003
	[Thu Feb  4 18:13:14 2021] RBP: 0000000000000003 R08: 0000000000000003 R09: 0000000000000078
	[Thu Feb  4 18:13:14 2021] R10: fffffffffffff6ed R11: 0000000000000206 R12: 00007ffe81671a3c
	[Thu Feb  4 18:13:14 2021] R13: 00007ffe8166f110 R14: 0000000000000001 R15: 0000000000000000

	[Thu Feb  4 18:13:14 2021] Allocated by task 7605:
	[Thu Feb  4 18:13:14 2021]  kasan_save_stack+0x21/0x50
	[Thu Feb  4 18:13:14 2021]  __kasan_kmalloc.constprop.18+0xbe/0xd0
	[Thu Feb  4 18:13:14 2021]  kasan_kmalloc+0x9/0x10
	[Thu Feb  4 18:13:14 2021]  __kmalloc+0x4af/0xcc0
	[Thu Feb  4 18:13:14 2021]  btrfs_alloc_chunk+0x3b4/0x1320
	[Thu Feb  4 18:13:14 2021]  btrfs_chunk_alloc+0x2aa/0x430
	[Thu Feb  4 18:13:14 2021]  find_free_extent+0x1159/0x18a0
	[Thu Feb  4 18:13:14 2021]  btrfs_reserve_extent+0xe0/0x200
	[Thu Feb  4 18:13:14 2021]  btrfs_alloc_tree_block+0x1ab/0x850
	[Thu Feb  4 18:13:14 2021]  alloc_tree_block_no_bg_flush+0xca/0xf0
	[Thu Feb  4 18:13:14 2021]  __btrfs_cow_block+0x274/0x950
	[Thu Feb  4 18:13:14 2021]  btrfs_cow_block+0x1b9/0x370
	[Thu Feb  4 18:13:14 2021]  btrfs_search_slot+0x97a/0xfc0
	[Thu Feb  4 18:13:14 2021]  btrfs_truncate_inode_items+0x339/0x1570
	[Thu Feb  4 18:13:14 2021]  btrfs_truncate_free_space_cache+0x1ef/0x320
	[Thu Feb  4 18:13:14 2021]  delete_block_group_cache+0x80/0xd0
	[Thu Feb  4 18:13:14 2021]  btrfs_relocate_block_group+0x1a3/0x4c0
	[Thu Feb  4 18:13:14 2021]  btrfs_relocate_chunk+0x52/0x120
	[Thu Feb  4 18:13:14 2021]  btrfs_balance+0xe09/0x18e0
	[Thu Feb  4 18:13:14 2021]  btrfs_ioctl_balance+0x3a7/0x460
	[Thu Feb  4 18:13:14 2021]  btrfs_ioctl+0x3197/0x3fa0
	[Thu Feb  4 18:13:14 2021]  __x64_sys_ioctl+0xc3/0x100
	[Thu Feb  4 18:13:14 2021]  do_syscall_64+0x37/0x80
	[Thu Feb  4 18:13:14 2021]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

	[Thu Feb  4 18:13:14 2021] The buggy address belongs to the object at ffff88812e1a7040
				    which belongs to the cache kmalloc-192 of size 192
	[Thu Feb  4 18:13:14 2021] The buggy address is located 0 bytes to the right of
				    192-byte region [ffff88812e1a7040, ffff88812e1a7100)
	[Thu Feb  4 18:13:14 2021] The buggy address belongs to the page:
	[Thu Feb  4 18:13:14 2021] page:00000000210b7c03 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88812e1a7ff1 pfn:0x12e1a7
	[Thu Feb  4 18:13:14 2021] flags: 0x17ffe0000000200(slab)
	[Thu Feb  4 18:13:14 2021] raw: 017ffe0000000200 ffffea000696d108 ffffea0006ba3ec8 ffff888100040b40
	[Thu Feb  4 18:13:14 2021] raw: ffff88812e1a7ff1 ffff88812e1a7040 000000010000000d 0000000000000000
	[Thu Feb  4 18:13:14 2021] page dumped because: kasan: bad access detected

	[Thu Feb  4 18:13:14 2021] Memory state around the buggy address:
	[Thu Feb  4 18:13:14 2021]  ffff88812e1a7000: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00
	[Thu Feb  4 18:13:14 2021]  ffff88812e1a7080: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
	[Thu Feb  4 18:13:14 2021] >ffff88812e1a7100: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
	[Thu Feb  4 18:13:14 2021]                    ^
	[Thu Feb  4 18:13:14 2021]  ffff88812e1a7180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
	[Thu Feb  4 18:13:14 2021]  ffff88812e1a7200: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
	[Thu Feb  4 18:13:14 2021] ==================================================================

	(gdb) l *(btrfs_alloc_chunk+0x74b)
	0xffffffff8190c3ab is in btrfs_alloc_chunk (fs/btrfs/volumes.c:5047).
	5042            ndevs = 0;
	5043            while (ndevs < ctl->ndevs) {
	5044                    hint = devices_info[ndevs++].alloc_hint;
	5045                    while (devices_info[ndevs].alloc_hint == hint &&
	5046                           ndevs < ctl->ndevs)
	5047                                    ndevs++;
	5048                    if (ndevs >= ctl->devs_min)
	5049                            break;
	5050            }
	5051

> +		if (ndevs >= ctl->devs_min)
> +			break;
> +	}
> +
> +	BUG_ON(ndevs > ctl->ndevs);
> +	ctl->ndevs = ndevs;
> +
>  	return 0;
>  }
>  
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index d776b7f55d56..31a3e4cf93b5 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -364,6 +364,7 @@ struct btrfs_device_info {
>  	u64 dev_offset;
>  	u64 max_avail;
>  	u64 total_avail;
> +	int alloc_hint;
>  };
>  
>  struct btrfs_raid_attr {
> -- 
> 2.30.0
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] btrfs: add allocator_hint mode
  2021-02-04 23:24   ` Zygo Blaxell
@ 2021-02-05 18:01     ` Goffredo Baroncelli
  0 siblings, 0 replies; 21+ messages in thread
From: Goffredo Baroncelli @ 2021-02-05 18:01 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: linux-btrfs, Josef Bacik, Goffredo Baroncelli

On 2/5/21 12:24 AM, Zygo Blaxell wrote:
> On Mon, Feb 01, 2021 at 10:28:20PM +0100, Goffredo Baroncelli wrote:
>> From: Goffredo Baroncelli <kreijack@inwind.it>
[...]
Hi Zygo
> 
> Well, I guess if you're going to keep putting the mount option in each
> new patch version, then I'm going to keep saying "please remove the
> mount option" from each new patch version.
> 
> The right side of this || can be deleted, and the entire patch 4/5
> (which adds the mount option).

In the next iteration I will move the "mount option" patch at the end of the chain; this will help you to remove this part of the patch that you don't like.

[...]



> 	(gdb) l *(btrfs_alloc_chunk+0x74b)
> 	0xffffffff8190c3ab is in btrfs_alloc_chunk (fs/btrfs/volumes.c:5047).
> 	5042            ndevs = 0;
> 	5043            while (ndevs < ctl->ndevs) {
> 	5044                    hint = devices_info[ndevs++].alloc_hint;

> 	5045                    while (devices_info[ndevs].alloc_hint == hint &&
> 	5046                           ndevs < ctl->ndevs)

this check is WRONG. The left and right side of && have to be swapped. Otherwise it is possible
an access to the last element+1 of the array before the out of bound check.
My fault.

> 	5047                                    ndevs++;
> 	5048                    if (ndevs >= ctl->devs_min)
> 	5049                            break;
> 	5050            }
> 	5051

BR
G.Baroncelli


> 
>> +		if (ndevs >= ctl->devs_min)
>> +			break;
>> +	}
>> +
>> +	BUG_ON(ndevs > ctl->ndevs);
>> +	ctl->ndevs = ndevs;
>> +
>>   	return 0;
>>   }
>>   
>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>> index d776b7f55d56..31a3e4cf93b5 100644
>> --- a/fs/btrfs/volumes.h
>> +++ b/fs/btrfs/volumes.h
>> @@ -364,6 +364,7 @@ struct btrfs_device_info {
>>   	u64 dev_offset;
>>   	u64 max_avail;
>>   	u64 total_avail;
>> +	int alloc_hint;
>>   };
>>   
>>   struct btrfs_raid_attr {
>> -- 
>> 2.30.0
>>


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC][PATCH V6] btrfs: allocation_hint mode
  2021-02-01 21:28 [RFC][PATCH V6] btrfs: allocation_hint mode Goffredo Baroncelli
                   ` (4 preceding siblings ...)
  2021-02-01 21:28 ` [PATCH 5/5] btrfs: add allocator_hint mode Goffredo Baroncelli
@ 2021-02-10 16:04 ` Josef Bacik
  2021-02-11 18:47   ` Goffredo Baroncelli
  5 siblings, 1 reply; 21+ messages in thread
From: Josef Bacik @ 2021-02-10 16:04 UTC (permalink / raw)
  To: Goffredo Baroncelli, linux-btrfs; +Cc: Zygo Blaxell

On 2/1/21 4:28 PM, Goffredo Baroncelli wrote:
> 
> Hi all,
> 
> the previous V5 serie was called "btrfs: preferred_metadata: preferred device
> for metadata".
> 

A few general points up front, first I'd highly recommend reading our patch 
submission guidelines

https://github.com/btrfs/btrfs-workflow/blob/master/patch-submission.md

specifically the 'Git config options' section, as it tells you how to apply our 
git hooks to your local repo.  This will check your patches for all the 
automatic formatting things we'll complain about, that way you don't have to get 
bogged down in those style of comments in the review.  For example as soon as I 
started applying your patches I was getting a ton of whitespace warnings, these 
are better caught before sending them along.

Also try to develop on Dave's misc-next branch.  I realize this is a moving 
target, so I'm fine with massaging patches so I can review, but again everything 
needed massaging.

And finally for a new feature we're going to need an xfstest or two in order to 
merge them.  I realize we're still working out the details, but the further you 
get into this it would be good to go ahead and have a test that validates 
everything.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] btrfs: add ioctl BTRFS_IOC_DEV_PROPERTIES.
  2021-02-01 21:28 ` [PATCH 1/5] btrfs: add ioctl BTRFS_IOC_DEV_PROPERTIES Goffredo Baroncelli
@ 2021-02-10 16:08   ` Josef Bacik
  2021-02-11 18:47     ` Goffredo Baroncelli
  0 siblings, 1 reply; 21+ messages in thread
From: Josef Bacik @ 2021-02-10 16:08 UTC (permalink / raw)
  To: Goffredo Baroncelli, linux-btrfs; +Cc: Zygo Blaxell, Goffredo Baroncelli

On 2/1/21 4:28 PM, Goffredo Baroncelli wrote:
> From: Goffredo Baroncelli <kreijack@inwind.it>
> 
> This ioctl is a base for returning / setting information from / to  the
> fields of the btrfs_dev_item object.
> 
> For now only the "type" field is returned / set.
> 
> Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
> ---
>   fs/btrfs/ioctl.c           | 67 ++++++++++++++++++++++++++++++++++++++
>   fs/btrfs/volumes.c         |  2 +-
>   fs/btrfs/volumes.h         |  2 ++
>   include/uapi/linux/btrfs.h | 40 +++++++++++++++++++++++
>   4 files changed, 110 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 703212ff50a5..9e67741fa966 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -4842,6 +4842,71 @@ static int btrfs_ioctl_set_features(struct file *file, void __user *arg)
>   	return ret;
>   }
>   
> +static long btrfs_ioctl_dev_properties(struct file *file,
> +						void __user *argp)
> +{
> +	struct inode *inode = file_inode(file);
> +	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> +	struct btrfs_ioctl_dev_properties dev_props;
> +	struct btrfs_device	*device;
> +        struct btrfs_root *root = fs_info->chunk_root;
> +        struct btrfs_trans_handle *trans;
> +	int ret;
> +	u64 prev_type;
> +
> +	if (!capable(CAP_SYS_ADMIN))
> +		return -EPERM;
> +
> +	if (copy_from_user(&dev_props, argp, sizeof(dev_props)))
> +		return -EFAULT;
> +
> +	device = btrfs_find_device(fs_info->fs_devices, dev_props.devid,
> +				NULL, NULL);
> +	if (!device) {
> +		btrfs_info(fs_info, "change_dev_properties: unable to find device %llu",
> +			   dev_props.devid);
> +		return -ENODEV;
> +	}
> +
> +	if (dev_props.properties & BTRFS_DEV_PROPERTY_READ) {
> +		u64 props = dev_props.properties;
> +		memset(&dev_props, 0, sizeof(dev_props));
> +		if (props & BTRFS_DEV_PROPERTY_TYPE) {
> +			dev_props.properties = BTRFS_DEV_PROPERTY_TYPE;
> +			dev_props.type = device->type;
> +		}
> +		if(copy_to_user(argp, &dev_props, sizeof(dev_props)))
> +			return -EFAULT;
> +		return 0;
> +	}
> +
> +	/* it is possible to set only BTRFS_DEV_PROPERTY_TYPE for now */
> +	if (dev_props.properties & ~(BTRFS_DEV_PROPERTY_TYPE))
> +		return -EPERM;
> +
> +	trans = btrfs_start_transaction(root, 0);

This needs to be 1, we're updating an item.

> +        if (IS_ERR(trans))
> +                return PTR_ERR(trans);
> +
> +	prev_type = device->type;
> +	device->type = dev_props.type;
> +	ret = btrfs_update_device(trans, device);
> +
> +        if (ret < 0) {
> +                btrfs_abort_transaction(trans, ret);
> +                btrfs_end_transaction(trans);
> +		device->type = prev_type;
> +		return  ret;
> +        }
> +
> +        ret = btrfs_commit_transaction(trans);
> +	if (ret < 0)
> +		device->type = prev_type;
> +
> +	return ret;
> +
> +}
> +
>   static int _btrfs_ioctl_send(struct file *file, void __user *argp, bool compat)
>   {
>   	struct btrfs_ioctl_send_args *arg;
> @@ -5025,6 +5090,8 @@ long btrfs_ioctl(struct file *file, unsigned int
>   		return btrfs_ioctl_get_subvol_rootref(file, argp);
>   	case BTRFS_IOC_INO_LOOKUP_USER:
>   		return btrfs_ioctl_ino_lookup_user(file, argp);
> +	case BTRFS_IOC_DEV_PROPERTIES:
> +		return btrfs_ioctl_dev_properties(file, argp);
>   	}
>   
>   	return -ENOTTY;
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index ee086fc56c30..68b346c5465d 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2744,7 +2744,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>   	return ret;
>   }
>   
> -static noinline int btrfs_update_device(struct btrfs_trans_handle *trans,
> +int btrfs_update_device(struct btrfs_trans_handle *trans,
>   					struct btrfs_device *device)
>   {
>   	int ret;
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 1997a4649a66..d776b7f55d56 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -595,5 +595,7 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info,
>   int btrfs_bg_type_to_factor(u64 flags);
>   const char *btrfs_bg_type_to_raid_name(u64 flags);
>   int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info);
> +int btrfs_update_device(struct btrfs_trans_handle *trans,
> +                                        struct btrfs_device *device);
>   
>   #endif
> diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
> index 5df73001aad4..e6caef42837a 100644
> --- a/include/uapi/linux/btrfs.h
> +++ b/include/uapi/linux/btrfs.h
> @@ -860,6 +860,44 @@ struct btrfs_ioctl_get_subvol_rootref_args {
>   		__u8 align[7];
>   };
>   
> +#define BTRFS_DEV_PROPERTY_TYPE		(1ULL << 0)
> +#define BTRFS_DEV_PROPERTY_DEV_GROUP	(1ULL << 1)
> +#define BTRFS_DEV_PROPERTY_SEEK_SPEED	(1ULL << 2)
> +#define BTRFS_DEV_PROPERTY_BANDWIDTH	(1ULL << 3)
> +#define BTRFS_DEV_PROPERTY_READ		(1ULL << 60)
> +
> +/*
> + * The ioctl BTRFS_IOC_DEV_PROPERTIES can read and write the device properties.
> + *
> + * The properties that the user want to write have to be set
> + * in the 'properties' field using the BTRFS_DEV_PROPERTY_xxxx constants.
> + *
> + * If the ioctl is used to read the device properties, the bit
> + * BTRFS_DEV_PROPERTY_READ has to be set in the 'properties' field.
> + * In this case the properties that the user want have to be set in the
> + * 'properties' field. The kernel doesn't return a property that was not
> + * required, however it may return a subset of the requested properties.
> + * The returned properties have the corrispondent BTRFS_DEV_PROPERTY_xxxx
> + * flag set in the 'properties' field.
> + *
> + * Up to 2020/05/11 the only properties that can be read/write is the 'type'
> + * one.
> + */
> +struct btrfs_ioctl_dev_properties {
> +	__u64	devid;
> +	__u64	properties;
> +	__u64	type;
> +	__u32	dev_group;
> +	__u8	seek_speed;
> +	__u8	bandwidth;
> +
> +	/*
> +	 * for future expansion
> +	 */
> +	__u8	unused1[2];
> +	__u64	unused2[4];
> +};
> +

I think we're padding out to 1k for new stuff like this?  We can never have too 
much room for expansion.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] btrfs: add flags to give an hint to the chunk allocator
  2021-02-01 21:28 ` [PATCH 2/5] btrfs: add flags to give an hint to the chunk allocator Goffredo Baroncelli
@ 2021-02-10 16:09   ` Josef Bacik
  2021-02-11 18:47     ` Goffredo Baroncelli
  0 siblings, 1 reply; 21+ messages in thread
From: Josef Bacik @ 2021-02-10 16:09 UTC (permalink / raw)
  To: Goffredo Baroncelli, linux-btrfs; +Cc: Zygo Blaxell, Goffredo Baroncelli

On 2/1/21 4:28 PM, Goffredo Baroncelli wrote:
> From: Goffredo Baroncelli <kreijack@inwind.it>
> 
> Add the following flags to give an hint about which chunk should be
> allocated in which a disk.
> The following flags are created:
> 
> - BTRFS_DEV_ALLOCATION_PREFERRED_DATA
>    preferred data chunk, but metadata chunk allowed
> - BTRFS_DEV_ALLOCATION_PREFERRED_METADATA
>    preferred metadata chunk, but data chunk allowed
> - BTRFS_DEV_ALLOCATION_METADATA_ONLY
>    only metadata chunk allowed
> - BTRFS_DEV_ALLOCATION_DATA_ONLY
>    only data chunk allowed
> 
> Signed-off-by: Goffredo Baroncelli <kreijack@inwid.it>
> ---
>   include/uapi/linux/btrfs_tree.h | 18 ++++++++++++++++++
>   1 file changed, 18 insertions(+)
> 
> diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
> index 58d7cff9afb1..bd3af853df0c 100644
> --- a/include/uapi/linux/btrfs_tree.h
> +++ b/include/uapi/linux/btrfs_tree.h
> @@ -361,6 +361,24 @@ struct btrfs_key {
>   	__u64 offset;
>   } __attribute__ ((__packed__));
>   
> +/* dev_item.type */
> +
> +/* btrfs chunk allocation hints */
> +#define BTRFS_DEV_ALLOCATION_MASK_BIT_COUNT	3
> +#define BTRFS_DEV_ALLOCATION_MASK ((1ULL << \
> +		BTRFS_DEV_ALLOCATION_MASK_BIT_COUNT) -1)
> +#define BTRFS_DEV_ALLOCATION_MASK_COUNT (1ULL << \
> +		BTRFS_DEV_ALLOCATION_MASK_BIT_COUNT)

We just want to define the actual values that are going to disk, helpers can be 
defined elsewhere.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] btrfs: add allocator_hint mode
  2021-02-01 21:28 ` [PATCH 5/5] btrfs: add allocator_hint mode Goffredo Baroncelli
  2021-02-04 23:24   ` Zygo Blaxell
@ 2021-02-10 16:12   ` Josef Bacik
  2021-02-11 18:46     ` Goffredo Baroncelli
  2021-02-19 18:51   ` Goffredo Baroncelli
  2 siblings, 1 reply; 21+ messages in thread
From: Josef Bacik @ 2021-02-10 16:12 UTC (permalink / raw)
  To: Goffredo Baroncelli, linux-btrfs; +Cc: Zygo Blaxell, Goffredo Baroncelli

On 2/1/21 4:28 PM, Goffredo Baroncelli wrote:
> From: Goffredo Baroncelli <kreijack@inwind.it>
> 
> When this mode is enabled, the chunk allocation policy is modified as follow.
> 
> Each disk may have a different tag:
> - BTRFS_DEV_ALLOCATION_PREFERRED_METADATA
> - BTRFS_DEV_ALLOCATION_METADATA_ONLY
> - BTRFS_DEV_ALLOCATION_DATA_ONLY
> - BTRFS_DEV_ALLOCATION_PREFERRED_DATA (default)
> 
> Where:
> - ALLOCATION_PREFERRED_X means that it is preferred to use this disk for the
> X chunk type (the other type may be allowed when the space is low)
> - ALLOCATION_X_ONLY means that it is used *only* for the X chunk type. This
> means also that it is a preferred choice.
> 
> Each time the allocator allocates a chunk of type X , first it takes the disks
> tagged as ALLOCATION_X_ONLY or ALLOCATION_PREFERRED_X; if the space is not
> enough, it uses also the disks tagged as ALLOCATION_METADATA_ONLY; if the space
> is not enough, it uses also the other disks, with the exception of the one
> marked as ALLOCATION_PREFERRED_Y, where Y the other type of chunk (i.e. not X).
> 
> Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
> ---
>   fs/btrfs/volumes.c | 81 +++++++++++++++++++++++++++++++++++++++++++++-
>   fs/btrfs/volumes.h |  1 +
>   2 files changed, 81 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 68b346c5465d..57ee3e2fdac0 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4806,13 +4806,18 @@ static int btrfs_add_system_chunk(struct btrfs_fs_info *fs_info,
>   }
>   
>   /*
> - * sort the devices in descending order by max_avail, total_avail
> + * sort the devices in descending order by alloc_hint,
> + * max_avail, total_avail
>    */
>   static int btrfs_cmp_device_info(const void *a, const void *b)
>   {
>   	const struct btrfs_device_info *di_a = a;
>   	const struct btrfs_device_info *di_b = b;
>   
> +	if (di_a->alloc_hint > di_b->alloc_hint)
> +		return -1;
> +	if (di_a->alloc_hint < di_b->alloc_hint)
> +		return 1;
>   	if (di_a->max_avail > di_b->max_avail)
>   		return -1;
>   	if (di_a->max_avail < di_b->max_avail)
> @@ -4939,6 +4944,15 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices,
>   	int ndevs = 0;
>   	u64 max_avail;
>   	u64 dev_offset;
> +	int hint;
> +
> +	static const char alloc_hint_map[BTRFS_DEV_ALLOCATION_MASK_COUNT] = {
> +		[BTRFS_DEV_ALLOCATION_DATA_ONLY] = -1,
> +		[BTRFS_DEV_ALLOCATION_PREFERRED_DATA] = 0,
> +		[BTRFS_DEV_ALLOCATION_METADATA_ONLY] = 1,
> +		[BTRFS_DEV_ALLOCATION_PREFERRED_METADATA] = 2
> +		/* the other values are set to 0 */
> +	};

This can be made global, up with the btrfs_raid_array definitions.

>   
>   	/*
>   	 * in the first pass through the devices list, we gather information
> @@ -4991,16 +5005,81 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices,
>   		devices_info[ndevs].max_avail = max_avail;
>   		devices_info[ndevs].total_avail = total_avail;
>   		devices_info[ndevs].dev = device;
> +
> +		if (((ctl->type & BTRFS_BLOCK_GROUP_DATA) &&
> +		     (ctl->type & BTRFS_BLOCK_GROUP_METADATA)) ||
> +		    info->allocation_hint_mode ==
> +		     BTRFS_ALLOCATION_HINT_DISABLED) {
> +			/*
> +			 * if mixed bg or the allocator hint is
> +			 * disable, set all the alloc_hint
> +			 * fields to the same value, so the sorting
> +			 * is not affected
> +			 */
> +			devices_info[ndevs].alloc_hint = 0;
> +		} else if(ctl->type & BTRFS_BLOCK_GROUP_DATA) {
> +			hint = device->type & BTRFS_DEV_ALLOCATION_MASK;
> +
> +			/*
> +			 * skip BTRFS_DEV_METADATA_ONLY disks
> +			 */
> +			if (hint == BTRFS_DEV_ALLOCATION_METADATA_ONLY)
> +				continue;
> +			/*
> +			 * if a data chunk must be allocated,
> +			 * sort also by hint (data disk
> +			 * higher priority)
> +			 */
> +			devices_info[ndevs].alloc_hint = -alloc_hint_map[hint];
> +		} else { /* BTRFS_BLOCK_GROUP_METADATA */
> +			hint = device->type & BTRFS_DEV_ALLOCATION_MASK;
> +
> +			/*
> +			 * skip BTRFS_DEV_DATA_ONLY disks
> +			 */
> +			if (hint == BTRFS_DEV_ALLOCATION_DATA_ONLY)
> +				continue;
> +			/*
> +			 * if a data chunk must be allocated,
> +			 * sort also by hint (metadata hint
> +			 * higher priority)
> +			 */
> +			devices_info[ndevs].alloc_hint = alloc_hint_map[hint];
> +		}
> +
>   		++ndevs;
>   	}
>   	ctl->ndevs = ndevs;
>   
> +	/*
> +	 * no devices available
> +	 */
> +	if (!ndevs)
> +		return 0;
> +
>   	/*
>   	 * now sort the devices by hole size / available space
>   	 */
>   	sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
>   	     btrfs_cmp_device_info, NULL);
>   
> +	/*
> +	 * select the minimum set of disks grouped by hint that
> +	 * can host the chunk
> +	 */
> +	ndevs = 0;
> +	while (ndevs < ctl->ndevs) {
> +		hint = devices_info[ndevs++].alloc_hint;
> +		while (devices_info[ndevs].alloc_hint == hint &&
> +		       ndevs < ctl->ndevs)
> +				ndevs++;
> +		if (ndevs >= ctl->devs_min)
> +			break;
> +	}

Can we just adjust btrfs_cmp_device_info to take the hint info into account? 
Thanks,

Josef

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] btrfs: add allocation_hint option.
  2021-02-01 21:28 ` [PATCH 4/5] btrfs: add allocation_hint option Goffredo Baroncelli
@ 2021-02-10 16:14   ` Josef Bacik
  2021-02-11 18:46     ` Goffredo Baroncelli
  0 siblings, 1 reply; 21+ messages in thread
From: Josef Bacik @ 2021-02-10 16:14 UTC (permalink / raw)
  To: Goffredo Baroncelli, linux-btrfs; +Cc: Zygo Blaxell, Goffredo Baroncelli

On 2/1/21 4:28 PM, Goffredo Baroncelli wrote:
> From: Goffredo Baroncelli <kreijack@inwind.it>
> 
> Add allocation_hint mount option. This option accepts the following values:
> 
> - 0 (default):  the chunks allocator ignores the disk hints
> - 1:            the chunks allocator considers the disk hints
> 
> Signed-off-by: Goffredo Baroncelli <kreijack@winwind.it>

We don't want an mount option here.  If a user has tagged a block device as 
METADATA/DATA_ONLY or METADATA_PREFERRED we know they want it.  If we want to 
add a way to disable the hinting then I think a sysctl option to disable is a 
reasonable compromise.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] btrfs: add allocator_hint mode
  2021-02-10 16:12   ` Josef Bacik
@ 2021-02-11 18:46     ` Goffredo Baroncelli
  0 siblings, 0 replies; 21+ messages in thread
From: Goffredo Baroncelli @ 2021-02-11 18:46 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs; +Cc: Zygo Blaxell, Goffredo Baroncelli

On 2/10/21 5:12 PM, Josef Bacik wrote:
> On 2/1/21 4:28 PM, Goffredo Baroncelli wrote:
>> From: Goffredo Baroncelli <kreijack@inwind.it>
[...]
>> +    int hint;
>> +
>> +    static const char alloc_hint_map[BTRFS_DEV_ALLOCATION_MASK_COUNT] = {
>> +        [BTRFS_DEV_ALLOCATION_DATA_ONLY] = -1,
>> +        [BTRFS_DEV_ALLOCATION_PREFERRED_DATA] = 0,
>> +        [BTRFS_DEV_ALLOCATION_METADATA_ONLY] = 1,
>> +        [BTRFS_DEV_ALLOCATION_PREFERRED_METADATA] = 2
>> +        /* the other values are set to 0 */
>> +    };
> 
> This can be made global, up with the btrfs_raid_array definitions.

I moved this in the beginning, outside the functions. It still "static", so no global.
> 
[...]

>>       /*
>>        * now sort the devices by hole size / available space
>>        */
>>       sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
>>            btrfs_cmp_device_info, NULL);
>> +    /*
>> +     * select the minimum set of disks grouped by hint that
>> +     * can host the chunk
>> +     */
>> +    ndevs = 0;
>> +    while (ndevs < ctl->ndevs) {
>> +        hint = devices_info[ndevs++].alloc_hint;
>> +        while (devices_info[ndevs].alloc_hint == hint &&
>> +               ndevs < ctl->ndevs)
>> +                ndevs++;
>> +        if (ndevs >= ctl->devs_min)
>> +            break;
>> +    }
> 
> Can we just adjust btrfs_cmp_device_info to take the hint info into account? Thanks,

Unfortunately No. btrfs_cmp_device_info is used to sort the disks list.
Instead this piece of code *limits* the available number of disks to a disks set
sufficient to host the new chunk.

> 
> Josef


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/5] btrfs: add allocation_hint option.
  2021-02-10 16:14   ` Josef Bacik
@ 2021-02-11 18:46     ` Goffredo Baroncelli
  0 siblings, 0 replies; 21+ messages in thread
From: Goffredo Baroncelli @ 2021-02-11 18:46 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs; +Cc: Zygo Blaxell, Goffredo Baroncelli

On 2/10/21 5:14 PM, Josef Bacik wrote:
> On 2/1/21 4:28 PM, Goffredo Baroncelli wrote:
>> From: Goffredo Baroncelli <kreijack@inwind.it>
>>
>> Add allocation_hint mount option. This option accepts the following values:
>>
>> - 0 (default):  the chunks allocator ignores the disk hints
>> - 1:            the chunks allocator considers the disk hints
>>
>> Signed-off-by: Goffredo Baroncelli <kreijack@winwind.it>
> 
> We don't want an mount option here.  If a user has tagged a block device as METADATA/DATA_ONLY or METADATA_PREFERRED we know they want it.  If we want to add a way to disable the hinting then I think a sysctl option to disable is a reasonable compromise.  Thanks,

The idea was to have a knob to switch off all the logic in case of problem.
When the code will be more tested we can drop this part. For the moment I will move this patch at the end of the series to simplify the "manual" drop.

> 
> Josef

Ciao
Goffredo
-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] btrfs: add ioctl BTRFS_IOC_DEV_PROPERTIES.
  2021-02-10 16:08   ` Josef Bacik
@ 2021-02-11 18:47     ` Goffredo Baroncelli
  0 siblings, 0 replies; 21+ messages in thread
From: Goffredo Baroncelli @ 2021-02-11 18:47 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs; +Cc: Zygo Blaxell, Goffredo Baroncelli

On 2/10/21 5:08 PM, Josef Bacik wrote:
> On 2/1/21 4:28 PM, Goffredo Baroncelli wrote:
>> From: Goffredo Baroncelli <kreijack@inwind.it>
>>
>> This ioctl is a base for returning / setting information from / to  the
>> fields of the btrfs_dev_item object.
>>
>> For now only the "type" field is returned / set.
>>
>> Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
[...]
>> +    /* it is possible to set only BTRFS_DEV_PROPERTY_TYPE for now */
>> +    if (dev_props.properties & ~(BTRFS_DEV_PROPERTY_TYPE))
>> +        return -EPERM;
>> +
>> +    trans = btrfs_start_transaction(root, 0);
> 
> This needs to be 1, we're updating an item.

Ok

[...]
>> + * Up to 2020/05/11 the only properties that can be read/write is the 'type'
>> + * one.
>> + */
>> +struct btrfs_ioctl_dev_properties {
>> +    __u64    devid;
>> +    __u64    properties;
>> +    __u64    type;
>> +    __u32    dev_group;
>> +    __u8    seek_speed;
>> +    __u8    bandwidth;
>> +
>> +    /*
>> +     * for future expansion
>> +     */
>> +    __u8    unused1[2];
>> +    __u64    unused2[4];
>> +};
>> +
> 
> I think we're padding out to 1k for new stuff like this?  We can never have too much room for expansion.  Thanks,
Ok, in the next iteration this will be expanded up to 1kB

> 
> Josef

Ciao
Goffredo

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] btrfs: add flags to give an hint to the chunk allocator
  2021-02-10 16:09   ` Josef Bacik
@ 2021-02-11 18:47     ` Goffredo Baroncelli
  0 siblings, 0 replies; 21+ messages in thread
From: Goffredo Baroncelli @ 2021-02-11 18:47 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs; +Cc: Zygo Blaxell, Goffredo Baroncelli

On 2/10/21 5:09 PM, Josef Bacik wrote:
> On 2/1/21 4:28 PM, Goffredo Baroncelli wrote:
>> From: Goffredo Baroncelli <kreijack@inwind.it>
[...]
>> +
>> +/* btrfs chunk allocation hints */
>> +#define BTRFS_DEV_ALLOCATION_MASK_BIT_COUNT    3
>> +#define BTRFS_DEV_ALLOCATION_MASK ((1ULL << \
>> +        BTRFS_DEV_ALLOCATION_MASK_BIT_COUNT) -1)
>> +#define BTRFS_DEV_ALLOCATION_MASK_COUNT (1ULL << \
>> +        BTRFS_DEV_ALLOCATION_MASK_BIT_COUNT)
> 
> We just want to define the actual values that are going to disk, helpers can be defined elsewhere.  Thanks,

I will move BTRFS_DEV_ALLOCATION_MASK, BTRFS_DEV_ALLOCATION_MASK_COUNT. Instead I like
to left BTRFS_DEV_ALLOCATION_MASK_BIT_COUNT here, in order to show how many bits
are available.

> 
> Josef

Ciao
Goffredo

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC][PATCH V6] btrfs: allocation_hint mode
  2021-02-10 16:04 ` [RFC][PATCH V6] btrfs: allocation_hint mode Josef Bacik
@ 2021-02-11 18:47   ` Goffredo Baroncelli
  2021-02-11 18:58     ` Josef Bacik
  2021-02-16 22:27     ` Josef Bacik
  0 siblings, 2 replies; 21+ messages in thread
From: Goffredo Baroncelli @ 2021-02-11 18:47 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs; +Cc: Zygo Blaxell

On 2/10/21 5:04 PM, Josef Bacik wrote:
> On 2/1/21 4:28 PM, Goffredo Baroncelli wrote:
>>
>> Hi all,
>>
>> the previous V5 serie was called "btrfs: preferred_metadata: preferred device
>> for metadata".
>>
> 
> A few general points up front, first I'd highly recommend reading our patch submission guidelines
> 
> https://github.com/btrfs/btrfs-workflow/blob/master/patch-submission.md
> 
> specifically the 'Git config options' section, as it tells you how to apply our git hooks to your local repo.  This will check your patches for all the automatic formatting things we'll complain about, that way you don't have to get bogged down in those style of comments in the review.  For example as soon as I started applying your patches I was getting a ton of whitespace warnings, these are better caught before sending them along.
> 
ok
> Also try to develop on Dave's misc-next branch.  I realize this is a moving target, so I'm fine with massaging patches so I can review, but again everything needed massaging.
> 
ok
> And finally for a new feature we're going to need an xfstest or two in order to merge them.  I realize we're still working out the details, but the further you get into this it would be good to go ahead and have a test that validates everything.  Thanks,

Definitely I have to do it. Unfortunately the last time I tried I found complex to setup it. Do you have some link for an "xfstest beginner" ?
> 
> Josef


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC][PATCH V6] btrfs: allocation_hint mode
  2021-02-11 18:47   ` Goffredo Baroncelli
@ 2021-02-11 18:58     ` Josef Bacik
  2021-02-16 22:27     ` Josef Bacik
  1 sibling, 0 replies; 21+ messages in thread
From: Josef Bacik @ 2021-02-11 18:58 UTC (permalink / raw)
  To: kreijack, linux-btrfs; +Cc: Zygo Blaxell

On 2/11/21 1:47 PM, Goffredo Baroncelli wrote:
> On 2/10/21 5:04 PM, Josef Bacik wrote:
>> On 2/1/21 4:28 PM, Goffredo Baroncelli wrote:
>>>
>>> Hi all,
>>>
>>> the previous V5 serie was called "btrfs: preferred_metadata: preferred device
>>> for metadata".
>>>
>>
>> A few general points up front, first I'd highly recommend reading our patch 
>> submission guidelines
>>
>> https://github.com/btrfs/btrfs-workflow/blob/master/patch-submission.md
>>
>> specifically the 'Git config options' section, as it tells you how to apply 
>> our git hooks to your local repo.  This will check your patches for all the 
>> automatic formatting things we'll complain about, that way you don't have to 
>> get bogged down in those style of comments in the review.  For example as soon 
>> as I started applying your patches I was getting a ton of whitespace warnings, 
>> these are better caught before sending them along.
>>
> ok
>> Also try to develop on Dave's misc-next branch.  I realize this is a moving 
>> target, so I'm fine with massaging patches so I can review, but again 
>> everything needed massaging.
>>
> ok
>> And finally for a new feature we're going to need an xfstest or two in order 
>> to merge them.  I realize we're still working out the details, but the further 
>> you get into this it would be good to go ahead and have a test that validates 
>> everything.  Thanks,
> 
> Definitely I have to do it. Unfortunately the last time I tried I found complex 
> to setup it. Do you have some link for an "xfstest beginner" ?

I do not, I'll write one and put it in our developer workflow repo.  I'll let 
you know when it's in place.

Josef


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC][PATCH V6] btrfs: allocation_hint mode
  2021-02-11 18:47   ` Goffredo Baroncelli
  2021-02-11 18:58     ` Josef Bacik
@ 2021-02-16 22:27     ` Josef Bacik
  1 sibling, 0 replies; 21+ messages in thread
From: Josef Bacik @ 2021-02-16 22:27 UTC (permalink / raw)
  To: kreijack, linux-btrfs; +Cc: Zygo Blaxell

On 2/11/21 1:47 PM, Goffredo Baroncelli wrote:
> On 2/10/21 5:04 PM, Josef Bacik wrote:
>> On 2/1/21 4:28 PM, Goffredo Baroncelli wrote:
>>>
>>> Hi all,
>>>
>>> the previous V5 serie was called "btrfs: preferred_metadata: preferred device
>>> for metadata".
>>>
>>
>> A few general points up front, first I'd highly recommend reading our patch 
>> submission guidelines
>>
>> https://github.com/btrfs/btrfs-workflow/blob/master/patch-submission.md
>>
>> specifically the 'Git config options' section, as it tells you how to apply 
>> our git hooks to your local repo.  This will check your patches for all the 
>> automatic formatting things we'll complain about, that way you don't have to 
>> get bogged down in those style of comments in the review.  For example as soon 
>> as I started applying your patches I was getting a ton of whitespace warnings, 
>> these are better caught before sending them along.
>>
> ok
>> Also try to develop on Dave's misc-next branch.  I realize this is a moving 
>> target, so I'm fine with massaging patches so I can review, but again 
>> everything needed massaging.
>>
> ok
>> And finally for a new feature we're going to need an xfstest or two in order 
>> to merge them.  I realize we're still working out the details, but the further 
>> you get into this it would be good to go ahead and have a test that validates 
>> everything.  Thanks,
> 
> Definitely I have to do it. Unfortunately the last time I tried I found complex 
> to setup it. Do you have some link for an "xfstest beginner" ?

Sorry this took me longer to get to than I wanted, you can check it out here

https://github.com/btrfs/btrfs-workflow/blob/master/using-fstests.md

let me know if anything doesn't make sense, at this point it's hard for me to 
realize what doesn't make sense in fstests.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] btrfs: add allocator_hint mode
  2021-02-01 21:28 ` [PATCH 5/5] btrfs: add allocator_hint mode Goffredo Baroncelli
  2021-02-04 23:24   ` Zygo Blaxell
  2021-02-10 16:12   ` Josef Bacik
@ 2021-02-19 18:51   ` Goffredo Baroncelli
  2 siblings, 0 replies; 21+ messages in thread
From: Goffredo Baroncelli @ 2021-02-19 18:51 UTC (permalink / raw)
  To: Goffredo Baroncelli, linux-btrfs; +Cc: Zygo Blaxell, Josef Bacik

On 2/1/21 10:28 PM, Goffredo Baroncelli wrote:
> From: Goffredo Baroncelli <kreijack@inwind.it>
> 
> When this mode is enabled, the chunk allocation policy is modified as follow.
> 
> Each disk may have a different tag:
> - BTRFS_DEV_ALLOCATION_PREFERRED_METADATA
> - BTRFS_DEV_ALLOCATION_METADATA_ONLY
> - BTRFS_DEV_ALLOCATION_DATA_ONLY
> - BTRFS_DEV_ALLOCATION_PREFERRED_DATA (default)
> 
> Where:
> - ALLOCATION_PREFERRED_X means that it is preferred to use this disk for the
> X chunk type (the other type may be allowed when the space is low)
> - ALLOCATION_X_ONLY means that it is used *only* for the X chunk type. This
> means also that it is a preferred choice.
> 
> Each time the allocator allocates a chunk of type X , first it takes the disks
> tagged as ALLOCATION_X_ONLY or ALLOCATION_PREFERRED_X; if the space is not
> enough, it uses also the disks tagged as ALLOCATION_METADATA_ONLY; if the space
> is not enough, it uses also the other disks, with the exception of the one
> marked as ALLOCATION_PREFERRED_Y, where Y the other type of chunk (i.e. not X).
> 
> Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
> ---
>   fs/btrfs/volumes.c | 81 +++++++++++++++++++++++++++++++++++++++++++++-
>   fs/btrfs/volumes.h |  1 +
>   2 files changed, 81 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 68b346c5465d..57ee3e2fdac0 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4806,13 +4806,18 @@ static int btrfs_add_system_chunk(struct btrfs_fs_info *fs_info,
>   }
>   
>   /*
> - * sort the devices in descending order by max_avail, total_avail
> + * sort the devices in descending order by alloc_hint,
> + * max_avail, total_avail
>    */
>   static int btrfs_cmp_device_info(const void *a, const void *b)
>   {
>   	const struct btrfs_device_info *di_a = a;
>   	const struct btrfs_device_info *di_b = b;
>   
> +	if (di_a->alloc_hint > di_b->alloc_hint)
> +		return -1;
> +	if (di_a->alloc_hint < di_b->alloc_hint)
> +		return 1;
>   	if (di_a->max_avail > di_b->max_avail)
>   		return -1;
>   	if (di_a->max_avail < di_b->max_avail)
> @@ -4939,6 +4944,15 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices,
>   	int ndevs = 0;
>   	u64 max_avail;
>   	u64 dev_offset;
> +	int hint;
> +
> +	static const char alloc_hint_map[BTRFS_DEV_ALLOCATION_MASK_COUNT] = {
> +		[BTRFS_DEV_ALLOCATION_DATA_ONLY] = -1,
> +		[BTRFS_DEV_ALLOCATION_PREFERRED_DATA] = 0,
> +		[BTRFS_DEV_ALLOCATION_METADATA_ONLY] = 1,
> +		[BTRFS_DEV_ALLOCATION_PREFERRED_METADATA] = 2

Finally I found the reason of the wrong allocation. The last two values
are swapped: the priority starts from BTRFS_DEV_ALLOCATION_DATA_ONLY
and ends to BTRFS_DEV_ALLOCATION_METADATA_ONLY.

Ok, now I have to restart the tests :-)

> +		/* the other values are set to 0 */
> +	};
>   
>   	/*
>   	 * in the first pass through the devices list, we gather information
> @@ -4991,16 +5005,81 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices,
>   		devices_info[ndevs].max_avail = max_avail;
>   		devices_info[ndevs].total_avail = total_avail;
>   		devices_info[ndevs].dev = device;
> +
> +		if (((ctl->type & BTRFS_BLOCK_GROUP_DATA) &&
> +		     (ctl->type & BTRFS_BLOCK_GROUP_METADATA)) ||
> +		    info->allocation_hint_mode ==
> +		     BTRFS_ALLOCATION_HINT_DISABLED) {
> +			/*
> +			 * if mixed bg or the allocator hint is
> +			 * disable, set all the alloc_hint
> +			 * fields to the same value, so the sorting
> +			 * is not affected
> +			 */
> +			devices_info[ndevs].alloc_hint = 0;
> +		} else if(ctl->type & BTRFS_BLOCK_GROUP_DATA) {
> +			hint = device->type & BTRFS_DEV_ALLOCATION_MASK;
> +
> +			/*
> +			 * skip BTRFS_DEV_METADATA_ONLY disks
> +			 */
> +			if (hint == BTRFS_DEV_ALLOCATION_METADATA_ONLY)
> +				continue;
> +			/*
> +			 * if a data chunk must be allocated,
> +			 * sort also by hint (data disk
> +			 * higher priority)
> +			 */
> +			devices_info[ndevs].alloc_hint = -alloc_hint_map[hint];
> +		} else { /* BTRFS_BLOCK_GROUP_METADATA */
> +			hint = device->type & BTRFS_DEV_ALLOCATION_MASK;
> +
> +			/*
> +			 * skip BTRFS_DEV_DATA_ONLY disks
> +			 */
> +			if (hint == BTRFS_DEV_ALLOCATION_DATA_ONLY)
> +				continue;
> +			/*
> +			 * if a data chunk must be allocated,
> +			 * sort also by hint (metadata hint
> +			 * higher priority)
> +			 */
> +			devices_info[ndevs].alloc_hint = alloc_hint_map[hint];
> +		}
> +
>   		++ndevs;
>   	}
>   	ctl->ndevs = ndevs;
>   
> +	/*
> +	 * no devices available
> +	 */
> +	if (!ndevs)
> +		return 0;
> +
>   	/*
>   	 * now sort the devices by hole size / available space
>   	 */
>   	sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
>   	     btrfs_cmp_device_info, NULL);
>   
> +	/*
> +	 * select the minimum set of disks grouped by hint that
> +	 * can host the chunk
> +	 */
> +	ndevs = 0;
> +	while (ndevs < ctl->ndevs) {
> +		hint = devices_info[ndevs++].alloc_hint;
> +		while (devices_info[ndevs].alloc_hint == hint &&
> +		       ndevs < ctl->ndevs)
> +				ndevs++;
> +		if (ndevs >= ctl->devs_min)
> +			break;
> +	}
> +
> +	BUG_ON(ndevs > ctl->ndevs);
> +	ctl->ndevs = ndevs;
> +
>   	return 0;
>   }
>   
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index d776b7f55d56..31a3e4cf93b5 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -364,6 +364,7 @@ struct btrfs_device_info {
>   	u64 dev_offset;
>   	u64 max_avail;
>   	u64 total_avail;
> +	int alloc_hint;
>   };
>   
>   struct btrfs_raid_attr {
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-02-19 18:52 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-01 21:28 [RFC][PATCH V6] btrfs: allocation_hint mode Goffredo Baroncelli
2021-02-01 21:28 ` [PATCH 1/5] btrfs: add ioctl BTRFS_IOC_DEV_PROPERTIES Goffredo Baroncelli
2021-02-10 16:08   ` Josef Bacik
2021-02-11 18:47     ` Goffredo Baroncelli
2021-02-01 21:28 ` [PATCH 2/5] btrfs: add flags to give an hint to the chunk allocator Goffredo Baroncelli
2021-02-10 16:09   ` Josef Bacik
2021-02-11 18:47     ` Goffredo Baroncelli
2021-02-01 21:28 ` [PATCH 3/5] btrfs: export dev_item.type in /sys/fs/btrfs/<uuid>/devinfo/<devid>/type Goffredo Baroncelli
2021-02-01 21:28 ` [PATCH 4/5] btrfs: add allocation_hint option Goffredo Baroncelli
2021-02-10 16:14   ` Josef Bacik
2021-02-11 18:46     ` Goffredo Baroncelli
2021-02-01 21:28 ` [PATCH 5/5] btrfs: add allocator_hint mode Goffredo Baroncelli
2021-02-04 23:24   ` Zygo Blaxell
2021-02-05 18:01     ` Goffredo Baroncelli
2021-02-10 16:12   ` Josef Bacik
2021-02-11 18:46     ` Goffredo Baroncelli
2021-02-19 18:51   ` Goffredo Baroncelli
2021-02-10 16:04 ` [RFC][PATCH V6] btrfs: allocation_hint mode Josef Bacik
2021-02-11 18:47   ` Goffredo Baroncelli
2021-02-11 18:58     ` Josef Bacik
2021-02-16 22:27     ` Josef Bacik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.