linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC v2 0/8] btrfs: raid-stripe-tree draft patches
@ 2022-06-29 14:41 Johannes Thumshirn
  2022-06-29 14:41 ` [PATCH RFC v2 1/8] btrfs: add raid stripe tree definitions Johannes Thumshirn
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2022-06-29 14:41 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Naohiro Aota, Damien Le Moal, Johannes Thumshirn, Qu Wenruo,
	Christoph Hellwig, Josef Bacik

Here's a second draft of my btrfs zoned RAID1 patches.

Updates of the raid-stripe-tree are done at delayed-ref time to safe on
bandwidth while for reading we do the stripe-tree lookup on bio mapping time,
i.e. when the logical to physical translation happens for regular btrfs RAID
as well.

The stripe tree is keyed by an extent's disk_bytenr and disk_num_bytes and
it's contents are the respective physical device id and position.

For an example 1M write (split into 126K segments due to zone-append)
rapido2:/home/johannes/src/fstests# xfs_io -fdc "pwrite -b 1M 0 1M" -c fsync /mnt/test/test
wrote 1048576/1048576 bytes at offset 0
1 MiB, 1 ops; 0.0065 sec (151.538 MiB/sec and 151.5381 ops/sec)

The tree will look as follows:

rapido2:/home/johannes/src/fstests# btrfs inspect-internal dump-tree -t raid_stripe /dev/nullb0
btrfs-progs v5.16.1 
raid stripe tree key (RAID_STRIPE_TREE ROOT_ITEM 0)
leaf 805847040 items 9 free space 15770 generation 9 owner RAID_STRIPE_TREE
leaf 805847040 flags 0x1(WRITTEN) backref revision 1
checksum stored 1b22e13800000000000000000000000000000000000000000000000000000000
checksum calced 1b22e13800000000000000000000000000000000000000000000000000000000
fs uuid e4f523d1-89a1-41f9-ab75-6ba3c42a28fb
chunk uuid 6f2d8aaa-d348-4bf2-9b5e-141a37ba4c77
        item 0 key (939524096 RAID_STRIPE_KEY 126976) itemoff 16251 itemsize 32
                        stripe 0 devid 1 offset 939524096
                        stripe 1 devid 2 offset 536870912
        item 1 key (939651072 RAID_STRIPE_KEY 126976) itemoff 16219 itemsize 32
                        stripe 0 devid 1 offset 939651072
                        stripe 1 devid 2 offset 536997888
        item 2 key (939778048 RAID_STRIPE_KEY 126976) itemoff 16187 itemsize 32
                        stripe 0 devid 1 offset 939778048
                        stripe 1 devid 2 offset 537124864
        item 3 key (939905024 RAID_STRIPE_KEY 126976) itemoff 16155 itemsize 32
                        stripe 0 devid 1 offset 939905024
                        stripe 1 devid 2 offset 537251840
        item 4 key (940032000 RAID_STRIPE_KEY 126976) itemoff 16123 itemsize 32
                        stripe 0 devid 1 offset 940032000
                        stripe 1 devid 2 offset 537378816
        item 5 key (940158976 RAID_STRIPE_KEY 126976) itemoff 16091 itemsize 32
                        stripe 0 devid 1 offset 940158976
                        stripe 1 devid 2 offset 537505792
        item 6 key (940285952 RAID_STRIPE_KEY 126976) itemoff 16059 itemsize 32
                        stripe 0 devid 1 offset 940285952
                        stripe 1 devid 2 offset 537632768
        item 7 key (940412928 RAID_STRIPE_KEY 126976) itemoff 16027 itemsize 32
                        stripe 0 devid 1 offset 940412928
                        stripe 1 devid 2 offset 537759744
        item 8 key (940539904 RAID_STRIPE_KEY 32768) itemoff 15995 itemsize 32
                        stripe 0 devid 1 offset 940539904
                        stripe 1 devid 2 offset 537886720
total bytes 26843545600
bytes used 1245184
uuid e4f523d1-89a1-41f9-ab75-6ba3c42a28fb

The performance deviation is meassurable but overall not too bad for a first shot:

RAID1:
READ: bw=81.6MiB/s (85.6MB/s), 81.6MiB/s-81.6MiB/s (85.6MB/s-85.6MB/s), io=496MiB (520MB), run=6075-6075msec
WRITE: bw=86.9MiB/s (91.1MB/s), 86.9MiB/s-86.9MiB/s (91.1MB/s-91.1MB/s), io=528MiB (554MB), run=6075-6075msec

Single:
READ: bw=92.5MiB/s (97.0MB/s), 92.5MiB/s-92.5MiB/s (97.0MB/s-97.0MB/s), io=496MiB (520MB), run=5360-5360msec
WRITE: bw=98.5MiB/s (103MB/s), 98.5MiB/s-98.5MiB/s (103MB/s-103MB/s), io=528MiB (554MB), run=5360-5360msec

Changes to v1:
- Write the stripe-tree at delayed-ref time (Qu)
- Add a different write path for preallocation

v1 of the patchset can be found here:
https://lore.kernel.org/linux-btrfs/cover.1652711187.git.johannes.thumshirn@wdc.com/

Johannes Thumshirn (8):
  btrfs: add raid stripe tree definitions
  btrfs: read raid-stripe-tree from disk
  btrfs: add boilerplate code to insert raid extent
  btrfs: add boilerplate code to insert stripe entries for preallocated
    extents
  btrfs: add code to delete raid extent
  btrfs: add code to read raid extent
  btrfs: zoned: allow zoned RAID1
  btrfs: add raid stripe tree pretty printer

 fs/btrfs/Makefile               |   2 +-
 fs/btrfs/block-rsv.c            |   1 +
 fs/btrfs/ctree.h                |  33 ++++
 fs/btrfs/disk-io.c              |  15 ++
 fs/btrfs/extent-tree.c          |  53 ++++++
 fs/btrfs/inode.c                |   6 +
 fs/btrfs/print-tree.c           |  21 +++
 fs/btrfs/raid-stripe-tree.c     | 318 ++++++++++++++++++++++++++++++++
 fs/btrfs/raid-stripe-tree.h     |  72 ++++++++
 fs/btrfs/volumes.c              |  35 +++-
 fs/btrfs/volumes.h              |   4 +
 fs/btrfs/zoned.c                |  39 ++++
 include/uapi/linux/btrfs.h      |   1 +
 include/uapi/linux/btrfs_tree.h |  17 ++
 14 files changed, 614 insertions(+), 3 deletions(-)
 create mode 100644 fs/btrfs/raid-stripe-tree.c
 create mode 100644 fs/btrfs/raid-stripe-tree.h

-- 
2.35.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH RFC v2 1/8] btrfs: add raid stripe tree definitions
  2022-06-29 14:41 [PATCH RFC v2 0/8] btrfs: raid-stripe-tree draft patches Johannes Thumshirn
@ 2022-06-29 14:41 ` Johannes Thumshirn
  2022-06-29 14:41 ` [PATCH RFC v2 2/8] btrfs: read raid-stripe-tree from disk Johannes Thumshirn
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2022-06-29 14:41 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Naohiro Aota, Damien Le Moal, Johannes Thumshirn, Qu Wenruo,
	Christoph Hellwig, Josef Bacik

Add definitions for the raid-stripe-tree. This tree will hold informatioin
about the on-disk layout of the stripes in a RAID set.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/ctree.h                | 29 +++++++++++++++++++++++++++++
 include/uapi/linux/btrfs_tree.h | 17 +++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4e2569f84aab..18e2f186cb5e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1906,6 +1906,35 @@ BTRFS_SETGET_FUNCS(timespec_nsec, struct btrfs_timespec, nsec, 32);
 BTRFS_SETGET_STACK_FUNCS(stack_timespec_sec, struct btrfs_timespec, sec, 64);
 BTRFS_SETGET_STACK_FUNCS(stack_timespec_nsec, struct btrfs_timespec, nsec, 32);
 
+BTRFS_SETGET_FUNCS(stripe_extent_devid, struct btrfs_stripe_extent, devid, 64);
+BTRFS_SETGET_FUNCS(stripe_extent_physical, struct btrfs_stripe_extent, physical, 64);
+BTRFS_SETGET_STACK_FUNCS(stack_stripe_extent_devid, struct btrfs_stripe_extent, devid, 64);
+BTRFS_SETGET_STACK_FUNCS(stack_stripe_extent_physical, struct btrfs_stripe_extent, physical, 64);
+
+static inline struct btrfs_stripe_extent *btrfs_stripe_extent_nr(
+					 struct btrfs_dp_stripe *dps, int nr)
+{
+	unsigned long offset = (unsigned long)dps;
+
+	offset += offsetof(struct btrfs_dp_stripe, extents);
+	offset += nr * sizeof(struct btrfs_stripe_extent);
+	return (struct btrfs_stripe_extent *)offset;
+}
+
+static inline u64 btrfs_stripe_extent_devid_nr(const struct extent_buffer *eb,
+					       struct btrfs_dp_stripe *dps,
+					       int nr)
+{
+	return btrfs_stripe_extent_devid(eb, btrfs_stripe_extent_nr(dps, nr));
+}
+
+static inline u64 btrfs_stripe_extent_physical_nr(const struct extent_buffer *eb,
+						  struct btrfs_dp_stripe *dps,
+						  int nr)
+{
+	return btrfs_stripe_extent_physical(eb, btrfs_stripe_extent_nr(dps, nr));
+}
+
 /* struct btrfs_dev_extent */
 BTRFS_SETGET_FUNCS(dev_extent_chunk_tree, struct btrfs_dev_extent,
 		   chunk_tree, 64);
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index d4117152d907..070fc9266821 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -56,6 +56,9 @@
 /* Holds the block group items for extent tree v2. */
 #define BTRFS_BLOCK_GROUP_TREE_OBJECTID 11ULL
 
+/* tracks RAID stripes in block groups. */
+#define BTRFS_RAID_STRIPE_TREE_OBJECTID 12ULL
+
 /* device stats in the device tree */
 #define BTRFS_DEV_STATS_OBJECTID 0ULL
 
@@ -264,6 +267,8 @@
  */
 #define BTRFS_QGROUP_RELATION_KEY       246
 
+#define BTRFS_RAID_STRIPE_KEY		247
+
 /*
  * Obsolete name, see BTRFS_TEMPORARY_ITEM_KEY.
  */
@@ -488,6 +493,18 @@ struct btrfs_free_space_header {
 	__le64 num_bitmaps;
 } __attribute__ ((__packed__));
 
+struct btrfs_stripe_extent {
+	/* btrfs device-id this raid extent lives on */
+	__le64 devid;
+	/* physical location on disk */
+	__le64 physical;
+} __attribute__ ((__packed__));
+
+struct btrfs_dp_stripe {
+	/* array of stripe extents this stripe is composed of */
+	DECLARE_FLEX_ARRAY(struct btrfs_stripe_extent, extents);
+} __attribute__ ((__packed__));
+
 #define BTRFS_HEADER_FLAG_WRITTEN	(1ULL << 0)
 #define BTRFS_HEADER_FLAG_RELOC		(1ULL << 1)
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC v2 2/8] btrfs: read raid-stripe-tree from disk
  2022-06-29 14:41 [PATCH RFC v2 0/8] btrfs: raid-stripe-tree draft patches Johannes Thumshirn
  2022-06-29 14:41 ` [PATCH RFC v2 1/8] btrfs: add raid stripe tree definitions Johannes Thumshirn
@ 2022-06-29 14:41 ` Johannes Thumshirn
  2022-06-29 14:41 ` [PATCH RFC v2 3/8] btrfs: add boilerplate code to insert raid extent Johannes Thumshirn
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2022-06-29 14:41 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Naohiro Aota, Damien Le Moal, Johannes Thumshirn, Qu Wenruo,
	Christoph Hellwig, Josef Bacik

If we find a raid-stripe-tree on mount, read it from disk.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/block-rsv.c       |  1 +
 fs/btrfs/ctree.h           |  1 +
 fs/btrfs/disk-io.c         | 12 ++++++++++++
 include/uapi/linux/btrfs.h |  1 +
 4 files changed, 15 insertions(+)

diff --git a/fs/btrfs/block-rsv.c b/fs/btrfs/block-rsv.c
index b3ee49b0b1e8..62c20c9d8c25 100644
--- a/fs/btrfs/block-rsv.c
+++ b/fs/btrfs/block-rsv.c
@@ -427,6 +427,7 @@ void btrfs_init_root_block_rsv(struct btrfs_root *root)
 	case BTRFS_CSUM_TREE_OBJECTID:
 	case BTRFS_EXTENT_TREE_OBJECTID:
 	case BTRFS_FREE_SPACE_TREE_OBJECTID:
+	case BTRFS_RAID_STRIPE_TREE_OBJECTID:
 		root->block_rsv = &fs_info->delayed_refs_rsv;
 		break;
 	case BTRFS_ROOT_TREE_OBJECTID:
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 18e2f186cb5e..376b9b112429 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -690,6 +690,7 @@ struct btrfs_fs_info {
 	struct btrfs_root *uuid_root;
 	struct btrfs_root *data_reloc_root;
 	struct btrfs_root *block_group_root;
+	struct btrfs_root *stripe_root;
 
 	/* the log root tree is a directory of all the other log roots */
 	struct btrfs_root *log_root_tree;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 70b388de4d66..45d1ea23a230 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1607,6 +1607,9 @@ static struct btrfs_root *btrfs_get_global_root(struct btrfs_fs_info *fs_info,
 
 		return btrfs_grab_root(root) ? root : ERR_PTR(-ENOENT);
 	}
+	if (objectid == BTRFS_RAID_STRIPE_TREE_OBJECTID)
+		return btrfs_grab_root(fs_info->stripe_root) ?
+			fs_info->stripe_root : ERR_PTR(-ENOENT);
 	return NULL;
 }
 
@@ -1679,6 +1682,7 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info)
 	btrfs_put_root(fs_info->fs_root);
 	btrfs_put_root(fs_info->data_reloc_root);
 	btrfs_put_root(fs_info->block_group_root);
+	btrfs_put_root(fs_info->stripe_root);
 	btrfs_check_leaked_roots(fs_info);
 	btrfs_extent_buffer_leak_debug_check(fs_info);
 	kfree(fs_info->super_copy);
@@ -2220,6 +2224,7 @@ static void free_root_pointers(struct btrfs_fs_info *info, bool free_chunk_root)
 	free_root_extent_buffers(info->fs_root);
 	free_root_extent_buffers(info->data_reloc_root);
 	free_root_extent_buffers(info->block_group_root);
+	free_root_extent_buffers(info->stripe_root);
 	if (free_chunk_root)
 		free_root_extent_buffers(info->chunk_root);
 }
@@ -2646,6 +2651,13 @@ static int btrfs_read_roots(struct btrfs_fs_info *fs_info)
 		fs_info->uuid_root = root;
 	}
 
+	location.objectid = BTRFS_RAID_STRIPE_TREE_OBJECTID;
+	root = btrfs_read_tree_root(tree_root, &location);
+	if (!IS_ERR(root)) {
+		set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state);
+		fs_info->stripe_root = root;
+	}
+
 	return 0;
 out:
 	btrfs_warn(fs_info, "failed to read root (objectid=%llu): %d",
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index f54dc91e4025..5ca789af5beb 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -310,6 +310,7 @@ struct btrfs_ioctl_fs_info_args {
 #define BTRFS_FEATURE_INCOMPAT_RAID1C34		(1ULL << 11)
 #define BTRFS_FEATURE_INCOMPAT_ZONED		(1ULL << 12)
 #define BTRFS_FEATURE_INCOMPAT_EXTENT_TREE_V2	(1ULL << 13)
+#define BTRFS_FEATURE_INCOMPAT_STRIPE_TREE	(1ULL << 14)
 
 struct btrfs_ioctl_feature_flags {
 	__u64 compat_flags;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC v2 3/8] btrfs: add boilerplate code to insert raid extent
  2022-06-29 14:41 [PATCH RFC v2 0/8] btrfs: raid-stripe-tree draft patches Johannes Thumshirn
  2022-06-29 14:41 ` [PATCH RFC v2 1/8] btrfs: add raid stripe tree definitions Johannes Thumshirn
  2022-06-29 14:41 ` [PATCH RFC v2 2/8] btrfs: read raid-stripe-tree from disk Johannes Thumshirn
@ 2022-06-29 14:41 ` Johannes Thumshirn
  2022-06-29 14:41 ` [PATCH RFC v2 4/8] btrfs: add boilerplate code to insert stripe entries for preallocated extents Johannes Thumshirn
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2022-06-29 14:41 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Naohiro Aota, Damien Le Moal, Johannes Thumshirn, Qu Wenruo,
	Christoph Hellwig, Josef Bacik

Add boilerplate code to insert raid extents into the raid-stripe-tree on
each write to a RAID1 block-group.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/Makefile           |   2 +-
 fs/btrfs/ctree.h            |   3 +
 fs/btrfs/disk-io.c          |   3 +
 fs/btrfs/extent-tree.c      |  45 +++++++++
 fs/btrfs/raid-stripe-tree.c | 188 ++++++++++++++++++++++++++++++++++++
 fs/btrfs/raid-stripe-tree.h |  65 +++++++++++++
 fs/btrfs/volumes.c          |  13 +++
 fs/btrfs/volumes.h          |   4 +
 fs/btrfs/zoned.c            |   4 +
 9 files changed, 326 insertions(+), 1 deletion(-)
 create mode 100644 fs/btrfs/raid-stripe-tree.c
 create mode 100644 fs/btrfs/raid-stripe-tree.h

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 99f9995670ea..4484831ac624 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -31,7 +31,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
 	   backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
 	   uuid-tree.o props.o free-space-tree.o tree-checker.o space-info.o \
 	   block-rsv.o delalloc-space.o block-group.o discard.o reflink.o \
-	   subpage.o tree-mod-log.o
+	   subpage.o tree-mod-log.o raid-stripe-tree.o
 
 btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
 btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 376b9b112429..2eb79afb4d83 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1092,6 +1092,9 @@ struct btrfs_fs_info {
 	/* Updates are not protected by any lock */
 	struct btrfs_commit_stats commit_stats;
 
+	struct mutex stripe_update_lock;
+	struct rb_root stripe_update_tree;
+
 #ifdef CONFIG_BTRFS_FS_REF_VERIFY
 	spinlock_t ref_verify_lock;
 	struct rb_root block_tree;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 45d1ea23a230..3d7c1b8d1cd5 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3155,6 +3155,9 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
 
 	fs_info->bg_reclaim_threshold = BTRFS_DEFAULT_RECLAIM_THRESH;
 	INIT_WORK(&fs_info->reclaim_bgs_work, btrfs_reclaim_bgs_work);
+
+	mutex_init(&fs_info->stripe_update_lock);
+	fs_info->stripe_update_tree = RB_ROOT;
 }
 
 static int init_mount_fs_info(struct btrfs_fs_info *fs_info, struct super_block *sb)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f97a0f28f464..e1738b3dfb21 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -36,6 +36,7 @@
 #include "rcu-string.h"
 #include "zoned.h"
 #include "dev-replace.h"
+#include "raid-stripe-tree.h"
 
 #undef SCRAMBLE_DELAYED_REFS
 
@@ -1491,6 +1492,47 @@ static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
+static int add_stripe_entry_for_delayed_ref(struct btrfs_trans_handle *trans,
+					    struct btrfs_delayed_ref_node *node)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct extent_map *em;
+	struct map_lookup *map;
+	int ret;
+
+	if (!fs_info->stripe_root)
+		return 0;
+
+	em = btrfs_get_chunk_map(fs_info, node->bytenr, node->num_bytes);
+	if (!em) {
+		btrfs_err(fs_info,
+			  "cannot get chunk map for address %llu",
+			  node->bytenr);
+		return -EINVAL;
+	}
+
+	map = em->map_lookup;
+
+	if (btrfs_need_stripe_tree_update(fs_info, map->type)) {
+		struct btrfs_ordered_stripe *stripe;
+
+		stripe = btrfs_lookup_ordered_stripe(fs_info, node->bytenr);
+		if (!stripe) {
+			btrfs_err(fs_info,
+				  "cannot get stripe extent for address %llu (%llu)",
+				  node->bytenr, node->num_bytes);
+			free_extent_map(em);
+			return -EINVAL;
+		}
+		ASSERT(stripe->logical == node->bytenr);
+		ret = btrfs_insert_raid_extent(trans, stripe);
+		btrfs_put_ordered_stripe(fs_info, stripe);
+	}
+	free_extent_map(em);
+
+	return ret;
+}
+
 static int run_delayed_data_ref(struct btrfs_trans_handle *trans,
 				struct btrfs_delayed_ref_node *node,
 				struct btrfs_delayed_extent_op *extent_op,
@@ -1521,6 +1563,9 @@ static int run_delayed_data_ref(struct btrfs_trans_handle *trans,
 						 flags, ref->objectid,
 						 ref->offset, &ins,
 						 node->ref_mod);
+		if (ret)
+			return ret;
+		ret = add_stripe_entry_for_delayed_ref(trans, node);
 	} else if (node->action == BTRFS_ADD_DELAYED_REF) {
 		ret = __btrfs_inc_extent_ref(trans, node, parent, ref_root,
 					     ref->objectid, ref->offset,
diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
new file mode 100644
index 000000000000..360046a104c7
--- /dev/null
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -0,0 +1,188 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/btrfs_tree.h>
+
+#include "ctree.h"
+#include "transaction.h"
+#include "disk-io.h"
+#include "raid-stripe-tree.h"
+#include "volumes.h"
+
+static struct rb_node *stripe_tree_insert(struct rb_root *root, u64 logical,
+					  struct rb_node *node)
+{
+	struct rb_node **p = &root->rb_node;
+	struct btrfs_ordered_stripe *stripe;
+	struct rb_node *parent = NULL;
+
+	while (*p) {
+		parent = *p;
+		stripe = rb_entry(*p, struct btrfs_ordered_stripe, rb_node);
+
+		if (logical < stripe->logical)
+			p = &(*p)->rb_left;
+		else if (logical >= stripe->logical + stripe->num_bytes)
+			p = &(*p)->rb_right;
+		else
+			return parent;
+	}
+
+	rb_link_node(node, parent, p);
+	rb_insert_color(node, root);
+	return NULL;
+}
+
+static struct btrfs_ordered_stripe *btrfs_add_ordered_stripe(
+					      struct btrfs_fs_info *fs_info,
+					      u64 logical,
+					      u64 length, int num_stripes,
+					      struct btrfs_io_stripe *stripes)
+{
+	struct btrfs_ordered_stripe *stripe;
+	struct btrfs_io_stripe *tmp;
+	struct rb_node *node;
+	size_t size;
+
+	size = num_stripes * sizeof(struct btrfs_io_stripe);
+	stripe = kzalloc(sizeof(struct btrfs_ordered_stripe), GFP_NOFS);
+	if (!stripe)
+		return ERR_PTR(-ENOMEM);
+
+	spin_lock_init(&stripe->lock);
+	tmp = kmemdup(stripes, size, GFP_NOFS);
+	if (!tmp) {
+		kfree(stripe);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	stripe->logical = logical;
+	stripe->num_bytes = length;
+	stripe->num_stripes = num_stripes;
+	spin_lock(&stripe->lock);
+	stripe->stripes = tmp;
+	spin_unlock(&stripe->lock);
+	refcount_set(&stripe->ref, 1);
+
+	mutex_lock(&fs_info->stripe_update_lock);
+	node = stripe_tree_insert(&fs_info->stripe_update_tree, logical,
+				  &stripe->rb_node);
+	mutex_unlock(&fs_info->stripe_update_lock);
+
+	if (node) {
+		btrfs_panic(fs_info, -EEXIST,
+			  "inconsistency in ordered stripes at offset %llu",
+			  logical);
+		kfree(stripe->stripes);
+		kfree(stripe);
+		return ERR_PTR(-EEXIST);
+	}
+
+	return stripe;
+}
+
+struct btrfs_ordered_stripe *btrfs_lookup_ordered_stripe(struct btrfs_fs_info *fs_info,
+							 u64 logical)
+{
+	struct rb_root *root = &fs_info->stripe_update_tree;
+	struct btrfs_ordered_stripe *stripe;
+	struct rb_node *n = root->rb_node;
+
+	mutex_lock(&fs_info->stripe_update_lock);
+	while (n) {
+		stripe = rb_entry(n, struct btrfs_ordered_stripe, rb_node);
+
+		if (logical < stripe->logical) {
+			n = n->rb_left;
+			stripe = NULL;
+		} else if (logical >= stripe->logical + stripe->num_bytes) {
+			n = n->rb_right;
+			stripe = NULL;
+		} else {
+			break;
+		}
+	}
+	if (stripe)
+		btrfs_get_ordered_stripe(stripe);
+	mutex_unlock(&fs_info->stripe_update_lock);
+
+	return stripe;
+}
+
+void btrfs_remove_ordered_stripe(struct btrfs_fs_info *fs_info,
+				 struct btrfs_ordered_stripe *stripe)
+{
+	struct rb_node *node = &stripe->rb_node;
+
+	mutex_lock(&fs_info->stripe_update_lock);
+	rb_erase(node, &fs_info->stripe_update_tree);
+	RB_CLEAR_NODE(node);
+	mutex_unlock(&fs_info->stripe_update_lock);
+
+	spin_lock(&stripe->lock);
+	kfree(stripe->stripes);
+	spin_unlock(&stripe->lock);
+	kfree(stripe);
+}
+
+int btrfs_insert_raid_extent(struct btrfs_trans_handle *trans,
+			     struct btrfs_ordered_stripe *stripe)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_key stripe_key;
+	struct btrfs_root *stripe_root = fs_info->stripe_root;
+	struct btrfs_dp_stripe *raid_stripe;
+	size_t item_size;
+	int ret;
+
+	item_size = stripe->num_stripes * sizeof(struct btrfs_stripe_extent);
+
+	raid_stripe = kzalloc(item_size, GFP_NOFS);
+	if (!raid_stripe) {
+		btrfs_abort_transaction(trans, -ENOMEM);
+		btrfs_end_transaction(trans);
+		return -ENOMEM;
+	}
+
+	spin_lock(&stripe->lock);
+	for (int i = 0; i < stripe->num_stripes; i++) {
+		u64 devid = stripe->stripes[i].dev->devid;
+		u64 physical = stripe->stripes[i].physical;
+		struct btrfs_stripe_extent *stripe_extent =
+						&raid_stripe->extents[i];
+
+		btrfs_set_stack_stripe_extent_devid(stripe_extent, devid);
+		btrfs_set_stack_stripe_extent_physical(stripe_extent, physical);
+	}
+	spin_unlock(&stripe->lock);
+
+	stripe_key.objectid = stripe->logical;
+	stripe_key.type = BTRFS_RAID_STRIPE_KEY;
+	stripe_key.offset = stripe->num_bytes;
+
+	ret = btrfs_insert_item(trans, stripe_root, &stripe_key, raid_stripe,
+				item_size);
+	if (ret)
+		btrfs_abort_transaction(trans, ret);
+
+	kfree(raid_stripe);
+
+	return ret;
+}
+
+void btrfs_raid_stripe_update(struct work_struct *work)
+{
+	struct btrfs_io_context *bioc =
+		container_of(work, struct btrfs_io_context, stripe_update_work);
+	struct btrfs_ordered_stripe *stripe;
+	struct bio *bio = bioc->orig_bio;
+	struct btrfs_fs_info *fs_info = bioc->fs_info;
+
+	stripe = btrfs_add_ordered_stripe(fs_info, bioc->logical, bioc->length,
+					  bioc->num_stripes, bioc->stripes);
+	if (IS_ERR(stripe)) {
+		btrfs_bio_counter_dec(fs_info);
+		bio->bi_status = errno_to_blk_status(PTR_ERR(stripe));
+	}
+	btrfs_put_bioc(bioc);
+}
+
diff --git a/fs/btrfs/raid-stripe-tree.h b/fs/btrfs/raid-stripe-tree.h
new file mode 100644
index 000000000000..b9c40ef26dfa
--- /dev/null
+++ b/fs/btrfs/raid-stripe-tree.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef BTRFS_RAID_STRIPE_TREE_H
+#define BTRFS_RAID_STRIPE_TREE_H
+
+#include "volumes.h"
+
+struct btrfs_ordered_stripe {
+	struct rb_node rb_node;
+
+	u64 logical;
+	u64 num_bytes;
+	int num_stripes;
+	struct btrfs_io_stripe *stripes;
+	spinlock_t lock;
+	refcount_t ref;
+};
+
+int btrfs_insert_raid_extent(struct btrfs_trans_handle *trans,
+			     struct btrfs_ordered_stripe *stripe);
+void btrfs_raid_stripe_update(struct work_struct *work);
+struct btrfs_ordered_stripe *btrfs_lookup_ordered_stripe(
+						 struct btrfs_fs_info *fs_info,
+						 u64 logical);
+void btrfs_remove_ordered_stripe(struct btrfs_fs_info *fs_info,
+				 struct btrfs_ordered_stripe *stripe);
+
+static inline void btrfs_get_ordered_stripe(struct btrfs_ordered_stripe *stripe)
+{
+	refcount_inc(&stripe->ref);
+}
+
+static inline void btrfs_put_ordered_stripe(struct btrfs_fs_info *fs_info,
+					    struct btrfs_ordered_stripe *stripe)
+{
+	if (refcount_dec_and_test(&stripe->ref))
+		btrfs_remove_ordered_stripe(fs_info, stripe);
+}
+
+static inline int btrfs_num_raid_stripes(u32 item_size)
+{
+	return item_size - offsetof(struct btrfs_dp_stripe, extents) /
+		sizeof(struct btrfs_stripe_extent);
+}
+
+static inline bool btrfs_need_stripe_tree_update(struct btrfs_fs_info *fs_info,
+						 u64 map_type)
+{
+	u64 type = map_type & BTRFS_BLOCK_GROUP_TYPE_MASK;
+	u64 profile = map_type & BTRFS_BLOCK_GROUP_PROFILE_MASK;
+
+	if (!fs_info->stripe_root)
+		return false;
+
+	// for now
+	if (type != BTRFS_BLOCK_GROUP_DATA)
+		return false;
+
+	if (profile & BTRFS_BLOCK_GROUP_RAID1_MASK)
+		return true;
+
+	return false;
+}
+
+#endif
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 2d788a351c1f..b8d4e92c7196 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -33,6 +33,7 @@
 #include "block-group.h"
 #include "discard.h"
 #include "zoned.h"
+#include "raid-stripe-tree.h"
 
 #define BTRFS_BLOCK_GROUP_STRIPE_MASK	(BTRFS_BLOCK_GROUP_RAID0 | \
 					 BTRFS_BLOCK_GROUP_RAID10 | \
@@ -5897,6 +5898,7 @@ static struct btrfs_io_context *alloc_btrfs_io_context(struct btrfs_fs_info *fs_
 	bioc->fs_info = fs_info;
 	bioc->tgtdev_map = (int *)(bioc->stripes + total_stripes);
 	bioc->raid_map = (u64 *)(bioc->tgtdev_map + real_stripes);
+	INIT_WORK(&bioc->stripe_update_work, btrfs_raid_stripe_update);
 
 	return bioc;
 }
@@ -6623,6 +6625,7 @@ static void btrfs_end_bio_work(struct work_struct *work)
 static void btrfs_end_bioc(struct btrfs_io_context *bioc, bool async)
 {
 	struct bio *orig_bio = bioc->orig_bio;
+	struct btrfs_fs_info *fs_info = bioc->fs_info;
 	struct btrfs_bio *bbio = btrfs_bio(orig_bio);
 
 	bbio->mirror_num = bioc->mirror_num;
@@ -6642,6 +6645,12 @@ static void btrfs_end_bioc(struct btrfs_io_context *bioc, bool async)
 		INIT_WORK(&bbio->end_io_work, btrfs_end_bio_work);
 		queue_work(btrfs_end_io_wq(bioc), &bbio->end_io_work);
 	} else {
+		if (btrfs_op(orig_bio) == BTRFS_MAP_WRITE &&
+		    btrfs_need_stripe_tree_update(fs_info,
+						  bioc->map_type)) {
+			btrfs_get_bioc(bioc);
+			schedule_work(&bioc->stripe_update_work);
+		}
 		bio_endio(orig_bio);
 	}
 
@@ -6667,6 +6676,8 @@ static void btrfs_end_bio(struct bio *bio)
 				btrfs_dev_stat_inc_and_print(stripe->dev,
 						BTRFS_DEV_STAT_FLUSH_ERRS);
 		}
+	} else if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
+		stripe->physical = bio->bi_iter.bi_sector << SECTOR_SHIFT;
 	}
 
 	if (bio != bioc->orig_bio)
@@ -6754,6 +6765,8 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 	bioc->orig_bio = bio;
 	bioc->private = bio->bi_private;
 	bioc->end_io = bio->bi_end_io;
+	bioc->logical = logical;
+	bioc->length = length;
 	atomic_set(&bioc->stripes_pending, total_devs);
 
 	if ((bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) &&
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 9537d82bb7a2..f22ea9c23faa 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -14,6 +14,7 @@
 #define BTRFS_MAX_DATA_CHUNK_SIZE	(10ULL * SZ_1G)
 
 extern struct mutex uuid_mutex;
+struct btrfs_ordered_stripe;
 
 #define BTRFS_STRIPE_LEN	SZ_64K
 
@@ -463,6 +464,9 @@ struct btrfs_io_context {
 	int mirror_num;
 	int num_tgtdevs;
 	int *tgtdev_map;
+	u64 logical;
+	u64 length;
+	struct work_struct stripe_update_work;
 	/*
 	 * logical block numbers for the start of each stripe
 	 * The last one or two are p/q.  These are sorted,
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 7a0f8fa44800..5cf6abeda588 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1641,6 +1641,10 @@ void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered)
 	u64 *logical = NULL;
 	int nr, stripe_len;
 
+	/* Filesystems with a stripe tree have their own l2p mapping */
+	if (fs_info->stripe_root)
+		return;
+
 	/* Zoned devices should not have partitions. So, we can assume it is 0 */
 	ASSERT(!bdev_is_partition(ordered->bdev));
 	if (WARN_ON(!ordered->bdev))
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC v2 4/8] btrfs: add boilerplate code to insert stripe entries for preallocated extents
  2022-06-29 14:41 [PATCH RFC v2 0/8] btrfs: raid-stripe-tree draft patches Johannes Thumshirn
                   ` (2 preceding siblings ...)
  2022-06-29 14:41 ` [PATCH RFC v2 3/8] btrfs: add boilerplate code to insert raid extent Johannes Thumshirn
@ 2022-06-29 14:41 ` Johannes Thumshirn
  2022-06-29 14:41 ` [PATCH RFC v2 5/8] btrfs: add code to delete raid extent Johannes Thumshirn
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2022-06-29 14:41 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Naohiro Aota, Damien Le Moal, Johannes Thumshirn, Qu Wenruo,
	Christoph Hellwig, Josef Bacik

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/inode.c            |  6 ++++++
 fs/btrfs/raid-stripe-tree.c | 34 ++++++++++++++++++++++++++++++++++
 fs/btrfs/raid-stripe-tree.h |  2 ++
 3 files changed, 42 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 9890782fe932..97e218a45165 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -55,6 +55,7 @@
 #include "zoned.h"
 #include "subpage.h"
 #include "inode-item.h"
+#include "raid-stripe-tree.h"
 
 struct btrfs_iget_args {
 	u64 ino;
@@ -9901,6 +9902,11 @@ static struct btrfs_trans_handle *insert_prealloc_file_extent(
 	if (qgroup_released < 0)
 		return ERR_PTR(qgroup_released);
 
+	ret = btrfs_insert_preallocated_raid_stripe(inode->root->fs_info,
+						    start, len);
+	if (ret)
+		goto free_qgroup;
+
 	if (trans) {
 		ret = insert_reserved_file_extent(trans, inode,
 						  file_offset, &stack_fi,
diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index 360046a104c7..85d08f052a64 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -124,6 +124,40 @@ void btrfs_remove_ordered_stripe(struct btrfs_fs_info *fs_info,
 	kfree(stripe);
 }
 
+int btrfs_insert_preallocated_raid_stripe(struct btrfs_fs_info *fs_info,
+					  u64 start, u64 len)
+{
+	struct btrfs_io_context *bioc = NULL;
+	struct btrfs_ordered_stripe *stripe;
+	u64 map_length = len;
+	int ret;
+
+	if (!fs_info->stripe_root)
+		return 0;
+
+	ret = btrfs_map_block(fs_info, BTRFS_MAP_WRITE, start, &map_length,
+			      &bioc, 0);
+	if (ret)
+		return ret;
+
+	stripe = btrfs_lookup_ordered_stripe(fs_info, start);
+	if (!stripe) {
+		stripe = btrfs_add_ordered_stripe(fs_info, start, len,
+						  bioc->num_stripes,
+						  bioc->stripes);
+		if (IS_ERR(stripe))
+			return PTR_ERR(stripe);
+	} else {
+		spin_lock(&stripe->lock);
+		memcpy(stripe->stripes, bioc->stripes,
+		       bioc->num_stripes * sizeof(struct btrfs_io_stripe));
+		spin_unlock(&stripe->lock);
+		btrfs_put_ordered_stripe(fs_info, stripe);
+	}
+
+	return 0;
+}
+
 int btrfs_insert_raid_extent(struct btrfs_trans_handle *trans,
 			     struct btrfs_ordered_stripe *stripe)
 {
diff --git a/fs/btrfs/raid-stripe-tree.h b/fs/btrfs/raid-stripe-tree.h
index b9c40ef26dfa..1644515fcecb 100644
--- a/fs/btrfs/raid-stripe-tree.h
+++ b/fs/btrfs/raid-stripe-tree.h
@@ -18,6 +18,8 @@ struct btrfs_ordered_stripe {
 
 int btrfs_insert_raid_extent(struct btrfs_trans_handle *trans,
 			     struct btrfs_ordered_stripe *stripe);
+int btrfs_insert_preallocated_raid_stripe(struct btrfs_fs_info *fs_info,
+					  u64 start, u64 len);
 void btrfs_raid_stripe_update(struct work_struct *work);
 struct btrfs_ordered_stripe *btrfs_lookup_ordered_stripe(
 						 struct btrfs_fs_info *fs_info,
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC v2 5/8] btrfs: add code to delete raid extent
  2022-06-29 14:41 [PATCH RFC v2 0/8] btrfs: raid-stripe-tree draft patches Johannes Thumshirn
                   ` (3 preceding siblings ...)
  2022-06-29 14:41 ` [PATCH RFC v2 4/8] btrfs: add boilerplate code to insert stripe entries for preallocated extents Johannes Thumshirn
@ 2022-06-29 14:41 ` Johannes Thumshirn
  2022-06-29 14:41 ` [PATCH RFC v2 6/8] btrfs: add code to read " Johannes Thumshirn
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2022-06-29 14:41 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Naohiro Aota, Damien Le Moal, Johannes Thumshirn, Qu Wenruo,
	Christoph Hellwig, Josef Bacik

Add boilerplate code to delete entries from the raid-stripe-tree if the
corresponding file extent got deleted.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/extent-tree.c      |  8 ++++++++
 fs/btrfs/raid-stripe-tree.c | 31 +++++++++++++++++++++++++++++++
 fs/btrfs/raid-stripe-tree.h |  2 ++
 3 files changed, 41 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index e1738b3dfb21..f62036790c2f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3225,6 +3225,14 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 			}
 		}
 
+		if (is_data) {
+			ret = btrfs_delete_raid_extent(trans, bytenr, num_bytes);
+			if (ret) {
+				btrfs_abort_transaction(trans, ret);
+				return ret;
+			}
+		}
+
 		ret = btrfs_del_items(trans, extent_root, path, path->slots[0],
 				      num_to_del);
 		if (ret) {
diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index 85d08f052a64..a673aaf8e703 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -124,6 +124,37 @@ void btrfs_remove_ordered_stripe(struct btrfs_fs_info *fs_info,
 	kfree(stripe);
 }
 
+int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start,
+			     u64 length)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_root *stripe_root = fs_info->stripe_root;
+	struct btrfs_path *path;
+	struct btrfs_key stripe_key;
+	int ret;
+
+	if (!stripe_root)
+		return 0;
+
+	stripe_key.objectid = start;
+	stripe_key.type = BTRFS_RAID_STRIPE_KEY;
+	stripe_key.offset = length;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	ret = btrfs_search_slot(trans, stripe_root, &stripe_key, path, -1, 1);
+	if (ret)
+		goto out;
+
+	ret = btrfs_del_item(trans, stripe_root, path);
+out:
+	btrfs_free_path(path);
+	return ret;
+
+}
+
 int btrfs_insert_preallocated_raid_stripe(struct btrfs_fs_info *fs_info,
 					  u64 start, u64 len)
 {
diff --git a/fs/btrfs/raid-stripe-tree.h b/fs/btrfs/raid-stripe-tree.h
index 1644515fcecb..d3cc24e37de1 100644
--- a/fs/btrfs/raid-stripe-tree.h
+++ b/fs/btrfs/raid-stripe-tree.h
@@ -16,6 +16,8 @@ struct btrfs_ordered_stripe {
 	refcount_t ref;
 };
 
+int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start,
+			     u64 length);
 int btrfs_insert_raid_extent(struct btrfs_trans_handle *trans,
 			     struct btrfs_ordered_stripe *stripe);
 int btrfs_insert_preallocated_raid_stripe(struct btrfs_fs_info *fs_info,
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC v2 6/8] btrfs: add code to read raid extent
  2022-06-29 14:41 [PATCH RFC v2 0/8] btrfs: raid-stripe-tree draft patches Johannes Thumshirn
                   ` (4 preceding siblings ...)
  2022-06-29 14:41 ` [PATCH RFC v2 5/8] btrfs: add code to delete raid extent Johannes Thumshirn
@ 2022-06-29 14:41 ` Johannes Thumshirn
  2022-06-29 14:41 ` [PATCH RFC v2 7/8] btrfs: zoned: allow zoned RAID1 Johannes Thumshirn
  2022-06-29 14:41 ` [PATCH RFC v2 8/8] btrfs: add raid stripe tree pretty printer Johannes Thumshirn
  7 siblings, 0 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2022-06-29 14:41 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Naohiro Aota, Damien Le Moal, Johannes Thumshirn, Qu Wenruo,
	Christoph Hellwig, Josef Bacik

Add boilerplate code to lookup the physical address from the
raid-stripe-tree when a read on an RAID volume formatted with the
raid-stripe-tree was attempted.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/raid-stripe-tree.c | 65 +++++++++++++++++++++++++++++++++++++
 fs/btrfs/raid-stripe-tree.h |  3 ++
 fs/btrfs/volumes.c          | 22 +++++++++++--
 3 files changed, 88 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index a673aaf8e703..5ee630a792fc 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -7,6 +7,7 @@
 #include "disk-io.h"
 #include "raid-stripe-tree.h"
 #include "volumes.h"
+#include "misc.h"
 
 static struct rb_node *stripe_tree_insert(struct rb_root *root, u64 logical,
 					  struct rb_node *node)
@@ -251,3 +252,67 @@ void btrfs_raid_stripe_update(struct work_struct *work)
 	btrfs_put_bioc(bioc);
 }
 
+int btrfs_get_raid_extent_offset(struct btrfs_fs_info *fs_info,
+				 u64 logical, u64 length, u64 map_type,
+				 u64 devid, u64 *physical)
+{
+	struct btrfs_root *stripe_root = fs_info->stripe_root;
+	struct btrfs_dp_stripe *raid_stripe;
+	struct btrfs_key stripe_key;
+	struct btrfs_key found_key;
+	struct btrfs_path *path;
+	struct extent_buffer *leaf;
+	u64 offset;
+	u64 found_logical, found_length;
+	int num_stripes;
+	int slot;
+	int ret;
+	int i;
+
+	stripe_key.objectid = logical;
+	stripe_key.type = BTRFS_RAID_STRIPE_KEY;
+	stripe_key.offset = length;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	num_stripes = btrfs_bg_type_to_factor(map_type);
+
+	ret = btrfs_search_slot_for_read(stripe_root, &stripe_key, path, 0, 0);
+	if (ret < 0) {
+		goto out;
+	}
+
+	if (ret == 1)
+		ret = 0;
+
+	while (1) {
+		leaf = path->nodes[0];
+		slot = path->slots[0];
+
+		btrfs_item_key_to_cpu(leaf, &found_key, slot);
+		found_logical = found_key.objectid;
+		found_length = found_key.offset;
+
+		if (!in_range(logical, found_logical, found_length))
+			goto next;
+		offset = logical - found_logical;
+
+		raid_stripe = btrfs_item_ptr(leaf, slot, struct btrfs_dp_stripe);
+		for (i = 0; i < num_stripes; i++) {
+			if (btrfs_stripe_extent_devid_nr(leaf, raid_stripe, i) != devid)
+				continue;
+			*physical = btrfs_stripe_extent_physical_nr(leaf, raid_stripe, i) + offset;
+			goto out;
+		}
+next:
+		ret = btrfs_next_item(stripe_root, path);
+		if (ret)
+			break;
+	}
+out:
+	btrfs_free_path(path);
+
+	return ret;
+}
diff --git a/fs/btrfs/raid-stripe-tree.h b/fs/btrfs/raid-stripe-tree.h
index d3cc24e37de1..75e17cad283a 100644
--- a/fs/btrfs/raid-stripe-tree.h
+++ b/fs/btrfs/raid-stripe-tree.h
@@ -16,6 +16,9 @@ struct btrfs_ordered_stripe {
 	refcount_t ref;
 };
 
+int btrfs_get_raid_extent_offset(struct btrfs_fs_info *fs_info,
+				 u64 logical, u64 length, u64 map_type,
+				 u64 devid, u64 *physical);
 int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start,
 			     u64 length);
 int btrfs_insert_raid_extent(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b8d4e92c7196..2569ef564c97 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6526,9 +6526,27 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 	}
 
 	for (i = 0; i < num_stripes; i++) {
-		bioc->stripes[i].physical = map->stripes[stripe_index].physical +
-			stripe_offset + stripe_nr * map->stripe_len;
+		u64 physical;
+
 		bioc->stripes[i].dev = map->stripes[stripe_index].dev;
+
+		if (fs_info->stripe_root && op == BTRFS_MAP_READ &&
+		   btrfs_need_stripe_tree_update(bioc->fs_info,
+						 map->type)) {
+			ret = btrfs_get_raid_extent_offset(fs_info, logical,
+							   map->stripe_len,
+							   map->type,
+							   bioc->stripes[i].dev->devid,
+							   &physical);
+			if (ret) {
+				btrfs_put_bioc(bioc);
+				goto out;
+			}
+		} else {
+			physical = map->stripes[stripe_index].physical +
+				stripe_offset + stripe_nr * map->stripe_len;
+		}
+		bioc->stripes[i].physical = physical;
 		stripe_index++;
 	}
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC v2 7/8] btrfs: zoned: allow zoned RAID1
  2022-06-29 14:41 [PATCH RFC v2 0/8] btrfs: raid-stripe-tree draft patches Johannes Thumshirn
                   ` (5 preceding siblings ...)
  2022-06-29 14:41 ` [PATCH RFC v2 6/8] btrfs: add code to read " Johannes Thumshirn
@ 2022-06-29 14:41 ` Johannes Thumshirn
  2022-06-29 14:41 ` [PATCH RFC v2 8/8] btrfs: add raid stripe tree pretty printer Johannes Thumshirn
  7 siblings, 0 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2022-06-29 14:41 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Naohiro Aota, Damien Le Moal, Johannes Thumshirn, Qu Wenruo,
	Christoph Hellwig, Josef Bacik

When we have a raid-stripe-tree, we can do RAID1 on zoned devices for data
block-groups. For meta-data block-groups, we don't actually need
anything special, as all meta-data I/O is protected by the
btrfs_zoned_meta_io_lock() already.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/zoned.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 5cf6abeda588..e51e405342ad 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1463,6 +1463,41 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 		cache->zone_capacity = min(caps[0], caps[1]);
 		break;
 	case BTRFS_BLOCK_GROUP_RAID1:
+	case BTRFS_BLOCK_GROUP_RAID1C3:
+	case BTRFS_BLOCK_GROUP_RAID1C4:
+		if (map->type & BTRFS_BLOCK_GROUP_DATA &&
+		    !fs_info->stripe_root) {
+			btrfs_err(fs_info,
+				  "zoned: data RAID1 needs stripe_root");
+			ret = -EIO;
+			goto out;
+
+		}
+
+		for (i = 0; i < map->num_stripes; i++) {
+			if (alloc_offsets[i] == WP_MISSING_DEV)
+				continue;
+
+			if (i == 0)
+				continue;
+
+			if (alloc_offsets[0] != alloc_offsets[i]) {
+				btrfs_err(fs_info,
+					  "zoned: write pointer offset mismatch of zones in RAID profile");
+				ret = -EIO;
+				goto out;
+			}
+			if (test_bit(0, active) != test_bit(i, active)) {
+				if (!btrfs_zone_activate(cache)) {
+					ret = -EIO;
+					goto out;
+				}
+			}
+			cache->zone_capacity = min(caps[0], caps[i]);
+		}
+		cache->zone_is_active = test_bit(0, active);
+		cache->alloc_offset = alloc_offsets[0];
+		break;
 	case BTRFS_BLOCK_GROUP_RAID0:
 	case BTRFS_BLOCK_GROUP_RAID10:
 	case BTRFS_BLOCK_GROUP_RAID5:
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC v2 8/8] btrfs: add raid stripe tree pretty printer
  2022-06-29 14:41 [PATCH RFC v2 0/8] btrfs: raid-stripe-tree draft patches Johannes Thumshirn
                   ` (6 preceding siblings ...)
  2022-06-29 14:41 ` [PATCH RFC v2 7/8] btrfs: zoned: allow zoned RAID1 Johannes Thumshirn
@ 2022-06-29 14:41 ` Johannes Thumshirn
  7 siblings, 0 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2022-06-29 14:41 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Naohiro Aota, Damien Le Moal, Johannes Thumshirn, Qu Wenruo,
	Christoph Hellwig, Josef Bacik

Decode raid-stripe-tree entries on btrfs_print_tree().

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/print-tree.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c
index dd8777872143..10a39d5a7e40 100644
--- a/fs/btrfs/print-tree.c
+++ b/fs/btrfs/print-tree.c
@@ -6,6 +6,7 @@
 #include "ctree.h"
 #include "disk-io.h"
 #include "print-tree.h"
+#include "raid-stripe-tree.h"
 
 struct root_name_map {
 	u64 id;
@@ -25,6 +26,7 @@ static const struct root_name_map root_map[] = {
 	{ BTRFS_FREE_SPACE_TREE_OBJECTID,	"FREE_SPACE_TREE"	},
 	{ BTRFS_BLOCK_GROUP_TREE_OBJECTID,	"BLOCK_GROUP_TREE"	},
 	{ BTRFS_DATA_RELOC_TREE_OBJECTID,	"DATA_RELOC_TREE"	},
+	{ BTRFS_RAID_STRIPE_TREE_OBJECTID,	"RAID_STRIPE_TREE"	},
 };
 
 const char *btrfs_root_name(const struct btrfs_key *key, char *buf)
@@ -184,6 +186,20 @@ static void print_uuid_item(struct extent_buffer *l, unsigned long offset,
 	}
 }
 
+static void print_raid_stripe_key(struct extent_buffer *eb, u32 item_size,
+				  struct btrfs_dp_stripe *stripe)
+{
+	int num_stripes;
+	int i;
+
+	num_stripes = btrfs_num_raid_stripes(item_size);
+
+	for (i = 0; i < num_stripes; i++)
+		pr_info("\t\t\tstripe %d devid %llu physical %llu\n", i,
+			btrfs_stripe_extent_devid_nr(eb, stripe, i),
+			btrfs_stripe_extent_physical_nr(eb, stripe, i));
+}
+
 /*
  * Helper to output refs and locking status of extent buffer.  Useful to debug
  * race condition related problems.
@@ -348,6 +364,11 @@ void btrfs_print_leaf(struct extent_buffer *l)
 			print_uuid_item(l, btrfs_item_ptr_offset(l, i),
 					btrfs_item_size(l, i));
 			break;
+		case BTRFS_RAID_STRIPE_KEY:
+			print_raid_stripe_key(l, btrfs_item_size(l, i),
+					      btrfs_item_ptr(l, i,
+							     struct btrfs_dp_stripe));
+			break;
 		}
 	}
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-06-29 14:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-29 14:41 [PATCH RFC v2 0/8] btrfs: raid-stripe-tree draft patches Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 1/8] btrfs: add raid stripe tree definitions Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 2/8] btrfs: read raid-stripe-tree from disk Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 3/8] btrfs: add boilerplate code to insert raid extent Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 4/8] btrfs: add boilerplate code to insert stripe entries for preallocated extents Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 5/8] btrfs: add code to delete raid extent Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 6/8] btrfs: add code to read " Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 7/8] btrfs: zoned: allow zoned RAID1 Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 8/8] btrfs: add raid stripe tree pretty printer Johannes Thumshirn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).