* [PATCH RFC v2 0/8] btrfs: raid-stripe-tree draft patches
@ 2022-06-29 14:41 Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 1/8] btrfs: add raid stripe tree definitions Johannes Thumshirn
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2022-06-29 14:41 UTC (permalink / raw)
To: linux-btrfs
Cc: Naohiro Aota, Damien Le Moal, Johannes Thumshirn, Qu Wenruo,
Christoph Hellwig, Josef Bacik
Here's a second draft of my btrfs zoned RAID1 patches.
Updates of the raid-stripe-tree are done at delayed-ref time to safe on
bandwidth while for reading we do the stripe-tree lookup on bio mapping time,
i.e. when the logical to physical translation happens for regular btrfs RAID
as well.
The stripe tree is keyed by an extent's disk_bytenr and disk_num_bytes and
it's contents are the respective physical device id and position.
For an example 1M write (split into 126K segments due to zone-append)
rapido2:/home/johannes/src/fstests# xfs_io -fdc "pwrite -b 1M 0 1M" -c fsync /mnt/test/test
wrote 1048576/1048576 bytes at offset 0
1 MiB, 1 ops; 0.0065 sec (151.538 MiB/sec and 151.5381 ops/sec)
The tree will look as follows:
rapido2:/home/johannes/src/fstests# btrfs inspect-internal dump-tree -t raid_stripe /dev/nullb0
btrfs-progs v5.16.1
raid stripe tree key (RAID_STRIPE_TREE ROOT_ITEM 0)
leaf 805847040 items 9 free space 15770 generation 9 owner RAID_STRIPE_TREE
leaf 805847040 flags 0x1(WRITTEN) backref revision 1
checksum stored 1b22e13800000000000000000000000000000000000000000000000000000000
checksum calced 1b22e13800000000000000000000000000000000000000000000000000000000
fs uuid e4f523d1-89a1-41f9-ab75-6ba3c42a28fb
chunk uuid 6f2d8aaa-d348-4bf2-9b5e-141a37ba4c77
item 0 key (939524096 RAID_STRIPE_KEY 126976) itemoff 16251 itemsize 32
stripe 0 devid 1 offset 939524096
stripe 1 devid 2 offset 536870912
item 1 key (939651072 RAID_STRIPE_KEY 126976) itemoff 16219 itemsize 32
stripe 0 devid 1 offset 939651072
stripe 1 devid 2 offset 536997888
item 2 key (939778048 RAID_STRIPE_KEY 126976) itemoff 16187 itemsize 32
stripe 0 devid 1 offset 939778048
stripe 1 devid 2 offset 537124864
item 3 key (939905024 RAID_STRIPE_KEY 126976) itemoff 16155 itemsize 32
stripe 0 devid 1 offset 939905024
stripe 1 devid 2 offset 537251840
item 4 key (940032000 RAID_STRIPE_KEY 126976) itemoff 16123 itemsize 32
stripe 0 devid 1 offset 940032000
stripe 1 devid 2 offset 537378816
item 5 key (940158976 RAID_STRIPE_KEY 126976) itemoff 16091 itemsize 32
stripe 0 devid 1 offset 940158976
stripe 1 devid 2 offset 537505792
item 6 key (940285952 RAID_STRIPE_KEY 126976) itemoff 16059 itemsize 32
stripe 0 devid 1 offset 940285952
stripe 1 devid 2 offset 537632768
item 7 key (940412928 RAID_STRIPE_KEY 126976) itemoff 16027 itemsize 32
stripe 0 devid 1 offset 940412928
stripe 1 devid 2 offset 537759744
item 8 key (940539904 RAID_STRIPE_KEY 32768) itemoff 15995 itemsize 32
stripe 0 devid 1 offset 940539904
stripe 1 devid 2 offset 537886720
total bytes 26843545600
bytes used 1245184
uuid e4f523d1-89a1-41f9-ab75-6ba3c42a28fb
The performance deviation is meassurable but overall not too bad for a first shot:
RAID1:
READ: bw=81.6MiB/s (85.6MB/s), 81.6MiB/s-81.6MiB/s (85.6MB/s-85.6MB/s), io=496MiB (520MB), run=6075-6075msec
WRITE: bw=86.9MiB/s (91.1MB/s), 86.9MiB/s-86.9MiB/s (91.1MB/s-91.1MB/s), io=528MiB (554MB), run=6075-6075msec
Single:
READ: bw=92.5MiB/s (97.0MB/s), 92.5MiB/s-92.5MiB/s (97.0MB/s-97.0MB/s), io=496MiB (520MB), run=5360-5360msec
WRITE: bw=98.5MiB/s (103MB/s), 98.5MiB/s-98.5MiB/s (103MB/s-103MB/s), io=528MiB (554MB), run=5360-5360msec
Changes to v1:
- Write the stripe-tree at delayed-ref time (Qu)
- Add a different write path for preallocation
v1 of the patchset can be found here:
https://lore.kernel.org/linux-btrfs/cover.1652711187.git.johannes.thumshirn@wdc.com/
Johannes Thumshirn (8):
btrfs: add raid stripe tree definitions
btrfs: read raid-stripe-tree from disk
btrfs: add boilerplate code to insert raid extent
btrfs: add boilerplate code to insert stripe entries for preallocated
extents
btrfs: add code to delete raid extent
btrfs: add code to read raid extent
btrfs: zoned: allow zoned RAID1
btrfs: add raid stripe tree pretty printer
fs/btrfs/Makefile | 2 +-
fs/btrfs/block-rsv.c | 1 +
fs/btrfs/ctree.h | 33 ++++
fs/btrfs/disk-io.c | 15 ++
fs/btrfs/extent-tree.c | 53 ++++++
fs/btrfs/inode.c | 6 +
fs/btrfs/print-tree.c | 21 +++
fs/btrfs/raid-stripe-tree.c | 318 ++++++++++++++++++++++++++++++++
fs/btrfs/raid-stripe-tree.h | 72 ++++++++
fs/btrfs/volumes.c | 35 +++-
fs/btrfs/volumes.h | 4 +
fs/btrfs/zoned.c | 39 ++++
include/uapi/linux/btrfs.h | 1 +
include/uapi/linux/btrfs_tree.h | 17 ++
14 files changed, 614 insertions(+), 3 deletions(-)
create mode 100644 fs/btrfs/raid-stripe-tree.c
create mode 100644 fs/btrfs/raid-stripe-tree.h
--
2.35.3
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH RFC v2 1/8] btrfs: add raid stripe tree definitions
2022-06-29 14:41 [PATCH RFC v2 0/8] btrfs: raid-stripe-tree draft patches Johannes Thumshirn
@ 2022-06-29 14:41 ` Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 2/8] btrfs: read raid-stripe-tree from disk Johannes Thumshirn
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2022-06-29 14:41 UTC (permalink / raw)
To: linux-btrfs
Cc: Naohiro Aota, Damien Le Moal, Johannes Thumshirn, Qu Wenruo,
Christoph Hellwig, Josef Bacik
Add definitions for the raid-stripe-tree. This tree will hold informatioin
about the on-disk layout of the stripes in a RAID set.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/ctree.h | 29 +++++++++++++++++++++++++++++
include/uapi/linux/btrfs_tree.h | 17 +++++++++++++++++
2 files changed, 46 insertions(+)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4e2569f84aab..18e2f186cb5e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1906,6 +1906,35 @@ BTRFS_SETGET_FUNCS(timespec_nsec, struct btrfs_timespec, nsec, 32);
BTRFS_SETGET_STACK_FUNCS(stack_timespec_sec, struct btrfs_timespec, sec, 64);
BTRFS_SETGET_STACK_FUNCS(stack_timespec_nsec, struct btrfs_timespec, nsec, 32);
+BTRFS_SETGET_FUNCS(stripe_extent_devid, struct btrfs_stripe_extent, devid, 64);
+BTRFS_SETGET_FUNCS(stripe_extent_physical, struct btrfs_stripe_extent, physical, 64);
+BTRFS_SETGET_STACK_FUNCS(stack_stripe_extent_devid, struct btrfs_stripe_extent, devid, 64);
+BTRFS_SETGET_STACK_FUNCS(stack_stripe_extent_physical, struct btrfs_stripe_extent, physical, 64);
+
+static inline struct btrfs_stripe_extent *btrfs_stripe_extent_nr(
+ struct btrfs_dp_stripe *dps, int nr)
+{
+ unsigned long offset = (unsigned long)dps;
+
+ offset += offsetof(struct btrfs_dp_stripe, extents);
+ offset += nr * sizeof(struct btrfs_stripe_extent);
+ return (struct btrfs_stripe_extent *)offset;
+}
+
+static inline u64 btrfs_stripe_extent_devid_nr(const struct extent_buffer *eb,
+ struct btrfs_dp_stripe *dps,
+ int nr)
+{
+ return btrfs_stripe_extent_devid(eb, btrfs_stripe_extent_nr(dps, nr));
+}
+
+static inline u64 btrfs_stripe_extent_physical_nr(const struct extent_buffer *eb,
+ struct btrfs_dp_stripe *dps,
+ int nr)
+{
+ return btrfs_stripe_extent_physical(eb, btrfs_stripe_extent_nr(dps, nr));
+}
+
/* struct btrfs_dev_extent */
BTRFS_SETGET_FUNCS(dev_extent_chunk_tree, struct btrfs_dev_extent,
chunk_tree, 64);
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index d4117152d907..070fc9266821 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -56,6 +56,9 @@
/* Holds the block group items for extent tree v2. */
#define BTRFS_BLOCK_GROUP_TREE_OBJECTID 11ULL
+/* tracks RAID stripes in block groups. */
+#define BTRFS_RAID_STRIPE_TREE_OBJECTID 12ULL
+
/* device stats in the device tree */
#define BTRFS_DEV_STATS_OBJECTID 0ULL
@@ -264,6 +267,8 @@
*/
#define BTRFS_QGROUP_RELATION_KEY 246
+#define BTRFS_RAID_STRIPE_KEY 247
+
/*
* Obsolete name, see BTRFS_TEMPORARY_ITEM_KEY.
*/
@@ -488,6 +493,18 @@ struct btrfs_free_space_header {
__le64 num_bitmaps;
} __attribute__ ((__packed__));
+struct btrfs_stripe_extent {
+ /* btrfs device-id this raid extent lives on */
+ __le64 devid;
+ /* physical location on disk */
+ __le64 physical;
+} __attribute__ ((__packed__));
+
+struct btrfs_dp_stripe {
+ /* array of stripe extents this stripe is composed of */
+ DECLARE_FLEX_ARRAY(struct btrfs_stripe_extent, extents);
+} __attribute__ ((__packed__));
+
#define BTRFS_HEADER_FLAG_WRITTEN (1ULL << 0)
#define BTRFS_HEADER_FLAG_RELOC (1ULL << 1)
--
2.35.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH RFC v2 2/8] btrfs: read raid-stripe-tree from disk
2022-06-29 14:41 [PATCH RFC v2 0/8] btrfs: raid-stripe-tree draft patches Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 1/8] btrfs: add raid stripe tree definitions Johannes Thumshirn
@ 2022-06-29 14:41 ` Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 3/8] btrfs: add boilerplate code to insert raid extent Johannes Thumshirn
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2022-06-29 14:41 UTC (permalink / raw)
To: linux-btrfs
Cc: Naohiro Aota, Damien Le Moal, Johannes Thumshirn, Qu Wenruo,
Christoph Hellwig, Josef Bacik
If we find a raid-stripe-tree on mount, read it from disk.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/block-rsv.c | 1 +
fs/btrfs/ctree.h | 1 +
fs/btrfs/disk-io.c | 12 ++++++++++++
include/uapi/linux/btrfs.h | 1 +
4 files changed, 15 insertions(+)
diff --git a/fs/btrfs/block-rsv.c b/fs/btrfs/block-rsv.c
index b3ee49b0b1e8..62c20c9d8c25 100644
--- a/fs/btrfs/block-rsv.c
+++ b/fs/btrfs/block-rsv.c
@@ -427,6 +427,7 @@ void btrfs_init_root_block_rsv(struct btrfs_root *root)
case BTRFS_CSUM_TREE_OBJECTID:
case BTRFS_EXTENT_TREE_OBJECTID:
case BTRFS_FREE_SPACE_TREE_OBJECTID:
+ case BTRFS_RAID_STRIPE_TREE_OBJECTID:
root->block_rsv = &fs_info->delayed_refs_rsv;
break;
case BTRFS_ROOT_TREE_OBJECTID:
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 18e2f186cb5e..376b9b112429 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -690,6 +690,7 @@ struct btrfs_fs_info {
struct btrfs_root *uuid_root;
struct btrfs_root *data_reloc_root;
struct btrfs_root *block_group_root;
+ struct btrfs_root *stripe_root;
/* the log root tree is a directory of all the other log roots */
struct btrfs_root *log_root_tree;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 70b388de4d66..45d1ea23a230 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1607,6 +1607,9 @@ static struct btrfs_root *btrfs_get_global_root(struct btrfs_fs_info *fs_info,
return btrfs_grab_root(root) ? root : ERR_PTR(-ENOENT);
}
+ if (objectid == BTRFS_RAID_STRIPE_TREE_OBJECTID)
+ return btrfs_grab_root(fs_info->stripe_root) ?
+ fs_info->stripe_root : ERR_PTR(-ENOENT);
return NULL;
}
@@ -1679,6 +1682,7 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info)
btrfs_put_root(fs_info->fs_root);
btrfs_put_root(fs_info->data_reloc_root);
btrfs_put_root(fs_info->block_group_root);
+ btrfs_put_root(fs_info->stripe_root);
btrfs_check_leaked_roots(fs_info);
btrfs_extent_buffer_leak_debug_check(fs_info);
kfree(fs_info->super_copy);
@@ -2220,6 +2224,7 @@ static void free_root_pointers(struct btrfs_fs_info *info, bool free_chunk_root)
free_root_extent_buffers(info->fs_root);
free_root_extent_buffers(info->data_reloc_root);
free_root_extent_buffers(info->block_group_root);
+ free_root_extent_buffers(info->stripe_root);
if (free_chunk_root)
free_root_extent_buffers(info->chunk_root);
}
@@ -2646,6 +2651,13 @@ static int btrfs_read_roots(struct btrfs_fs_info *fs_info)
fs_info->uuid_root = root;
}
+ location.objectid = BTRFS_RAID_STRIPE_TREE_OBJECTID;
+ root = btrfs_read_tree_root(tree_root, &location);
+ if (!IS_ERR(root)) {
+ set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state);
+ fs_info->stripe_root = root;
+ }
+
return 0;
out:
btrfs_warn(fs_info, "failed to read root (objectid=%llu): %d",
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index f54dc91e4025..5ca789af5beb 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -310,6 +310,7 @@ struct btrfs_ioctl_fs_info_args {
#define BTRFS_FEATURE_INCOMPAT_RAID1C34 (1ULL << 11)
#define BTRFS_FEATURE_INCOMPAT_ZONED (1ULL << 12)
#define BTRFS_FEATURE_INCOMPAT_EXTENT_TREE_V2 (1ULL << 13)
+#define BTRFS_FEATURE_INCOMPAT_STRIPE_TREE (1ULL << 14)
struct btrfs_ioctl_feature_flags {
__u64 compat_flags;
--
2.35.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH RFC v2 3/8] btrfs: add boilerplate code to insert raid extent
2022-06-29 14:41 [PATCH RFC v2 0/8] btrfs: raid-stripe-tree draft patches Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 1/8] btrfs: add raid stripe tree definitions Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 2/8] btrfs: read raid-stripe-tree from disk Johannes Thumshirn
@ 2022-06-29 14:41 ` Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 4/8] btrfs: add boilerplate code to insert stripe entries for preallocated extents Johannes Thumshirn
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2022-06-29 14:41 UTC (permalink / raw)
To: linux-btrfs
Cc: Naohiro Aota, Damien Le Moal, Johannes Thumshirn, Qu Wenruo,
Christoph Hellwig, Josef Bacik
Add boilerplate code to insert raid extents into the raid-stripe-tree on
each write to a RAID1 block-group.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/Makefile | 2 +-
fs/btrfs/ctree.h | 3 +
fs/btrfs/disk-io.c | 3 +
fs/btrfs/extent-tree.c | 45 +++++++++
fs/btrfs/raid-stripe-tree.c | 188 ++++++++++++++++++++++++++++++++++++
fs/btrfs/raid-stripe-tree.h | 65 +++++++++++++
fs/btrfs/volumes.c | 13 +++
fs/btrfs/volumes.h | 4 +
fs/btrfs/zoned.c | 4 +
9 files changed, 326 insertions(+), 1 deletion(-)
create mode 100644 fs/btrfs/raid-stripe-tree.c
create mode 100644 fs/btrfs/raid-stripe-tree.h
diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 99f9995670ea..4484831ac624 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -31,7 +31,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
uuid-tree.o props.o free-space-tree.o tree-checker.o space-info.o \
block-rsv.o delalloc-space.o block-group.o discard.o reflink.o \
- subpage.o tree-mod-log.o
+ subpage.o tree-mod-log.o raid-stripe-tree.o
btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 376b9b112429..2eb79afb4d83 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1092,6 +1092,9 @@ struct btrfs_fs_info {
/* Updates are not protected by any lock */
struct btrfs_commit_stats commit_stats;
+ struct mutex stripe_update_lock;
+ struct rb_root stripe_update_tree;
+
#ifdef CONFIG_BTRFS_FS_REF_VERIFY
spinlock_t ref_verify_lock;
struct rb_root block_tree;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 45d1ea23a230..3d7c1b8d1cd5 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3155,6 +3155,9 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
fs_info->bg_reclaim_threshold = BTRFS_DEFAULT_RECLAIM_THRESH;
INIT_WORK(&fs_info->reclaim_bgs_work, btrfs_reclaim_bgs_work);
+
+ mutex_init(&fs_info->stripe_update_lock);
+ fs_info->stripe_update_tree = RB_ROOT;
}
static int init_mount_fs_info(struct btrfs_fs_info *fs_info, struct super_block *sb)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f97a0f28f464..e1738b3dfb21 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -36,6 +36,7 @@
#include "rcu-string.h"
#include "zoned.h"
#include "dev-replace.h"
+#include "raid-stripe-tree.h"
#undef SCRAMBLE_DELAYED_REFS
@@ -1491,6 +1492,47 @@ static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
return ret;
}
+static int add_stripe_entry_for_delayed_ref(struct btrfs_trans_handle *trans,
+ struct btrfs_delayed_ref_node *node)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ struct extent_map *em;
+ struct map_lookup *map;
+ int ret;
+
+ if (!fs_info->stripe_root)
+ return 0;
+
+ em = btrfs_get_chunk_map(fs_info, node->bytenr, node->num_bytes);
+ if (!em) {
+ btrfs_err(fs_info,
+ "cannot get chunk map for address %llu",
+ node->bytenr);
+ return -EINVAL;
+ }
+
+ map = em->map_lookup;
+
+ if (btrfs_need_stripe_tree_update(fs_info, map->type)) {
+ struct btrfs_ordered_stripe *stripe;
+
+ stripe = btrfs_lookup_ordered_stripe(fs_info, node->bytenr);
+ if (!stripe) {
+ btrfs_err(fs_info,
+ "cannot get stripe extent for address %llu (%llu)",
+ node->bytenr, node->num_bytes);
+ free_extent_map(em);
+ return -EINVAL;
+ }
+ ASSERT(stripe->logical == node->bytenr);
+ ret = btrfs_insert_raid_extent(trans, stripe);
+ btrfs_put_ordered_stripe(fs_info, stripe);
+ }
+ free_extent_map(em);
+
+ return ret;
+}
+
static int run_delayed_data_ref(struct btrfs_trans_handle *trans,
struct btrfs_delayed_ref_node *node,
struct btrfs_delayed_extent_op *extent_op,
@@ -1521,6 +1563,9 @@ static int run_delayed_data_ref(struct btrfs_trans_handle *trans,
flags, ref->objectid,
ref->offset, &ins,
node->ref_mod);
+ if (ret)
+ return ret;
+ ret = add_stripe_entry_for_delayed_ref(trans, node);
} else if (node->action == BTRFS_ADD_DELAYED_REF) {
ret = __btrfs_inc_extent_ref(trans, node, parent, ref_root,
ref->objectid, ref->offset,
diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
new file mode 100644
index 000000000000..360046a104c7
--- /dev/null
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -0,0 +1,188 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/btrfs_tree.h>
+
+#include "ctree.h"
+#include "transaction.h"
+#include "disk-io.h"
+#include "raid-stripe-tree.h"
+#include "volumes.h"
+
+static struct rb_node *stripe_tree_insert(struct rb_root *root, u64 logical,
+ struct rb_node *node)
+{
+ struct rb_node **p = &root->rb_node;
+ struct btrfs_ordered_stripe *stripe;
+ struct rb_node *parent = NULL;
+
+ while (*p) {
+ parent = *p;
+ stripe = rb_entry(*p, struct btrfs_ordered_stripe, rb_node);
+
+ if (logical < stripe->logical)
+ p = &(*p)->rb_left;
+ else if (logical >= stripe->logical + stripe->num_bytes)
+ p = &(*p)->rb_right;
+ else
+ return parent;
+ }
+
+ rb_link_node(node, parent, p);
+ rb_insert_color(node, root);
+ return NULL;
+}
+
+static struct btrfs_ordered_stripe *btrfs_add_ordered_stripe(
+ struct btrfs_fs_info *fs_info,
+ u64 logical,
+ u64 length, int num_stripes,
+ struct btrfs_io_stripe *stripes)
+{
+ struct btrfs_ordered_stripe *stripe;
+ struct btrfs_io_stripe *tmp;
+ struct rb_node *node;
+ size_t size;
+
+ size = num_stripes * sizeof(struct btrfs_io_stripe);
+ stripe = kzalloc(sizeof(struct btrfs_ordered_stripe), GFP_NOFS);
+ if (!stripe)
+ return ERR_PTR(-ENOMEM);
+
+ spin_lock_init(&stripe->lock);
+ tmp = kmemdup(stripes, size, GFP_NOFS);
+ if (!tmp) {
+ kfree(stripe);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ stripe->logical = logical;
+ stripe->num_bytes = length;
+ stripe->num_stripes = num_stripes;
+ spin_lock(&stripe->lock);
+ stripe->stripes = tmp;
+ spin_unlock(&stripe->lock);
+ refcount_set(&stripe->ref, 1);
+
+ mutex_lock(&fs_info->stripe_update_lock);
+ node = stripe_tree_insert(&fs_info->stripe_update_tree, logical,
+ &stripe->rb_node);
+ mutex_unlock(&fs_info->stripe_update_lock);
+
+ if (node) {
+ btrfs_panic(fs_info, -EEXIST,
+ "inconsistency in ordered stripes at offset %llu",
+ logical);
+ kfree(stripe->stripes);
+ kfree(stripe);
+ return ERR_PTR(-EEXIST);
+ }
+
+ return stripe;
+}
+
+struct btrfs_ordered_stripe *btrfs_lookup_ordered_stripe(struct btrfs_fs_info *fs_info,
+ u64 logical)
+{
+ struct rb_root *root = &fs_info->stripe_update_tree;
+ struct btrfs_ordered_stripe *stripe;
+ struct rb_node *n = root->rb_node;
+
+ mutex_lock(&fs_info->stripe_update_lock);
+ while (n) {
+ stripe = rb_entry(n, struct btrfs_ordered_stripe, rb_node);
+
+ if (logical < stripe->logical) {
+ n = n->rb_left;
+ stripe = NULL;
+ } else if (logical >= stripe->logical + stripe->num_bytes) {
+ n = n->rb_right;
+ stripe = NULL;
+ } else {
+ break;
+ }
+ }
+ if (stripe)
+ btrfs_get_ordered_stripe(stripe);
+ mutex_unlock(&fs_info->stripe_update_lock);
+
+ return stripe;
+}
+
+void btrfs_remove_ordered_stripe(struct btrfs_fs_info *fs_info,
+ struct btrfs_ordered_stripe *stripe)
+{
+ struct rb_node *node = &stripe->rb_node;
+
+ mutex_lock(&fs_info->stripe_update_lock);
+ rb_erase(node, &fs_info->stripe_update_tree);
+ RB_CLEAR_NODE(node);
+ mutex_unlock(&fs_info->stripe_update_lock);
+
+ spin_lock(&stripe->lock);
+ kfree(stripe->stripes);
+ spin_unlock(&stripe->lock);
+ kfree(stripe);
+}
+
+int btrfs_insert_raid_extent(struct btrfs_trans_handle *trans,
+ struct btrfs_ordered_stripe *stripe)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ struct btrfs_key stripe_key;
+ struct btrfs_root *stripe_root = fs_info->stripe_root;
+ struct btrfs_dp_stripe *raid_stripe;
+ size_t item_size;
+ int ret;
+
+ item_size = stripe->num_stripes * sizeof(struct btrfs_stripe_extent);
+
+ raid_stripe = kzalloc(item_size, GFP_NOFS);
+ if (!raid_stripe) {
+ btrfs_abort_transaction(trans, -ENOMEM);
+ btrfs_end_transaction(trans);
+ return -ENOMEM;
+ }
+
+ spin_lock(&stripe->lock);
+ for (int i = 0; i < stripe->num_stripes; i++) {
+ u64 devid = stripe->stripes[i].dev->devid;
+ u64 physical = stripe->stripes[i].physical;
+ struct btrfs_stripe_extent *stripe_extent =
+ &raid_stripe->extents[i];
+
+ btrfs_set_stack_stripe_extent_devid(stripe_extent, devid);
+ btrfs_set_stack_stripe_extent_physical(stripe_extent, physical);
+ }
+ spin_unlock(&stripe->lock);
+
+ stripe_key.objectid = stripe->logical;
+ stripe_key.type = BTRFS_RAID_STRIPE_KEY;
+ stripe_key.offset = stripe->num_bytes;
+
+ ret = btrfs_insert_item(trans, stripe_root, &stripe_key, raid_stripe,
+ item_size);
+ if (ret)
+ btrfs_abort_transaction(trans, ret);
+
+ kfree(raid_stripe);
+
+ return ret;
+}
+
+void btrfs_raid_stripe_update(struct work_struct *work)
+{
+ struct btrfs_io_context *bioc =
+ container_of(work, struct btrfs_io_context, stripe_update_work);
+ struct btrfs_ordered_stripe *stripe;
+ struct bio *bio = bioc->orig_bio;
+ struct btrfs_fs_info *fs_info = bioc->fs_info;
+
+ stripe = btrfs_add_ordered_stripe(fs_info, bioc->logical, bioc->length,
+ bioc->num_stripes, bioc->stripes);
+ if (IS_ERR(stripe)) {
+ btrfs_bio_counter_dec(fs_info);
+ bio->bi_status = errno_to_blk_status(PTR_ERR(stripe));
+ }
+ btrfs_put_bioc(bioc);
+}
+
diff --git a/fs/btrfs/raid-stripe-tree.h b/fs/btrfs/raid-stripe-tree.h
new file mode 100644
index 000000000000..b9c40ef26dfa
--- /dev/null
+++ b/fs/btrfs/raid-stripe-tree.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef BTRFS_RAID_STRIPE_TREE_H
+#define BTRFS_RAID_STRIPE_TREE_H
+
+#include "volumes.h"
+
+struct btrfs_ordered_stripe {
+ struct rb_node rb_node;
+
+ u64 logical;
+ u64 num_bytes;
+ int num_stripes;
+ struct btrfs_io_stripe *stripes;
+ spinlock_t lock;
+ refcount_t ref;
+};
+
+int btrfs_insert_raid_extent(struct btrfs_trans_handle *trans,
+ struct btrfs_ordered_stripe *stripe);
+void btrfs_raid_stripe_update(struct work_struct *work);
+struct btrfs_ordered_stripe *btrfs_lookup_ordered_stripe(
+ struct btrfs_fs_info *fs_info,
+ u64 logical);
+void btrfs_remove_ordered_stripe(struct btrfs_fs_info *fs_info,
+ struct btrfs_ordered_stripe *stripe);
+
+static inline void btrfs_get_ordered_stripe(struct btrfs_ordered_stripe *stripe)
+{
+ refcount_inc(&stripe->ref);
+}
+
+static inline void btrfs_put_ordered_stripe(struct btrfs_fs_info *fs_info,
+ struct btrfs_ordered_stripe *stripe)
+{
+ if (refcount_dec_and_test(&stripe->ref))
+ btrfs_remove_ordered_stripe(fs_info, stripe);
+}
+
+static inline int btrfs_num_raid_stripes(u32 item_size)
+{
+ return item_size - offsetof(struct btrfs_dp_stripe, extents) /
+ sizeof(struct btrfs_stripe_extent);
+}
+
+static inline bool btrfs_need_stripe_tree_update(struct btrfs_fs_info *fs_info,
+ u64 map_type)
+{
+ u64 type = map_type & BTRFS_BLOCK_GROUP_TYPE_MASK;
+ u64 profile = map_type & BTRFS_BLOCK_GROUP_PROFILE_MASK;
+
+ if (!fs_info->stripe_root)
+ return false;
+
+ // for now
+ if (type != BTRFS_BLOCK_GROUP_DATA)
+ return false;
+
+ if (profile & BTRFS_BLOCK_GROUP_RAID1_MASK)
+ return true;
+
+ return false;
+}
+
+#endif
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 2d788a351c1f..b8d4e92c7196 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -33,6 +33,7 @@
#include "block-group.h"
#include "discard.h"
#include "zoned.h"
+#include "raid-stripe-tree.h"
#define BTRFS_BLOCK_GROUP_STRIPE_MASK (BTRFS_BLOCK_GROUP_RAID0 | \
BTRFS_BLOCK_GROUP_RAID10 | \
@@ -5897,6 +5898,7 @@ static struct btrfs_io_context *alloc_btrfs_io_context(struct btrfs_fs_info *fs_
bioc->fs_info = fs_info;
bioc->tgtdev_map = (int *)(bioc->stripes + total_stripes);
bioc->raid_map = (u64 *)(bioc->tgtdev_map + real_stripes);
+ INIT_WORK(&bioc->stripe_update_work, btrfs_raid_stripe_update);
return bioc;
}
@@ -6623,6 +6625,7 @@ static void btrfs_end_bio_work(struct work_struct *work)
static void btrfs_end_bioc(struct btrfs_io_context *bioc, bool async)
{
struct bio *orig_bio = bioc->orig_bio;
+ struct btrfs_fs_info *fs_info = bioc->fs_info;
struct btrfs_bio *bbio = btrfs_bio(orig_bio);
bbio->mirror_num = bioc->mirror_num;
@@ -6642,6 +6645,12 @@ static void btrfs_end_bioc(struct btrfs_io_context *bioc, bool async)
INIT_WORK(&bbio->end_io_work, btrfs_end_bio_work);
queue_work(btrfs_end_io_wq(bioc), &bbio->end_io_work);
} else {
+ if (btrfs_op(orig_bio) == BTRFS_MAP_WRITE &&
+ btrfs_need_stripe_tree_update(fs_info,
+ bioc->map_type)) {
+ btrfs_get_bioc(bioc);
+ schedule_work(&bioc->stripe_update_work);
+ }
bio_endio(orig_bio);
}
@@ -6667,6 +6676,8 @@ static void btrfs_end_bio(struct bio *bio)
btrfs_dev_stat_inc_and_print(stripe->dev,
BTRFS_DEV_STAT_FLUSH_ERRS);
}
+ } else if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
+ stripe->physical = bio->bi_iter.bi_sector << SECTOR_SHIFT;
}
if (bio != bioc->orig_bio)
@@ -6754,6 +6765,8 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
bioc->orig_bio = bio;
bioc->private = bio->bi_private;
bioc->end_io = bio->bi_end_io;
+ bioc->logical = logical;
+ bioc->length = length;
atomic_set(&bioc->stripes_pending, total_devs);
if ((bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) &&
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 9537d82bb7a2..f22ea9c23faa 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -14,6 +14,7 @@
#define BTRFS_MAX_DATA_CHUNK_SIZE (10ULL * SZ_1G)
extern struct mutex uuid_mutex;
+struct btrfs_ordered_stripe;
#define BTRFS_STRIPE_LEN SZ_64K
@@ -463,6 +464,9 @@ struct btrfs_io_context {
int mirror_num;
int num_tgtdevs;
int *tgtdev_map;
+ u64 logical;
+ u64 length;
+ struct work_struct stripe_update_work;
/*
* logical block numbers for the start of each stripe
* The last one or two are p/q. These are sorted,
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 7a0f8fa44800..5cf6abeda588 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1641,6 +1641,10 @@ void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered)
u64 *logical = NULL;
int nr, stripe_len;
+ /* Filesystems with a stripe tree have their own l2p mapping */
+ if (fs_info->stripe_root)
+ return;
+
/* Zoned devices should not have partitions. So, we can assume it is 0 */
ASSERT(!bdev_is_partition(ordered->bdev));
if (WARN_ON(!ordered->bdev))
--
2.35.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH RFC v2 4/8] btrfs: add boilerplate code to insert stripe entries for preallocated extents
2022-06-29 14:41 [PATCH RFC v2 0/8] btrfs: raid-stripe-tree draft patches Johannes Thumshirn
` (2 preceding siblings ...)
2022-06-29 14:41 ` [PATCH RFC v2 3/8] btrfs: add boilerplate code to insert raid extent Johannes Thumshirn
@ 2022-06-29 14:41 ` Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 5/8] btrfs: add code to delete raid extent Johannes Thumshirn
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2022-06-29 14:41 UTC (permalink / raw)
To: linux-btrfs
Cc: Naohiro Aota, Damien Le Moal, Johannes Thumshirn, Qu Wenruo,
Christoph Hellwig, Josef Bacik
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/inode.c | 6 ++++++
fs/btrfs/raid-stripe-tree.c | 34 ++++++++++++++++++++++++++++++++++
fs/btrfs/raid-stripe-tree.h | 2 ++
3 files changed, 42 insertions(+)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 9890782fe932..97e218a45165 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -55,6 +55,7 @@
#include "zoned.h"
#include "subpage.h"
#include "inode-item.h"
+#include "raid-stripe-tree.h"
struct btrfs_iget_args {
u64 ino;
@@ -9901,6 +9902,11 @@ static struct btrfs_trans_handle *insert_prealloc_file_extent(
if (qgroup_released < 0)
return ERR_PTR(qgroup_released);
+ ret = btrfs_insert_preallocated_raid_stripe(inode->root->fs_info,
+ start, len);
+ if (ret)
+ goto free_qgroup;
+
if (trans) {
ret = insert_reserved_file_extent(trans, inode,
file_offset, &stack_fi,
diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index 360046a104c7..85d08f052a64 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -124,6 +124,40 @@ void btrfs_remove_ordered_stripe(struct btrfs_fs_info *fs_info,
kfree(stripe);
}
+int btrfs_insert_preallocated_raid_stripe(struct btrfs_fs_info *fs_info,
+ u64 start, u64 len)
+{
+ struct btrfs_io_context *bioc = NULL;
+ struct btrfs_ordered_stripe *stripe;
+ u64 map_length = len;
+ int ret;
+
+ if (!fs_info->stripe_root)
+ return 0;
+
+ ret = btrfs_map_block(fs_info, BTRFS_MAP_WRITE, start, &map_length,
+ &bioc, 0);
+ if (ret)
+ return ret;
+
+ stripe = btrfs_lookup_ordered_stripe(fs_info, start);
+ if (!stripe) {
+ stripe = btrfs_add_ordered_stripe(fs_info, start, len,
+ bioc->num_stripes,
+ bioc->stripes);
+ if (IS_ERR(stripe))
+ return PTR_ERR(stripe);
+ } else {
+ spin_lock(&stripe->lock);
+ memcpy(stripe->stripes, bioc->stripes,
+ bioc->num_stripes * sizeof(struct btrfs_io_stripe));
+ spin_unlock(&stripe->lock);
+ btrfs_put_ordered_stripe(fs_info, stripe);
+ }
+
+ return 0;
+}
+
int btrfs_insert_raid_extent(struct btrfs_trans_handle *trans,
struct btrfs_ordered_stripe *stripe)
{
diff --git a/fs/btrfs/raid-stripe-tree.h b/fs/btrfs/raid-stripe-tree.h
index b9c40ef26dfa..1644515fcecb 100644
--- a/fs/btrfs/raid-stripe-tree.h
+++ b/fs/btrfs/raid-stripe-tree.h
@@ -18,6 +18,8 @@ struct btrfs_ordered_stripe {
int btrfs_insert_raid_extent(struct btrfs_trans_handle *trans,
struct btrfs_ordered_stripe *stripe);
+int btrfs_insert_preallocated_raid_stripe(struct btrfs_fs_info *fs_info,
+ u64 start, u64 len);
void btrfs_raid_stripe_update(struct work_struct *work);
struct btrfs_ordered_stripe *btrfs_lookup_ordered_stripe(
struct btrfs_fs_info *fs_info,
--
2.35.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH RFC v2 5/8] btrfs: add code to delete raid extent
2022-06-29 14:41 [PATCH RFC v2 0/8] btrfs: raid-stripe-tree draft patches Johannes Thumshirn
` (3 preceding siblings ...)
2022-06-29 14:41 ` [PATCH RFC v2 4/8] btrfs: add boilerplate code to insert stripe entries for preallocated extents Johannes Thumshirn
@ 2022-06-29 14:41 ` Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 6/8] btrfs: add code to read " Johannes Thumshirn
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2022-06-29 14:41 UTC (permalink / raw)
To: linux-btrfs
Cc: Naohiro Aota, Damien Le Moal, Johannes Thumshirn, Qu Wenruo,
Christoph Hellwig, Josef Bacik
Add boilerplate code to delete entries from the raid-stripe-tree if the
corresponding file extent got deleted.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/extent-tree.c | 8 ++++++++
fs/btrfs/raid-stripe-tree.c | 31 +++++++++++++++++++++++++++++++
fs/btrfs/raid-stripe-tree.h | 2 ++
3 files changed, 41 insertions(+)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index e1738b3dfb21..f62036790c2f 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3225,6 +3225,14 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
}
}
+ if (is_data) {
+ ret = btrfs_delete_raid_extent(trans, bytenr, num_bytes);
+ if (ret) {
+ btrfs_abort_transaction(trans, ret);
+ return ret;
+ }
+ }
+
ret = btrfs_del_items(trans, extent_root, path, path->slots[0],
num_to_del);
if (ret) {
diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index 85d08f052a64..a673aaf8e703 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -124,6 +124,37 @@ void btrfs_remove_ordered_stripe(struct btrfs_fs_info *fs_info,
kfree(stripe);
}
+int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start,
+ u64 length)
+{
+ struct btrfs_fs_info *fs_info = trans->fs_info;
+ struct btrfs_root *stripe_root = fs_info->stripe_root;
+ struct btrfs_path *path;
+ struct btrfs_key stripe_key;
+ int ret;
+
+ if (!stripe_root)
+ return 0;
+
+ stripe_key.objectid = start;
+ stripe_key.type = BTRFS_RAID_STRIPE_KEY;
+ stripe_key.offset = length;
+
+ path = btrfs_alloc_path();
+ if (!path)
+ return -ENOMEM;
+
+ ret = btrfs_search_slot(trans, stripe_root, &stripe_key, path, -1, 1);
+ if (ret)
+ goto out;
+
+ ret = btrfs_del_item(trans, stripe_root, path);
+out:
+ btrfs_free_path(path);
+ return ret;
+
+}
+
int btrfs_insert_preallocated_raid_stripe(struct btrfs_fs_info *fs_info,
u64 start, u64 len)
{
diff --git a/fs/btrfs/raid-stripe-tree.h b/fs/btrfs/raid-stripe-tree.h
index 1644515fcecb..d3cc24e37de1 100644
--- a/fs/btrfs/raid-stripe-tree.h
+++ b/fs/btrfs/raid-stripe-tree.h
@@ -16,6 +16,8 @@ struct btrfs_ordered_stripe {
refcount_t ref;
};
+int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start,
+ u64 length);
int btrfs_insert_raid_extent(struct btrfs_trans_handle *trans,
struct btrfs_ordered_stripe *stripe);
int btrfs_insert_preallocated_raid_stripe(struct btrfs_fs_info *fs_info,
--
2.35.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH RFC v2 6/8] btrfs: add code to read raid extent
2022-06-29 14:41 [PATCH RFC v2 0/8] btrfs: raid-stripe-tree draft patches Johannes Thumshirn
` (4 preceding siblings ...)
2022-06-29 14:41 ` [PATCH RFC v2 5/8] btrfs: add code to delete raid extent Johannes Thumshirn
@ 2022-06-29 14:41 ` Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 7/8] btrfs: zoned: allow zoned RAID1 Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 8/8] btrfs: add raid stripe tree pretty printer Johannes Thumshirn
7 siblings, 0 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2022-06-29 14:41 UTC (permalink / raw)
To: linux-btrfs
Cc: Naohiro Aota, Damien Le Moal, Johannes Thumshirn, Qu Wenruo,
Christoph Hellwig, Josef Bacik
Add boilerplate code to lookup the physical address from the
raid-stripe-tree when a read on an RAID volume formatted with the
raid-stripe-tree was attempted.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/raid-stripe-tree.c | 65 +++++++++++++++++++++++++++++++++++++
fs/btrfs/raid-stripe-tree.h | 3 ++
fs/btrfs/volumes.c | 22 +++++++++++--
3 files changed, 88 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index a673aaf8e703..5ee630a792fc 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -7,6 +7,7 @@
#include "disk-io.h"
#include "raid-stripe-tree.h"
#include "volumes.h"
+#include "misc.h"
static struct rb_node *stripe_tree_insert(struct rb_root *root, u64 logical,
struct rb_node *node)
@@ -251,3 +252,67 @@ void btrfs_raid_stripe_update(struct work_struct *work)
btrfs_put_bioc(bioc);
}
+int btrfs_get_raid_extent_offset(struct btrfs_fs_info *fs_info,
+ u64 logical, u64 length, u64 map_type,
+ u64 devid, u64 *physical)
+{
+ struct btrfs_root *stripe_root = fs_info->stripe_root;
+ struct btrfs_dp_stripe *raid_stripe;
+ struct btrfs_key stripe_key;
+ struct btrfs_key found_key;
+ struct btrfs_path *path;
+ struct extent_buffer *leaf;
+ u64 offset;
+ u64 found_logical, found_length;
+ int num_stripes;
+ int slot;
+ int ret;
+ int i;
+
+ stripe_key.objectid = logical;
+ stripe_key.type = BTRFS_RAID_STRIPE_KEY;
+ stripe_key.offset = length;
+
+ path = btrfs_alloc_path();
+ if (!path)
+ return -ENOMEM;
+
+ num_stripes = btrfs_bg_type_to_factor(map_type);
+
+ ret = btrfs_search_slot_for_read(stripe_root, &stripe_key, path, 0, 0);
+ if (ret < 0) {
+ goto out;
+ }
+
+ if (ret == 1)
+ ret = 0;
+
+ while (1) {
+ leaf = path->nodes[0];
+ slot = path->slots[0];
+
+ btrfs_item_key_to_cpu(leaf, &found_key, slot);
+ found_logical = found_key.objectid;
+ found_length = found_key.offset;
+
+ if (!in_range(logical, found_logical, found_length))
+ goto next;
+ offset = logical - found_logical;
+
+ raid_stripe = btrfs_item_ptr(leaf, slot, struct btrfs_dp_stripe);
+ for (i = 0; i < num_stripes; i++) {
+ if (btrfs_stripe_extent_devid_nr(leaf, raid_stripe, i) != devid)
+ continue;
+ *physical = btrfs_stripe_extent_physical_nr(leaf, raid_stripe, i) + offset;
+ goto out;
+ }
+next:
+ ret = btrfs_next_item(stripe_root, path);
+ if (ret)
+ break;
+ }
+out:
+ btrfs_free_path(path);
+
+ return ret;
+}
diff --git a/fs/btrfs/raid-stripe-tree.h b/fs/btrfs/raid-stripe-tree.h
index d3cc24e37de1..75e17cad283a 100644
--- a/fs/btrfs/raid-stripe-tree.h
+++ b/fs/btrfs/raid-stripe-tree.h
@@ -16,6 +16,9 @@ struct btrfs_ordered_stripe {
refcount_t ref;
};
+int btrfs_get_raid_extent_offset(struct btrfs_fs_info *fs_info,
+ u64 logical, u64 length, u64 map_type,
+ u64 devid, u64 *physical);
int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start,
u64 length);
int btrfs_insert_raid_extent(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b8d4e92c7196..2569ef564c97 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6526,9 +6526,27 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
}
for (i = 0; i < num_stripes; i++) {
- bioc->stripes[i].physical = map->stripes[stripe_index].physical +
- stripe_offset + stripe_nr * map->stripe_len;
+ u64 physical;
+
bioc->stripes[i].dev = map->stripes[stripe_index].dev;
+
+ if (fs_info->stripe_root && op == BTRFS_MAP_READ &&
+ btrfs_need_stripe_tree_update(bioc->fs_info,
+ map->type)) {
+ ret = btrfs_get_raid_extent_offset(fs_info, logical,
+ map->stripe_len,
+ map->type,
+ bioc->stripes[i].dev->devid,
+ &physical);
+ if (ret) {
+ btrfs_put_bioc(bioc);
+ goto out;
+ }
+ } else {
+ physical = map->stripes[stripe_index].physical +
+ stripe_offset + stripe_nr * map->stripe_len;
+ }
+ bioc->stripes[i].physical = physical;
stripe_index++;
}
--
2.35.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH RFC v2 7/8] btrfs: zoned: allow zoned RAID1
2022-06-29 14:41 [PATCH RFC v2 0/8] btrfs: raid-stripe-tree draft patches Johannes Thumshirn
` (5 preceding siblings ...)
2022-06-29 14:41 ` [PATCH RFC v2 6/8] btrfs: add code to read " Johannes Thumshirn
@ 2022-06-29 14:41 ` Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 8/8] btrfs: add raid stripe tree pretty printer Johannes Thumshirn
7 siblings, 0 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2022-06-29 14:41 UTC (permalink / raw)
To: linux-btrfs
Cc: Naohiro Aota, Damien Le Moal, Johannes Thumshirn, Qu Wenruo,
Christoph Hellwig, Josef Bacik
When we have a raid-stripe-tree, we can do RAID1 on zoned devices for data
block-groups. For meta-data block-groups, we don't actually need
anything special, as all meta-data I/O is protected by the
btrfs_zoned_meta_io_lock() already.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/zoned.c | 35 +++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 5cf6abeda588..e51e405342ad 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1463,6 +1463,41 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
cache->zone_capacity = min(caps[0], caps[1]);
break;
case BTRFS_BLOCK_GROUP_RAID1:
+ case BTRFS_BLOCK_GROUP_RAID1C3:
+ case BTRFS_BLOCK_GROUP_RAID1C4:
+ if (map->type & BTRFS_BLOCK_GROUP_DATA &&
+ !fs_info->stripe_root) {
+ btrfs_err(fs_info,
+ "zoned: data RAID1 needs stripe_root");
+ ret = -EIO;
+ goto out;
+
+ }
+
+ for (i = 0; i < map->num_stripes; i++) {
+ if (alloc_offsets[i] == WP_MISSING_DEV)
+ continue;
+
+ if (i == 0)
+ continue;
+
+ if (alloc_offsets[0] != alloc_offsets[i]) {
+ btrfs_err(fs_info,
+ "zoned: write pointer offset mismatch of zones in RAID profile");
+ ret = -EIO;
+ goto out;
+ }
+ if (test_bit(0, active) != test_bit(i, active)) {
+ if (!btrfs_zone_activate(cache)) {
+ ret = -EIO;
+ goto out;
+ }
+ }
+ cache->zone_capacity = min(caps[0], caps[i]);
+ }
+ cache->zone_is_active = test_bit(0, active);
+ cache->alloc_offset = alloc_offsets[0];
+ break;
case BTRFS_BLOCK_GROUP_RAID0:
case BTRFS_BLOCK_GROUP_RAID10:
case BTRFS_BLOCK_GROUP_RAID5:
--
2.35.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH RFC v2 8/8] btrfs: add raid stripe tree pretty printer
2022-06-29 14:41 [PATCH RFC v2 0/8] btrfs: raid-stripe-tree draft patches Johannes Thumshirn
` (6 preceding siblings ...)
2022-06-29 14:41 ` [PATCH RFC v2 7/8] btrfs: zoned: allow zoned RAID1 Johannes Thumshirn
@ 2022-06-29 14:41 ` Johannes Thumshirn
7 siblings, 0 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2022-06-29 14:41 UTC (permalink / raw)
To: linux-btrfs
Cc: Naohiro Aota, Damien Le Moal, Johannes Thumshirn, Qu Wenruo,
Christoph Hellwig, Josef Bacik
Decode raid-stripe-tree entries on btrfs_print_tree().
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
fs/btrfs/print-tree.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c
index dd8777872143..10a39d5a7e40 100644
--- a/fs/btrfs/print-tree.c
+++ b/fs/btrfs/print-tree.c
@@ -6,6 +6,7 @@
#include "ctree.h"
#include "disk-io.h"
#include "print-tree.h"
+#include "raid-stripe-tree.h"
struct root_name_map {
u64 id;
@@ -25,6 +26,7 @@ static const struct root_name_map root_map[] = {
{ BTRFS_FREE_SPACE_TREE_OBJECTID, "FREE_SPACE_TREE" },
{ BTRFS_BLOCK_GROUP_TREE_OBJECTID, "BLOCK_GROUP_TREE" },
{ BTRFS_DATA_RELOC_TREE_OBJECTID, "DATA_RELOC_TREE" },
+ { BTRFS_RAID_STRIPE_TREE_OBJECTID, "RAID_STRIPE_TREE" },
};
const char *btrfs_root_name(const struct btrfs_key *key, char *buf)
@@ -184,6 +186,20 @@ static void print_uuid_item(struct extent_buffer *l, unsigned long offset,
}
}
+static void print_raid_stripe_key(struct extent_buffer *eb, u32 item_size,
+ struct btrfs_dp_stripe *stripe)
+{
+ int num_stripes;
+ int i;
+
+ num_stripes = btrfs_num_raid_stripes(item_size);
+
+ for (i = 0; i < num_stripes; i++)
+ pr_info("\t\t\tstripe %d devid %llu physical %llu\n", i,
+ btrfs_stripe_extent_devid_nr(eb, stripe, i),
+ btrfs_stripe_extent_physical_nr(eb, stripe, i));
+}
+
/*
* Helper to output refs and locking status of extent buffer. Useful to debug
* race condition related problems.
@@ -348,6 +364,11 @@ void btrfs_print_leaf(struct extent_buffer *l)
print_uuid_item(l, btrfs_item_ptr_offset(l, i),
btrfs_item_size(l, i));
break;
+ case BTRFS_RAID_STRIPE_KEY:
+ print_raid_stripe_key(l, btrfs_item_size(l, i),
+ btrfs_item_ptr(l, i,
+ struct btrfs_dp_stripe));
+ break;
}
}
}
--
2.35.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-06-29 14:41 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-29 14:41 [PATCH RFC v2 0/8] btrfs: raid-stripe-tree draft patches Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 1/8] btrfs: add raid stripe tree definitions Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 2/8] btrfs: read raid-stripe-tree from disk Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 3/8] btrfs: add boilerplate code to insert raid extent Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 4/8] btrfs: add boilerplate code to insert stripe entries for preallocated extents Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 5/8] btrfs: add code to delete raid extent Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 6/8] btrfs: add code to read " Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 7/8] btrfs: zoned: allow zoned RAID1 Johannes Thumshirn
2022-06-29 14:41 ` [PATCH RFC v2 8/8] btrfs: add raid stripe tree pretty printer Johannes Thumshirn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).