[PATCH v5 0/5] btrfs: fix corruption caused by partial dio writes

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v5 0/5] btrfs: fix corruption caused by partial dio writes
@ 2023-03-22 19:11 Boris Burkov
  2023-03-22 19:11 ` [PATCH v5 1/5] btrfs: add function to create and return an ordered extent Boris Burkov
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Boris Burkov @ 2023-03-22 19:11 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

The final patch in this series ensures that bios submitted by btrfs dio
match the corresponding ordered_extent and extent_map exactly. As a
result, there is no hole or deadlock as a result of partial writes, even
if the write buffer is a partly overlapping mapping of the range being
written to.

This required a bit of refactoring and setup. Specifically, the zoned
device code for "extracting" an ordered extent matching a bio could be
reused with some refactoring to return the new ordered extents after the
split.

Patch 1: Generic patch for returning an ordered extent while creating it
Patch 2: Cache the dio ordered extent so that we can efficiently detect
         partial writes during bio submission, without an extra lookup.
Patch 3: Use Patch 1 to track the new ordered_extent(s) that result from
         splitting an ordered_extent.
Patch 4: Fix a bug in ordered extent splitting
Patch 5: Use the new more general split logic of Patch 4 and the stashed
         ordered extent from Patch 2 to split partial dio bios to fix
         the corruption while avoiding the deadlock.

---
Changelog:
v5:
- skip splitting em on nocow writes, this removes the need to refactor
  split_em, so remove that patch, and just rename split_zoned_em to
  split_em.
v4:
- significant changes; redesign the fix to use bio splitting instead of
  extending the ordered_extent lifetime across calls into iomap.
- all the oe/em splitting refactoring and fixes
v3:
- handle BTRFS_IOERR set on the ordered_extent in btrfs_dio_iomap_end.
  If the bio fails before we loop in the submission loop and exit from
  the loop early, we never submit a second bio covering the rest of the
  extent range, resulting in leaking the ordered_extent, which hangs umount.
  We can distinguish this from a short write in btrfs_dio_iomap_end by
  checking the ordered_extent.
v2:
- rename new ordered extent function
- pull the new function into a prep patch
- reorganize how the ordered_extent is stored/passed around to avoid so
many annoying memsets and exposing it to fs/btrfs/file.c
- lots of small code style improvements
- remove unintentional whitespace changes
- commit message improvements
- various ASSERTs for clarity/debugging


Boris Burkov (5):
  btrfs: add function to create and return an ordered extent
  btrfs: stash ordered extent in dio_data during iomap dio
  btrfs: return ordered_extent splits from bio extraction
  btrfs: fix crash with non-zero pre in btrfs_split_ordered_extent
  btrfs: split partial dio bios before submit

 fs/btrfs/bio.c          |  2 +-
 fs/btrfs/btrfs_inode.h  |  5 ++-
 fs/btrfs/inode.c        | 94 +++++++++++++++++++++++++++++++----------
 fs/btrfs/ordered-data.c | 88 ++++++++++++++++++++++++++++----------
 fs/btrfs/ordered-data.h | 13 ++++--
 5 files changed, 152 insertions(+), 50 deletions(-)

-- 
2.38.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v5 1/5] btrfs: add function to create and return an ordered extent
  2023-03-22 19:11 [PATCH v5 0/5] btrfs: fix corruption caused by partial dio writes Boris Burkov
@ 2023-03-22 19:11 ` Boris Burkov
  2023-03-22 19:11 ` [PATCH v5 2/5] btrfs: stash ordered extent in dio_data during iomap dio Boris Burkov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Boris Burkov @ 2023-03-22 19:11 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Currently, btrfs_add_ordered_extent allocates a new ordered extent, adds
it to the rb_tree, but doesn't return a referenced pointer to the
caller. There are cases where it is useful for the creator of a new
ordered_extent to hang on to such a pointer, so add a new function
btrfs_alloc_ordered_extent which is the same as
btrfs_add_ordered_extent, except it takes an additional reference count
and returns a pointer to the ordered_extent. Implement
btrfs_add_ordered_extent as btrfs_alloc_ordered_extent followed by
dropping the new reference and handling the IS_ERR case.

The type of flags in btrfs_alloc_ordered_extent and
btrfs_add_ordered_extent is changed from unsigned int to unsigned long
so it's unified with the other ordered extent functions.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ordered-data.c | 46 +++++++++++++++++++++++++++++++++--------
 fs/btrfs/ordered-data.h |  7 ++++++-
 2 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 6c24b69e2d0a..1848d0d1a9c4 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -160,14 +160,16 @@ static inline struct rb_node *tree_search(struct btrfs_ordered_inode_tree *tree,
  * @compress_type:   Compression algorithm used for data.
  *
  * Most of these parameters correspond to &struct btrfs_file_extent_item. The
- * tree is given a single reference on the ordered extent that was inserted.
+ * tree is given a single reference on the ordered extent that was inserted, and
+ * the returned pointer is given a second reference.
  *
- * Return: 0 or -ENOMEM.
+ * Return: the new ordered extent or ERR_PTR(-ENOMEM).
  */
-int btrfs_add_ordered_extent(struct btrfs_inode *inode, u64 file_offset,
-			     u64 num_bytes, u64 ram_bytes, u64 disk_bytenr,
-			     u64 disk_num_bytes, u64 offset, unsigned flags,
-			     int compress_type)
+struct btrfs_ordered_extent *btrfs_alloc_ordered_extent(
+			struct btrfs_inode *inode, u64 file_offset,
+			u64 num_bytes, u64 ram_bytes, u64 disk_bytenr,
+			u64 disk_num_bytes, u64 offset, unsigned long flags,
+			int compress_type)
 {
 	struct btrfs_root *root = inode->root;
 	struct btrfs_fs_info *fs_info = root->fs_info;
@@ -181,7 +183,7 @@ int btrfs_add_ordered_extent(struct btrfs_inode *inode, u64 file_offset,
 		/* For nocow write, we can release the qgroup rsv right now */
 		ret = btrfs_qgroup_free_data(inode, NULL, file_offset, num_bytes);
 		if (ret < 0)
-			return ret;
+			return ERR_PTR(ret);
 		ret = 0;
 	} else {
 		/*
@@ -190,11 +192,11 @@ int btrfs_add_ordered_extent(struct btrfs_inode *inode, u64 file_offset,
 		 */
 		ret = btrfs_qgroup_release_data(inode, file_offset, num_bytes);
 		if (ret < 0)
-			return ret;
+			return ERR_PTR(ret);
 	}
 	entry = kmem_cache_zalloc(btrfs_ordered_extent_cache, GFP_NOFS);
 	if (!entry)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
 	entry->file_offset = file_offset;
 	entry->num_bytes = num_bytes;
@@ -256,6 +258,32 @@ int btrfs_add_ordered_extent(struct btrfs_inode *inode, u64 file_offset,
 	btrfs_mod_outstanding_extents(inode, 1);
 	spin_unlock(&inode->lock);
 
+	/* One ref for the returned entry to match semantics of lookup. */
+	refcount_inc(&entry->refs);
+
+	return entry;
+}
+
+/*
+ * Add a new btrfs_ordered_extent for the range, but drop the reference instead
+ * of returning it to the caller.
+ */
+int btrfs_add_ordered_extent(struct btrfs_inode *inode, u64 file_offset,
+			     u64 num_bytes, u64 ram_bytes, u64 disk_bytenr,
+			     u64 disk_num_bytes, u64 offset, unsigned long flags,
+			     int compress_type)
+{
+	struct btrfs_ordered_extent *ordered;
+
+	ordered = btrfs_alloc_ordered_extent(inode, file_offset, num_bytes,
+					     ram_bytes, disk_bytenr,
+					     disk_num_bytes, offset, flags,
+					     compress_type);
+
+	if (IS_ERR(ordered))
+		return PTR_ERR(ordered);
+	btrfs_put_ordered_extent(ordered);
+
 	return 0;
 }
 
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index eb40cb39f842..18007f9c00ad 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -178,9 +178,14 @@ void btrfs_mark_ordered_io_finished(struct btrfs_inode *inode,
 bool btrfs_dec_test_ordered_pending(struct btrfs_inode *inode,
 				    struct btrfs_ordered_extent **cached,
 				    u64 file_offset, u64 io_size);
+struct btrfs_ordered_extent *btrfs_alloc_ordered_extent(
+			struct btrfs_inode *inode, u64 file_offset,
+			u64 num_bytes, u64 ram_bytes, u64 disk_bytenr,
+			u64 disk_num_bytes, u64 offset, unsigned long flags,
+			int compress_type);
 int btrfs_add_ordered_extent(struct btrfs_inode *inode, u64 file_offset,
 			     u64 num_bytes, u64 ram_bytes, u64 disk_bytenr,
-			     u64 disk_num_bytes, u64 offset, unsigned flags,
+			     u64 disk_num_bytes, u64 offset, unsigned long flags,
 			     int compress_type);
 void btrfs_add_ordered_sum(struct btrfs_ordered_extent *entry,
 			   struct btrfs_ordered_sum *sum);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 2/5] btrfs: stash ordered extent in dio_data during iomap dio
  2023-03-22 19:11 [PATCH v5 0/5] btrfs: fix corruption caused by partial dio writes Boris Burkov
  2023-03-22 19:11 ` [PATCH v5 1/5] btrfs: add function to create and return an ordered extent Boris Burkov
@ 2023-03-22 19:11 ` Boris Burkov
  2023-03-22 19:11 ` [PATCH v5 3/5] btrfs: return ordered_extent splits from bio extraction Boris Burkov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Boris Burkov @ 2023-03-22 19:11 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

While it is not feasible for an ordered extent to survive across the
calls btrfs_direct_write makes into __iomap_dio_rw, it is still helpful
to stash it on the dio_data in between creating it in iomap_begin and
finishing it in either end_io or iomap_end.

The specific use I have in mind is that we can check if a partcular bio
is partial in submit_io without unconditionally looking up the ordered
extent. This is a preparatory patch for a later patch which does just
that.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/inode.c | 37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 76d93b9e94a9..5ab486f448eb 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -81,6 +81,7 @@ struct btrfs_dio_data {
 	struct extent_changeset *data_reserved;
 	bool data_space_reserved;
 	bool nocow_done;
+	struct btrfs_ordered_extent *ordered;
 };
 
 struct btrfs_dio_private {
@@ -6968,6 +6969,7 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
 }
 
 static struct extent_map *btrfs_create_dio_extent(struct btrfs_inode *inode,
+						  struct btrfs_dio_data *dio_data,
 						  const u64 start,
 						  const u64 len,
 						  const u64 orig_start,
@@ -6978,7 +6980,7 @@ static struct extent_map *btrfs_create_dio_extent(struct btrfs_inode *inode,
 						  const int type)
 {
 	struct extent_map *em = NULL;
-	int ret;
+	struct btrfs_ordered_extent *ordered;
 
 	if (type != BTRFS_ORDERED_NOCOW) {
 		em = create_io_em(inode, start, len, orig_start, block_start,
@@ -6988,18 +6990,21 @@ static struct extent_map *btrfs_create_dio_extent(struct btrfs_inode *inode,
 		if (IS_ERR(em))
 			goto out;
 	}
-	ret = btrfs_add_ordered_extent(inode, start, len, len, block_start,
-				       block_len, 0,
-				       (1 << type) |
-				       (1 << BTRFS_ORDERED_DIRECT),
-				       BTRFS_COMPRESS_NONE);
-	if (ret) {
+	ordered = btrfs_alloc_ordered_extent(inode, start, len, len,
+					     block_start, block_len, 0,
+					     (1 << type) |
+					     (1 << BTRFS_ORDERED_DIRECT),
+					     BTRFS_COMPRESS_NONE);
+	if (IS_ERR(ordered)) {
 		if (em) {
 			free_extent_map(em);
 			btrfs_drop_extent_map_range(inode, start,
 						    start + len - 1, false);
 		}
-		em = ERR_PTR(ret);
+		em = ERR_PTR(PTR_ERR(ordered));
+	} else {
+		ASSERT(!dio_data->ordered);
+		dio_data->ordered = ordered;
 	}
  out:
 
@@ -7007,6 +7012,7 @@ static struct extent_map *btrfs_create_dio_extent(struct btrfs_inode *inode,
 }
 
 static struct extent_map *btrfs_new_extent_direct(struct btrfs_inode *inode,
+						  struct btrfs_dio_data *dio_data,
 						  u64 start, u64 len)
 {
 	struct btrfs_root *root = inode->root;
@@ -7022,7 +7028,8 @@ static struct extent_map *btrfs_new_extent_direct(struct btrfs_inode *inode,
 	if (ret)
 		return ERR_PTR(ret);
 
-	em = btrfs_create_dio_extent(inode, start, ins.offset, start,
+	em = btrfs_create_dio_extent(inode, dio_data,
+				     start, ins.offset, start,
 				     ins.objectid, ins.offset, ins.offset,
 				     ins.offset, BTRFS_ORDERED_REGULAR);
 	btrfs_dec_block_group_reservations(fs_info, ins.objectid);
@@ -7367,7 +7374,7 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 		}
 		space_reserved = true;
 
-		em2 = btrfs_create_dio_extent(BTRFS_I(inode), start, len,
+		em2 = btrfs_create_dio_extent(BTRFS_I(inode), dio_data, start, len,
 					      orig_start, block_start,
 					      len, orig_block_len,
 					      ram_bytes, type);
@@ -7409,7 +7416,7 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 			goto out;
 		space_reserved = true;
 
-		em = btrfs_new_extent_direct(BTRFS_I(inode), start, len);
+		em = btrfs_new_extent_direct(BTRFS_I(inode), dio_data, start, len);
 		if (IS_ERR(em)) {
 			ret = PTR_ERR(em);
 			goto out;
@@ -7715,6 +7722,10 @@ static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 				      pos + length - 1, NULL);
 		ret = -ENOTBLK;
 	}
+	if (write) {
+		btrfs_put_ordered_extent(dio_data->ordered);
+		dio_data->ordered = NULL;
+	}
 
 	if (write)
 		extent_changeset_free(dio_data->data_reserved);
@@ -7776,7 +7787,7 @@ static const struct iomap_dio_ops btrfs_dio_ops = {
 
 ssize_t btrfs_dio_read(struct kiocb *iocb, struct iov_iter *iter, size_t done_before)
 {
-	struct btrfs_dio_data data;
+	struct btrfs_dio_data data = { 0 };
 
 	return iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, &btrfs_dio_ops,
 			    IOMAP_DIO_PARTIAL, &data, done_before);
@@ -7785,7 +7796,7 @@ ssize_t btrfs_dio_read(struct kiocb *iocb, struct iov_iter *iter, size_t done_be
 struct iomap_dio *btrfs_dio_write(struct kiocb *iocb, struct iov_iter *iter,
 				  size_t done_before)
 {
-	struct btrfs_dio_data data;
+	struct btrfs_dio_data data = { 0 };
 
 	return __iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, &btrfs_dio_ops,
 			    IOMAP_DIO_PARTIAL, &data, done_before);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 3/5] btrfs: return ordered_extent splits from bio extraction
  2023-03-22 19:11 [PATCH v5 0/5] btrfs: fix corruption caused by partial dio writes Boris Burkov
  2023-03-22 19:11 ` [PATCH v5 1/5] btrfs: add function to create and return an ordered extent Boris Burkov
  2023-03-22 19:11 ` [PATCH v5 2/5] btrfs: stash ordered extent in dio_data during iomap dio Boris Burkov
@ 2023-03-22 19:11 ` Boris Burkov
  2023-03-23  8:47   ` Christoph Hellwig
  2023-03-22 19:11 ` [PATCH v5 4/5] btrfs: fix crash with non-zero pre in btrfs_split_ordered_extent Boris Burkov
  2023-03-22 19:11 ` [PATCH v5 5/5] btrfs: split partial dio bios before submit Boris Burkov
  4 siblings, 1 reply; 15+ messages in thread
From: Boris Burkov @ 2023-03-22 19:11 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

When extracting a bio from its ordered extent for dio partial writes, we
need the "remainder" ordered extent. It would be possible to look it up
in that case, but since we can grab the ordered_extent from the new
allocation function, we might as well wire it up to be returned to the
caller via out parameter and save that lookup.

Refactor the clone ordered extent function to return the new ordered
extent, then refactor the split and extract functions to pass back the
new pre and post split ordered extents via output parameter.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/bio.c          |  2 +-
 fs/btrfs/btrfs_inode.h  |  5 ++++-
 fs/btrfs/inode.c        | 36 +++++++++++++++++++++++++++---------
 fs/btrfs/ordered-data.c | 36 +++++++++++++++++++++++-------------
 fs/btrfs/ordered-data.h |  6 ++++--
 5 files changed, 59 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index cf09c6271edb..b849ced40d37 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -653,7 +653,7 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
 		if (use_append) {
 			bio->bi_opf &= ~REQ_OP_WRITE;
 			bio->bi_opf |= REQ_OP_ZONE_APPEND;
-			ret = btrfs_extract_ordered_extent(bbio);
+			ret = btrfs_extract_ordered_extent_bio(bbio, NULL, NULL, NULL);
 			if (ret)
 				goto fail_put_bio;
 		}
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 9dc21622806e..e92a09559058 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -407,7 +407,10 @@ static inline void btrfs_inode_split_flags(u64 inode_item_flags,
 
 int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
 			    u32 pgoff, u8 *csum, const u8 * const csum_expected);
-blk_status_t btrfs_extract_ordered_extent(struct btrfs_bio *bbio);
+blk_status_t btrfs_extract_ordered_extent_bio(struct btrfs_bio *bbio,
+					      struct btrfs_ordered_extent *ordered,
+					      struct btrfs_ordered_extent **ret_pre,
+					      struct btrfs_ordered_extent **ret_post);
 bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
 			u32 bio_offset, struct bio_vec *bv);
 noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5ab486f448eb..e30390051f15 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2514,10 +2514,14 @@ void btrfs_clear_delalloc_extent(struct btrfs_inode *inode,
 /*
  * Split an extent_map at [start, start + len]
  *
- * This function is intended to be used only for extract_ordered_extent().
+ * This function is intended to be used only for
+ * btrfs_extract_ordered_extent_bio().
+ *
+ * It makes assumptions about the extent map that are only valid in the narrow
+ * situations in which we are extracting a bio from a containing ordered extent,
+ * that are specific to zoned filesystems or partial dio writes.
  */
-static int split_zoned_em(struct btrfs_inode *inode, u64 start, u64 len,
-			  u64 pre, u64 post)
+static int split_em(struct btrfs_inode *inode, u64 start, u64 len, u64 pre, u64 post)
 {
 	struct extent_map_tree *em_tree = &inode->extent_tree;
 	struct extent_map *em;
@@ -2626,22 +2630,36 @@ static int split_zoned_em(struct btrfs_inode *inode, u64 start, u64 len,
 	return ret;
 }
 
-blk_status_t btrfs_extract_ordered_extent(struct btrfs_bio *bbio)
+/*
+ * Extract a bio from an ordered extent to enforce an invariant where the bio
+ * fully matches a single ordered extent.
+ *
+ * @bbio: the bio to extract.
+ * @ordered: the ordered extent the bio is in, will be shrunk to fit. If NULL we
+ *	     will look it up.
+ * @ret_pre: out parameter to return the new oe in front of the bio, if needed.
+ * @ret_post: out parameter to return the new oe past the bio, if needed.
+ */
+blk_status_t btrfs_extract_ordered_extent_bio(struct btrfs_bio *bbio,
+					      struct btrfs_ordered_extent *ordered,
+					      struct btrfs_ordered_extent **ret_pre,
+					      struct btrfs_ordered_extent **ret_post)
 {
 	u64 start = (u64)bbio->bio.bi_iter.bi_sector << SECTOR_SHIFT;
 	u64 len = bbio->bio.bi_iter.bi_size;
 	struct btrfs_inode *inode = bbio->inode;
-	struct btrfs_ordered_extent *ordered;
 	u64 file_len;
 	u64 end = start + len;
 	u64 ordered_end;
 	u64 pre, post;
 	int ret = 0;
 
-	ordered = btrfs_lookup_ordered_extent(inode, bbio->file_offset);
+	if (!ordered)
+		ordered = btrfs_lookup_ordered_extent(inode, bbio->file_offset);
 	if (WARN_ON_ONCE(!ordered))
 		return BLK_STS_IOERR;
 
+	ordered_end = ordered->disk_bytenr + ordered->disk_num_bytes;
 	/* No need to split */
 	if (ordered->disk_num_bytes == len)
 		goto out;
@@ -2658,7 +2676,6 @@ blk_status_t btrfs_extract_ordered_extent(struct btrfs_bio *bbio)
 		goto out;
 	}
 
-	ordered_end = ordered->disk_bytenr + ordered->disk_num_bytes;
 	/* bio must be in one ordered extent */
 	if (WARN_ON_ONCE(start < ordered->disk_bytenr || end > ordered_end)) {
 		ret = -EINVAL;
@@ -2675,10 +2692,11 @@ blk_status_t btrfs_extract_ordered_extent(struct btrfs_bio *bbio)
 	pre = start - ordered->disk_bytenr;
 	post = ordered_end - end;
 
-	ret = btrfs_split_ordered_extent(ordered, pre, post);
+	ret = btrfs_split_ordered_extent(ordered, pre, post, ret_pre, ret_post);
 	if (ret)
 		goto out;
-	ret = split_zoned_em(inode, bbio->file_offset, file_len, pre, post);
+	if (!test_bit(BTRFS_ORDERED_NOCOW, &ordered->flags))
+		ret = split_em(inode, bbio->file_offset, file_len, pre, post);
 
 out:
 	btrfs_put_ordered_extent(ordered);
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 1848d0d1a9c4..4bebebb9b434 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -1117,8 +1117,8 @@ bool btrfs_try_lock_ordered_range(struct btrfs_inode *inode, u64 start, u64 end,
 }
 
 
-static int clone_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pos,
-				u64 len)
+static struct btrfs_ordered_extent *clone_ordered_extent(struct btrfs_ordered_extent *ordered,
+						  u64 pos, u64 len)
 {
 	struct inode *inode = ordered->inode;
 	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
@@ -1133,18 +1133,22 @@ static int clone_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pos,
 	percpu_counter_add_batch(&fs_info->ordered_bytes, -len,
 				 fs_info->delalloc_batch);
 	WARN_ON_ONCE(flags & (1 << BTRFS_ORDERED_COMPRESSED));
-	return btrfs_add_ordered_extent(BTRFS_I(inode), file_offset, len, len,
-					disk_bytenr, len, 0, flags,
-					ordered->compress_type);
+	return btrfs_alloc_ordered_extent(BTRFS_I(inode), file_offset, len, len,
+					  disk_bytenr, len, 0, flags,
+					  ordered->compress_type);
 }
 
-int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre,
-				u64 post)
+int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered,
+			       u64 pre, u64 post,
+			       struct btrfs_ordered_extent **ret_pre,
+			       struct btrfs_ordered_extent **ret_post)
+
 {
 	struct inode *inode = ordered->inode;
 	struct btrfs_ordered_inode_tree *tree = &BTRFS_I(inode)->ordered_tree;
 	struct rb_node *node;
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	struct btrfs_ordered_extent *oe;
 	int ret = 0;
 
 	trace_btrfs_ordered_extent_split(BTRFS_I(inode), ordered);
@@ -1172,12 +1176,18 @@ int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre,
 
 	spin_unlock_irq(&tree->lock);
 
-	if (pre)
-		ret = clone_ordered_extent(ordered, 0, pre);
-	if (ret == 0 && post)
-		ret = clone_ordered_extent(ordered, pre + ordered->disk_num_bytes,
-					   post);
-
+	if (pre) {
+		oe = clone_ordered_extent(ordered, 0, pre);
+		ret = IS_ERR(oe) ? PTR_ERR(oe) : 0;
+		if (!ret && ret_pre)
+			*ret_pre = oe;
+	}
+	if (!ret && post) {
+		oe = clone_ordered_extent(ordered, pre + ordered->disk_num_bytes, post);
+		ret = IS_ERR(oe) ? PTR_ERR(oe) : 0;
+		if (!ret && ret_post)
+			*ret_post = oe;
+	}
 	return ret;
 }
 
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index 18007f9c00ad..933f6f0d8c10 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -212,8 +212,10 @@ void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start,
 					struct extent_state **cached_state);
 bool btrfs_try_lock_ordered_range(struct btrfs_inode *inode, u64 start, u64 end,
 				  struct extent_state **cached_state);
-int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre,
-			       u64 post);
+int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered,
+			       u64 pre, u64 post,
+			       struct btrfs_ordered_extent **ret_pre,
+			       struct btrfs_ordered_extent **ret_post);
 int __init ordered_data_init(void);
 void __cold ordered_data_exit(void);
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 4/5] btrfs: fix crash with non-zero pre in btrfs_split_ordered_extent
  2023-03-22 19:11 [PATCH v5 0/5] btrfs: fix corruption caused by partial dio writes Boris Burkov
                   ` (2 preceding siblings ...)
  2023-03-22 19:11 ` [PATCH v5 3/5] btrfs: return ordered_extent splits from bio extraction Boris Burkov
@ 2023-03-22 19:11 ` Boris Burkov
  2023-03-23  8:36   ` Naohiro Aota
  2023-03-22 19:11 ` [PATCH v5 5/5] btrfs: split partial dio bios before submit Boris Burkov
  4 siblings, 1 reply; 15+ messages in thread
From: Boris Burkov @ 2023-03-22 19:11 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

if pre != 0 in btrfs_split_ordered_extent, then we do the following:
1. remove ordered (at file_offset) from the rb tree
2. modify file_offset+=pre
3. re-insert ordered
4. clone an ordered extent at offset 0 length pre from ordered.
5. clone an ordered extent for the post range, if necessary.

step 4 is not correct, as at this point, the start of ordered is already
the end of the desired new pre extent. Further this causes a panic when
btrfs_alloc_ordered_extent sees that the node (from the modified and
re-inserted ordered) is already present at file_offset + 0 = file_offset.

We can fix this by either using a negative offset, or by moving the
clone of the pre extent to after we remove the original one, but before
we modify and re-insert it. The former feels quite kludgy, as we are
"cloning" from outside the range of the ordered extent, so opt for the
latter, which does have some locking annoyances.

Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/ordered-data.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 4bebebb9b434..d14a3fe1a113 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -1161,6 +1161,17 @@ int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered,
 	if (tree->last == node)
 		tree->last = NULL;

+	if (pre) {
+		spin_unlock_irq(&tree->lock);
+		oe = clone_ordered_extent(ordered, 0, pre);
+		ret = IS_ERR(oe) ? PTR_ERR(oe) : 0;
+		if (!ret && ret_pre)
+			*ret_pre = oe;
+		if (ret)
+			goto out;
+		spin_lock_irq(&tree->lock);
+	}
+
 	ordered->file_offset += pre;
 	ordered->disk_bytenr += pre;
 	ordered->num_bytes -= (pre + post);
@@ -1176,18 +1187,13 @@ int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered,

 	spin_unlock_irq(&tree->lock);

-	if (pre) {
-		oe = clone_ordered_extent(ordered, 0, pre);
-		ret = IS_ERR(oe) ? PTR_ERR(oe) : 0;
-		if (!ret && ret_pre)
-			*ret_pre = oe;
-	}
-	if (!ret && post) {
+	if (post) {
 		oe = clone_ordered_extent(ordered, pre + ordered->disk_num_bytes, post);
 		ret = IS_ERR(oe) ? PTR_ERR(oe) : 0;
 		if (!ret && ret_post)
 			*ret_post = oe;
 	}
+out:
 	return ret;
 }

-- 
2.38.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v5 5/5] btrfs: split partial dio bios before submit
  2023-03-22 19:11 [PATCH v5 0/5] btrfs: fix corruption caused by partial dio writes Boris Burkov
                   ` (3 preceding siblings ...)
  2023-03-22 19:11 ` [PATCH v5 4/5] btrfs: fix crash with non-zero pre in btrfs_split_ordered_extent Boris Burkov
@ 2023-03-22 19:11 ` Boris Burkov
  4 siblings, 0 replies; 15+ messages in thread
From: Boris Burkov @ 2023-03-22 19:11 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

If an application is doing direct io to a btrfs file and experiences a
page fault reading from the write buffer, iomap will issue a partial
bio, and allow the fs to keep going. However, there was a subtle bug in
this codepath in the btrfs dio iomap implementation that led to the
partial write ending up as a gap in the file's extents and to be read
back as zeros.

The sequence of events in a partial write, lightly summarized and
trimmed down for brevity is as follows:

====WRITING TASK====
btrfs_direct_write
__iomap_dio_write
iomap_iter
btrfs_dio_iomap_begin # create full ordered extent
iomap_dio_bio_iter
bio_iov_iter_get_pages # page fault; partial read
submit_bio # partial bio
iomap_iter
btrfs_dio_iomap_end
btrfs_mark_ordered_io_finished # sets BTRFS_ORDERED_IOERR;
			       # submit to finish_ordered_fn wq
fault_in_iov_iter_readable # btrfs_direct_write detects partial write
__iomap_dio_write
iomap_iter
btrfs_dio_iomap_begin # create second partial ordered extent
iomap_dio_bio_iter
bio_iov_iter_get_pages # read all of remainder
submit_bio # partial bio with all of remainder
iomap_iter
btrfs_dio_iomap_end # nothing exciting to do with ordered io

====DIO ENDIO====
==FIRST PARTIAL BIO==
btrfs_dio_end_io
btrfs_mark_ordered_io_finished # bytes_left > 0
			     # don't submit to finish_ordered_fn wq
==SECOND PARTIAL BIO==
btrfs_dio_end_io
btrfs_mark_ordered_io_finished # bytes_left == 0
			     # submit to finish_ordered_fn wq

====BTRFS FINISH ORDERED WQ====
==FIRST PARTIAL BIO==
btrfs_finish_ordered_io # called by dio_iomap_end_io, sees
		    # BTRFS_ORDERED_IOERR, just drops the
		    # ordered_extent
==SECOND PARTIAL BIO==
btrfs_finish_ordered_io # called by btrfs_dio_end_io, writes out file
		    # extents, csums, etc...

The essence of the problem is that while btrfs_direct_write and iomap
properly interact to submit all the correct bios, there is insufficient
logic in the btrfs dio functions (btrfs_dio_iomap_begin,
btrfs_dio_submit_io, btrfs_dio_end_io, and btrfs_dio_iomap_end) to
ensure that every bio is at least a part of a completed ordered_extent.
And it is completing an ordered_extent that results in crucial
functionality like writing out a file extent for the range.

More specifically, btrfs_dio_end_io treats the ordered extent as
unfinished but btrfs_dio_iomap_end sets BTRFS_ORDERED_IOERR on it.
Thus, the finish io work doesn't result in file extents, csums, etc...
In the aftermath, such a file behaves as though it has a hole in it,
instead of the purportedly written data.

We considered a few options for fixing the bug (apologies for any
incorrect summary of a proposal which I didn't implement and fully
understand):
1. treat the partial bio as if we had truncated the file, which would
result in properly finishing it.
2. split the ordered extent when submitting a partial bio.
3. cache the ordered extent across calls to __iomap_dio_rw in
iter->private, so that we could reuse it and correctly apply several
bios to it.

I had trouble with 1, and it felt the most like a hack, so I tried 2
and 3. Since 3 has the benefit of also not creating an extra file
extent, and avoids an ordered extent lookup during bio submission, it
felt like the best option. However, that turned out to re-introduce a
deadlock which this code discarding the ordered_extent between faults
was meant to fix in the first place. (Link to an explanation of the
deadlock below)

Therefore, go with fix #2, which requires a bit more setup work but
fixes the corruption without introducing the deadlock, which is
fundamentally caused by the ordered extent existing when we attempt to
fault in a range that overlaps with it.

Put succinctly, what this patch does is: when we submit a dio bio, check
if it is partial against the ordered extent stored in dio_data, and if it
is, extract the ordered_extent that matches the bio exactly out of the
larger ordered_extent. Keep the remaining ordered_extent around in dio_data
for cancellation in iomap_end.

Thanks to Josef, Christoph, and Filipe with their help figuring out the
bug and the fix.

Fixes: 51bd9563b678 ("btrfs: fix deadlock due to page faults during direct IO reads and writes")
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2169947
Link: https://lore.kernel.org/linux-btrfs/aa1fb69e-b613-47aa-a99e-a0a2c9ed273f@app.fastmail.com/
Link: https://pastebin.com/3SDaH8C6
Link: https://lore.kernel.org/linux-btrfs/20230315195231.GW10580@twin.jikos.cz/T/#t
Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/inode.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e30390051f15..08d132071bd3 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7782,6 +7782,7 @@ static void btrfs_dio_submit_io(const struct iomap_iter *iter, struct bio *bio,
 	struct btrfs_dio_private *dip =
 		container_of(bbio, struct btrfs_dio_private, bbio);
 	struct btrfs_dio_data *dio_data = iter->private;
+	int err = 0;

 	btrfs_bio_init(bbio, BTRFS_I(iter->inode), btrfs_dio_end_io, bio->bi_private);
 	bbio->file_offset = file_offset;
@@ -7790,7 +7791,25 @@ static void btrfs_dio_submit_io(const struct iomap_iter *iter, struct bio *bio,
 	dip->bytes = bio->bi_iter.bi_size;

 	dio_data->submitted += bio->bi_iter.bi_size;
-	btrfs_submit_bio(bbio, 0);
+	/*
+	 * Check if we are doing a partial write. If we are, we need to split
+	 * the ordered extent to match the submitted bio. Hang on to the
+	 * remaining unfinishable ordered_extent in dio_data so that it can be
+	 * cancelled in iomap_end to avoid a deadlock wherein faulting the
+	 * remaining pages is blocked on the outstanding ordered extent.
+	 */
+	if (iter->flags & IOMAP_WRITE) {
+		struct btrfs_ordered_extent *ordered = dio_data->ordered;
+
+		ASSERT(ordered);
+		if (bio->bi_iter.bi_size < ordered->num_bytes)
+			err = btrfs_extract_ordered_extent_bio(bbio, ordered, NULL,
+							       &dio_data->ordered);
+	}
+	if (err)
+		btrfs_bio_end_io(bbio, err);
+	else
+		btrfs_submit_bio(bbio, 0);
 }

 static const struct iomap_ops btrfs_dio_iomap_ops = {
-- 
2.38.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 4/5] btrfs: fix crash with non-zero pre in btrfs_split_ordered_extent
  2023-03-22 19:11 ` [PATCH v5 4/5] btrfs: fix crash with non-zero pre in btrfs_split_ordered_extent Boris Burkov
@ 2023-03-23  8:36   ` Naohiro Aota
  2023-03-23 16:22     ` Boris Burkov
  0 siblings, 1 reply; 15+ messages in thread
From: Naohiro Aota @ 2023-03-23  8:36 UTC (permalink / raw)
  To: Boris Burkov; +Cc: linux-btrfs, kernel-team

On Wed, Mar 22, 2023 at 12:11:51PM -0700, Boris Burkov wrote:
> if pre != 0 in btrfs_split_ordered_extent, then we do the following:
> 1. remove ordered (at file_offset) from the rb tree
> 2. modify file_offset+=pre
> 3. re-insert ordered
> 4. clone an ordered extent at offset 0 length pre from ordered.
> 5. clone an ordered extent for the post range, if necessary.
> 
> step 4 is not correct, as at this point, the start of ordered is already
> the end of the desired new pre extent. Further this causes a panic when
> btrfs_alloc_ordered_extent sees that the node (from the modified and
> re-inserted ordered) is already present at file_offset + 0 = file_offset.
> 
> We can fix this by either using a negative offset, or by moving the
> clone of the pre extent to after we remove the original one, but before
> we modify and re-insert it. The former feels quite kludgy, as we are
> "cloning" from outside the range of the ordered extent, so opt for the
> latter, which does have some locking annoyances.
> 
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
>  fs/btrfs/ordered-data.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
> index 4bebebb9b434..d14a3fe1a113 100644
> --- a/fs/btrfs/ordered-data.c
> +++ b/fs/btrfs/ordered-data.c
> @@ -1161,6 +1161,17 @@ int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered,
>  	if (tree->last == node)
>  		tree->last = NULL;
>  
> +	if (pre) {
> +		spin_unlock_irq(&tree->lock);
> +		oe = clone_ordered_extent(ordered, 0, pre);
> +		ret = IS_ERR(oe) ? PTR_ERR(oe) : 0;
> +		if (!ret && ret_pre)
> +			*ret_pre = oe;
> +		if (ret)
> +			goto out;

How about just "return ret;"?

> +		spin_lock_irq(&tree->lock);

I'm concerned about the locking too. Before this spin_lock_irq() is taken,
nothing in the ordered extent range in the tree. So, maybe, someone might
insert or lookup that range in the meantime, and fail? Well, this function
is called under the IO for this range, so it might be OK, though...

So, I considered another approach that factoring out some parts of
btrfs_add_ordered_extent() and use them to rewrite
btrfs_split_ordered_extent().

btrfs_add_ordered_extent() is doing three things:

1. Space accounting
   - btrfs_qgroup_free_data() or btrfs_qgroup_release_data()
   - percpu_counter_add_batch(...)
2. Allocating and initializing btrfs_ordered_extent
3. Adding the btrfs_ordered_extent entry to trees, incrementing OE counter
   - tree_insert(&tree->tree, ...)
   - list_add_tail(&entry->root_extent_list, &root->ordered_extents);
   - btrfs_mod_outstanding_extents(inode, 1);

For btrfs_split_ordered_extent(), we don't need to do #1 above as it was
already done for the original ordered extent. So, if we factor #2 to a
function e.g, init_ordered_extent(), we can rewrite clone_ordered_extent()
to return a cloned OE (doing #2), and also rewrite
btrfs_split_ordered_extent() o do #3 as following:

/* clone_ordered_extent() now returns new ordered extent. */
/* It is not inserted into the trees, yet. */
static struct btrfs_ordered_extent *clone_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pos,
				u64 len)
{
	struct inode *inode = ordered->inode;
	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
	u64 file_offset = ordered->file_offset + pos;
	u64 disk_bytenr = ordered->disk_bytenr + pos;
	unsigned long flags = ordered->flags & BTRFS_ORDERED_TYPE_FLAGS;

	WARN_ON_ONCE(flags & (1 << BTRFS_ORDERED_COMPRESSED));
	return init_ordered_extent(BTRFS_I(inode), file_offset, len, len,
				   disk_bytenr, len, 0, flags,
				   ordered->compress_type);
}

int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre,
				u64 post)
{
...
	/* clone new OEs first */
	if (pre)
		pre_oe = clone_ordered_extent(ordered, 0, pre);
	if (post)
		post_oe = clone_ordered_extent(ordered, ordered->disk_num_bytes - post, ost);
	/* check pre_oe and post_oe */

	spin_lock_irq(&tree->lock);
	/* remove original OE from tree */
	...

	/* modify the original OE params */
	ordered->file_offset += pre;
	...

	/* Re-insert the original OE */
	node = tree_insert(&tree->tree, ordered->file_offset, &ordered->rb_node);
	...

	/* Insert new OEs */
	if (pre_oe)
		node = tree_insert(...);
	...
	spin_unlock_irq(&tree->lock);

	/* And, do the root->ordered_extents and outstanding_extents works */
	...
}

With this approach, no one can see the intermediate state that an OE is
missing for some area in the original OE range.

> +	}
> +
>  	ordered->file_offset += pre;
>  	ordered->disk_bytenr += pre;
>  	ordered->num_bytes -= (pre + post);
> @@ -1176,18 +1187,13 @@ int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered,
>  
>  	spin_unlock_irq(&tree->lock);
>  
> -	if (pre) {
> -		oe = clone_ordered_extent(ordered, 0, pre);
> -		ret = IS_ERR(oe) ? PTR_ERR(oe) : 0;
> -		if (!ret && ret_pre)
> -			*ret_pre = oe;
> -	}
> -	if (!ret && post) {
> +	if (post) {
>  		oe = clone_ordered_extent(ordered, pre + ordered->disk_num_bytes, post);
>  		ret = IS_ERR(oe) ? PTR_ERR(oe) : 0;
>  		if (!ret && ret_post)
>  			*ret_post = oe;
>  	}
> +out:
>  	return ret;
>  }
>  
> -- 
> 2.38.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 3/5] btrfs: return ordered_extent splits from bio extraction
  2023-03-22 19:11 ` [PATCH v5 3/5] btrfs: return ordered_extent splits from bio extraction Boris Burkov
@ 2023-03-23  8:47   ` Christoph Hellwig
  2023-03-23 16:15     ` Boris Burkov
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2023-03-23  8:47 UTC (permalink / raw)
  To: Boris Burkov; +Cc: linux-btrfs, kernel-team

This is a bit of a mess.  And the root cause of that is that
btrfs_extract_ordered_extent the way it is used right now does
the wrong thing in terms of splitting the ordered_extent.  What
we want is to allocate a new one for the beginning of the range,
and leave the rest alone.

I did run into this a while ago during my (nt yet submitted) work
to keep an ordered_extent pointer in the btrfs_bio, and I have some
patches to sort it out.

I've rebased your fix on top of those, can you check if this tree
makes sense to you;

   http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/btrfs-dio-fix-hch

it passes basic testing so far.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 3/5] btrfs: return ordered_extent splits from bio extraction
  2023-03-23  8:47   ` Christoph Hellwig
@ 2023-03-23 16:15     ` Boris Burkov
  2023-03-23 17:00       ` Boris Burkov
  0 siblings, 1 reply; 15+ messages in thread
From: Boris Burkov @ 2023-03-23 16:15 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs, kernel-team

On Thu, Mar 23, 2023 at 01:47:28AM -0700, Christoph Hellwig wrote:
> This is a bit of a mess.  And the root cause of that is that
> btrfs_extract_ordered_extent the way it is used right now does
> the wrong thing in terms of splitting the ordered_extent.  What
> we want is to allocate a new one for the beginning of the range,
> and leave the rest alone.
> 
> I did run into this a while ago during my (nt yet submitted) work
> to keep an ordered_extent pointer in the btrfs_bio, and I have some
> patches to sort it out.
> 
> I've rebased your fix on top of those, can you check if this tree
> makes sense to you;
> 
>    http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/btrfs-dio-fix-hch
> 
> it passes basic testing so far.

Nice, this is great!

I actually also made the same changes as in your branch while working on
my fix, but didn't know enough about the zoned use case to realize that
the simpler "extract from beginning" constraint also applied to the
zoned case. So what happened in my branch was I implemented the three
way split as two "split at starts" which ultimately felt too messy and I
opted for returning the new split objects from the the existing model.

If it's true that we can always do a "split from front" then I'm all
aboard and think this is the way forward. Given that I found what I
think is a serious bug in the case where pre>0, I suspect you are right,
and we aren't hitting that case.

I will check that this passes my testing for the various dio cases (I
have one modified xfstests case I haven't sent yet for the meanest
version of the deadlock I have come up with so far) and the other tests
that I saw races/flakiness on, but from a quick look, your branch looks
correct to me. I believe the most non-obvious property my fix relies on
is dio_data->ordered having the leftovers from the partial after
submission so that it can be cancelled, which your branch looks to
maintain.

Assuming the tests pass, I do want to get this in sooner than later,
since downstream is still waiting on a fix. Would you be willing to send
your stack soon for my fix to land atop? I don't mind if you just send a
patch series with my patches mixed in, either. If, OTOH, your patches
are still a while out, or depend on something else that's underway,
maybe we could land mine, then gut them for your improvements. I'm fine
with it either way.

Thanks,
Boris

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 4/5] btrfs: fix crash with non-zero pre in btrfs_split_ordered_extent
  2023-03-23  8:36   ` Naohiro Aota
@ 2023-03-23 16:22     ` Boris Burkov
  0 siblings, 0 replies; 15+ messages in thread
From: Boris Burkov @ 2023-03-23 16:22 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: linux-btrfs, kernel-team

On Thu, Mar 23, 2023 at 08:36:08AM +0000, Naohiro Aota wrote:
> On Wed, Mar 22, 2023 at 12:11:51PM -0700, Boris Burkov wrote:
> > if pre != 0 in btrfs_split_ordered_extent, then we do the following:
> > 1. remove ordered (at file_offset) from the rb tree
> > 2. modify file_offset+=pre
> > 3. re-insert ordered
> > 4. clone an ordered extent at offset 0 length pre from ordered.
> > 5. clone an ordered extent for the post range, if necessary.
> > 
> > step 4 is not correct, as at this point, the start of ordered is already
> > the end of the desired new pre extent. Further this causes a panic when
> > btrfs_alloc_ordered_extent sees that the node (from the modified and
> > re-inserted ordered) is already present at file_offset + 0 = file_offset.
> > 
> > We can fix this by either using a negative offset, or by moving the
> > clone of the pre extent to after we remove the original one, but before
> > we modify and re-insert it. The former feels quite kludgy, as we are
> > "cloning" from outside the range of the ordered extent, so opt for the
> > latter, which does have some locking annoyances.
> > 
> > Signed-off-by: Boris Burkov <boris@bur.io>
> > ---
> >  fs/btrfs/ordered-data.c | 20 +++++++++++++-------
> >  1 file changed, 13 insertions(+), 7 deletions(-)
> > 
> > diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
> > index 4bebebb9b434..d14a3fe1a113 100644
> > --- a/fs/btrfs/ordered-data.c
> > +++ b/fs/btrfs/ordered-data.c
> > @@ -1161,6 +1161,17 @@ int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered,
> >  	if (tree->last == node)
> >  		tree->last = NULL;
> >  
> > +	if (pre) {
> > +		spin_unlock_irq(&tree->lock);
> > +		oe = clone_ordered_extent(ordered, 0, pre);
> > +		ret = IS_ERR(oe) ? PTR_ERR(oe) : 0;
> > +		if (!ret && ret_pre)
> > +			*ret_pre = oe;
> > +		if (ret)
> > +			goto out;
> 
> How about just "return ret;"?
> 
> > +		spin_lock_irq(&tree->lock);
> 
> I'm concerned about the locking too. Before this spin_lock_irq() is taken,
> nothing in the ordered extent range in the tree. So, maybe, someone might
> insert or lookup that range in the meantime, and fail? Well, this function
> is called under the IO for this range, so it might be OK, though...
> 
> So, I considered another approach that factoring out some parts of
> btrfs_add_ordered_extent() and use them to rewrite
> btrfs_split_ordered_extent().
> 
> btrfs_add_ordered_extent() is doing three things:
> 
> 1. Space accounting
>    - btrfs_qgroup_free_data() or btrfs_qgroup_release_data()
>    - percpu_counter_add_batch(...)
> 2. Allocating and initializing btrfs_ordered_extent
> 3. Adding the btrfs_ordered_extent entry to trees, incrementing OE counter
>    - tree_insert(&tree->tree, ...)
>    - list_add_tail(&entry->root_extent_list, &root->ordered_extents);
>    - btrfs_mod_outstanding_extents(inode, 1);
> 
> For btrfs_split_ordered_extent(), we don't need to do #1 above as it was
> already done for the original ordered extent. So, if we factor #2 to a
> function e.g, init_ordered_extent(), we can rewrite clone_ordered_extent()
> to return a cloned OE (doing #2), and also rewrite
> btrfs_split_ordered_extent() o do #3 as following:
> 
> /* clone_ordered_extent() now returns new ordered extent. */
> /* It is not inserted into the trees, yet. */
> static struct btrfs_ordered_extent *clone_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pos,
> 				u64 len)
> {
> 	struct inode *inode = ordered->inode;
> 	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
> 	u64 file_offset = ordered->file_offset + pos;
> 	u64 disk_bytenr = ordered->disk_bytenr + pos;
> 	unsigned long flags = ordered->flags & BTRFS_ORDERED_TYPE_FLAGS;
> 
> 	WARN_ON_ONCE(flags & (1 << BTRFS_ORDERED_COMPRESSED));
> 	return init_ordered_extent(BTRFS_I(inode), file_offset, len, len,
> 				   disk_bytenr, len, 0, flags,
> 				   ordered->compress_type);
> }
> 
> int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre,
> 				u64 post)
> {
> ...
> 	/* clone new OEs first */
> 	if (pre)
> 		pre_oe = clone_ordered_extent(ordered, 0, pre);
> 	if (post)
> 		post_oe = clone_ordered_extent(ordered, ordered->disk_num_bytes - post, ost);
> 	/* check pre_oe and post_oe */
> 
> 	spin_lock_irq(&tree->lock);
> 	/* remove original OE from tree */
> 	...
> 
> 	/* modify the original OE params */
> 	ordered->file_offset += pre;
> 	...
> 
> 	/* Re-insert the original OE */
> 	node = tree_insert(&tree->tree, ordered->file_offset, &ordered->rb_node);
> 	...
> 
> 	/* Insert new OEs */
> 	if (pre_oe)
> 		node = tree_insert(...);
> 	...
> 	spin_unlock_irq(&tree->lock);
> 
> 	/* And, do the root->ordered_extents and outstanding_extents works */
> 	...
> }
> 
> With this approach, no one can see the intermediate state that an OE is
> missing for some area in the original OE range.

I like this solution, I think it is nice to split it up so that three
steps are separate. i.e., initialize the two new OEs with the old state,
then modify the middle OE with the new state and re-insert the new OEs
together. And everything after the initialization can be under the lock.

However, based on Christoph's response, I would lean towards getting rid
of the three way split altogether. I would love to hear your thoughts in
that thread as well, before committing to that, though.

If we do keep the three way split, then I will definitely implement your
idea here, I think it's nicer than the weird lock dropping/re-taking
stuff I was doing.

Thanks,
Boris

> 
> > +	}
> > +
> >  	ordered->file_offset += pre;
> >  	ordered->disk_bytenr += pre;
> >  	ordered->num_bytes -= (pre + post);
> > @@ -1176,18 +1187,13 @@ int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered,
> >  
> >  	spin_unlock_irq(&tree->lock);
> >  
> > -	if (pre) {
> > -		oe = clone_ordered_extent(ordered, 0, pre);
> > -		ret = IS_ERR(oe) ? PTR_ERR(oe) : 0;
> > -		if (!ret && ret_pre)
> > -			*ret_pre = oe;
> > -	}
> > -	if (!ret && post) {
> > +	if (post) {
> >  		oe = clone_ordered_extent(ordered, pre + ordered->disk_num_bytes, post);
> >  		ret = IS_ERR(oe) ? PTR_ERR(oe) : 0;
> >  		if (!ret && ret_post)
> >  			*ret_post = oe;
> >  	}
> > +out:
> >  	return ret;
> >  }
> >  
> > -- 
> > 2.38.1
> > 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 3/5] btrfs: return ordered_extent splits from bio extraction
  2023-03-23 16:15     ` Boris Burkov
@ 2023-03-23 17:00       ` Boris Burkov
  2023-03-23 17:45         ` Boris Burkov
  2023-03-23 21:29         ` Christoph Hellwig
  0 siblings, 2 replies; 15+ messages in thread
From: Boris Burkov @ 2023-03-23 17:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs, kernel-team

On Thu, Mar 23, 2023 at 09:15:29AM -0700, Boris Burkov wrote:
> On Thu, Mar 23, 2023 at 01:47:28AM -0700, Christoph Hellwig wrote:
> > This is a bit of a mess.  And the root cause of that is that
> > btrfs_extract_ordered_extent the way it is used right now does
> > the wrong thing in terms of splitting the ordered_extent.  What
> > we want is to allocate a new one for the beginning of the range,
> > and leave the rest alone.
> > 
> > I did run into this a while ago during my (nt yet submitted) work
> > to keep an ordered_extent pointer in the btrfs_bio, and I have some
> > patches to sort it out.
> > 
> > I've rebased your fix on top of those, can you check if this tree
> > makes sense to you;
> > 
> >    http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/btrfs-dio-fix-hch
> > 
> > it passes basic testing so far.
> 
> Nice, this is great!
> 
> I actually also made the same changes as in your branch while working on
> my fix, but didn't know enough about the zoned use case to realize that
> the simpler "extract from beginning" constraint also applied to the
> zoned case. So what happened in my branch was I implemented the three
> way split as two "split at starts" which ultimately felt too messy and I
> opted for returning the new split objects from the the existing model.
> 
> If it's true that we can always do a "split from front" then I'm all
> aboard and think this is the way forward. Given that I found what I
> think is a serious bug in the case where pre>0, I suspect you are right,
> and we aren't hitting that case.
> 
> I will check that this passes my testing for the various dio cases (I
> have one modified xfstests case I haven't sent yet for the meanest
> version of the deadlock I have come up with so far) and the other tests
> that I saw races/flakiness on, but from a quick look, your branch looks
> correct to me. I believe the most non-obvious property my fix relies on
> is dio_data->ordered having the leftovers from the partial after
> submission so that it can be cancelled, which your branch looks to
> maintain.

Your branch as-is does not pass the existing tests, It's missing a fix
from my V5. We need to avoid splitting partial OEs when doing NOCOW dio
writes, because iomap_begin() does not create a fresh pinned em in that
case, since it reuses the existing extent.

e.g.,

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8cb61f4daec0..bbc89a0872e7 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7719,7 +7719,7 @@ static void btrfs_dio_submit_io(const struct iomap_iter *iter, struct bio *bio,
         * cancelled in iomap_end to avoid a deadlock wherein faulting the
         * remaining pages is blocked on the outstanding ordered extent.
         */
-       if (iter->flags & IOMAP_WRITE) {
+       if (iter->flags & IOMAP_WRITE && !test_bit(BTRFS_ORDERED_NOCOW, &dio_data->ordered->flags)) {
                int err;

                err = btrfs_extract_ordered_extent(bbio, dio_data->ordered);

With that patch, I pass 10x of btrfs/250, so running the full suite next.

> 
> Assuming the tests pass, I do want to get this in sooner than later,
> since downstream is still waiting on a fix. Would you be willing to send
> your stack soon for my fix to land atop? I don't mind if you just send a
> patch series with my patches mixed in, either. If, OTOH, your patches
> are still a while out, or depend on something else that's underway,
> maybe we could land mine, then gut them for your improvements. I'm fine
> with it either way.
> 
> Thanks,
> Boris

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 3/5] btrfs: return ordered_extent splits from bio extraction
  2023-03-23 17:00       ` Boris Burkov
@ 2023-03-23 17:45         ` Boris Burkov
  2023-03-23 21:29         ` Christoph Hellwig
  1 sibling, 0 replies; 15+ messages in thread
From: Boris Burkov @ 2023-03-23 17:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs, kernel-team

On Thu, Mar 23, 2023 at 10:00:06AM -0700, Boris Burkov wrote:
> On Thu, Mar 23, 2023 at 09:15:29AM -0700, Boris Burkov wrote:
> > On Thu, Mar 23, 2023 at 01:47:28AM -0700, Christoph Hellwig wrote:
> > > This is a bit of a mess.  And the root cause of that is that
> > > btrfs_extract_ordered_extent the way it is used right now does
> > > the wrong thing in terms of splitting the ordered_extent.  What
> > > we want is to allocate a new one for the beginning of the range,
> > > and leave the rest alone.
> > > 
> > > I did run into this a while ago during my (nt yet submitted) work
> > > to keep an ordered_extent pointer in the btrfs_bio, and I have some
> > > patches to sort it out.
> > > 
> > > I've rebased your fix on top of those, can you check if this tree
> > > makes sense to you;
> > > 
> > >    http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/btrfs-dio-fix-hch
> > > 
> > > it passes basic testing so far.
> > 
> > Nice, this is great!
> > 
> > I actually also made the same changes as in your branch while working on
> > my fix, but didn't know enough about the zoned use case to realize that
> > the simpler "extract from beginning" constraint also applied to the
> > zoned case. So what happened in my branch was I implemented the three
> > way split as two "split at starts" which ultimately felt too messy and I
> > opted for returning the new split objects from the the existing model.
> > 
> > If it's true that we can always do a "split from front" then I'm all
> > aboard and think this is the way forward. Given that I found what I
> > think is a serious bug in the case where pre>0, I suspect you are right,
> > and we aren't hitting that case.
> > 
> > I will check that this passes my testing for the various dio cases (I
> > have one modified xfstests case I haven't sent yet for the meanest
> > version of the deadlock I have come up with so far) and the other tests
> > that I saw races/flakiness on, but from a quick look, your branch looks
> > correct to me. I believe the most non-obvious property my fix relies on
> > is dio_data->ordered having the leftovers from the partial after
> > submission so that it can be cancelled, which your branch looks to
> > maintain.
> 
> Your branch as-is does not pass the existing tests, It's missing a fix
> from my V5. We need to avoid splitting partial OEs when doing NOCOW dio
> writes, because iomap_begin() does not create a fresh pinned em in that
> case, since it reuses the existing extent.
> 
> e.g.,
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 8cb61f4daec0..bbc89a0872e7 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7719,7 +7719,7 @@ static void btrfs_dio_submit_io(const struct iomap_iter *iter, struct bio *bio,
>          * cancelled in iomap_end to avoid a deadlock wherein faulting the
>          * remaining pages is blocked on the outstanding ordered extent.
>          */
> -       if (iter->flags & IOMAP_WRITE) {
> +       if (iter->flags & IOMAP_WRITE && !test_bit(BTRFS_ORDERED_NOCOW, &dio_data->ordered->flags)) {
>                 int err;
> 
>                 err = btrfs_extract_ordered_extent(bbio, dio_data->ordered);
> 
> With that patch, I pass 10x of btrfs/250, so running the full suite next.

fstests in general passed on my system, so I am happy with this branch +
my above tweak if Naohiro/Johannes are on board with the simplified
ordered_extent/extent_map splitting model that assumes the bio is at the
start offset.

> 
> > 
> > Assuming the tests pass, I do want to get this in sooner than later,
> > since downstream is still waiting on a fix. Would you be willing to send
> > your stack soon for my fix to land atop? I don't mind if you just send a
> > patch series with my patches mixed in, either. If, OTOH, your patches
> > are still a while out, or depend on something else that's underway,
> > maybe we could land mine, then gut them for your improvements. I'm fine
> > with it either way.
> > 
> > Thanks,
> > Boris

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 3/5] btrfs: return ordered_extent splits from bio extraction
  2023-03-23 17:00       ` Boris Burkov
  2023-03-23 17:45         ` Boris Burkov
@ 2023-03-23 21:29         ` Christoph Hellwig
  2023-03-23 22:43           ` Boris Burkov
  1 sibling, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2023-03-23 21:29 UTC (permalink / raw)
  To: Boris Burkov; +Cc: Christoph Hellwig, linux-btrfs, kernel-team

On Thu, Mar 23, 2023 at 10:00:06AM -0700, Boris Burkov wrote:
> Your branch as-is does not pass the existing tests, It's missing a fix
> from my V5. We need to avoid splitting partial OEs when doing NOCOW dio
> writes, because iomap_begin() does not create a fresh pinned em in that
> case, since it reuses the existing extent.

Oops, yes, that got lost.  I can add this as another patch attributed
to you.

That beeing said, I'm a bit confused about:

 1) if we need this split call at all for the non-zoned case as we don't
    need to record a different physical disk address
 2) how we clean up this on-disk logical to physical mapping at all on
    a write failure

Maybe we should let those dragons sleep for now and just do the minimal
fix, though.

I just woke up on an airplane, so depending on my jetlag I might have
a new series ready with the minimal fix for varying definitions of
"in a few hours".

> 
> e.g.,
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 8cb61f4daec0..bbc89a0872e7 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7719,7 +7719,7 @@ static void btrfs_dio_submit_io(const struct iomap_iter *iter, struct bio *bio,
>          * cancelled in iomap_end to avoid a deadlock wherein faulting the
>          * remaining pages is blocked on the outstanding ordered extent.
>          */
> -       if (iter->flags & IOMAP_WRITE) {
> +       if (iter->flags & IOMAP_WRITE && !test_bit(BTRFS_ORDERED_NOCOW, &dio_data->ordered->flags)) {
>                 int err;
> 
>                 err = btrfs_extract_ordered_extent(bbio, dio_data->ordered);

I think the BTRFS_ORDERED_NOCOW should be just around the
split_extent_map call.  That matches your series, and without
that we wouldn't split the ordered_extent for nowcow writes and thus
only fix the original problem for non-nocow writes.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 3/5] btrfs: return ordered_extent splits from bio extraction
  2023-03-23 21:29         ` Christoph Hellwig
@ 2023-03-23 22:43           ` Boris Burkov
  2023-03-24  0:24             ` Christoph Hellwig
  0 siblings, 1 reply; 15+ messages in thread
From: Boris Burkov @ 2023-03-23 22:43 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs, kernel-team

On Thu, Mar 23, 2023 at 02:29:39PM -0700, Christoph Hellwig wrote:
> On Thu, Mar 23, 2023 at 10:00:06AM -0700, Boris Burkov wrote:
> > Your branch as-is does not pass the existing tests, It's missing a fix
> > from my V5. We need to avoid splitting partial OEs when doing NOCOW dio
> > writes, because iomap_begin() does not create a fresh pinned em in that
> > case, since it reuses the existing extent.
> 
> Oops, yes, that got lost.  I can add this as another patch attributed
> to you.
> 
> That beeing said, I'm a bit confused about:
> 
>  1) if we need this split call at all for the non-zoned case as we don't
>     need to record a different physical disk address

I think I understand this, but maybe I'm missing exactly what you're
asking.

In finish_ordered_io, we call unpin_extent_cache, which blows up on
em->start != oe->file_offset. I believe the rationale is we are creating
a new em which is PINNED when we allocate the extent in
btrfs_new_extent_direct (via the call to btrfs_reserve_extent), so we
need to unpin it and allow it to be merged, etc... For nocow, we don't
allocate that new extent, so we don't need to split/unpin the existing
extent_map which we are just reusing.

>  2) how we clean up this on-disk logical to physical mapping at all on
>     a write failure

This I haven't thought much about, so I will leave it in the "dragons
sleep for now" category.

> 
> Maybe we should let those dragons sleep for now and just do the minimal
> fix, though.
> 
> I just woke up on an airplane, so depending on my jetlag I might have
> a new series ready with the minimal fix for varying definitions of
> "in a few hours".

Great, that works for me. I just didn't want to wait weeks if you were
blocked on other stuff.

> 
> > 
> > e.g.,
> > 
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index 8cb61f4daec0..bbc89a0872e7 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -7719,7 +7719,7 @@ static void btrfs_dio_submit_io(const struct iomap_iter *iter, struct bio *bio,
> >          * cancelled in iomap_end to avoid a deadlock wherein faulting the
> >          * remaining pages is blocked on the outstanding ordered extent.
> >          */
> > -       if (iter->flags & IOMAP_WRITE) {
> > +       if (iter->flags & IOMAP_WRITE && !test_bit(BTRFS_ORDERED_NOCOW, &dio_data->ordered->flags)) {
> >                 int err;
> > 
> >                 err = btrfs_extract_ordered_extent(bbio, dio_data->ordered);
> 
> I think the BTRFS_ORDERED_NOCOW should be just around the
> split_extent_map call.  That matches your series, and without
> that we wouldn't split the ordered_extent for nowcow writes and thus
> only fix the original problem for non-nocow writes.

Oops, my bad. Good catch.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v5 3/5] btrfs: return ordered_extent splits from bio extraction
  2023-03-23 22:43           ` Boris Burkov
@ 2023-03-24  0:24             ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2023-03-24  0:24 UTC (permalink / raw)
  To: Boris Burkov; +Cc: Christoph Hellwig, linux-btrfs, kernel-team

On Thu, Mar 23, 2023 at 03:43:36PM -0700, Boris Burkov wrote:
> In finish_ordered_io, we call unpin_extent_cache, which blows up on
> em->start != oe->file_offset. I believe the rationale is we are creating
> a new em which is PINNED when we allocate the extent in
> btrfs_new_extent_direct (via the call to btrfs_reserve_extent), so we
> need to unpin it and allow it to be merged, etc... For nocow, we don't
> allocate that new extent, so we don't need to split/unpin the existing
> extent_map which we are just reusing.

Yeah, I actually just ran into that when testing my idea :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-03-24  0:24 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-22 19:11 [PATCH v5 0/5] btrfs: fix corruption caused by partial dio writes Boris Burkov
2023-03-22 19:11 ` [PATCH v5 1/5] btrfs: add function to create and return an ordered extent Boris Burkov
2023-03-22 19:11 ` [PATCH v5 2/5] btrfs: stash ordered extent in dio_data during iomap dio Boris Burkov
2023-03-22 19:11 ` [PATCH v5 3/5] btrfs: return ordered_extent splits from bio extraction Boris Burkov
2023-03-23  8:47   ` Christoph Hellwig
2023-03-23 16:15     ` Boris Burkov
2023-03-23 17:00       ` Boris Burkov
2023-03-23 17:45         ` Boris Burkov
2023-03-23 21:29         ` Christoph Hellwig
2023-03-23 22:43           ` Boris Burkov
2023-03-24  0:24             ` Christoph Hellwig
2023-03-22 19:11 ` [PATCH v5 4/5] btrfs: fix crash with non-zero pre in btrfs_split_ordered_extent Boris Burkov
2023-03-23  8:36   ` Naohiro Aota
2023-03-23 16:22     ` Boris Burkov
2023-03-22 19:11 ` [PATCH v5 5/5] btrfs: split partial dio bios before submit Boris Burkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).