linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] btrfs: fix enospc and transaction aborts during fallocate
@ 2020-09-08 10:27 fdmanana
  2020-09-08 10:27 ` [PATCH 1/5] btrfs: fix metadata reservation for fallocate that leads to transaction aborts fdmanana
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: fdmanana @ 2020-09-08 10:27 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Filipe Manana

From: Filipe Manana <fdmanana@suse.com>

When attempting to fallocate on a large file range with many file extent
items, the operation can fail with ENOSPC when it shouldn't and, more
critical, abort the transaction and turn the filesystem to RO mode.

First patch fixes the issue, the remaining just do some cleanups after it.

Filipe Manana (5):
  btrfs: fix metadata reservation for fallocate that leads to
    transaction aborts
  btrfs: remove item_size member of struct btrfs_clone_extent_info
  btrfs: rename struct btrfs_clone_extent_info to a more generic name
  btrfs: rename btrfs_punch_hole_range() to a more generic name
  btrfs: rename btrfs_insert_clone_extent() to a more generic name

 fs/btrfs/ctree.h   |  28 +++++++++--
 fs/btrfs/file.c    | 119 ++++++++++++++++++++++++++-------------------
 fs/btrfs/inode.c   |  67 ++++++++++++++++---------
 fs/btrfs/reflink.c |   8 +--
 4 files changed, 142 insertions(+), 80 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/5] btrfs: fix metadata reservation for fallocate that leads to transaction aborts
  2020-09-08 10:27 [PATCH 0/5] btrfs: fix enospc and transaction aborts during fallocate fdmanana
@ 2020-09-08 10:27 ` fdmanana
  2020-09-10 14:48   ` Josef Bacik
  2020-09-08 10:27 ` [PATCH 2/5] btrfs: remove item_size member of struct btrfs_clone_extent_info fdmanana
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: fdmanana @ 2020-09-08 10:27 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Filipe Manana

From: Filipe Manana <fdmanana@suse.com>

When doing an fallocate(), specially a zero range operation, we assume
that reserving 3 units of metadata space is enough, that at most we touch
one leaf in subvolume/fs tree for removing existing file extent items and
inserting a new file extent item. This assumption is generally true for
most common use cases. However when we end up needing to remove file extent
items from multiple leaves, we can end up failing with -ENOSPC and abort
the current transaction, turning the filesystem to RO mode. When this
happens a stack trace like the following is dumped in dmesg/syslog:

[ 1500.620934] ------------[ cut here ]------------
[ 1500.620938] BTRFS: Transaction aborted (error -28)
[ 1500.620973] WARNING: CPU: 2 PID: 30807 at fs/btrfs/inode.c:9724 __btrfs_prealloc_file_range+0x512/0x570 [btrfs]
[ 1500.620974] Modules linked in: btrfs intel_rapl_msr intel_rapl_common kvm_intel (...)
[ 1500.621010] CPU: 2 PID: 30807 Comm: xfs_io Tainted: G        W         5.9.0-rc3-btrfs-next-67 #1
[ 1500.621012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 1500.621023] RIP: 0010:__btrfs_prealloc_file_range+0x512/0x570 [btrfs]
[ 1500.621026] Code: 8b 40 50 f0 48 (...)
[ 1500.621028] RSP: 0018:ffffb05fc8803ca0 EFLAGS: 00010286
[ 1500.621030] RAX: 0000000000000000 RBX: ffff9608af276488 RCX: 0000000000000000
[ 1500.621032] RDX: 0000000000000001 RSI: 0000000000000027 RDI: 00000000ffffffff
[ 1500.621033] RBP: ffffb05fc8803d90 R08: 0000000000000001 R09: 0000000000000001
[ 1500.621035] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000003200000
[ 1500.621037] R13: 00000000ffffffe4 R14: ffff9608af275fe8 R15: ffff9608af275f60
[ 1500.621039] FS:  00007fb5b2368ec0(0000) GS:ffff9608b6600000(0000) knlGS:0000000000000000
[ 1500.621041] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1500.621043] CR2: 00007fb5b2366fb8 CR3: 0000000202d38005 CR4: 00000000003706e0
[ 1500.621046] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1500.621047] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1500.621049] Call Trace:
[ 1500.621076]  btrfs_prealloc_file_range+0x10/0x20 [btrfs]
[ 1500.621087]  btrfs_fallocate+0xccd/0x1280 [btrfs]
[ 1500.621108]  vfs_fallocate+0x14d/0x290
[ 1500.621112]  ksys_fallocate+0x3a/0x70
[ 1500.621117]  __x64_sys_fallocate+0x1a/0x20
[ 1500.621120]  do_syscall_64+0x33/0x80
[ 1500.621123]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1500.621126] RIP: 0033:0x7fb5b248c477
[ 1500.621128] Code: 89 7c 24 08 (...)
[ 1500.621130] RSP: 002b:00007ffc7bee9060 EFLAGS: 00000293 ORIG_RAX: 000000000000011d
[ 1500.621132] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fb5b248c477
[ 1500.621134] RDX: 0000000000000000 RSI: 0000000000000010 RDI: 0000000000000003
[ 1500.621136] RBP: 0000557718faafd0 R08: 0000000000000000 R09: 0000000000000000
[ 1500.621137] R10: 0000000003200000 R11: 0000000000000293 R12: 0000000000000010
[ 1500.621139] R13: 0000557718faafb0 R14: 0000557718faa480 R15: 0000000000000003
[ 1500.621151] irq event stamp: 1026217
[ 1500.621154] hardirqs last  enabled at (1026223): [<ffffffffba965570>] console_unlock+0x500/0x5c0
[ 1500.621156] hardirqs last disabled at (1026228): [<ffffffffba9654c7>] console_unlock+0x457/0x5c0
[ 1500.621159] softirqs last  enabled at (1022486): [<ffffffffbb6003dc>] __do_softirq+0x3dc/0x606
[ 1500.621161] softirqs last disabled at (1022477): [<ffffffffbb4010b2>] asm_call_on_stack+0x12/0x20
[ 1500.621162] ---[ end trace 2955b08408d8b9d4 ]---
[ 1500.621167] BTRFS: error (device sdj) in __btrfs_prealloc_file_range:9724: errno=-28 No space left

When we use fallocate() internally, for reserving an extent for a space
cache, inode cache or relocation, we can't hit this problem since either
there aren't any file extent items to remove from the subvolume tree or
there is at most one.

When using plain fallocate() it's very unlikely, since that would require
having many file extent items representing holes for the target range and
crossing multiple leafs - we attempt to increase the range (merge) of such
file extent items when punching holes, so at most we end up with 2 file
extent items for holes at leaf boundaries.

However when using the zero range operation of fallocate() for a large
range (100+ MiB for example) that's fairly easy to trigger. The following
example reproducer triggers the issue:

  $ cat reproducer.sh
  #!/bin/bash

  umount /dev/sdj &> /dev/null
  mkfs.btrfs -f -n 16384 -O ^no-holes /dev/sdj > /dev/null
  mount /dev/sdj /mnt/sdj

  # Create a 100M file with many file extent items. Punch a hole every 8K
  # just to speedup the file creation - we could do 4K sequential writes
  # followed by fsync (or O_SYNC) as well, but that takes a lot of time.
  file_size=$((100 * 1024 * 1024))
  xfs_io -f -c "pwrite -S 0xab -b 10M 0 $file_size" /mnt/sdj/foobar
  for ((i = 0; i < $file_size; i += 8192)); do
      xfs_io -c "fpunch $i 4096" /mnt/sdj/foobar
  done

  # Force a transaction commit, so the zero range operation will be forced
  # to COW all metadata extents it need to touch.
  sync

  xfs_io -c "fzero 0 $file_size" /mnt/sdj/foobar

  umount /mnt/sdj

  $ ./reproducer.sh
  wrote 104857600/104857600 bytes at offset 0
  100 MiB, 10 ops; 0.0669 sec (1.458 GiB/sec and 149.3117 ops/sec)
  fallocate: No space left on device

  $ dmesg
  <shows the same stack trace pasted before>

To fix this use the existing infrastructure that hole punching and
extent cloning use for replacing a file range with another extent. This
deals with doing the removal of file extent items and inserting the new
one using an incremental approach, reserving more space when needed and
always ensuring we don't leave an implicit hole in the range in case
we need to do multiple iterations and a crash happens between iterations.

A test case for fstests will follow up soon.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.h   | 16 +++++++++++
 fs/btrfs/file.c    | 38 +++++++++++++++++++-------
 fs/btrfs/inode.c   | 68 +++++++++++++++++++++++++++++++---------------
 fs/btrfs/reflink.c |  1 +
 4 files changed, 91 insertions(+), 32 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index af2bd059daae..67ca25daf9c8 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1191,6 +1191,22 @@ struct btrfs_clone_extent_info {
 	u64 file_offset;
 	char *extent_buf;
 	u32 item_size;
+	/*
+	 * Set to true when attempting to replace a file range with a new extent
+	 * described by this structure, set to false when attempting to clone an
+	 * existing extent into a file range.
+	 */
+	bool is_new_extent;
+	/* Meaningful only if is_new_extent is true. */
+	int qgroup_reserved;
+	/*
+	 * Meaningful only if is_new_extent is true.
+	 * Used to track how many extent items we have already inserted in a
+	 * subvolume tree that refer to the extent described by this structure,
+	 * so that we know when to create a new delayed ref or update an existing
+	 * one.
+	 */
+	int insertions;
 };
 
 struct btrfs_file_private {
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index af4eab9cbc51..73827c9c7a70 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2583,7 +2583,6 @@ static int btrfs_insert_clone_extent(struct btrfs_trans_handle *trans,
 	struct btrfs_key key;
 	int slot;
 	struct btrfs_ref ref = { 0 };
-	u64 ref_offset;
 	int ret;
 
 	if (clone_len == 0)
@@ -2608,6 +2607,8 @@ static int btrfs_insert_clone_extent(struct btrfs_trans_handle *trans,
 	extent = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
 	btrfs_set_file_extent_offset(leaf, extent, clone_info->data_offset);
 	btrfs_set_file_extent_num_bytes(leaf, extent, clone_len);
+	if (clone_info->is_new_extent)
+		btrfs_set_file_extent_generation(leaf, extent, trans->transid);
 	btrfs_mark_buffer_dirty(leaf);
 	btrfs_release_path(path);
 
@@ -2621,13 +2622,29 @@ static int btrfs_insert_clone_extent(struct btrfs_trans_handle *trans,
 		return 0;
 
 	inode_add_bytes(inode, clone_len);
-	btrfs_init_generic_ref(&ref, BTRFS_ADD_DELAYED_REF,
-			       clone_info->disk_offset,
-			       clone_info->disk_len, 0);
-	ref_offset = clone_info->file_offset - clone_info->data_offset;
-	btrfs_init_data_ref(&ref, root->root_key.objectid,
-			    btrfs_ino(BTRFS_I(inode)), ref_offset);
-	ret = btrfs_inc_extent_ref(trans, &ref);
+
+	if (clone_info->is_new_extent && clone_info->insertions == 0) {
+		key.objectid = clone_info->disk_offset;
+		key.type = BTRFS_EXTENT_ITEM_KEY;
+		key.offset = clone_info->disk_len;
+		ret = btrfs_alloc_reserved_file_extent(trans, root,
+						       btrfs_ino(BTRFS_I(inode)),
+						       clone_info->file_offset,
+						       clone_info->qgroup_reserved,
+						       &key);
+	} else {
+		u64 ref_offset;
+
+		btrfs_init_generic_ref(&ref, BTRFS_ADD_DELAYED_REF,
+				       clone_info->disk_offset,
+				       clone_info->disk_len, 0);
+		ref_offset = clone_info->file_offset - clone_info->data_offset;
+		btrfs_init_data_ref(&ref, root->root_key.objectid,
+				    btrfs_ino(BTRFS_I(inode)), ref_offset);
+		ret = btrfs_inc_extent_ref(trans, &ref);
+	}
+
+	clone_info->insertions++;
 
 	return ret;
 }
@@ -2705,7 +2722,8 @@ int btrfs_punch_hole_range(struct inode *inode, struct btrfs_path *path,
 			 * returned by __btrfs_drop_extents() without having
 			 * changed anything in the file.
 			 */
-			if (clone_info && ret && ret != -EOPNOTSUPP)
+			if (clone_info && !clone_info->is_new_extent &&
+			    ret && ret != -EOPNOTSUPP)
 				btrfs_abort_transaction(trans, ret);
 			break;
 		}
@@ -2800,7 +2818,7 @@ int btrfs_punch_hole_range(struct inode *inode, struct btrfs_path *path,
 	 * than 16Mb would force the full fsync any way (when
 	 * try_release_extent_mapping() is invoked during page cache truncation.
 	 */
-	if (clone_info)
+	if (clone_info && !clone_info->is_new_extent)
 		set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
 			&BTRFS_I(inode)->runtime_flags);
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 170a929eec81..a9a4c3d5984c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9575,11 +9575,15 @@ static int btrfs_symlink(struct inode *dir, struct dentry *dentry,
 	return err;
 }
 
-static int insert_prealloc_file_extent(struct btrfs_trans_handle *trans,
+static struct btrfs_trans_handle *insert_prealloc_file_extent(
+				       struct btrfs_trans_handle *trans_in,
 				       struct inode *inode, struct btrfs_key *ins,
 				       u64 file_offset)
 {
 	struct btrfs_file_extent_item stack_fi;
+	struct btrfs_clone_extent_info extent_info;
+	struct btrfs_trans_handle *trans = trans_in;
+	struct btrfs_path *path;
 	u64 start = ins->objectid;
 	u64 len = ins->offset;
 	int ret;
@@ -9596,10 +9600,41 @@ static int insert_prealloc_file_extent(struct btrfs_trans_handle *trans,
 
 	ret = btrfs_qgroup_release_data(BTRFS_I(inode), file_offset, len);
 	if (ret < 0)
-		return ret;
-	return insert_reserved_file_extent(trans, BTRFS_I(inode), file_offset,
-					   &stack_fi, ret);
+		return ERR_PTR(ret);
+
+	if (trans) {
+		ret = insert_reserved_file_extent(trans, BTRFS_I(inode),
+						  file_offset, &stack_fi, ret);
+		if (ret)
+			return ERR_PTR(ret);
+		return trans;
+	}
+
+	extent_info.disk_offset = start;
+	extent_info.disk_len = len;
+	extent_info.data_offset = 0;
+	extent_info.data_len = len;
+	extent_info.file_offset = file_offset;
+	extent_info.extent_buf = (char *)&stack_fi;
+	extent_info.item_size = sizeof(stack_fi);
+	extent_info.is_new_extent = true;
+	extent_info.qgroup_reserved = ret;
+	extent_info.insertions = 0;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return ERR_PTR(-ENOMEM);
+
+	ret = btrfs_punch_hole_range(inode, path, file_offset,
+				     file_offset + len - 1, &extent_info,
+				     &trans);
+	btrfs_free_path(path);
+	if (ret)
+		return ERR_PTR(ret);
+
+	return trans;
 }
+
 static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 				       u64 start, u64 num_bytes, u64 min_size,
 				       loff_t actual_len, u64 *alloc_hint,
@@ -9622,14 +9657,6 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 	if (trans)
 		own_trans = false;
 	while (num_bytes > 0) {
-		if (own_trans) {
-			trans = btrfs_start_transaction(root, 3);
-			if (IS_ERR(trans)) {
-				ret = PTR_ERR(trans);
-				break;
-			}
-		}
-
 		cur_bytes = min_t(u64, num_bytes, SZ_256M);
 		cur_bytes = max(cur_bytes, min_size);
 		/*
@@ -9641,11 +9668,8 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 		cur_bytes = min(cur_bytes, last_alloc);
 		ret = btrfs_reserve_extent(root, cur_bytes, cur_bytes,
 				min_size, 0, *alloc_hint, &ins, 1, 0);
-		if (ret) {
-			if (own_trans)
-				btrfs_end_transaction(trans);
+		if (ret)
 			break;
-		}
 
 		/*
 		 * We've reserved this space, and thus converted it from
@@ -9658,13 +9682,11 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 		btrfs_dec_block_group_reservations(fs_info, ins.objectid);
 
 		last_alloc = ins.offset;
-		ret = insert_prealloc_file_extent(trans, inode, &ins, cur_offset);
-		if (ret) {
+		trans = insert_prealloc_file_extent(trans, inode, &ins, cur_offset);
+		if (IS_ERR(trans)) {
+			ret = PTR_ERR(trans);
 			btrfs_free_reserved_extent(fs_info, ins.objectid,
 						   ins.offset, 0);
-			btrfs_abort_transaction(trans, ret);
-			if (own_trans)
-				btrfs_end_transaction(trans);
 			break;
 		}
 
@@ -9727,8 +9749,10 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 			break;
 		}
 
-		if (own_trans)
+		if (own_trans) {
 			btrfs_end_transaction(trans);
+			trans = NULL;
+		}
 	}
 	if (clear_offset < end)
 		btrfs_free_reserved_data_space(BTRFS_I(inode), NULL, clear_offset,
diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c
index 2461be6ccb6f..3c03126b52b1 100644
--- a/fs/btrfs/reflink.c
+++ b/fs/btrfs/reflink.c
@@ -463,6 +463,7 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
 			clone_info.file_offset = new_key.offset;
 			clone_info.extent_buf = buf;
 			clone_info.item_size = size;
+			clone_info.is_new_extent = false;
 			ret = btrfs_punch_hole_range(inode, path, drop_start,
 					new_key.offset + datal - 1, &clone_info,
 					&trans);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/5] btrfs: remove item_size member of struct btrfs_clone_extent_info
  2020-09-08 10:27 [PATCH 0/5] btrfs: fix enospc and transaction aborts during fallocate fdmanana
  2020-09-08 10:27 ` [PATCH 1/5] btrfs: fix metadata reservation for fallocate that leads to transaction aborts fdmanana
@ 2020-09-08 10:27 ` fdmanana
  2020-09-10 14:48   ` Josef Bacik
  2020-09-08 10:27 ` [PATCH 3/5] btrfs: rename struct btrfs_clone_extent_info to a more generic name fdmanana
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: fdmanana @ 2020-09-08 10:27 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Filipe Manana

From: Filipe Manana <fdmanana@suse.com>

The value of item_size of struct btrfs_clone_extent_info is always set to
the size of a non-inline file extent item, and in fact the infratructure
that uses this structure (btrfs_punch_hole_range()) does not work with
inline file extents at all (and it is not supposed to).

So just remove that field from the structure and use directly
sizeof(struct btrfs_file_extent_item) instead. Also assert that the
file extent type is not inline at btrfs_insert_clone_extent().

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.h   | 2 +-
 fs/btrfs/file.c    | 5 +++--
 fs/btrfs/inode.c   | 1 -
 fs/btrfs/reflink.c | 1 -
 4 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 67ca25daf9c8..3e648b0d5d60 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1189,8 +1189,8 @@ struct btrfs_clone_extent_info {
 	u64 data_offset;
 	u64 data_len;
 	u64 file_offset;
+	/* Pointer to a file extent item of type regular or prealloc. */
 	char *extent_buf;
-	u32 item_size;
 	/*
 	 * Set to true when attempting to replace a file range with a new extent
 	 * described by this structure, set to false when attempting to clone an
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 73827c9c7a70..28794a98778a 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2596,15 +2596,16 @@ static int btrfs_insert_clone_extent(struct btrfs_trans_handle *trans,
 	key.type = BTRFS_EXTENT_DATA_KEY;
 	key.offset = clone_info->file_offset;
 	ret = btrfs_insert_empty_item(trans, root, path, &key,
-				      clone_info->item_size);
+				      sizeof(struct btrfs_file_extent_item));
 	if (ret)
 		return ret;
 	leaf = path->nodes[0];
 	slot = path->slots[0];
 	write_extent_buffer(leaf, clone_info->extent_buf,
 			    btrfs_item_ptr_offset(leaf, slot),
-			    clone_info->item_size);
+			    sizeof(struct btrfs_file_extent_item));
 	extent = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
+	ASSERT(btrfs_file_extent_type(leaf, extent) != BTRFS_FILE_EXTENT_INLINE);
 	btrfs_set_file_extent_offset(leaf, extent, clone_info->data_offset);
 	btrfs_set_file_extent_num_bytes(leaf, extent, clone_len);
 	if (clone_info->is_new_extent)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a9a4c3d5984c..0281895268f7 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9616,7 +9616,6 @@ static struct btrfs_trans_handle *insert_prealloc_file_extent(
 	extent_info.data_len = len;
 	extent_info.file_offset = file_offset;
 	extent_info.extent_buf = (char *)&stack_fi;
-	extent_info.item_size = sizeof(stack_fi);
 	extent_info.is_new_extent = true;
 	extent_info.qgroup_reserved = ret;
 	extent_info.insertions = 0;
diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c
index 3c03126b52b1..020da4d65b64 100644
--- a/fs/btrfs/reflink.c
+++ b/fs/btrfs/reflink.c
@@ -462,7 +462,6 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
 			clone_info.data_len = datal;
 			clone_info.file_offset = new_key.offset;
 			clone_info.extent_buf = buf;
-			clone_info.item_size = size;
 			clone_info.is_new_extent = false;
 			ret = btrfs_punch_hole_range(inode, path, drop_start,
 					new_key.offset + datal - 1, &clone_info,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/5] btrfs: rename struct btrfs_clone_extent_info to a more generic name
  2020-09-08 10:27 [PATCH 0/5] btrfs: fix enospc and transaction aborts during fallocate fdmanana
  2020-09-08 10:27 ` [PATCH 1/5] btrfs: fix metadata reservation for fallocate that leads to transaction aborts fdmanana
  2020-09-08 10:27 ` [PATCH 2/5] btrfs: remove item_size member of struct btrfs_clone_extent_info fdmanana
@ 2020-09-08 10:27 ` fdmanana
  2020-09-10 14:48   ` Josef Bacik
  2020-09-08 10:27 ` [PATCH 4/5] btrfs: rename btrfs_punch_hole_range() " fdmanana
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: fdmanana @ 2020-09-08 10:27 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Filipe Manana

From: Filipe Manana <fdmanana@suse.com>

Now that we can use btrfs_clone_extent_info to convey information for a
new prealloc extent as well, and not just for existing extents that are
being cloned, rename it to btrfs_replace_extent_info, which reflects the
fact that this is now more generic and it is used to replace all existing
extents in a file range with the extent described by the structure.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.h   |  8 +++-
 fs/btrfs/file.c    | 92 +++++++++++++++++++++++-----------------------
 fs/btrfs/inode.c   |  2 +-
 fs/btrfs/reflink.c |  2 +-
 4 files changed, 54 insertions(+), 50 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 3e648b0d5d60..0b1dadd44e53 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1183,7 +1183,11 @@ struct btrfs_root {
 #endif
 };
 
-struct btrfs_clone_extent_info {
+/*
+ * Structure that conveys information about an extent that is going to replace
+ * all the extents in a file range.
+ */
+struct btrfs_replace_extent_info {
 	u64 disk_offset;
 	u64 disk_len;
 	u64 data_offset;
@@ -3072,7 +3076,7 @@ int btrfs_drop_extents(struct btrfs_trans_handle *trans,
 		       u64 end, int drop_cache);
 int btrfs_punch_hole_range(struct inode *inode, struct btrfs_path *path,
 			   const u64 start, const u64 end,
-			   struct btrfs_clone_extent_info *clone_info,
+			   struct btrfs_replace_extent_info *extent_info,
 			   struct btrfs_trans_handle **trans_out);
 int btrfs_mark_extent_written(struct btrfs_trans_handle *trans,
 			      struct btrfs_inode *inode, u64 start, u64 end);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 28794a98778a..7ac0a20119f3 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2573,8 +2573,8 @@ static int btrfs_punch_hole_lock_range(struct inode *inode,
 static int btrfs_insert_clone_extent(struct btrfs_trans_handle *trans,
 				     struct inode *inode,
 				     struct btrfs_path *path,
-				     struct btrfs_clone_extent_info *clone_info,
-				     const u64 clone_len)
+				     struct btrfs_replace_extent_info *extent_info,
+				     const u64 replace_len)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_root *root = BTRFS_I(inode)->root;
@@ -2585,67 +2585,67 @@ static int btrfs_insert_clone_extent(struct btrfs_trans_handle *trans,
 	struct btrfs_ref ref = { 0 };
 	int ret;
 
-	if (clone_len == 0)
+	if (replace_len == 0)
 		return 0;
 
-	if (clone_info->disk_offset == 0 &&
+	if (extent_info->disk_offset == 0 &&
 	    btrfs_fs_incompat(fs_info, NO_HOLES))
 		return 0;
 
 	key.objectid = btrfs_ino(BTRFS_I(inode));
 	key.type = BTRFS_EXTENT_DATA_KEY;
-	key.offset = clone_info->file_offset;
+	key.offset = extent_info->file_offset;
 	ret = btrfs_insert_empty_item(trans, root, path, &key,
 				      sizeof(struct btrfs_file_extent_item));
 	if (ret)
 		return ret;
 	leaf = path->nodes[0];
 	slot = path->slots[0];
-	write_extent_buffer(leaf, clone_info->extent_buf,
+	write_extent_buffer(leaf, extent_info->extent_buf,
 			    btrfs_item_ptr_offset(leaf, slot),
 			    sizeof(struct btrfs_file_extent_item));
 	extent = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
 	ASSERT(btrfs_file_extent_type(leaf, extent) != BTRFS_FILE_EXTENT_INLINE);
-	btrfs_set_file_extent_offset(leaf, extent, clone_info->data_offset);
-	btrfs_set_file_extent_num_bytes(leaf, extent, clone_len);
-	if (clone_info->is_new_extent)
+	btrfs_set_file_extent_offset(leaf, extent, extent_info->data_offset);
+	btrfs_set_file_extent_num_bytes(leaf, extent, replace_len);
+	if (extent_info->is_new_extent)
 		btrfs_set_file_extent_generation(leaf, extent, trans->transid);
 	btrfs_mark_buffer_dirty(leaf);
 	btrfs_release_path(path);
 
 	ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode),
-			clone_info->file_offset, clone_len);
+			extent_info->file_offset, replace_len);
 	if (ret)
 		return ret;
 
 	/* If it's a hole, nothing more needs to be done. */
-	if (clone_info->disk_offset == 0)
+	if (extent_info->disk_offset == 0)
 		return 0;
 
-	inode_add_bytes(inode, clone_len);
+	inode_add_bytes(inode, replace_len);
 
-	if (clone_info->is_new_extent && clone_info->insertions == 0) {
-		key.objectid = clone_info->disk_offset;
+	if (extent_info->is_new_extent && extent_info->insertions == 0) {
+		key.objectid = extent_info->disk_offset;
 		key.type = BTRFS_EXTENT_ITEM_KEY;
-		key.offset = clone_info->disk_len;
+		key.offset = extent_info->disk_len;
 		ret = btrfs_alloc_reserved_file_extent(trans, root,
 						       btrfs_ino(BTRFS_I(inode)),
-						       clone_info->file_offset,
-						       clone_info->qgroup_reserved,
+						       extent_info->file_offset,
+						       extent_info->qgroup_reserved,
 						       &key);
 	} else {
 		u64 ref_offset;
 
 		btrfs_init_generic_ref(&ref, BTRFS_ADD_DELAYED_REF,
-				       clone_info->disk_offset,
-				       clone_info->disk_len, 0);
-		ref_offset = clone_info->file_offset - clone_info->data_offset;
+				       extent_info->disk_offset,
+				       extent_info->disk_len, 0);
+		ref_offset = extent_info->file_offset - extent_info->data_offset;
 		btrfs_init_data_ref(&ref, root->root_key.objectid,
 				    btrfs_ino(BTRFS_I(inode)), ref_offset);
 		ret = btrfs_inc_extent_ref(trans, &ref);
 	}
 
-	clone_info->insertions++;
+	extent_info->insertions++;
 
 	return ret;
 }
@@ -2653,15 +2653,15 @@ static int btrfs_insert_clone_extent(struct btrfs_trans_handle *trans,
 /*
  * The respective range must have been previously locked, as well as the inode.
  * The end offset is inclusive (last byte of the range).
- * @clone_info is NULL for fallocate's hole punching and non-NULL for extent
- * cloning.
- * When cloning, we don't want to end up in a state where we dropped extents
- * without inserting a new one, so we must abort the transaction to avoid a
- * corruption.
+ * @extent_info is NULL for fallocate's hole punching and non-NULL when replacing
+ * the file range with an extent.
+ * When not punching a hole, we don't want to end up in a state where we dropped
+ * extents without inserting a new one, so we must abort the transaction to avoid
+ * a corruption.
  */
 int btrfs_punch_hole_range(struct inode *inode, struct btrfs_path *path,
 			   const u64 start, const u64 end,
-			   struct btrfs_clone_extent_info *clone_info,
+			   struct btrfs_replace_extent_info *extent_info,
 			   struct btrfs_trans_handle **trans_out)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -2690,10 +2690,10 @@ int btrfs_punch_hole_range(struct inode *inode, struct btrfs_path *path,
 	/*
 	 * 1 - update the inode
 	 * 1 - removing the extents in the range
-	 * 1 - adding the hole extent if no_holes isn't set or if we are cloning
-	 *     an extent
+	 * 1 - adding the hole extent if no_holes isn't set or if we are
+	 *     replacing the range with a new extent
 	 */
-	if (!btrfs_fs_incompat(fs_info, NO_HOLES) || clone_info)
+	if (!btrfs_fs_incompat(fs_info, NO_HOLES) || extent_info)
 		rsv_count = 3;
 	else
 		rsv_count = 2;
@@ -2723,7 +2723,7 @@ int btrfs_punch_hole_range(struct inode *inode, struct btrfs_path *path,
 			 * returned by __btrfs_drop_extents() without having
 			 * changed anything in the file.
 			 */
-			if (clone_info && !clone_info->is_new_extent &&
+			if (extent_info && !extent_info->is_new_extent &&
 			    ret && ret != -EOPNOTSUPP)
 				btrfs_abort_transaction(trans, ret);
 			break;
@@ -2731,7 +2731,7 @@ int btrfs_punch_hole_range(struct inode *inode, struct btrfs_path *path,
 
 		trans->block_rsv = &fs_info->trans_block_rsv;
 
-		if (!clone_info && cur_offset < drop_end &&
+		if (!extent_info && cur_offset < drop_end &&
 		    cur_offset < ino_size) {
 			ret = fill_holes(trans, BTRFS_I(inode), path,
 					cur_offset, drop_end);
@@ -2745,7 +2745,7 @@ int btrfs_punch_hole_range(struct inode *inode, struct btrfs_path *path,
 				btrfs_abort_transaction(trans, ret);
 				break;
 			}
-		} else if (!clone_info && cur_offset < drop_end) {
+		} else if (!extent_info && cur_offset < drop_end) {
 			/*
 			 * We are past the i_size here, but since we didn't
 			 * insert holes we need to clear the mapped area so we
@@ -2765,18 +2765,18 @@ int btrfs_punch_hole_range(struct inode *inode, struct btrfs_path *path,
 			}
 		}
 
-		if (clone_info && drop_end > clone_info->file_offset) {
-			u64 clone_len = drop_end - clone_info->file_offset;
+		if (extent_info && drop_end > extent_info->file_offset) {
+			u64 replace_len = drop_end - extent_info->file_offset;
 
 			ret = btrfs_insert_clone_extent(trans, inode, path,
-							clone_info, clone_len);
+							extent_info, replace_len);
 			if (ret) {
 				btrfs_abort_transaction(trans, ret);
 				break;
 			}
-			clone_info->data_len -= clone_len;
-			clone_info->data_offset += clone_len;
-			clone_info->file_offset += clone_len;
+			extent_info->data_len -= replace_len;
+			extent_info->data_offset += replace_len;
+			extent_info->file_offset += replace_len;
 		}
 
 		cur_offset = drop_end;
@@ -2800,7 +2800,7 @@ int btrfs_punch_hole_range(struct inode *inode, struct btrfs_path *path,
 		BUG_ON(ret);	/* shouldn't happen */
 		trans->block_rsv = rsv;
 
-		if (!clone_info) {
+		if (!extent_info) {
 			ret = find_first_non_hole(inode, &cur_offset, &len);
 			if (unlikely(ret < 0))
 				break;
@@ -2819,7 +2819,7 @@ int btrfs_punch_hole_range(struct inode *inode, struct btrfs_path *path,
 	 * than 16Mb would force the full fsync any way (when
 	 * try_release_extent_mapping() is invoked during page cache truncation.
 	 */
-	if (clone_info && !clone_info->is_new_extent)
+	if (extent_info && !extent_info->is_new_extent)
 		set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
 			&BTRFS_I(inode)->runtime_flags);
 
@@ -2845,7 +2845,7 @@ int btrfs_punch_hole_range(struct inode *inode, struct btrfs_path *path,
 	 * (because it's useless) or if it represents a 0 bytes range (when
 	 * cur_offset == drop_end).
 	 */
-	if (!clone_info && cur_offset < ino_size && cur_offset < drop_end) {
+	if (!extent_info && cur_offset < ino_size && cur_offset < drop_end) {
 		ret = fill_holes(trans, BTRFS_I(inode), path,
 				cur_offset, drop_end);
 		if (ret) {
@@ -2853,7 +2853,7 @@ int btrfs_punch_hole_range(struct inode *inode, struct btrfs_path *path,
 			btrfs_abort_transaction(trans, ret);
 			goto out_trans;
 		}
-	} else if (!clone_info && cur_offset < drop_end) {
+	} else if (!extent_info && cur_offset < drop_end) {
 		/* See the comment in the loop above for the reasoning here. */
 		ret = btrfs_inode_clear_file_extent_range(BTRFS_I(inode),
 					cur_offset, drop_end - cur_offset);
@@ -2863,9 +2863,9 @@ int btrfs_punch_hole_range(struct inode *inode, struct btrfs_path *path,
 		}
 
 	}
-	if (clone_info) {
-		ret = btrfs_insert_clone_extent(trans, inode, path, clone_info,
-						clone_info->data_len);
+	if (extent_info) {
+		ret = btrfs_insert_clone_extent(trans, inode, path, extent_info,
+						extent_info->data_len);
 		if (ret) {
 			btrfs_abort_transaction(trans, ret);
 			goto out_trans;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0281895268f7..a67b80979c48 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9581,7 +9581,7 @@ static struct btrfs_trans_handle *insert_prealloc_file_extent(
 				       u64 file_offset)
 {
 	struct btrfs_file_extent_item stack_fi;
-	struct btrfs_clone_extent_info extent_info;
+	struct btrfs_replace_extent_info extent_info;
 	struct btrfs_trans_handle *trans = trans_in;
 	struct btrfs_path *path;
 	u64 start = ins->objectid;
diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c
index 020da4d65b64..dc8b5397e198 100644
--- a/fs/btrfs/reflink.c
+++ b/fs/btrfs/reflink.c
@@ -439,7 +439,7 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
 
 		if (type == BTRFS_FILE_EXTENT_REG ||
 		    type == BTRFS_FILE_EXTENT_PREALLOC) {
-			struct btrfs_clone_extent_info clone_info;
+			struct btrfs_replace_extent_info clone_info;
 
 			/*
 			 *    a  | --- range to clone ---|  b
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/5] btrfs: rename btrfs_punch_hole_range() to a more generic name
  2020-09-08 10:27 [PATCH 0/5] btrfs: fix enospc and transaction aborts during fallocate fdmanana
                   ` (2 preceding siblings ...)
  2020-09-08 10:27 ` [PATCH 3/5] btrfs: rename struct btrfs_clone_extent_info to a more generic name fdmanana
@ 2020-09-08 10:27 ` fdmanana
  2020-09-10 14:49   ` Josef Bacik
  2020-09-08 10:27 ` [PATCH 5/5] btrfs: rename btrfs_insert_clone_extent() " fdmanana
  2020-09-11 14:02 ` [PATCH 0/5] btrfs: fix enospc and transaction aborts during fallocate David Sterba
  5 siblings, 1 reply; 12+ messages in thread
From: fdmanana @ 2020-09-08 10:27 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Filipe Manana

From: Filipe Manana <fdmanana@suse.com>

The function btrfs_punch_hole_range() is now used to replace all the file
extents in a given file range with an extent described in the given struct
btrfs_replace_extent_info argument. This extent can either be an existing
extent that is being cloned or it can be a new extent (namely a prealloc
extent). When that argument is NULL it only punches a hole (drop alls the
existing extents) in the file range.

So rename the function to btrfs_replace_file_extents().

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.h   | 2 +-
 fs/btrfs/file.c    | 4 ++--
 fs/btrfs/inode.c   | 2 +-
 fs/btrfs/reflink.c | 4 ++--
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0b1dadd44e53..74ddff3529df 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3074,7 +3074,7 @@ int __btrfs_drop_extents(struct btrfs_trans_handle *trans,
 int btrfs_drop_extents(struct btrfs_trans_handle *trans,
 		       struct btrfs_root *root, struct inode *inode, u64 start,
 		       u64 end, int drop_cache);
-int btrfs_punch_hole_range(struct inode *inode, struct btrfs_path *path,
+int btrfs_replace_file_extents(struct inode *inode, struct btrfs_path *path,
 			   const u64 start, const u64 end,
 			   struct btrfs_replace_extent_info *extent_info,
 			   struct btrfs_trans_handle **trans_out);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 7ac0a20119f3..241b34e44a6c 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2659,7 +2659,7 @@ static int btrfs_insert_clone_extent(struct btrfs_trans_handle *trans,
  * extents without inserting a new one, so we must abort the transaction to avoid
  * a corruption.
  */
-int btrfs_punch_hole_range(struct inode *inode, struct btrfs_path *path,
+int btrfs_replace_file_extents(struct inode *inode, struct btrfs_path *path,
 			   const u64 start, const u64 end,
 			   struct btrfs_replace_extent_info *extent_info,
 			   struct btrfs_trans_handle **trans_out)
@@ -3007,7 +3007,7 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
 		goto out;
 	}
 
-	ret = btrfs_punch_hole_range(inode, path, lockstart, lockend, NULL,
+	ret = btrfs_replace_file_extents(inode, path, lockstart, lockend, NULL,
 				     &trans);
 	btrfs_free_path(path);
 	if (ret)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a67b80979c48..9f6fb0ff33e7 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9624,7 +9624,7 @@ static struct btrfs_trans_handle *insert_prealloc_file_extent(
 	if (!path)
 		return ERR_PTR(-ENOMEM);
 
-	ret = btrfs_punch_hole_range(inode, path, file_offset,
+	ret = btrfs_replace_file_extents(inode, path, file_offset,
 				     file_offset + len - 1, &extent_info,
 				     &trans);
 	btrfs_free_path(path);
diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c
index dc8b5397e198..39b3269e5760 100644
--- a/fs/btrfs/reflink.c
+++ b/fs/btrfs/reflink.c
@@ -463,7 +463,7 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
 			clone_info.file_offset = new_key.offset;
 			clone_info.extent_buf = buf;
 			clone_info.is_new_extent = false;
-			ret = btrfs_punch_hole_range(inode, path, drop_start,
+			ret = btrfs_replace_file_extents(inode, path, drop_start,
 					new_key.offset + datal - 1, &clone_info,
 					&trans);
 			if (ret)
@@ -533,7 +533,7 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
 		btrfs_release_path(path);
 		path->leave_spinning = 0;
 
-		ret = btrfs_punch_hole_range(inode, path, last_dest_end,
+		ret = btrfs_replace_file_extents(inode, path, last_dest_end,
 				destoff + len - 1, NULL, &trans);
 		if (ret)
 			goto out;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/5] btrfs: rename btrfs_insert_clone_extent() to a more generic name
  2020-09-08 10:27 [PATCH 0/5] btrfs: fix enospc and transaction aborts during fallocate fdmanana
                   ` (3 preceding siblings ...)
  2020-09-08 10:27 ` [PATCH 4/5] btrfs: rename btrfs_punch_hole_range() " fdmanana
@ 2020-09-08 10:27 ` fdmanana
  2020-09-10 14:49   ` Josef Bacik
  2020-09-11 14:02 ` [PATCH 0/5] btrfs: fix enospc and transaction aborts during fallocate David Sterba
  5 siblings, 1 reply; 12+ messages in thread
From: fdmanana @ 2020-09-08 10:27 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Filipe Manana

From: Filipe Manana <fdmanana@suse.com>

Now that we use the same mechanism to replace all the extents in a file
range with either a hole, an existing extent (when cloning) or a new
extent (when using fallocate), the name of btrfs_insert_clone_extent()
no longer reflects its genericity.

So rename it to btrfs_insert_replace_extent(), since what it does is
to either insert an existing extent or a new extent into a file range.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/file.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 241b34e44a6c..32ceda264b7d 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2570,7 +2570,7 @@ static int btrfs_punch_hole_lock_range(struct inode *inode,
 	return 0;
 }
 
-static int btrfs_insert_clone_extent(struct btrfs_trans_handle *trans,
+static int btrfs_insert_replace_extent(struct btrfs_trans_handle *trans,
 				     struct inode *inode,
 				     struct btrfs_path *path,
 				     struct btrfs_replace_extent_info *extent_info,
@@ -2768,7 +2768,7 @@ int btrfs_replace_file_extents(struct inode *inode, struct btrfs_path *path,
 		if (extent_info && drop_end > extent_info->file_offset) {
 			u64 replace_len = drop_end - extent_info->file_offset;
 
-			ret = btrfs_insert_clone_extent(trans, inode, path,
+			ret = btrfs_insert_replace_extent(trans, inode, path,
 							extent_info, replace_len);
 			if (ret) {
 				btrfs_abort_transaction(trans, ret);
@@ -2864,7 +2864,7 @@ int btrfs_replace_file_extents(struct inode *inode, struct btrfs_path *path,
 
 	}
 	if (extent_info) {
-		ret = btrfs_insert_clone_extent(trans, inode, path, extent_info,
+		ret = btrfs_insert_replace_extent(trans, inode, path, extent_info,
 						extent_info->data_len);
 		if (ret) {
 			btrfs_abort_transaction(trans, ret);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/5] btrfs: fix metadata reservation for fallocate that leads to transaction aborts
  2020-09-08 10:27 ` [PATCH 1/5] btrfs: fix metadata reservation for fallocate that leads to transaction aborts fdmanana
@ 2020-09-10 14:48   ` Josef Bacik
  0 siblings, 0 replies; 12+ messages in thread
From: Josef Bacik @ 2020-09-10 14:48 UTC (permalink / raw)
  To: fdmanana, linux-btrfs; +Cc: Filipe Manana

On 9/8/20 6:27 AM, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When doing an fallocate(), specially a zero range operation, we assume
> that reserving 3 units of metadata space is enough, that at most we touch
> one leaf in subvolume/fs tree for removing existing file extent items and
> inserting a new file extent item. This assumption is generally true for
> most common use cases. However when we end up needing to remove file extent
> items from multiple leaves, we can end up failing with -ENOSPC and abort
> the current transaction, turning the filesystem to RO mode. When this
> happens a stack trace like the following is dumped in dmesg/syslog:
> 
> [ 1500.620934] ------------[ cut here ]------------
> [ 1500.620938] BTRFS: Transaction aborted (error -28)
> [ 1500.620973] WARNING: CPU: 2 PID: 30807 at fs/btrfs/inode.c:9724 __btrfs_prealloc_file_range+0x512/0x570 [btrfs]
> [ 1500.620974] Modules linked in: btrfs intel_rapl_msr intel_rapl_common kvm_intel (...)
> [ 1500.621010] CPU: 2 PID: 30807 Comm: xfs_io Tainted: G        W         5.9.0-rc3-btrfs-next-67 #1
> [ 1500.621012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> [ 1500.621023] RIP: 0010:__btrfs_prealloc_file_range+0x512/0x570 [btrfs]
> [ 1500.621026] Code: 8b 40 50 f0 48 (...)
> [ 1500.621028] RSP: 0018:ffffb05fc8803ca0 EFLAGS: 00010286
> [ 1500.621030] RAX: 0000000000000000 RBX: ffff9608af276488 RCX: 0000000000000000
> [ 1500.621032] RDX: 0000000000000001 RSI: 0000000000000027 RDI: 00000000ffffffff
> [ 1500.621033] RBP: ffffb05fc8803d90 R08: 0000000000000001 R09: 0000000000000001
> [ 1500.621035] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000003200000
> [ 1500.621037] R13: 00000000ffffffe4 R14: ffff9608af275fe8 R15: ffff9608af275f60
> [ 1500.621039] FS:  00007fb5b2368ec0(0000) GS:ffff9608b6600000(0000) knlGS:0000000000000000
> [ 1500.621041] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1500.621043] CR2: 00007fb5b2366fb8 CR3: 0000000202d38005 CR4: 00000000003706e0
> [ 1500.621046] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1500.621047] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1500.621049] Call Trace:
> [ 1500.621076]  btrfs_prealloc_file_range+0x10/0x20 [btrfs]
> [ 1500.621087]  btrfs_fallocate+0xccd/0x1280 [btrfs]
> [ 1500.621108]  vfs_fallocate+0x14d/0x290
> [ 1500.621112]  ksys_fallocate+0x3a/0x70
> [ 1500.621117]  __x64_sys_fallocate+0x1a/0x20
> [ 1500.621120]  do_syscall_64+0x33/0x80
> [ 1500.621123]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 1500.621126] RIP: 0033:0x7fb5b248c477
> [ 1500.621128] Code: 89 7c 24 08 (...)
> [ 1500.621130] RSP: 002b:00007ffc7bee9060 EFLAGS: 00000293 ORIG_RAX: 000000000000011d
> [ 1500.621132] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fb5b248c477
> [ 1500.621134] RDX: 0000000000000000 RSI: 0000000000000010 RDI: 0000000000000003
> [ 1500.621136] RBP: 0000557718faafd0 R08: 0000000000000000 R09: 0000000000000000
> [ 1500.621137] R10: 0000000003200000 R11: 0000000000000293 R12: 0000000000000010
> [ 1500.621139] R13: 0000557718faafb0 R14: 0000557718faa480 R15: 0000000000000003
> [ 1500.621151] irq event stamp: 1026217
> [ 1500.621154] hardirqs last  enabled at (1026223): [<ffffffffba965570>] console_unlock+0x500/0x5c0
> [ 1500.621156] hardirqs last disabled at (1026228): [<ffffffffba9654c7>] console_unlock+0x457/0x5c0
> [ 1500.621159] softirqs last  enabled at (1022486): [<ffffffffbb6003dc>] __do_softirq+0x3dc/0x606
> [ 1500.621161] softirqs last disabled at (1022477): [<ffffffffbb4010b2>] asm_call_on_stack+0x12/0x20
> [ 1500.621162] ---[ end trace 2955b08408d8b9d4 ]---
> [ 1500.621167] BTRFS: error (device sdj) in __btrfs_prealloc_file_range:9724: errno=-28 No space left
> 
> When we use fallocate() internally, for reserving an extent for a space
> cache, inode cache or relocation, we can't hit this problem since either
> there aren't any file extent items to remove from the subvolume tree or
> there is at most one.
> 
> When using plain fallocate() it's very unlikely, since that would require
> having many file extent items representing holes for the target range and
> crossing multiple leafs - we attempt to increase the range (merge) of such
> file extent items when punching holes, so at most we end up with 2 file
> extent items for holes at leaf boundaries.
> 
> However when using the zero range operation of fallocate() for a large
> range (100+ MiB for example) that's fairly easy to trigger. The following
> example reproducer triggers the issue:
> 
>    $ cat reproducer.sh
>    #!/bin/bash
> 
>    umount /dev/sdj &> /dev/null
>    mkfs.btrfs -f -n 16384 -O ^no-holes /dev/sdj > /dev/null
>    mount /dev/sdj /mnt/sdj
> 
>    # Create a 100M file with many file extent items. Punch a hole every 8K
>    # just to speedup the file creation - we could do 4K sequential writes
>    # followed by fsync (or O_SYNC) as well, but that takes a lot of time.
>    file_size=$((100 * 1024 * 1024))
>    xfs_io -f -c "pwrite -S 0xab -b 10M 0 $file_size" /mnt/sdj/foobar
>    for ((i = 0; i < $file_size; i += 8192)); do
>        xfs_io -c "fpunch $i 4096" /mnt/sdj/foobar
>    done
> 
>    # Force a transaction commit, so the zero range operation will be forced
>    # to COW all metadata extents it need to touch.
>    sync
> 
>    xfs_io -c "fzero 0 $file_size" /mnt/sdj/foobar
> 
>    umount /mnt/sdj
> 
>    $ ./reproducer.sh
>    wrote 104857600/104857600 bytes at offset 0
>    100 MiB, 10 ops; 0.0669 sec (1.458 GiB/sec and 149.3117 ops/sec)
>    fallocate: No space left on device
> 
>    $ dmesg
>    <shows the same stack trace pasted before>
> 
> To fix this use the existing infrastructure that hole punching and
> extent cloning use for replacing a file range with another extent. This
> deals with doing the removal of file extent items and inserting the new
> one using an incremental approach, reserving more space when needed and
> always ensuring we don't leave an implicit hole in the range in case
> we need to do multiple iterations and a crash happens between iterations.
> 
> A test case for fstests will follow up soon.
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/5] btrfs: remove item_size member of struct btrfs_clone_extent_info
  2020-09-08 10:27 ` [PATCH 2/5] btrfs: remove item_size member of struct btrfs_clone_extent_info fdmanana
@ 2020-09-10 14:48   ` Josef Bacik
  0 siblings, 0 replies; 12+ messages in thread
From: Josef Bacik @ 2020-09-10 14:48 UTC (permalink / raw)
  To: fdmanana, linux-btrfs; +Cc: Filipe Manana

On 9/8/20 6:27 AM, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> The value of item_size of struct btrfs_clone_extent_info is always set to
> the size of a non-inline file extent item, and in fact the infratructure
> that uses this structure (btrfs_punch_hole_range()) does not work with
> inline file extents at all (and it is not supposed to).
> 
> So just remove that field from the structure and use directly
> sizeof(struct btrfs_file_extent_item) instead. Also assert that the
> file extent type is not inline at btrfs_insert_clone_extent().
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/5] btrfs: rename struct btrfs_clone_extent_info to a more generic name
  2020-09-08 10:27 ` [PATCH 3/5] btrfs: rename struct btrfs_clone_extent_info to a more generic name fdmanana
@ 2020-09-10 14:48   ` Josef Bacik
  0 siblings, 0 replies; 12+ messages in thread
From: Josef Bacik @ 2020-09-10 14:48 UTC (permalink / raw)
  To: fdmanana, linux-btrfs; +Cc: Filipe Manana

On 9/8/20 6:27 AM, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> Now that we can use btrfs_clone_extent_info to convey information for a
> new prealloc extent as well, and not just for existing extents that are
> being cloned, rename it to btrfs_replace_extent_info, which reflects the
> fact that this is now more generic and it is used to replace all existing
> extents in a file range with the extent described by the structure.
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/5] btrfs: rename btrfs_punch_hole_range() to a more generic name
  2020-09-08 10:27 ` [PATCH 4/5] btrfs: rename btrfs_punch_hole_range() " fdmanana
@ 2020-09-10 14:49   ` Josef Bacik
  0 siblings, 0 replies; 12+ messages in thread
From: Josef Bacik @ 2020-09-10 14:49 UTC (permalink / raw)
  To: fdmanana, linux-btrfs; +Cc: Filipe Manana

On 9/8/20 6:27 AM, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> The function btrfs_punch_hole_range() is now used to replace all the file
> extents in a given file range with an extent described in the given struct
> btrfs_replace_extent_info argument. This extent can either be an existing
> extent that is being cloned or it can be a new extent (namely a prealloc
> extent). When that argument is NULL it only punches a hole (drop alls the
> existing extents) in the file range.
> 
> So rename the function to btrfs_replace_file_extents().
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/5] btrfs: rename btrfs_insert_clone_extent() to a more generic name
  2020-09-08 10:27 ` [PATCH 5/5] btrfs: rename btrfs_insert_clone_extent() " fdmanana
@ 2020-09-10 14:49   ` Josef Bacik
  0 siblings, 0 replies; 12+ messages in thread
From: Josef Bacik @ 2020-09-10 14:49 UTC (permalink / raw)
  To: fdmanana, linux-btrfs; +Cc: Filipe Manana

On 9/8/20 6:27 AM, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> Now that we use the same mechanism to replace all the extents in a file
> range with either a hole, an existing extent (when cloning) or a new
> extent (when using fallocate), the name of btrfs_insert_clone_extent()
> no longer reflects its genericity.
> 
> So rename it to btrfs_insert_replace_extent(), since what it does is
> to either insert an existing extent or a new extent into a file range.
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] btrfs: fix enospc and transaction aborts during fallocate
  2020-09-08 10:27 [PATCH 0/5] btrfs: fix enospc and transaction aborts during fallocate fdmanana
                   ` (4 preceding siblings ...)
  2020-09-08 10:27 ` [PATCH 5/5] btrfs: rename btrfs_insert_clone_extent() " fdmanana
@ 2020-09-11 14:02 ` David Sterba
  5 siblings, 0 replies; 12+ messages in thread
From: David Sterba @ 2020-09-11 14:02 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs, Filipe Manana

On Tue, Sep 08, 2020 at 11:27:19AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When attempting to fallocate on a large file range with many file extent
> items, the operation can fail with ENOSPC when it shouldn't and, more
> critical, abort the transaction and turn the filesystem to RO mode.
> 
> First patch fixes the issue, the remaining just do some cleanups after it.
> 
> Filipe Manana (5):
>   btrfs: fix metadata reservation for fallocate that leads to
>     transaction aborts
>   btrfs: remove item_size member of struct btrfs_clone_extent_info
>   btrfs: rename struct btrfs_clone_extent_info to a more generic name
>   btrfs: rename btrfs_punch_hole_range() to a more generic name
>   btrfs: rename btrfs_insert_clone_extent() to a more generic name

Added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-09-11 17:01 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-08 10:27 [PATCH 0/5] btrfs: fix enospc and transaction aborts during fallocate fdmanana
2020-09-08 10:27 ` [PATCH 1/5] btrfs: fix metadata reservation for fallocate that leads to transaction aborts fdmanana
2020-09-10 14:48   ` Josef Bacik
2020-09-08 10:27 ` [PATCH 2/5] btrfs: remove item_size member of struct btrfs_clone_extent_info fdmanana
2020-09-10 14:48   ` Josef Bacik
2020-09-08 10:27 ` [PATCH 3/5] btrfs: rename struct btrfs_clone_extent_info to a more generic name fdmanana
2020-09-10 14:48   ` Josef Bacik
2020-09-08 10:27 ` [PATCH 4/5] btrfs: rename btrfs_punch_hole_range() " fdmanana
2020-09-10 14:49   ` Josef Bacik
2020-09-08 10:27 ` [PATCH 5/5] btrfs: rename btrfs_insert_clone_extent() " fdmanana
2020-09-10 14:49   ` Josef Bacik
2020-09-11 14:02 ` [PATCH 0/5] btrfs: fix enospc and transaction aborts during fallocate David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).