* [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
@ 2019-09-10 14:26 fdmanana
2019-09-11 16:04 ` Josef Bacik
` (4 more replies)
0 siblings, 5 replies; 14+ messages in thread
From: fdmanana @ 2019-09-10 14:26 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
Sometimes when fsync'ing a file we need to log that other inodes exist and
when we need to do that we acquire a reference on the inodes and then drop
that reference using iput() after logging them.
That generally is not a problem except if we end up doing the final iput()
(dropping the last reference) on the inode and that inode has a link count
of 0, which can happen in a very short time window if the logging path
gets a reference on the inode while it's being unlinked.
In that case we end up getting the eviction callback, btrfs_evict_inode(),
invoked through the iput() call chain which needs to drop all of the
inode's items from its subvolume btree, and in order to do that, it needs
to join a transaction at the helper function evict_refill_and_join().
However because the task previously started a transaction at the fsync
handler, btrfs_sync_file(), it has current->journal_info already pointing
to a transaction handle and therefore evict_refill_and_join() will get
that transaction handle from btrfs_join_transaction(). From this point on,
two different problems can happen:
1) evict_refill_and_join() will often change the transaction handle's
block reserve (->block_rsv) and set its ->bytes_reserved field to a
value greater than 0. If evict_refill_and_join() never commits the
transaction, the eviction handler ends up decreasing the reference
count (->use_count) of the transaction handle through the call to
btrfs_end_transaction(), and after that point we have a transaction
handle with a NULL ->block_rsv (which is the value prior to the
transaction join from evict_refill_and_join()) and a ->bytes_reserved
value greater than 0. If after the eviction/iput completes the inode
logging path hits an error or it decides that it must fallback to a
transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
non-zero value from btrfs_log_dentry_safe(), and because of that
non-zero value it tries to commit the transaction using a handle with
a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
the transaction commit hit an assertion failure at
btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
the ->block_rsv is NULL. The produced stack trace for that is like the
following:
[192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
[192922.917553] ------------[ cut here ]------------
[192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
[192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
[192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G W 5.1.4-btrfs-next-47 #1
[192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
(...)
[192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
[192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
[192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
[192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
[192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
[192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
[192922.923200] FS: 00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
[192922.923579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
[192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[192922.925105] Call Trace:
[192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
[192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
[192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs]
[192922.926731] do_fsync+0x38/0x60
[192922.927138] __x64_sys_fdatasync+0x13/0x20
[192922.927543] do_syscall_64+0x60/0x1c0
[192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe
(...)
[192922.934077] ---[ end trace f00808b12068168f ]---
2) If evict_refill_and_join() decides to commit the transaction, it will
be able to do it, since the nested transaction join only increments the
transaction handle's ->use_count reference counter and it does not
prevent the transaction from getting committed. This means that after
eviction completes, the fsync logging path will be using a transaction
handle that refers to an already committed transaction. What happens
when using such a stale transaction can be unpredictable, we are at
least having a use-after-free on the transaction handle itself, since
the transaction commit will call kmem_cache_free() against the handle
regardless of its ->use_count value, or we can end up silently losing
all the updates to the log tree after that iput() in the logging path,
or using a transaction handle that in the meanwhile was allocated to
another task for a new transaction, etc, pretty much unpredictable
what can happen.
In order to fix both of them, instead of using iput() during logging, use
btrfs_add_delayed_iput(), so that the logging path of fsync never drops
the last reference on an inode, that step is offloaded to a safe context
(usually the cleaner kthread).
The assertion failure issue was sporadically triggered by the test case
generic/475 from fstests, which loads the dm error target while fsstress
is running, which lead to fsync failing while logging inodes with -EIO
errors and then trying later to commit the transaction, triggering the
assertion failure.
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/tree-log.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 6c8297bcfeb7..1bfd7e34f31e 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4985,7 +4985,7 @@ static int log_conflicting_inodes(struct btrfs_trans_handle *trans,
BTRFS_I(inode),
LOG_OTHER_INODE_ALL,
0, LLONG_MAX, ctx);
- iput(inode);
+ btrfs_add_delayed_iput(inode);
}
}
continue;
@@ -5000,7 +5000,7 @@ static int log_conflicting_inodes(struct btrfs_trans_handle *trans,
ret = btrfs_log_inode(trans, root, BTRFS_I(inode),
LOG_OTHER_INODE, 0, LLONG_MAX, ctx);
if (ret) {
- iput(inode);
+ btrfs_add_delayed_iput(inode);
continue;
}
@@ -5009,7 +5009,7 @@ static int log_conflicting_inodes(struct btrfs_trans_handle *trans,
key.offset = 0;
ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
if (ret < 0) {
- iput(inode);
+ btrfs_add_delayed_iput(inode);
continue;
}
@@ -5056,7 +5056,7 @@ static int log_conflicting_inodes(struct btrfs_trans_handle *trans,
}
path->slots[0]++;
}
- iput(inode);
+ btrfs_add_delayed_iput(inode);
}
return ret;
@@ -5689,7 +5689,7 @@ static int log_new_dir_dentries(struct btrfs_trans_handle *trans,
}
if (btrfs_inode_in_log(BTRFS_I(di_inode), trans->transid)) {
- iput(di_inode);
+ btrfs_add_delayed_iput(di_inode);
break;
}
@@ -5701,7 +5701,7 @@ static int log_new_dir_dentries(struct btrfs_trans_handle *trans,
if (!ret &&
btrfs_must_commit_transaction(trans, BTRFS_I(di_inode)))
ret = 1;
- iput(di_inode);
+ btrfs_add_delayed_iput(di_inode);
if (ret)
goto next_dir_inode;
if (ctx->log_new_dentries) {
@@ -5848,7 +5848,7 @@ static int btrfs_log_all_parents(struct btrfs_trans_handle *trans,
if (!ret && ctx && ctx->log_new_dentries)
ret = log_new_dir_dentries(trans, root,
BTRFS_I(dir_inode), ctx);
- iput(dir_inode);
+ btrfs_add_delayed_iput(dir_inode);
if (ret)
goto out;
}
@@ -5891,7 +5891,7 @@ static int log_new_ancestors(struct btrfs_trans_handle *trans,
ret = btrfs_log_inode(trans, root, BTRFS_I(inode),
LOG_INODE_EXISTS,
0, LLONG_MAX, ctx);
- iput(inode);
+ btrfs_add_delayed_iput(inode);
if (ret)
return ret;
--
2.11.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
2019-09-10 14:26 [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction fdmanana
@ 2019-09-11 16:04 ` Josef Bacik
2019-09-12 7:30 ` Sasha Levin
` (3 subsequent siblings)
4 siblings, 0 replies; 14+ messages in thread
From: Josef Bacik @ 2019-09-11 16:04 UTC (permalink / raw)
To: fdmanana; +Cc: linux-btrfs
On Tue, Sep 10, 2019 at 03:26:49PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> Sometimes when fsync'ing a file we need to log that other inodes exist and
> when we need to do that we acquire a reference on the inodes and then drop
> that reference using iput() after logging them.
>
> That generally is not a problem except if we end up doing the final iput()
> (dropping the last reference) on the inode and that inode has a link count
> of 0, which can happen in a very short time window if the logging path
> gets a reference on the inode while it's being unlinked.
>
> In that case we end up getting the eviction callback, btrfs_evict_inode(),
> invoked through the iput() call chain which needs to drop all of the
> inode's items from its subvolume btree, and in order to do that, it needs
> to join a transaction at the helper function evict_refill_and_join().
> However because the task previously started a transaction at the fsync
> handler, btrfs_sync_file(), it has current->journal_info already pointing
> to a transaction handle and therefore evict_refill_and_join() will get
> that transaction handle from btrfs_join_transaction(). From this point on,
> two different problems can happen:
>
> 1) evict_refill_and_join() will often change the transaction handle's
> block reserve (->block_rsv) and set its ->bytes_reserved field to a
> value greater than 0. If evict_refill_and_join() never commits the
> transaction, the eviction handler ends up decreasing the reference
> count (->use_count) of the transaction handle through the call to
> btrfs_end_transaction(), and after that point we have a transaction
> handle with a NULL ->block_rsv (which is the value prior to the
> transaction join from evict_refill_and_join()) and a ->bytes_reserved
> value greater than 0. If after the eviction/iput completes the inode
> logging path hits an error or it decides that it must fallback to a
> transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
> non-zero value from btrfs_log_dentry_safe(), and because of that
> non-zero value it tries to commit the transaction using a handle with
> a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
> the transaction commit hit an assertion failure at
> btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
> the ->block_rsv is NULL. The produced stack trace for that is like the
> following:
>
> [192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
> [192922.917553] ------------[ cut here ]------------
> [192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
> [192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
> [192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G W 5.1.4-btrfs-next-47 #1
> [192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
> [192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
> (...)
> [192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
> [192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
> [192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
> [192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
> [192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
> [192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
> [192922.923200] FS: 00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
> [192922.923579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
> [192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [192922.925105] Call Trace:
> [192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
> [192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
> [192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs]
> [192922.926731] do_fsync+0x38/0x60
> [192922.927138] __x64_sys_fdatasync+0x13/0x20
> [192922.927543] do_syscall_64+0x60/0x1c0
> [192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> (...)
> [192922.934077] ---[ end trace f00808b12068168f ]---
>
> 2) If evict_refill_and_join() decides to commit the transaction, it will
> be able to do it, since the nested transaction join only increments the
> transaction handle's ->use_count reference counter and it does not
> prevent the transaction from getting committed. This means that after
> eviction completes, the fsync logging path will be using a transaction
> handle that refers to an already committed transaction. What happens
> when using such a stale transaction can be unpredictable, we are at
> least having a use-after-free on the transaction handle itself, since
> the transaction commit will call kmem_cache_free() against the handle
> regardless of its ->use_count value, or we can end up silently losing
> all the updates to the log tree after that iput() in the logging path,
> or using a transaction handle that in the meanwhile was allocated to
> another task for a new transaction, etc, pretty much unpredictable
> what can happen.
>
> In order to fix both of them, instead of using iput() during logging, use
> btrfs_add_delayed_iput(), so that the logging path of fsync never drops
> the last reference on an inode, that step is offloaded to a safe context
> (usually the cleaner kthread).
>
> The assertion failure issue was sporadically triggered by the test case
> generic/475 from fstests, which loads the dm error target while fsstress
> is running, which lead to fsync failing while logging inodes with -EIO
> errors and then trying later to commit the transaction, triggering the
> assertion failure.
>
> CC: stable@vger.kernel.org # 4.4+
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
2019-09-10 14:26 [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction fdmanana
2019-09-11 16:04 ` Josef Bacik
@ 2019-09-12 7:30 ` Sasha Levin
2019-09-12 8:02 ` Filipe Manana
2019-09-16 12:59 ` Filipe Manana
2019-09-12 11:10 ` Nikolay Borisov
` (2 subsequent siblings)
4 siblings, 2 replies; 14+ messages in thread
From: Sasha Levin @ 2019-09-12 7:30 UTC (permalink / raw)
To: Sasha Levin, Filipe Manana, linux-btrfs; +Cc: stable
Hi,
[This is an automated email]
This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: 4.4+
The bot has tested the following trees: v5.2.14, v4.19.72, v4.14.143, v4.9.192, v4.4.192.
v5.2.14: Build OK!
v4.19.72: Failed to apply! Possible dependencies:
6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different files")
a3baaf0d786e ("Btrfs: fix fsync after succession of renames and unlink/rmdir")
b8aa330d2acb ("Btrfs: improve performance on fsync of files with multiple hardlinks")
v4.14.143: Failed to apply! Possible dependencies:
0d836392cadd ("Btrfs: fix mount failure after fsync due to hard link recreation")
1f250e929a9c ("Btrfs: fix log replay failure after unlink and link combination")
6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different files")
8d9e220ca084 ("btrfs: simplify IS_ERR/PTR_ERR checks")
a3baaf0d786e ("Btrfs: fix fsync after succession of renames and unlink/rmdir")
b8aa330d2acb ("Btrfs: improve performance on fsync of files with multiple hardlinks")
v4.9.192: Failed to apply! Possible dependencies:
0b246afa62b0 ("btrfs: root->fs_info cleanup, add fs_info convenience variables")
0d836392cadd ("Btrfs: fix mount failure after fsync due to hard link recreation")
1f250e929a9c ("Btrfs: fix log replay failure after unlink and link combination")
4791c8f19c45 ("btrfs: Make btrfs_check_ref_name_override take btrfs_inode")
6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different files")
a3baaf0d786e ("Btrfs: fix fsync after succession of renames and unlink/rmdir")
cf8cddd38bab ("btrfs: don't abuse REQ_OP_* flags for btrfs_map_block")
da17066c4047 ("btrfs: pull node/sector/stripe sizes out of root and into fs_info")
db0a669fb002 ("btrfs: Make btrfs_add_link take btrfs_inode")
de143792253e ("btrfs: struct btrfsic_state->root should be an fs_info")
fb456252d3d9 ("btrfs: root->fs_info cleanup, use fs_info->dev_root everywhere")
v4.4.192: Failed to apply! Possible dependencies:
0132761017e0 ("btrfs: fix string and comment grammatical issues and typos")
09cbfeaf1a5a ("mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros")
0b246afa62b0 ("btrfs: root->fs_info cleanup, add fs_info convenience variables")
0d836392cadd ("Btrfs: fix mount failure after fsync due to hard link recreation")
0e749e54244e ("dax: increase granularity of dax_clear_blocks() operations")
1f250e929a9c ("Btrfs: fix log replay failure after unlink and link combination")
44f714dae50a ("Btrfs: improve performance on fsync against new inode after rename/unlink")
4791c8f19c45 ("btrfs: Make btrfs_check_ref_name_override take btrfs_inode")
52db400fcd50 ("pmem, dax: clean up clear_pmem()")
6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different files")
781feef7e6be ("Btrfs: fix lockdep warning about log_mutex")
a3baaf0d786e ("Btrfs: fix fsync after succession of renames and unlink/rmdir")
b2e0d1625e19 ("dax: fix lifetime of in-kernel dax mappings with dax_map_atomic()")
bb7ab3b92e46 ("btrfs: Fix misspellings in comments.")
cf8cddd38bab ("btrfs: don't abuse REQ_OP_* flags for btrfs_map_block")
d1a5f2b4d8a1 ("block: use DAX for partition table reads")
db0a669fb002 ("btrfs: Make btrfs_add_link take btrfs_inode")
de143792253e ("btrfs: struct btrfsic_state->root should be an fs_info")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
2019-09-12 7:30 ` Sasha Levin
@ 2019-09-12 8:02 ` Filipe Manana
2019-09-16 12:59 ` Filipe Manana
1 sibling, 0 replies; 14+ messages in thread
From: Filipe Manana @ 2019-09-12 8:02 UTC (permalink / raw)
To: Sasha Levin; +Cc: Filipe Manana, linux-btrfs, stable
On Thu, Sep 12, 2019 at 8:32 AM Sasha Levin <sashal@kernel.org> wrote:
>
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: 4.4+
>
> The bot has tested the following trees: v5.2.14, v4.19.72, v4.14.143, v4.9.192, v4.4.192.
>
> v5.2.14: Build OK!
> v4.19.72: Failed to apply! Possible dependencies:
> 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different files")
> a3baaf0d786e ("Btrfs: fix fsync after succession of renames and unlink/rmdir")
> b8aa330d2acb ("Btrfs: improve performance on fsync of files with multiple hardlinks")
>
> v4.14.143: Failed to apply! Possible dependencies:
> 0d836392cadd ("Btrfs: fix mount failure after fsync due to hard link recreation")
> 1f250e929a9c ("Btrfs: fix log replay failure after unlink and link combination")
> 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different files")
> 8d9e220ca084 ("btrfs: simplify IS_ERR/PTR_ERR checks")
> a3baaf0d786e ("Btrfs: fix fsync after succession of renames and unlink/rmdir")
> b8aa330d2acb ("Btrfs: improve performance on fsync of files with multiple hardlinks")
>
> v4.9.192: Failed to apply! Possible dependencies:
> 0b246afa62b0 ("btrfs: root->fs_info cleanup, add fs_info convenience variables")
> 0d836392cadd ("Btrfs: fix mount failure after fsync due to hard link recreation")
> 1f250e929a9c ("Btrfs: fix log replay failure after unlink and link combination")
> 4791c8f19c45 ("btrfs: Make btrfs_check_ref_name_override take btrfs_inode")
> 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different files")
> a3baaf0d786e ("Btrfs: fix fsync after succession of renames and unlink/rmdir")
> cf8cddd38bab ("btrfs: don't abuse REQ_OP_* flags for btrfs_map_block")
> da17066c4047 ("btrfs: pull node/sector/stripe sizes out of root and into fs_info")
> db0a669fb002 ("btrfs: Make btrfs_add_link take btrfs_inode")
> de143792253e ("btrfs: struct btrfsic_state->root should be an fs_info")
> fb456252d3d9 ("btrfs: root->fs_info cleanup, use fs_info->dev_root everywhere")
>
> v4.4.192: Failed to apply! Possible dependencies:
> 0132761017e0 ("btrfs: fix string and comment grammatical issues and typos")
> 09cbfeaf1a5a ("mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros")
> 0b246afa62b0 ("btrfs: root->fs_info cleanup, add fs_info convenience variables")
> 0d836392cadd ("Btrfs: fix mount failure after fsync due to hard link recreation")
> 0e749e54244e ("dax: increase granularity of dax_clear_blocks() operations")
> 1f250e929a9c ("Btrfs: fix log replay failure after unlink and link combination")
> 44f714dae50a ("Btrfs: improve performance on fsync against new inode after rename/unlink")
> 4791c8f19c45 ("btrfs: Make btrfs_check_ref_name_override take btrfs_inode")
> 52db400fcd50 ("pmem, dax: clean up clear_pmem()")
> 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different files")
> 781feef7e6be ("Btrfs: fix lockdep warning about log_mutex")
> a3baaf0d786e ("Btrfs: fix fsync after succession of renames and unlink/rmdir")
> b2e0d1625e19 ("dax: fix lifetime of in-kernel dax mappings with dax_map_atomic()")
> bb7ab3b92e46 ("btrfs: Fix misspellings in comments.")
> cf8cddd38bab ("btrfs: don't abuse REQ_OP_* flags for btrfs_map_block")
> d1a5f2b4d8a1 ("block: use DAX for partition table reads")
> db0a669fb002 ("btrfs: Make btrfs_add_link take btrfs_inode")
> de143792253e ("btrfs: struct btrfsic_state->root should be an fs_info")
>
>
> NOTE: The patch will not be queued to stable trees until it is upstream.
>
> How should we proceed with this patch?
After it lands on Linus' tree, I'll try to send patch versions that
apply to different kernel releases.
Thanks.
>
> --
> Thanks,
> Sasha
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
2019-09-10 14:26 [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction fdmanana
2019-09-11 16:04 ` Josef Bacik
2019-09-12 7:30 ` Sasha Levin
@ 2019-09-12 11:10 ` Nikolay Borisov
2019-09-12 11:24 ` Filipe Manana
2019-09-12 11:58 ` Josef Bacik
2019-09-12 12:18 ` Josef Bacik
4 siblings, 1 reply; 14+ messages in thread
From: Nikolay Borisov @ 2019-09-12 11:10 UTC (permalink / raw)
To: fdmanana, linux-btrfs
On 10.09.19 г. 17:26 ч., fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> Sometimes when fsync'ing a file we need to log that other inodes exist and
> when we need to do that we acquire a reference on the inodes and then drop
> that reference using iput() after logging them.
>
> That generally is not a problem except if we end up doing the final iput()
> (dropping the last reference) on the inode and that inode has a link count
> of 0, which can happen in a very short time window if the logging path
> gets a reference on the inode while it's being unlinked.
>
> In that case we end up getting the eviction callback, btrfs_evict_inode(),
> invoked through the iput() call chain which needs to drop all of the
> inode's items from its subvolume btree, and in order to do that, it needs
> to join a transaction at the helper function evict_refill_and_join().
> However because the task previously started a transaction at the fsync
> handler, btrfs_sync_file(), it has current->journal_info already pointing
> to a transaction handle and therefore evict_refill_and_join() will get
> that transaction handle from btrfs_join_transaction(). From this point on,
> two different problems can happen:
>
> 1) evict_refill_and_join() will often change the transaction handle's
> block reserve (->block_rsv) and set its ->bytes_reserved field to a
> value greater than 0. If evict_refill_and_join() never commits the
> transaction, the eviction handler ends up decreasing the reference
> count (->use_count) of the transaction handle through the call to
> btrfs_end_transaction(), and after that point we have a transaction
> handle with a NULL ->block_rsv (which is the value prior to the
> transaction join from evict_refill_and_join()) and a ->bytes_reserved
> value greater than 0. If after the eviction/iput completes the inode
> logging path hits an error or it decides that it must fallback to a
> transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
> non-zero value from btrfs_log_dentry_safe(), and because of that
> non-zero value it tries to commit the transaction using a handle with
> a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
> the transaction commit hit an assertion failure at
> btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
> the ->block_rsv is NULL. The produced stack trace for that is like the
> following:
>
> [192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
> [192922.917553] ------------[ cut here ]------------
> [192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
> [192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
> [192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G W 5.1.4-btrfs-next-47 #1
> [192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
> [192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
> (...)
> [192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
> [192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
> [192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
> [192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
> [192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
> [192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
> [192922.923200] FS: 00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
> [192922.923579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
> [192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [192922.925105] Call Trace:
> [192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
> [192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
> [192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs]
> [192922.926731] do_fsync+0x38/0x60
> [192922.927138] __x64_sys_fdatasync+0x13/0x20
> [192922.927543] do_syscall_64+0x60/0x1c0
> [192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> (...)
> [192922.934077] ---[ end trace f00808b12068168f ]---
>
> 2) If evict_refill_and_join() decides to commit the transaction, it will
evict_refill_and_join only ever calls btrfs_join_transaction so it
cannot ever commit the transaction. Let's look into its sole caller,
btrfs_evict_inode, I see 3 cases where transactions handle is involved
in that function:
1. btrfs_commit_inode_delayed_inode - iot calls trans_join +
end_transaction => use_count = 2 -> no commit
2. btrfs_truncate_inode_items -> never calls end/commit transaction
3. btrfs_evict_inode itself only ever calls btrfs_end_transaction on a
handle that's obtained via btrfs_join_transaction, meaning that trans
handle's ->use_count will be 2. This in turn will hit:
if (refcount_read(&trans->use_count) > 1) {
refcount_dec(&trans->use_count);
trans->block_rsv = trans->orig_rsv;
return 0;
}
in __btrfs_end_transaction. That code is in direct contradiction with
what you describe next? Am I missing something here? If my analysis is
correct this implies case 1) above also cannot happen because
trans->block_rsv is set to NULL in __btrfs_end_transaction only if
use_count == 1.
> be able to do it, since the nested transaction join only increments the
> transaction handle's ->use_count reference counter and it does not
> prevent the transaction from getting committed. This means that after
> eviction completes, the fsync logging path will be using a transaction
> handle that refers to an already committed transaction. What happens
> when using such a stale transaction can be unpredictable, we are at
> least having a use-after-free on the transaction handle itself, since
> the transaction commit will call kmem_cache_free() against the handle
> regardless of its ->use_count value, or we can end up silently losing
> all the updates to the log tree after that iput() in the logging path,
> or using a transaction handle that in the meanwhile was allocated to
> another task for a new transaction, etc, pretty much unpredictable
> what can happen.
>
> In order to fix both of them, instead of using iput() during logging, use
> btrfs_add_delayed_iput(), so that the logging path of fsync never drops
> the last reference on an inode, that step is offloaded to a safe context
> (usually the cleaner kthread).
>
> The assertion failure issue was sporadically triggered by the test case
> generic/475 from fstests, which loads the dm error target while fsstress
> is running, which lead to fsync failing while logging inodes with -EIO
> errors and then trying later to commit the transaction, triggering the
> assertion failure.
>
> CC: stable@vger.kernel.org # 4.4+
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> fs/btrfs/tree-log.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 6c8297bcfeb7..1bfd7e34f31e 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -4985,7 +4985,7 @@ static int log_conflicting_inodes(struct btrfs_trans_handle *trans,
> BTRFS_I(inode),
> LOG_OTHER_INODE_ALL,
> 0, LLONG_MAX, ctx);
> - iput(inode);
> + btrfs_add_delayed_iput(inode);
> }
> }
> continue;
> @@ -5000,7 +5000,7 @@ static int log_conflicting_inodes(struct btrfs_trans_handle *trans,
> ret = btrfs_log_inode(trans, root, BTRFS_I(inode),
> LOG_OTHER_INODE, 0, LLONG_MAX, ctx);
> if (ret) {
> - iput(inode);
> + btrfs_add_delayed_iput(inode);
> continue;
> }
>
> @@ -5009,7 +5009,7 @@ static int log_conflicting_inodes(struct btrfs_trans_handle *trans,
> key.offset = 0;
> ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
> if (ret < 0) {
> - iput(inode);
> + btrfs_add_delayed_iput(inode);
> continue;
> }
>
> @@ -5056,7 +5056,7 @@ static int log_conflicting_inodes(struct btrfs_trans_handle *trans,
> }
> path->slots[0]++;
> }
> - iput(inode);
> + btrfs_add_delayed_iput(inode);
> }
>
> return ret;
> @@ -5689,7 +5689,7 @@ static int log_new_dir_dentries(struct btrfs_trans_handle *trans,
> }
>
> if (btrfs_inode_in_log(BTRFS_I(di_inode), trans->transid)) {
> - iput(di_inode);
> + btrfs_add_delayed_iput(di_inode);
> break;
> }
>
> @@ -5701,7 +5701,7 @@ static int log_new_dir_dentries(struct btrfs_trans_handle *trans,
> if (!ret &&
> btrfs_must_commit_transaction(trans, BTRFS_I(di_inode)))
> ret = 1;
> - iput(di_inode);
> + btrfs_add_delayed_iput(di_inode);
> if (ret)
> goto next_dir_inode;
> if (ctx->log_new_dentries) {
> @@ -5848,7 +5848,7 @@ static int btrfs_log_all_parents(struct btrfs_trans_handle *trans,
> if (!ret && ctx && ctx->log_new_dentries)
> ret = log_new_dir_dentries(trans, root,
> BTRFS_I(dir_inode), ctx);
> - iput(dir_inode);
> + btrfs_add_delayed_iput(dir_inode);
> if (ret)
> goto out;
> }
> @@ -5891,7 +5891,7 @@ static int log_new_ancestors(struct btrfs_trans_handle *trans,
> ret = btrfs_log_inode(trans, root, BTRFS_I(inode),
> LOG_INODE_EXISTS,
> 0, LLONG_MAX, ctx);
> - iput(inode);
> + btrfs_add_delayed_iput(inode);
> if (ret)
> return ret;
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
2019-09-12 11:10 ` Nikolay Borisov
@ 2019-09-12 11:24 ` Filipe Manana
2019-09-12 11:35 ` Nikolay Borisov
0 siblings, 1 reply; 14+ messages in thread
From: Filipe Manana @ 2019-09-12 11:24 UTC (permalink / raw)
To: Nikolay Borisov; +Cc: linux-btrfs
On Thu, Sep 12, 2019 at 12:10 PM Nikolay Borisov <nborisov@suse.com> wrote:
>
>
>
> On 10.09.19 г. 17:26 ч., fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > Sometimes when fsync'ing a file we need to log that other inodes exist and
> > when we need to do that we acquire a reference on the inodes and then drop
> > that reference using iput() after logging them.
> >
> > That generally is not a problem except if we end up doing the final iput()
> > (dropping the last reference) on the inode and that inode has a link count
> > of 0, which can happen in a very short time window if the logging path
> > gets a reference on the inode while it's being unlinked.
> >
> > In that case we end up getting the eviction callback, btrfs_evict_inode(),
> > invoked through the iput() call chain which needs to drop all of the
> > inode's items from its subvolume btree, and in order to do that, it needs
> > to join a transaction at the helper function evict_refill_and_join().
> > However because the task previously started a transaction at the fsync
> > handler, btrfs_sync_file(), it has current->journal_info already pointing
> > to a transaction handle and therefore evict_refill_and_join() will get
> > that transaction handle from btrfs_join_transaction(). From this point on,
> > two different problems can happen:
> >
> > 1) evict_refill_and_join() will often change the transaction handle's
> > block reserve (->block_rsv) and set its ->bytes_reserved field to a
> > value greater than 0. If evict_refill_and_join() never commits the
> > transaction, the eviction handler ends up decreasing the reference
> > count (->use_count) of the transaction handle through the call to
> > btrfs_end_transaction(), and after that point we have a transaction
> > handle with a NULL ->block_rsv (which is the value prior to the
> > transaction join from evict_refill_and_join()) and a ->bytes_reserved
> > value greater than 0. If after the eviction/iput completes the inode
> > logging path hits an error or it decides that it must fallback to a
> > transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
> > non-zero value from btrfs_log_dentry_safe(), and because of that
> > non-zero value it tries to commit the transaction using a handle with
> > a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
> > the transaction commit hit an assertion failure at
> > btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
> > the ->block_rsv is NULL. The produced stack trace for that is like the
> > following:
> >
> > [192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
> > [192922.917553] ------------[ cut here ]------------
> > [192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
> > [192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
> > [192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G W 5.1.4-btrfs-next-47 #1
> > [192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
> > [192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
> > (...)
> > [192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
> > [192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
> > [192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
> > [192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
> > [192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
> > [192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
> > [192922.923200] FS: 00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
> > [192922.923579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
> > [192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [192922.925105] Call Trace:
> > [192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
> > [192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
> > [192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs]
> > [192922.926731] do_fsync+0x38/0x60
> > [192922.927138] __x64_sys_fdatasync+0x13/0x20
> > [192922.927543] do_syscall_64+0x60/0x1c0
> > [192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > (...)
> > [192922.934077] ---[ end trace f00808b12068168f ]---
> >
> > 2) If evict_refill_and_join() decides to commit the transaction, it will
> evict_refill_and_join only ever calls btrfs_join_transaction so it
> cannot ever commit the transaction.
It can:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/inode.c?h=v5.3-rc8#n5399
> Let's look into its sole caller,
> btrfs_evict_inode, I see 3 cases where transactions handle is involved
> in that function:
>
> 1. btrfs_commit_inode_delayed_inode - iot calls trans_join +
> end_transaction => use_count = 2 -> no commit
>
> 2. btrfs_truncate_inode_items -> never calls end/commit transaction
>
> 3. btrfs_evict_inode itself only ever calls btrfs_end_transaction on a
> handle that's obtained via btrfs_join_transaction, meaning that trans
> handle's ->use_count will be 2. This in turn will hit:
>
> if (refcount_read(&trans->use_count) > 1) {
> refcount_dec(&trans->use_count);
>
> trans->block_rsv = trans->orig_rsv;
>
> return 0;
>
> }
>
> in __btrfs_end_transaction. That code is in direct contradiction with
> what you describe next?
No, it isn't.
I explicitly mention it in point 1):
"... the eviction handler ends up decreasing the reference
count (->use_count) of the transaction handle through the call to
btrfs_end_transaction(), and after that point we have a transaction
handle with a NULL ->block_rsv (which is the value prior to the
transaction join from evict_refill_and_join()) and a ->bytes_reserved
value greater than 0."
> Am I missing something here? If my analysis is
> correct this implies case 1) above also cannot happen because
> trans->block_rsv is set to NULL in __btrfs_end_transaction only if
> use_count == 1.
Again, go read the change log and the code more carefully.
Thanks.
>
> > be able to do it, since the nested transaction join only increments the
> > transaction handle's ->use_count reference counter and it does not
> > prevent the transaction from getting committed. This means that after
> > eviction completes, the fsync logging path will be using a transaction
> > handle that refers to an already committed transaction. What happens
> > when using such a stale transaction can be unpredictable, we are at
> > least having a use-after-free on the transaction handle itself, since
> > the transaction commit will call kmem_cache_free() against the handle
> > regardless of its ->use_count value, or we can end up silently losing
> > all the updates to the log tree after that iput() in the logging path,
> > or using a transaction handle that in the meanwhile was allocated to
> > another task for a new transaction, etc, pretty much unpredictable
> > what can happen.
> >
> > In order to fix both of them, instead of using iput() during logging, use
> > btrfs_add_delayed_iput(), so that the logging path of fsync never drops
> > the last reference on an inode, that step is offloaded to a safe context
> > (usually the cleaner kthread).
> >
> > The assertion failure issue was sporadically triggered by the test case
> > generic/475 from fstests, which loads the dm error target while fsstress
> > is running, which lead to fsync failing while logging inodes with -EIO
> > errors and then trying later to commit the transaction, triggering the
> > assertion failure.
> >
> > CC: stable@vger.kernel.org # 4.4+
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> > fs/btrfs/tree-log.c | 16 ++++++++--------
> > 1 file changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> > index 6c8297bcfeb7..1bfd7e34f31e 100644
> > --- a/fs/btrfs/tree-log.c
> > +++ b/fs/btrfs/tree-log.c
> > @@ -4985,7 +4985,7 @@ static int log_conflicting_inodes(struct btrfs_trans_handle *trans,
> > BTRFS_I(inode),
> > LOG_OTHER_INODE_ALL,
> > 0, LLONG_MAX, ctx);
> > - iput(inode);
> > + btrfs_add_delayed_iput(inode);
> > }
> > }
> > continue;
> > @@ -5000,7 +5000,7 @@ static int log_conflicting_inodes(struct btrfs_trans_handle *trans,
> > ret = btrfs_log_inode(trans, root, BTRFS_I(inode),
> > LOG_OTHER_INODE, 0, LLONG_MAX, ctx);
> > if (ret) {
> > - iput(inode);
> > + btrfs_add_delayed_iput(inode);
> > continue;
> > }
> >
> > @@ -5009,7 +5009,7 @@ static int log_conflicting_inodes(struct btrfs_trans_handle *trans,
> > key.offset = 0;
> > ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
> > if (ret < 0) {
> > - iput(inode);
> > + btrfs_add_delayed_iput(inode);
> > continue;
> > }
> >
> > @@ -5056,7 +5056,7 @@ static int log_conflicting_inodes(struct btrfs_trans_handle *trans,
> > }
> > path->slots[0]++;
> > }
> > - iput(inode);
> > + btrfs_add_delayed_iput(inode);
> > }
> >
> > return ret;
> > @@ -5689,7 +5689,7 @@ static int log_new_dir_dentries(struct btrfs_trans_handle *trans,
> > }
> >
> > if (btrfs_inode_in_log(BTRFS_I(di_inode), trans->transid)) {
> > - iput(di_inode);
> > + btrfs_add_delayed_iput(di_inode);
> > break;
> > }
> >
> > @@ -5701,7 +5701,7 @@ static int log_new_dir_dentries(struct btrfs_trans_handle *trans,
> > if (!ret &&
> > btrfs_must_commit_transaction(trans, BTRFS_I(di_inode)))
> > ret = 1;
> > - iput(di_inode);
> > + btrfs_add_delayed_iput(di_inode);
> > if (ret)
> > goto next_dir_inode;
> > if (ctx->log_new_dentries) {
> > @@ -5848,7 +5848,7 @@ static int btrfs_log_all_parents(struct btrfs_trans_handle *trans,
> > if (!ret && ctx && ctx->log_new_dentries)
> > ret = log_new_dir_dentries(trans, root,
> > BTRFS_I(dir_inode), ctx);
> > - iput(dir_inode);
> > + btrfs_add_delayed_iput(dir_inode);
> > if (ret)
> > goto out;
> > }
> > @@ -5891,7 +5891,7 @@ static int log_new_ancestors(struct btrfs_trans_handle *trans,
> > ret = btrfs_log_inode(trans, root, BTRFS_I(inode),
> > LOG_INODE_EXISTS,
> > 0, LLONG_MAX, ctx);
> > - iput(inode);
> > + btrfs_add_delayed_iput(inode);
> > if (ret)
> > return ret;
> >
> >
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
2019-09-12 11:24 ` Filipe Manana
@ 2019-09-12 11:35 ` Nikolay Borisov
0 siblings, 0 replies; 14+ messages in thread
From: Nikolay Borisov @ 2019-09-12 11:35 UTC (permalink / raw)
To: Filipe Manana; +Cc: linux-btrfs
On 12.09.19 г. 14:24 ч., Filipe Manana wrote:
> On Thu, Sep 12, 2019 at 12:10 PM Nikolay Borisov <nborisov@suse.com> wrote:
>>
>>
>>
>> On 10.09.19 г. 17:26 ч., fdmanana@kernel.org wrote:
>>> From: Filipe Manana <fdmanana@suse.com>
>>>
>>> Sometimes when fsync'ing a file we need to log that other inodes exist and
>>> when we need to do that we acquire a reference on the inodes and then drop
>>> that reference using iput() after logging them.
>>>
>>> That generally is not a problem except if we end up doing the final iput()
>>> (dropping the last reference) on the inode and that inode has a link count
>>> of 0, which can happen in a very short time window if the logging path
>>> gets a reference on the inode while it's being unlinked.
>>>
>>> In that case we end up getting the eviction callback, btrfs_evict_inode(),
>>> invoked through the iput() call chain which needs to drop all of the
>>> inode's items from its subvolume btree, and in order to do that, it needs
>>> to join a transaction at the helper function evict_refill_and_join().
>>> However because the task previously started a transaction at the fsync
>>> handler, btrfs_sync_file(), it has current->journal_info already pointing
>>> to a transaction handle and therefore evict_refill_and_join() will get
>>> that transaction handle from btrfs_join_transaction(). From this point on,
>>> two different problems can happen:
>>>
>>> 1) evict_refill_and_join() will often change the transaction handle's
>>> block reserve (->block_rsv) and set its ->bytes_reserved field to a
>>> value greater than 0. If evict_refill_and_join() never commits the
>>> transaction, the eviction handler ends up decreasing the reference
>>> count (->use_count) of the transaction handle through the call to
>>> btrfs_end_transaction(), and after that point we have a transaction
>>> handle with a NULL ->block_rsv (which is the value prior to the
>>> transaction join from evict_refill_and_join()) and a ->bytes_reserved
>>> value greater than 0. If after the eviction/iput completes the inode
>>> logging path hits an error or it decides that it must fallback to a
>>> transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
>>> non-zero value from btrfs_log_dentry_safe(), and because of that
>>> non-zero value it tries to commit the transaction using a handle with
>>> a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
>>> the transaction commit hit an assertion failure at
>>> btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
>>> the ->block_rsv is NULL. The produced stack trace for that is like the
>>> following:
>>>
>>> [192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
>>> [192922.917553] ------------[ cut here ]------------
>>> [192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
>>> [192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
>>> [192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G W 5.1.4-btrfs-next-47 #1
>>> [192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
>>> [192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
>>> (...)
>>> [192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
>>> [192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
>>> [192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
>>> [192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
>>> [192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
>>> [192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
>>> [192922.923200] FS: 00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
>>> [192922.923579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
>>> [192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> [192922.925105] Call Trace:
>>> [192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
>>> [192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
>>> [192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs]
>>> [192922.926731] do_fsync+0x38/0x60
>>> [192922.927138] __x64_sys_fdatasync+0x13/0x20
>>> [192922.927543] do_syscall_64+0x60/0x1c0
>>> [192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>> (...)
>>> [192922.934077] ---[ end trace f00808b12068168f ]---
>>>
>>> 2) If evict_refill_and_join() decides to commit the transaction, it will
>> evict_refill_and_join only ever calls btrfs_join_transaction so it
>> cannot ever commit the transaction.
>
> It can:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/inode.c?h=v5.3-rc8#n5399
Right, in 5.3 branch yes but in misc-next evict_and_join looks
completely different, that's where the confusion stems from. Thanks
<snip>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
2019-09-10 14:26 [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction fdmanana
` (2 preceding siblings ...)
2019-09-12 11:10 ` Nikolay Borisov
@ 2019-09-12 11:58 ` Josef Bacik
2019-09-12 13:17 ` Filipe Manana
2019-09-12 12:18 ` Josef Bacik
4 siblings, 1 reply; 14+ messages in thread
From: Josef Bacik @ 2019-09-12 11:58 UTC (permalink / raw)
To: fdmanana; +Cc: linux-btrfs
On Tue, Sep 10, 2019 at 03:26:49PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> Sometimes when fsync'ing a file we need to log that other inodes exist and
> when we need to do that we acquire a reference on the inodes and then drop
> that reference using iput() after logging them.
>
> That generally is not a problem except if we end up doing the final iput()
> (dropping the last reference) on the inode and that inode has a link count
> of 0, which can happen in a very short time window if the logging path
> gets a reference on the inode while it's being unlinked.
>
> In that case we end up getting the eviction callback, btrfs_evict_inode(),
> invoked through the iput() call chain which needs to drop all of the
> inode's items from its subvolume btree, and in order to do that, it needs
> to join a transaction at the helper function evict_refill_and_join().
> However because the task previously started a transaction at the fsync
> handler, btrfs_sync_file(), it has current->journal_info already pointing
> to a transaction handle and therefore evict_refill_and_join() will get
> that transaction handle from btrfs_join_transaction(). From this point on,
> two different problems can happen:
>
> 1) evict_refill_and_join() will often change the transaction handle's
> block reserve (->block_rsv) and set its ->bytes_reserved field to a
> value greater than 0. If evict_refill_and_join() never commits the
> transaction, the eviction handler ends up decreasing the reference
> count (->use_count) of the transaction handle through the call to
> btrfs_end_transaction(), and after that point we have a transaction
> handle with a NULL ->block_rsv (which is the value prior to the
> transaction join from evict_refill_and_join()) and a ->bytes_reserved
> value greater than 0. If after the eviction/iput completes the inode
> logging path hits an error or it decides that it must fallback to a
> transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
> non-zero value from btrfs_log_dentry_safe(), and because of that
> non-zero value it tries to commit the transaction using a handle with
> a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
> the transaction commit hit an assertion failure at
> btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
> the ->block_rsv is NULL. The produced stack trace for that is like the
> following:
>
> [192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
> [192922.917553] ------------[ cut here ]------------
> [192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
> [192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
> [192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G W 5.1.4-btrfs-next-47 #1
> [192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
> [192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
> (...)
> [192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
> [192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
> [192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
> [192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
> [192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
> [192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
> [192922.923200] FS: 00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
> [192922.923579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
> [192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [192922.925105] Call Trace:
> [192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
> [192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
> [192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs]
> [192922.926731] do_fsync+0x38/0x60
> [192922.927138] __x64_sys_fdatasync+0x13/0x20
> [192922.927543] do_syscall_64+0x60/0x1c0
> [192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> (...)
> [192922.934077] ---[ end trace f00808b12068168f ]---
>
> 2) If evict_refill_and_join() decides to commit the transaction, it will
> be able to do it, since the nested transaction join only increments the
> transaction handle's ->use_count reference counter and it does not
> prevent the transaction from getting committed. This means that after
This brings up a good point, we should probably not allow the commit in this
case, or add an ASSERT(use_count == 1) or something, cause this would be bad.
Thanks,
Josef
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
2019-09-10 14:26 [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction fdmanana
` (3 preceding siblings ...)
2019-09-12 11:58 ` Josef Bacik
@ 2019-09-12 12:18 ` Josef Bacik
2019-09-12 13:19 ` Filipe Manana
4 siblings, 1 reply; 14+ messages in thread
From: Josef Bacik @ 2019-09-12 12:18 UTC (permalink / raw)
To: fdmanana; +Cc: linux-btrfs
On Tue, Sep 10, 2019 at 03:26:49PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> Sometimes when fsync'ing a file we need to log that other inodes exist and
> when we need to do that we acquire a reference on the inodes and then drop
> that reference using iput() after logging them.
>
> That generally is not a problem except if we end up doing the final iput()
> (dropping the last reference) on the inode and that inode has a link count
> of 0, which can happen in a very short time window if the logging path
> gets a reference on the inode while it's being unlinked.
>
> In that case we end up getting the eviction callback, btrfs_evict_inode(),
> invoked through the iput() call chain which needs to drop all of the
> inode's items from its subvolume btree, and in order to do that, it needs
> to join a transaction at the helper function evict_refill_and_join().
> However because the task previously started a transaction at the fsync
> handler, btrfs_sync_file(), it has current->journal_info already pointing
> to a transaction handle and therefore evict_refill_and_join() will get
> that transaction handle from btrfs_join_transaction(). From this point on,
> two different problems can happen:
>
> 1) evict_refill_and_join() will often change the transaction handle's
> block reserve (->block_rsv) and set its ->bytes_reserved field to a
> value greater than 0. If evict_refill_and_join() never commits the
> transaction, the eviction handler ends up decreasing the reference
> count (->use_count) of the transaction handle through the call to
> btrfs_end_transaction(), and after that point we have a transaction
> handle with a NULL ->block_rsv (which is the value prior to the
> transaction join from evict_refill_and_join()) and a ->bytes_reserved
> value greater than 0. If after the eviction/iput completes the inode
> logging path hits an error or it decides that it must fallback to a
> transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
> non-zero value from btrfs_log_dentry_safe(), and because of that
> non-zero value it tries to commit the transaction using a handle with
> a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
> the transaction commit hit an assertion failure at
> btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
> the ->block_rsv is NULL. The produced stack trace for that is like the
> following:
>
> [192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
> [192922.917553] ------------[ cut here ]------------
> [192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
> [192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
> [192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G W 5.1.4-btrfs-next-47 #1
> [192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
> [192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
> (...)
> [192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
> [192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
> [192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
> [192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
> [192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
> [192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
> [192922.923200] FS: 00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
> [192922.923579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
> [192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [192922.925105] Call Trace:
> [192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
> [192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
> [192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs]
> [192922.926731] do_fsync+0x38/0x60
> [192922.927138] __x64_sys_fdatasync+0x13/0x20
> [192922.927543] do_syscall_64+0x60/0x1c0
> [192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> (...)
> [192922.934077] ---[ end trace f00808b12068168f ]---
>
> 2) If evict_refill_and_join() decides to commit the transaction, it will
> be able to do it, since the nested transaction join only increments the
> transaction handle's ->use_count reference counter and it does not
> prevent the transaction from getting committed. This means that after
> eviction completes, the fsync logging path will be using a transaction
> handle that refers to an already committed transaction. What happens
> when using such a stale transaction can be unpredictable, we are at
> least having a use-after-free on the transaction handle itself, since
> the transaction commit will call kmem_cache_free() against the handle
> regardless of its ->use_count value, or we can end up silently losing
> all the updates to the log tree after that iput() in the logging path,
> or using a transaction handle that in the meanwhile was allocated to
> another task for a new transaction, etc, pretty much unpredictable
> what can happen.
>
And talking it over with Nikolay I realized that since we're doing the commit
through the flushing this doesn't actually happen anymore.
may_commit_transaction() returns -EGAIN if we already have a trans handle open,
so hooray I made it safer by accident! But we definitely should follow up and
add an assert in btrfs_commit_transaction() to catch this case, cause holy shit
that's bad. Thanks,
Josef
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
2019-09-12 11:58 ` Josef Bacik
@ 2019-09-12 13:17 ` Filipe Manana
0 siblings, 0 replies; 14+ messages in thread
From: Filipe Manana @ 2019-09-12 13:17 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs
On Thu, Sep 12, 2019 at 12:59 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On Tue, Sep 10, 2019 at 03:26:49PM +0100, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > Sometimes when fsync'ing a file we need to log that other inodes exist and
> > when we need to do that we acquire a reference on the inodes and then drop
> > that reference using iput() after logging them.
> >
> > That generally is not a problem except if we end up doing the final iput()
> > (dropping the last reference) on the inode and that inode has a link count
> > of 0, which can happen in a very short time window if the logging path
> > gets a reference on the inode while it's being unlinked.
> >
> > In that case we end up getting the eviction callback, btrfs_evict_inode(),
> > invoked through the iput() call chain which needs to drop all of the
> > inode's items from its subvolume btree, and in order to do that, it needs
> > to join a transaction at the helper function evict_refill_and_join().
> > However because the task previously started a transaction at the fsync
> > handler, btrfs_sync_file(), it has current->journal_info already pointing
> > to a transaction handle and therefore evict_refill_and_join() will get
> > that transaction handle from btrfs_join_transaction(). From this point on,
> > two different problems can happen:
> >
> > 1) evict_refill_and_join() will often change the transaction handle's
> > block reserve (->block_rsv) and set its ->bytes_reserved field to a
> > value greater than 0. If evict_refill_and_join() never commits the
> > transaction, the eviction handler ends up decreasing the reference
> > count (->use_count) of the transaction handle through the call to
> > btrfs_end_transaction(), and after that point we have a transaction
> > handle with a NULL ->block_rsv (which is the value prior to the
> > transaction join from evict_refill_and_join()) and a ->bytes_reserved
> > value greater than 0. If after the eviction/iput completes the inode
> > logging path hits an error or it decides that it must fallback to a
> > transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
> > non-zero value from btrfs_log_dentry_safe(), and because of that
> > non-zero value it tries to commit the transaction using a handle with
> > a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
> > the transaction commit hit an assertion failure at
> > btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
> > the ->block_rsv is NULL. The produced stack trace for that is like the
> > following:
> >
> > [192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
> > [192922.917553] ------------[ cut here ]------------
> > [192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
> > [192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
> > [192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G W 5.1.4-btrfs-next-47 #1
> > [192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
> > [192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
> > (...)
> > [192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
> > [192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
> > [192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
> > [192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
> > [192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
> > [192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
> > [192922.923200] FS: 00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
> > [192922.923579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
> > [192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [192922.925105] Call Trace:
> > [192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
> > [192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
> > [192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs]
> > [192922.926731] do_fsync+0x38/0x60
> > [192922.927138] __x64_sys_fdatasync+0x13/0x20
> > [192922.927543] do_syscall_64+0x60/0x1c0
> > [192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > (...)
> > [192922.934077] ---[ end trace f00808b12068168f ]---
> >
> > 2) If evict_refill_and_join() decides to commit the transaction, it will
> > be able to do it, since the nested transaction join only increments the
> > transaction handle's ->use_count reference counter and it does not
> > prevent the transaction from getting committed. This means that after
>
> This brings up a good point, we should probably not allow the commit in this
> case, or add an ASSERT(use_count == 1) or something, cause this would be bad.
> Thanks,
Yes, that should be separate change however.
I had warn_on(trans->use_count > 1) in the commit locally during
testing (didn't trigger in 4 days of fstests).
Thanks.
>
> Josef
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
2019-09-12 12:18 ` Josef Bacik
@ 2019-09-12 13:19 ` Filipe Manana
2019-09-12 15:14 ` Josef Bacik
0 siblings, 1 reply; 14+ messages in thread
From: Filipe Manana @ 2019-09-12 13:19 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs
On Thu, Sep 12, 2019 at 1:18 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On Tue, Sep 10, 2019 at 03:26:49PM +0100, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > Sometimes when fsync'ing a file we need to log that other inodes exist and
> > when we need to do that we acquire a reference on the inodes and then drop
> > that reference using iput() after logging them.
> >
> > That generally is not a problem except if we end up doing the final iput()
> > (dropping the last reference) on the inode and that inode has a link count
> > of 0, which can happen in a very short time window if the logging path
> > gets a reference on the inode while it's being unlinked.
> >
> > In that case we end up getting the eviction callback, btrfs_evict_inode(),
> > invoked through the iput() call chain which needs to drop all of the
> > inode's items from its subvolume btree, and in order to do that, it needs
> > to join a transaction at the helper function evict_refill_and_join().
> > However because the task previously started a transaction at the fsync
> > handler, btrfs_sync_file(), it has current->journal_info already pointing
> > to a transaction handle and therefore evict_refill_and_join() will get
> > that transaction handle from btrfs_join_transaction(). From this point on,
> > two different problems can happen:
> >
> > 1) evict_refill_and_join() will often change the transaction handle's
> > block reserve (->block_rsv) and set its ->bytes_reserved field to a
> > value greater than 0. If evict_refill_and_join() never commits the
> > transaction, the eviction handler ends up decreasing the reference
> > count (->use_count) of the transaction handle through the call to
> > btrfs_end_transaction(), and after that point we have a transaction
> > handle with a NULL ->block_rsv (which is the value prior to the
> > transaction join from evict_refill_and_join()) and a ->bytes_reserved
> > value greater than 0. If after the eviction/iput completes the inode
> > logging path hits an error or it decides that it must fallback to a
> > transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
> > non-zero value from btrfs_log_dentry_safe(), and because of that
> > non-zero value it tries to commit the transaction using a handle with
> > a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
> > the transaction commit hit an assertion failure at
> > btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
> > the ->block_rsv is NULL. The produced stack trace for that is like the
> > following:
> >
> > [192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
> > [192922.917553] ------------[ cut here ]------------
> > [192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
> > [192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
> > [192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G W 5.1.4-btrfs-next-47 #1
> > [192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
> > [192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
> > (...)
> > [192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
> > [192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
> > [192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
> > [192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
> > [192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
> > [192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
> > [192922.923200] FS: 00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
> > [192922.923579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
> > [192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [192922.925105] Call Trace:
> > [192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
> > [192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
> > [192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs]
> > [192922.926731] do_fsync+0x38/0x60
> > [192922.927138] __x64_sys_fdatasync+0x13/0x20
> > [192922.927543] do_syscall_64+0x60/0x1c0
> > [192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > (...)
> > [192922.934077] ---[ end trace f00808b12068168f ]---
> >
> > 2) If evict_refill_and_join() decides to commit the transaction, it will
> > be able to do it, since the nested transaction join only increments the
> > transaction handle's ->use_count reference counter and it does not
> > prevent the transaction from getting committed. This means that after
> > eviction completes, the fsync logging path will be using a transaction
> > handle that refers to an already committed transaction. What happens
> > when using such a stale transaction can be unpredictable, we are at
> > least having a use-after-free on the transaction handle itself, since
> > the transaction commit will call kmem_cache_free() against the handle
> > regardless of its ->use_count value, or we can end up silently losing
> > all the updates to the log tree after that iput() in the logging path,
> > or using a transaction handle that in the meanwhile was allocated to
> > another task for a new transaction, etc, pretty much unpredictable
> > what can happen.
> >
>
> And talking it over with Nikolay I realized that since we're doing the commit
> through the flushing this doesn't actually happen anymore.
> may_commit_transaction() returns -EGAIN if we already have a trans handle open,
> so hooray I made it safer by accident! But we definitely should follow up and
> add an assert in btrfs_commit_transaction() to catch this case, cause holy shit
> that's bad. Thanks,
Yep, but problem 1) (assertion failure) is still valid after your
recent changes in misc-next.
Either way, this fix has to be added to stable releases to fix both
problems there.
Thanks.
>
> Josef
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
2019-09-12 13:19 ` Filipe Manana
@ 2019-09-12 15:14 ` Josef Bacik
0 siblings, 0 replies; 14+ messages in thread
From: Josef Bacik @ 2019-09-12 15:14 UTC (permalink / raw)
To: Filipe Manana; +Cc: Josef Bacik, linux-btrfs
On Thu, Sep 12, 2019 at 02:19:55PM +0100, Filipe Manana wrote:
> On Thu, Sep 12, 2019 at 1:18 PM Josef Bacik <josef@toxicpanda.com> wrote:
> >
> > On Tue, Sep 10, 2019 at 03:26:49PM +0100, fdmanana@kernel.org wrote:
> > > From: Filipe Manana <fdmanana@suse.com>
> > >
> > > Sometimes when fsync'ing a file we need to log that other inodes exist and
> > > when we need to do that we acquire a reference on the inodes and then drop
> > > that reference using iput() after logging them.
> > >
> > > That generally is not a problem except if we end up doing the final iput()
> > > (dropping the last reference) on the inode and that inode has a link count
> > > of 0, which can happen in a very short time window if the logging path
> > > gets a reference on the inode while it's being unlinked.
> > >
> > > In that case we end up getting the eviction callback, btrfs_evict_inode(),
> > > invoked through the iput() call chain which needs to drop all of the
> > > inode's items from its subvolume btree, and in order to do that, it needs
> > > to join a transaction at the helper function evict_refill_and_join().
> > > However because the task previously started a transaction at the fsync
> > > handler, btrfs_sync_file(), it has current->journal_info already pointing
> > > to a transaction handle and therefore evict_refill_and_join() will get
> > > that transaction handle from btrfs_join_transaction(). From this point on,
> > > two different problems can happen:
> > >
> > > 1) evict_refill_and_join() will often change the transaction handle's
> > > block reserve (->block_rsv) and set its ->bytes_reserved field to a
> > > value greater than 0. If evict_refill_and_join() never commits the
> > > transaction, the eviction handler ends up decreasing the reference
> > > count (->use_count) of the transaction handle through the call to
> > > btrfs_end_transaction(), and after that point we have a transaction
> > > handle with a NULL ->block_rsv (which is the value prior to the
> > > transaction join from evict_refill_and_join()) and a ->bytes_reserved
> > > value greater than 0. If after the eviction/iput completes the inode
> > > logging path hits an error or it decides that it must fallback to a
> > > transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
> > > non-zero value from btrfs_log_dentry_safe(), and because of that
> > > non-zero value it tries to commit the transaction using a handle with
> > > a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
> > > the transaction commit hit an assertion failure at
> > > btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
> > > the ->block_rsv is NULL. The produced stack trace for that is like the
> > > following:
> > >
> > > [192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
> > > [192922.917553] ------------[ cut here ]------------
> > > [192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
> > > [192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
> > > [192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G W 5.1.4-btrfs-next-47 #1
> > > [192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
> > > [192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
> > > (...)
> > > [192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
> > > [192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
> > > [192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
> > > [192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
> > > [192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
> > > [192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
> > > [192922.923200] FS: 00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
> > > [192922.923579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
> > > [192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > [192922.925105] Call Trace:
> > > [192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
> > > [192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
> > > [192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs]
> > > [192922.926731] do_fsync+0x38/0x60
> > > [192922.927138] __x64_sys_fdatasync+0x13/0x20
> > > [192922.927543] do_syscall_64+0x60/0x1c0
> > > [192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > (...)
> > > [192922.934077] ---[ end trace f00808b12068168f ]---
> > >
> > > 2) If evict_refill_and_join() decides to commit the transaction, it will
> > > be able to do it, since the nested transaction join only increments the
> > > transaction handle's ->use_count reference counter and it does not
> > > prevent the transaction from getting committed. This means that after
> > > eviction completes, the fsync logging path will be using a transaction
> > > handle that refers to an already committed transaction. What happens
> > > when using such a stale transaction can be unpredictable, we are at
> > > least having a use-after-free on the transaction handle itself, since
> > > the transaction commit will call kmem_cache_free() against the handle
> > > regardless of its ->use_count value, or we can end up silently losing
> > > all the updates to the log tree after that iput() in the logging path,
> > > or using a transaction handle that in the meanwhile was allocated to
> > > another task for a new transaction, etc, pretty much unpredictable
> > > what can happen.
> > >
> >
> > And talking it over with Nikolay I realized that since we're doing the commit
> > through the flushing this doesn't actually happen anymore.
> > may_commit_transaction() returns -EGAIN if we already have a trans handle open,
> > so hooray I made it safer by accident! But we definitely should follow up and
> > add an assert in btrfs_commit_transaction() to catch this case, cause holy shit
> > that's bad. Thanks,
>
> Yep, but problem 1) (assertion failure) is still valid after your
> recent changes in misc-next.
> Either way, this fix has to be added to stable releases to fix both
> problems there.
>
Agreed, just adding context for the current changes. This patch is fine as it
is, just some follow up stuff to catch transaction commits under nested trans
handles. Thanks,
Josef
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
2019-09-12 7:30 ` Sasha Levin
2019-09-12 8:02 ` Filipe Manana
@ 2019-09-16 12:59 ` Filipe Manana
2019-09-16 15:13 ` Greg KH
1 sibling, 1 reply; 14+ messages in thread
From: Filipe Manana @ 2019-09-16 12:59 UTC (permalink / raw)
To: Sasha Levin; +Cc: Filipe Manana, linux-btrfs, stable
[-- Attachment #1: Type: text/plain, Size: 4148 bytes --]
On Thu, Sep 12, 2019 at 8:32 AM Sasha Levin <sashal@kernel.org> wrote:
>
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: 4.4+
>
> The bot has tested the following trees: v5.2.14, v4.19.72, v4.14.143, v4.9.192, v4.4.192.
>
> v5.2.14: Build OK!
> v4.19.72: Failed to apply! Possible dependencies:
> 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different files")
> a3baaf0d786e ("Btrfs: fix fsync after succession of renames and unlink/rmdir")
> b8aa330d2acb ("Btrfs: improve performance on fsync of files with multiple hardlinks")
>
> v4.14.143: Failed to apply! Possible dependencies:
> 0d836392cadd ("Btrfs: fix mount failure after fsync due to hard link recreation")
> 1f250e929a9c ("Btrfs: fix log replay failure after unlink and link combination")
> 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different files")
> 8d9e220ca084 ("btrfs: simplify IS_ERR/PTR_ERR checks")
> a3baaf0d786e ("Btrfs: fix fsync after succession of renames and unlink/rmdir")
> b8aa330d2acb ("Btrfs: improve performance on fsync of files with multiple hardlinks")
>
> v4.9.192: Failed to apply! Possible dependencies:
> 0b246afa62b0 ("btrfs: root->fs_info cleanup, add fs_info convenience variables")
> 0d836392cadd ("Btrfs: fix mount failure after fsync due to hard link recreation")
> 1f250e929a9c ("Btrfs: fix log replay failure after unlink and link combination")
> 4791c8f19c45 ("btrfs: Make btrfs_check_ref_name_override take btrfs_inode")
> 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different files")
> a3baaf0d786e ("Btrfs: fix fsync after succession of renames and unlink/rmdir")
> cf8cddd38bab ("btrfs: don't abuse REQ_OP_* flags for btrfs_map_block")
> da17066c4047 ("btrfs: pull node/sector/stripe sizes out of root and into fs_info")
> db0a669fb002 ("btrfs: Make btrfs_add_link take btrfs_inode")
> de143792253e ("btrfs: struct btrfsic_state->root should be an fs_info")
> fb456252d3d9 ("btrfs: root->fs_info cleanup, use fs_info->dev_root everywhere")
>
> v4.4.192: Failed to apply! Possible dependencies:
> 0132761017e0 ("btrfs: fix string and comment grammatical issues and typos")
> 09cbfeaf1a5a ("mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros")
> 0b246afa62b0 ("btrfs: root->fs_info cleanup, add fs_info convenience variables")
> 0d836392cadd ("Btrfs: fix mount failure after fsync due to hard link recreation")
> 0e749e54244e ("dax: increase granularity of dax_clear_blocks() operations")
> 1f250e929a9c ("Btrfs: fix log replay failure after unlink and link combination")
> 44f714dae50a ("Btrfs: improve performance on fsync against new inode after rename/unlink")
> 4791c8f19c45 ("btrfs: Make btrfs_check_ref_name_override take btrfs_inode")
> 52db400fcd50 ("pmem, dax: clean up clear_pmem()")
> 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different files")
> 781feef7e6be ("Btrfs: fix lockdep warning about log_mutex")
> a3baaf0d786e ("Btrfs: fix fsync after succession of renames and unlink/rmdir")
> b2e0d1625e19 ("dax: fix lifetime of in-kernel dax mappings with dax_map_atomic()")
> bb7ab3b92e46 ("btrfs: Fix misspellings in comments.")
> cf8cddd38bab ("btrfs: don't abuse REQ_OP_* flags for btrfs_map_block")
> d1a5f2b4d8a1 ("block: use DAX for partition table reads")
> db0a669fb002 ("btrfs: Make btrfs_add_link take btrfs_inode")
> de143792253e ("btrfs: struct btrfsic_state->root should be an fs_info")
>
>
> NOTE: The patch will not be queued to stable trees until it is upstream.
>
> How should we proceed with this patch?
So here follows, as attachments, patches for the following stable
versions: 4.4.193, 4.9.193, 4.14.144 and 4.19.73.
Thanks.
>
> --
> Thanks,
> Sasha
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
[-- Attachment #2: 4.4.193-Btrfs-fix-assertion-failure-during-fsync-and-use-of-.patch --]
[-- Type: text/x-patch, Size: 7226 bytes --]
From 842266d7437f7714a67ddb34680546f0268f27c9 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana@suse.com>
Date: Tue, 10 Sep 2019 15:26:49 +0100
Subject: [PATCH] Btrfs: fix assertion failure during fsync and use of stale
transaction
Sometimes when fsync'ing a file we need to log that other inodes exist and
when we need to do that we acquire a reference on the inodes and then drop
that reference using iput() after logging them.
That generally is not a problem except if we end up doing the final iput()
(dropping the last reference) on the inode and that inode has a link count
of 0, which can happen in a very short time window if the logging path
gets a reference on the inode while it's being unlinked.
In that case we end up getting the eviction callback, btrfs_evict_inode(),
invoked through the iput() call chain which needs to drop all of the
inode's items from its subvolume btree, and in order to do that, it needs
to join a transaction at the helper function evict_refill_and_join().
However because the task previously started a transaction at the fsync
handler, btrfs_sync_file(), it has current->journal_info already pointing
to a transaction handle and therefore evict_refill_and_join() will get
that transaction handle from btrfs_join_transaction(). From this point on,
two different problems can happen:
1) evict_refill_and_join() will often change the transaction handle's
block reserve (->block_rsv) and set its ->bytes_reserved field to a
value greater than 0. If evict_refill_and_join() never commits the
transaction, the eviction handler ends up decreasing the reference
count (->use_count) of the transaction handle through the call to
btrfs_end_transaction(), and after that point we have a transaction
handle with a NULL ->block_rsv (which is the value prior to the
transaction join from evict_refill_and_join()) and a ->bytes_reserved
value greater than 0. If after the eviction/iput completes the inode
logging path hits an error or it decides that it must fallback to a
transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
non-zero value from btrfs_log_dentry_safe(), and because of that
non-zero value it tries to commit the transaction using a handle with
a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
the transaction commit hit an assertion failure at
btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
the ->block_rsv is NULL. The produced stack trace for that is like the
following:
[192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
[192922.917553] ------------[ cut here ]------------
[192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
[192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
[192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G W 5.1.4-btrfs-next-47 #1
[192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
(...)
[192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
[192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
[192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
[192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
[192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
[192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
[192922.923200] FS: 00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
[192922.923579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
[192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[192922.925105] Call Trace:
[192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
[192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
[192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs]
[192922.926731] do_fsync+0x38/0x60
[192922.927138] __x64_sys_fdatasync+0x13/0x20
[192922.927543] do_syscall_64+0x60/0x1c0
[192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe
(...)
[192922.934077] ---[ end trace f00808b12068168f ]---
2) If evict_refill_and_join() decides to commit the transaction, it will
be able to do it, since the nested transaction join only increments the
transaction handle's ->use_count reference counter and it does not
prevent the transaction from getting committed. This means that after
eviction completes, the fsync logging path will be using a transaction
handle that refers to an already committed transaction. What happens
when using such a stale transaction can be unpredictable, we are at
least having a use-after-free on the transaction handle itself, since
the transaction commit will call kmem_cache_free() against the handle
regardless of its ->use_count value, or we can end up silently losing
all the updates to the log tree after that iput() in the logging path,
or using a transaction handle that in the meanwhile was allocated to
another task for a new transaction, etc, pretty much unpredictable
what can happen.
In order to fix both of them, instead of using iput() during logging, use
btrfs_add_delayed_iput(), so that the logging path of fsync never drops
the last reference on an inode, that step is offloaded to a safe context
(usually the cleaner kthread).
The assertion failure issue was sporadically triggered by the test case
generic/475 from fstests, which loads the dm error target while fsstress
is running, which lead to fsync failing while logging inodes with -EIO
errors and then trying later to commit the transaction, triggering the
assertion failure.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
fs/btrfs/tree-log.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 57a46093656a..f9c3907bf159 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -5133,7 +5133,7 @@ process_leaf:
}
if (btrfs_inode_in_log(di_inode, trans->transid)) {
- iput(di_inode);
+ btrfs_add_delayed_iput(di_inode);
continue;
}
@@ -5143,7 +5143,7 @@ process_leaf:
btrfs_release_path(path);
ret = btrfs_log_inode(trans, root, di_inode,
log_mode, 0, LLONG_MAX, ctx);
- iput(di_inode);
+ btrfs_add_delayed_iput(di_inode);
if (ret)
goto next_dir_inode;
if (ctx->log_new_dentries) {
@@ -5281,7 +5281,7 @@ static int btrfs_log_all_parents(struct btrfs_trans_handle *trans,
ret = btrfs_log_inode(trans, root, dir_inode,
LOG_INODE_ALL, 0, LLONG_MAX, ctx);
- iput(dir_inode);
+ btrfs_add_delayed_iput(dir_inode);
if (ret)
goto out;
}
--
2.16.4
[-- Attachment #3: 4.9.193-Btrfs-fix-assertion-failure-during-fsync-and-use-of-.patch --]
[-- Type: text/x-patch, Size: 7626 bytes --]
From ba94b074ad2b346f59a43fb62e908e4f8bb943e6 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana@suse.com>
Date: Tue, 10 Sep 2019 15:26:49 +0100
Subject: [PATCH] Btrfs: fix assertion failure during fsync and use of stale
transaction
Sometimes when fsync'ing a file we need to log that other inodes exist and
when we need to do that we acquire a reference on the inodes and then drop
that reference using iput() after logging them.
That generally is not a problem except if we end up doing the final iput()
(dropping the last reference) on the inode and that inode has a link count
of 0, which can happen in a very short time window if the logging path
gets a reference on the inode while it's being unlinked.
In that case we end up getting the eviction callback, btrfs_evict_inode(),
invoked through the iput() call chain which needs to drop all of the
inode's items from its subvolume btree, and in order to do that, it needs
to join a transaction at the helper function evict_refill_and_join().
However because the task previously started a transaction at the fsync
handler, btrfs_sync_file(), it has current->journal_info already pointing
to a transaction handle and therefore evict_refill_and_join() will get
that transaction handle from btrfs_join_transaction(). From this point on,
two different problems can happen:
1) evict_refill_and_join() will often change the transaction handle's
block reserve (->block_rsv) and set its ->bytes_reserved field to a
value greater than 0. If evict_refill_and_join() never commits the
transaction, the eviction handler ends up decreasing the reference
count (->use_count) of the transaction handle through the call to
btrfs_end_transaction(), and after that point we have a transaction
handle with a NULL ->block_rsv (which is the value prior to the
transaction join from evict_refill_and_join()) and a ->bytes_reserved
value greater than 0. If after the eviction/iput completes the inode
logging path hits an error or it decides that it must fallback to a
transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
non-zero value from btrfs_log_dentry_safe(), and because of that
non-zero value it tries to commit the transaction using a handle with
a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
the transaction commit hit an assertion failure at
btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
the ->block_rsv is NULL. The produced stack trace for that is like the
following:
[192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
[192922.917553] ------------[ cut here ]------------
[192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
[192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
[192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G W 5.1.4-btrfs-next-47 #1
[192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
(...)
[192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
[192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
[192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
[192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
[192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
[192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
[192922.923200] FS: 00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
[192922.923579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
[192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[192922.925105] Call Trace:
[192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
[192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
[192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs]
[192922.926731] do_fsync+0x38/0x60
[192922.927138] __x64_sys_fdatasync+0x13/0x20
[192922.927543] do_syscall_64+0x60/0x1c0
[192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe
(...)
[192922.934077] ---[ end trace f00808b12068168f ]---
2) If evict_refill_and_join() decides to commit the transaction, it will
be able to do it, since the nested transaction join only increments the
transaction handle's ->use_count reference counter and it does not
prevent the transaction from getting committed. This means that after
eviction completes, the fsync logging path will be using a transaction
handle that refers to an already committed transaction. What happens
when using such a stale transaction can be unpredictable, we are at
least having a use-after-free on the transaction handle itself, since
the transaction commit will call kmem_cache_free() against the handle
regardless of its ->use_count value, or we can end up silently losing
all the updates to the log tree after that iput() in the logging path,
or using a transaction handle that in the meanwhile was allocated to
another task for a new transaction, etc, pretty much unpredictable
what can happen.
In order to fix both of them, instead of using iput() during logging, use
btrfs_add_delayed_iput(), so that the logging path of fsync never drops
the last reference on an inode, that step is offloaded to a safe context
(usually the cleaner kthread).
The assertion failure issue was sporadically triggered by the test case
generic/475 from fstests, which loads the dm error target while fsstress
is running, which lead to fsync failing while logging inodes with -EIO
errors and then trying later to commit the transaction, triggering the
assertion failure.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
fs/btrfs/tree-log.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 02bb7b52cb36..65e1eaa5df84 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4846,7 +4846,7 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
err = btrfs_log_inode(trans, root, other_inode,
LOG_OTHER_INODE,
0, LLONG_MAX, ctx);
- iput(other_inode);
+ btrfs_add_delayed_iput(other_inode);
if (err)
goto out_unlock;
else
@@ -5264,7 +5264,7 @@ static int log_new_dir_dentries(struct btrfs_trans_handle *trans,
}
if (btrfs_inode_in_log(di_inode, trans->transid)) {
- iput(di_inode);
+ btrfs_add_delayed_iput(di_inode);
break;
}
@@ -5276,7 +5276,7 @@ static int log_new_dir_dentries(struct btrfs_trans_handle *trans,
if (!ret &&
btrfs_must_commit_transaction(trans, di_inode))
ret = 1;
- iput(di_inode);
+ btrfs_add_delayed_iput(di_inode);
if (ret)
goto next_dir_inode;
if (ctx->log_new_dentries) {
@@ -5422,7 +5422,7 @@ static int btrfs_log_all_parents(struct btrfs_trans_handle *trans,
if (!ret && ctx && ctx->log_new_dentries)
ret = log_new_dir_dentries(trans, root,
dir_inode, ctx);
- iput(dir_inode);
+ btrfs_add_delayed_iput(dir_inode);
if (ret)
goto out;
}
--
2.16.4
[-- Attachment #4: 4.14.144-Btrfs-fix-assertion-failure-during-fsync-and-use-of-.patch --]
[-- Type: text/x-patch, Size: 7616 bytes --]
From ca5970ba21c15efd6f1e2a66ca3cd1e7c3332761 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana@suse.com>
Date: Tue, 10 Sep 2019 15:26:49 +0100
Subject: [PATCH] Btrfs: fix assertion failure during fsync and use of stale
transaction
Sometimes when fsync'ing a file we need to log that other inodes exist and
when we need to do that we acquire a reference on the inodes and then drop
that reference using iput() after logging them.
That generally is not a problem except if we end up doing the final iput()
(dropping the last reference) on the inode and that inode has a link count
of 0, which can happen in a very short time window if the logging path
gets a reference on the inode while it's being unlinked.
In that case we end up getting the eviction callback, btrfs_evict_inode(),
invoked through the iput() call chain which needs to drop all of the
inode's items from its subvolume btree, and in order to do that, it needs
to join a transaction at the helper function evict_refill_and_join().
However because the task previously started a transaction at the fsync
handler, btrfs_sync_file(), it has current->journal_info already pointing
to a transaction handle and therefore evict_refill_and_join() will get
that transaction handle from btrfs_join_transaction(). From this point on,
two different problems can happen:
1) evict_refill_and_join() will often change the transaction handle's
block reserve (->block_rsv) and set its ->bytes_reserved field to a
value greater than 0. If evict_refill_and_join() never commits the
transaction, the eviction handler ends up decreasing the reference
count (->use_count) of the transaction handle through the call to
btrfs_end_transaction(), and after that point we have a transaction
handle with a NULL ->block_rsv (which is the value prior to the
transaction join from evict_refill_and_join()) and a ->bytes_reserved
value greater than 0. If after the eviction/iput completes the inode
logging path hits an error or it decides that it must fallback to a
transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
non-zero value from btrfs_log_dentry_safe(), and because of that
non-zero value it tries to commit the transaction using a handle with
a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
the transaction commit hit an assertion failure at
btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
the ->block_rsv is NULL. The produced stack trace for that is like the
following:
[192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
[192922.917553] ------------[ cut here ]------------
[192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
[192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
[192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G W 5.1.4-btrfs-next-47 #1
[192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
(...)
[192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
[192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
[192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
[192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
[192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
[192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
[192922.923200] FS: 00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
[192922.923579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
[192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[192922.925105] Call Trace:
[192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
[192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
[192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs]
[192922.926731] do_fsync+0x38/0x60
[192922.927138] __x64_sys_fdatasync+0x13/0x20
[192922.927543] do_syscall_64+0x60/0x1c0
[192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe
(...)
[192922.934077] ---[ end trace f00808b12068168f ]---
2) If evict_refill_and_join() decides to commit the transaction, it will
be able to do it, since the nested transaction join only increments the
transaction handle's ->use_count reference counter and it does not
prevent the transaction from getting committed. This means that after
eviction completes, the fsync logging path will be using a transaction
handle that refers to an already committed transaction. What happens
when using such a stale transaction can be unpredictable, we are at
least having a use-after-free on the transaction handle itself, since
the transaction commit will call kmem_cache_free() against the handle
regardless of its ->use_count value, or we can end up silently losing
all the updates to the log tree after that iput() in the logging path,
or using a transaction handle that in the meanwhile was allocated to
another task for a new transaction, etc, pretty much unpredictable
what can happen.
In order to fix both of them, instead of using iput() during logging, use
btrfs_add_delayed_iput(), so that the logging path of fsync never drops
the last reference on an inode, that step is offloaded to a safe context
(usually the cleaner kthread).
The assertion failure issue was sporadically triggered by the test case
generic/475 from fstests, which loads the dm error target while fsstress
is running, which lead to fsync failing while logging inodes with -EIO
errors and then trying later to commit the transaction, triggering the
assertion failure.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
fs/btrfs/tree-log.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index bfd7c89c8d92..f8ef2e3fbf63 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -5122,7 +5122,7 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
BTRFS_I(other_inode),
LOG_OTHER_INODE, 0, LLONG_MAX,
ctx);
- iput(other_inode);
+ btrfs_add_delayed_iput(other_inode);
if (err)
goto out_unlock;
else
@@ -5539,7 +5539,7 @@ static int log_new_dir_dentries(struct btrfs_trans_handle *trans,
}
if (btrfs_inode_in_log(BTRFS_I(di_inode), trans->transid)) {
- iput(di_inode);
+ btrfs_add_delayed_iput(di_inode);
break;
}
@@ -5551,7 +5551,7 @@ static int log_new_dir_dentries(struct btrfs_trans_handle *trans,
if (!ret &&
btrfs_must_commit_transaction(trans, BTRFS_I(di_inode)))
ret = 1;
- iput(di_inode);
+ btrfs_add_delayed_iput(di_inode);
if (ret)
goto next_dir_inode;
if (ctx->log_new_dentries) {
@@ -5698,7 +5698,7 @@ static int btrfs_log_all_parents(struct btrfs_trans_handle *trans,
if (!ret && ctx && ctx->log_new_dentries)
ret = log_new_dir_dentries(trans, root,
BTRFS_I(dir_inode), ctx);
- iput(dir_inode);
+ btrfs_add_delayed_iput(dir_inode);
if (ret)
goto out;
}
--
2.16.4
[-- Attachment #5: 4.19.73-Btrfs-fix-assertion-failure-during-fsync-and-use-of-.patch --]
[-- Type: text/x-patch, Size: 7616 bytes --]
From 958cb34cec8138e2adf330ebcbfbff19639a8981 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana@suse.com>
Date: Tue, 10 Sep 2019 15:26:49 +0100
Subject: [PATCH] Btrfs: fix assertion failure during fsync and use of stale
transaction
Sometimes when fsync'ing a file we need to log that other inodes exist and
when we need to do that we acquire a reference on the inodes and then drop
that reference using iput() after logging them.
That generally is not a problem except if we end up doing the final iput()
(dropping the last reference) on the inode and that inode has a link count
of 0, which can happen in a very short time window if the logging path
gets a reference on the inode while it's being unlinked.
In that case we end up getting the eviction callback, btrfs_evict_inode(),
invoked through the iput() call chain which needs to drop all of the
inode's items from its subvolume btree, and in order to do that, it needs
to join a transaction at the helper function evict_refill_and_join().
However because the task previously started a transaction at the fsync
handler, btrfs_sync_file(), it has current->journal_info already pointing
to a transaction handle and therefore evict_refill_and_join() will get
that transaction handle from btrfs_join_transaction(). From this point on,
two different problems can happen:
1) evict_refill_and_join() will often change the transaction handle's
block reserve (->block_rsv) and set its ->bytes_reserved field to a
value greater than 0. If evict_refill_and_join() never commits the
transaction, the eviction handler ends up decreasing the reference
count (->use_count) of the transaction handle through the call to
btrfs_end_transaction(), and after that point we have a transaction
handle with a NULL ->block_rsv (which is the value prior to the
transaction join from evict_refill_and_join()) and a ->bytes_reserved
value greater than 0. If after the eviction/iput completes the inode
logging path hits an error or it decides that it must fallback to a
transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
non-zero value from btrfs_log_dentry_safe(), and because of that
non-zero value it tries to commit the transaction using a handle with
a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
the transaction commit hit an assertion failure at
btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
the ->block_rsv is NULL. The produced stack trace for that is like the
following:
[192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
[192922.917553] ------------[ cut here ]------------
[192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
[192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
[192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G W 5.1.4-btrfs-next-47 #1
[192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
(...)
[192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
[192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
[192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
[192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
[192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
[192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
[192922.923200] FS: 00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
[192922.923579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
[192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[192922.925105] Call Trace:
[192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
[192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
[192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs]
[192922.926731] do_fsync+0x38/0x60
[192922.927138] __x64_sys_fdatasync+0x13/0x20
[192922.927543] do_syscall_64+0x60/0x1c0
[192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe
(...)
[192922.934077] ---[ end trace f00808b12068168f ]---
2) If evict_refill_and_join() decides to commit the transaction, it will
be able to do it, since the nested transaction join only increments the
transaction handle's ->use_count reference counter and it does not
prevent the transaction from getting committed. This means that after
eviction completes, the fsync logging path will be using a transaction
handle that refers to an already committed transaction. What happens
when using such a stale transaction can be unpredictable, we are at
least having a use-after-free on the transaction handle itself, since
the transaction commit will call kmem_cache_free() against the handle
regardless of its ->use_count value, or we can end up silently losing
all the updates to the log tree after that iput() in the logging path,
or using a transaction handle that in the meanwhile was allocated to
another task for a new transaction, etc, pretty much unpredictable
what can happen.
In order to fix both of them, instead of using iput() during logging, use
btrfs_add_delayed_iput(), so that the logging path of fsync never drops
the last reference on an inode, that step is offloaded to a safe context
(usually the cleaner kthread).
The assertion failure issue was sporadically triggered by the test case
generic/475 from fstests, which loads the dm error target while fsstress
is running, which lead to fsync failing while logging inodes with -EIO
errors and then trying later to commit the transaction, triggering the
assertion failure.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
fs/btrfs/tree-log.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 08c5afa06aee..e561eb475339 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -5107,7 +5107,7 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
BTRFS_I(other_inode),
LOG_OTHER_INODE, 0, LLONG_MAX,
ctx);
- iput(other_inode);
+ btrfs_add_delayed_iput(other_inode);
if (err)
goto out_unlock;
else
@@ -5519,7 +5519,7 @@ static int log_new_dir_dentries(struct btrfs_trans_handle *trans,
}
if (btrfs_inode_in_log(BTRFS_I(di_inode), trans->transid)) {
- iput(di_inode);
+ btrfs_add_delayed_iput(di_inode);
break;
}
@@ -5531,7 +5531,7 @@ static int log_new_dir_dentries(struct btrfs_trans_handle *trans,
if (!ret &&
btrfs_must_commit_transaction(trans, BTRFS_I(di_inode)))
ret = 1;
- iput(di_inode);
+ btrfs_add_delayed_iput(di_inode);
if (ret)
goto next_dir_inode;
if (ctx->log_new_dentries) {
@@ -5678,7 +5678,7 @@ static int btrfs_log_all_parents(struct btrfs_trans_handle *trans,
if (!ret && ctx && ctx->log_new_dentries)
ret = log_new_dir_dentries(trans, root,
BTRFS_I(dir_inode), ctx);
- iput(dir_inode);
+ btrfs_add_delayed_iput(dir_inode);
if (ret)
goto out;
}
--
2.16.4
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
2019-09-16 12:59 ` Filipe Manana
@ 2019-09-16 15:13 ` Greg KH
0 siblings, 0 replies; 14+ messages in thread
From: Greg KH @ 2019-09-16 15:13 UTC (permalink / raw)
To: Filipe Manana; +Cc: Sasha Levin, Filipe Manana, linux-btrfs, stable
On Mon, Sep 16, 2019 at 01:59:47PM +0100, Filipe Manana wrote:
> On Thu, Sep 12, 2019 at 8:32 AM Sasha Levin <sashal@kernel.org> wrote:
> >
> > Hi,
> >
> > [This is an automated email]
> >
> > This commit has been processed because it contains a -stable tag.
> > The stable tag indicates that it's relevant for the following trees: 4.4+
> >
> > The bot has tested the following trees: v5.2.14, v4.19.72, v4.14.143, v4.9.192, v4.4.192.
> >
> > v5.2.14: Build OK!
> > v4.19.72: Failed to apply! Possible dependencies:
> > 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different files")
> > a3baaf0d786e ("Btrfs: fix fsync after succession of renames and unlink/rmdir")
> > b8aa330d2acb ("Btrfs: improve performance on fsync of files with multiple hardlinks")
> >
> > v4.14.143: Failed to apply! Possible dependencies:
> > 0d836392cadd ("Btrfs: fix mount failure after fsync due to hard link recreation")
> > 1f250e929a9c ("Btrfs: fix log replay failure after unlink and link combination")
> > 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different files")
> > 8d9e220ca084 ("btrfs: simplify IS_ERR/PTR_ERR checks")
> > a3baaf0d786e ("Btrfs: fix fsync after succession of renames and unlink/rmdir")
> > b8aa330d2acb ("Btrfs: improve performance on fsync of files with multiple hardlinks")
> >
> > v4.9.192: Failed to apply! Possible dependencies:
> > 0b246afa62b0 ("btrfs: root->fs_info cleanup, add fs_info convenience variables")
> > 0d836392cadd ("Btrfs: fix mount failure after fsync due to hard link recreation")
> > 1f250e929a9c ("Btrfs: fix log replay failure after unlink and link combination")
> > 4791c8f19c45 ("btrfs: Make btrfs_check_ref_name_override take btrfs_inode")
> > 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different files")
> > a3baaf0d786e ("Btrfs: fix fsync after succession of renames and unlink/rmdir")
> > cf8cddd38bab ("btrfs: don't abuse REQ_OP_* flags for btrfs_map_block")
> > da17066c4047 ("btrfs: pull node/sector/stripe sizes out of root and into fs_info")
> > db0a669fb002 ("btrfs: Make btrfs_add_link take btrfs_inode")
> > de143792253e ("btrfs: struct btrfsic_state->root should be an fs_info")
> > fb456252d3d9 ("btrfs: root->fs_info cleanup, use fs_info->dev_root everywhere")
> >
> > v4.4.192: Failed to apply! Possible dependencies:
> > 0132761017e0 ("btrfs: fix string and comment grammatical issues and typos")
> > 09cbfeaf1a5a ("mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros")
> > 0b246afa62b0 ("btrfs: root->fs_info cleanup, add fs_info convenience variables")
> > 0d836392cadd ("Btrfs: fix mount failure after fsync due to hard link recreation")
> > 0e749e54244e ("dax: increase granularity of dax_clear_blocks() operations")
> > 1f250e929a9c ("Btrfs: fix log replay failure after unlink and link combination")
> > 44f714dae50a ("Btrfs: improve performance on fsync against new inode after rename/unlink")
> > 4791c8f19c45 ("btrfs: Make btrfs_check_ref_name_override take btrfs_inode")
> > 52db400fcd50 ("pmem, dax: clean up clear_pmem()")
> > 6b5fc433a7ad ("Btrfs: fix fsync after succession of renames of different files")
> > 781feef7e6be ("Btrfs: fix lockdep warning about log_mutex")
> > a3baaf0d786e ("Btrfs: fix fsync after succession of renames and unlink/rmdir")
> > b2e0d1625e19 ("dax: fix lifetime of in-kernel dax mappings with dax_map_atomic()")
> > bb7ab3b92e46 ("btrfs: Fix misspellings in comments.")
> > cf8cddd38bab ("btrfs: don't abuse REQ_OP_* flags for btrfs_map_block")
> > d1a5f2b4d8a1 ("block: use DAX for partition table reads")
> > db0a669fb002 ("btrfs: Make btrfs_add_link take btrfs_inode")
> > de143792253e ("btrfs: struct btrfsic_state->root should be an fs_info")
> >
> >
> > NOTE: The patch will not be queued to stable trees until it is upstream.
> >
> > How should we proceed with this patch?
>
> So here follows, as attachments, patches for the following stable
> versions: 4.4.193, 4.9.193, 4.14.144 and 4.19.73.
Thanks for these backports, now queued up!
greg k-h
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2019-09-16 15:13 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-10 14:26 [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction fdmanana
2019-09-11 16:04 ` Josef Bacik
2019-09-12 7:30 ` Sasha Levin
2019-09-12 8:02 ` Filipe Manana
2019-09-16 12:59 ` Filipe Manana
2019-09-16 15:13 ` Greg KH
2019-09-12 11:10 ` Nikolay Borisov
2019-09-12 11:24 ` Filipe Manana
2019-09-12 11:35 ` Nikolay Borisov
2019-09-12 11:58 ` Josef Bacik
2019-09-12 13:17 ` Filipe Manana
2019-09-12 12:18 ` Josef Bacik
2019-09-12 13:19 ` Filipe Manana
2019-09-12 15:14 ` Josef Bacik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).