linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] ext4: Fix possible fs corruption due to xattr races
@ 2022-06-06 14:28 Jan Kara
  2022-06-06 14:28 ` [PATCH 1/2] mbcache: Add functions to invalidate entries and wait for entry users Jan Kara
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Jan Kara @ 2022-06-06 14:28 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Ritesh Harjani, Jan Kara

Hello,

I've tracked down the culprit of the jbd2 assertion Ritesh reported to me. In
the end it does not have much to do with jbd2 but rather points to a subtle
race in xattr code between xattr block reuse and xattr block freeing that can
result in fs corruption during journal replay. See patch 2/2 for more details.
These patches fix the problem. I have to say I'm not too happy with the special
mbcache interface I had to add because it just requires too deep knowledge of
how things work internally to get things right. If you get it wrong, you'll
have subtle races like above. But I didn't find a more transparent way to
fix this race. If someone has ideas, suggestions are welcome!

								Honza

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] mbcache: Add functions to invalidate entries and wait for entry users
  2022-06-06 14:28 [PATCH 0/2] ext4: Fix possible fs corruption due to xattr races Jan Kara
@ 2022-06-06 14:28 ` Jan Kara
  2022-06-06 14:28 ` [PATCH 2/2] ext4: Fix race when reusing xattr blocks Jan Kara
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Jan Kara @ 2022-06-06 14:28 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Ritesh Harjani, Jan Kara, stable

Add functions which allow for invalidating entry in the cache but
keeping the last reference and for waiting on remaining users of the
cache entry. These functions will be used by ext4 to fix races with
xattr block sharing.

CC: stable@vger.kernel.org
Fixes: 82939d7999df ("ext4: convert to mbcache2")
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/mbcache.c            | 47 ++++++++++++++++++++++++++++++++++++-----
 include/linux/mbcache.h | 11 +++++++++-
 2 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/fs/mbcache.c b/fs/mbcache.c
index 97c54d3a2227..b81ea7c6fa21 100644
--- a/fs/mbcache.c
+++ b/fs/mbcache.c
@@ -125,6 +125,21 @@ void __mb_cache_entry_free(struct mb_cache_entry *entry)
 }
 EXPORT_SYMBOL(__mb_cache_entry_free);
 
+/*
+ * mb_cache_entry_wait_unused - wait to be the last user of the entry
+ *
+ * @entry - entry to work on
+ *
+ * Wait to be the last user of the entry.
+ */
+void mb_cache_entry_wait_unused(struct mb_cache_entry *entry)
+{
+	WARN_ON_ONCE(!list_empty(&entry->e_list));
+	WARN_ON_ONCE(!hlist_bl_unhashed(&entry->e_hash_list));
+	wait_var_event(&entry->e_refcnt, atomic_read(&entry->e_refcnt) == 1);
+}
+EXPORT_SYMBOL(mb_cache_entry_wait_unused);
+
 static struct mb_cache_entry *__entry_find(struct mb_cache *cache,
 					   struct mb_cache_entry *entry,
 					   u32 key)
@@ -217,14 +232,18 @@ struct mb_cache_entry *mb_cache_entry_get(struct mb_cache *cache, u32 key,
 }
 EXPORT_SYMBOL(mb_cache_entry_get);
 
-/* mb_cache_entry_delete - remove a cache entry
+/* mb_cache_entry_invalidate - invalidate mbcache entry
  * @cache - cache we work with
  * @key - key
  * @value - value
  *
- * Remove entry from cache @cache with key @key and value @value.
+ * Invalidate entry in cache @cache with key @key and value @value. The entry
+ * is removed from the hash and LRU so there can be no new users of it. The
+ * invalidated entry is returned so that the caller can drop the last
+ * reference (perhaps after waiting for remaining users).
  */
-void mb_cache_entry_delete(struct mb_cache *cache, u32 key, u64 value)
+struct mb_cache_entry *mb_cache_entry_invalidate(struct mb_cache *cache,
+						 u32 key, u64 value)
 {
 	struct hlist_bl_node *node;
 	struct hlist_bl_head *head;
@@ -246,11 +265,29 @@ void mb_cache_entry_delete(struct mb_cache *cache, u32 key, u64 value)
 				atomic_dec(&entry->e_refcnt);
 			}
 			spin_unlock(&cache->c_list_lock);
-			mb_cache_entry_put(cache, entry);
-			return;
+			return entry;
 		}
 	}
 	hlist_bl_unlock(head);
+
+	return NULL;
+}
+EXPORT_SYMBOL(mb_cache_entry_invalidate);
+
+/* mb_cache_entry_delete - delete mbcache entry
+ * @cache - cache we work with
+ * @key - key
+ * @value - value
+ *
+ * Delete entry in cache @cache with key @key and value @value.
+ */
+void mb_cache_entry_delete(struct mb_cache *cache, u32 key, u64 value)
+{
+	struct mb_cache_entry *entry;
+
+	entry = mb_cache_entry_invalidate(cache, key, value);
+	if (entry)
+		mb_cache_entry_put(cache, entry);
 }
 EXPORT_SYMBOL(mb_cache_entry_delete);
 
diff --git a/include/linux/mbcache.h b/include/linux/mbcache.h
index 20f1e3ff6013..4e6a5e05e78b 100644
--- a/include/linux/mbcache.h
+++ b/include/linux/mbcache.h
@@ -30,15 +30,24 @@ void mb_cache_destroy(struct mb_cache *cache);
 int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key,
 			  u64 value, bool reusable);
 void __mb_cache_entry_free(struct mb_cache_entry *entry);
+void mb_cache_entry_wait_unused(struct mb_cache_entry *entry);
 static inline int mb_cache_entry_put(struct mb_cache *cache,
 				     struct mb_cache_entry *entry)
 {
-	if (!atomic_dec_and_test(&entry->e_refcnt))
+	unsigned int cnt;
+
+	cnt = atomic_dec_return(&entry->e_refcnt);
+	if (cnt > 0) {
+		if (cnt == 1)
+			wake_up_var(&entry->e_refcnt);
 		return 0;
+	}
 	__mb_cache_entry_free(entry);
 	return 1;
 }
 
+struct mb_cache_entry *mb_cache_entry_invalidate(struct mb_cache *cache,
+						 u32 key, u64 value);
 void mb_cache_entry_delete(struct mb_cache *cache, u32 key, u64 value);
 struct mb_cache_entry *mb_cache_entry_get(struct mb_cache *cache, u32 key,
 					  u64 value);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] ext4: Fix race when reusing xattr blocks
  2022-06-06 14:28 [PATCH 0/2] ext4: Fix possible fs corruption due to xattr races Jan Kara
  2022-06-06 14:28 ` [PATCH 1/2] mbcache: Add functions to invalidate entries and wait for entry users Jan Kara
@ 2022-06-06 14:28 ` Jan Kara
  2022-06-07 12:22 ` [PATCH 0/2] ext4: Fix possible fs corruption due to xattr races Lukas Czerner
  2022-06-08  4:51 ` Ritesh Harjani
  3 siblings, 0 replies; 7+ messages in thread
From: Jan Kara @ 2022-06-06 14:28 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Ritesh Harjani, Jan Kara, stable

When ext4_xattr_block_set() decides to remove xattr block the following
race can happen:

CPU1					CPU2
ext4_xattr_block_set()			ext4_xattr_release_block()
  new_bh = ext4_xattr_block_cache_find()

					  lock_buffer(bh);
					  ref = le32_to_cpu(BHDR(bh)->h_refcount);
					  if (ref == 1) {
					    ...
					    mb_cache_entry_delete();
					    unlock_buffer(bh);
					    ext4_free_blocks();
					      ...
					      ext4_forget(..., bh, ...);
						jbd2_journal_revoke(..., bh);

  ext4_journal_get_write_access(..., new_bh, ...)
    do_get_write_access()
      jbd2_journal_cancel_revoke(..., new_bh);

Later the code in ext4_xattr_block_set() finds out the block got freed
and cancels reusal of the block but the revoke stays canceled and so in
case of block reuse and journal replay the filesystem can get corrupted.
If the race works out slightly differently, we can also hit assertions
in the jbd2 code. Fix the problem by waiting for users of the mbcache
entry (so xattr block reuse gets canceled) before we free xattr block to
prevent the issue with racing ext4_journal_get_write_access() call.

CC: stable@vger.kernel.org
Fixes: 82939d7999df ("ext4: convert to mbcache2")
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/xattr.c | 40 +++++++++++++++++++++++++++++++++-------
 1 file changed, 33 insertions(+), 7 deletions(-)

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 042325349098..4eeeb3db618f 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -1241,17 +1241,29 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode,
 	hash = le32_to_cpu(BHDR(bh)->h_hash);
 	ref = le32_to_cpu(BHDR(bh)->h_refcount);
 	if (ref == 1) {
+		struct mb_cache_entry *ce = NULL;
+
 		ea_bdebug(bh, "refcount now=0; freeing");
 		/*
 		 * This must happen under buffer lock for
 		 * ext4_xattr_block_set() to reliably detect freed block
 		 */
 		if (ea_block_cache)
-			mb_cache_entry_delete(ea_block_cache, hash,
-					      bh->b_blocknr);
+			ce = mb_cache_entry_invalidate(ea_block_cache, hash,
+						       bh->b_blocknr);
 		get_bh(bh);
 		unlock_buffer(bh);
 
+		if (ce) {
+			/*
+			 * Wait for outstanding users of xattr entry so that we
+			 * know they don't try to reuse xattr block before we
+			 * free it - that revokes the block from the journal
+			 * which upsets jbd2_journal_get_write_access()
+			 */
+			mb_cache_entry_wait_unused(ce);
+			mb_cache_entry_put(ea_block_cache, ce);
+		}
 		if (ext4_has_feature_ea_inode(inode->i_sb))
 			ext4_xattr_inode_dec_ref_all(handle, inode, bh,
 						     BFIRST(bh),
@@ -1847,7 +1859,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
 	struct buffer_head *new_bh = NULL;
 	struct ext4_xattr_search s_copy = bs->s;
 	struct ext4_xattr_search *s = &s_copy;
-	struct mb_cache_entry *ce = NULL;
+	struct mb_cache_entry *ce = NULL, *old_ce = NULL;
 	int error = 0;
 	struct mb_cache *ea_block_cache = EA_BLOCK_CACHE(inode);
 	struct inode *ea_inode = NULL, *tmp_inode;
@@ -1871,11 +1883,15 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
 			/*
 			 * This must happen under buffer lock for
 			 * ext4_xattr_block_set() to reliably detect modified
-			 * block
+			 * block. We keep ourselves entry reference so that
+			 * we can wait for outstanding users of the entry
+			 * before freeing the xattr block.
 			 */
-			if (ea_block_cache)
-				mb_cache_entry_delete(ea_block_cache, hash,
-						      bs->bh->b_blocknr);
+			if (ea_block_cache) {
+				old_ce = mb_cache_entry_invalidate(
+						ea_block_cache, hash,
+						bs->bh->b_blocknr);
+			}
 			ea_bdebug(bs->bh, "modifying in-place");
 			error = ext4_xattr_set_entry(i, s, handle, inode,
 						     true /* is_block */);
@@ -2127,6 +2143,14 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
 	if (bs->bh && bs->bh != new_bh) {
 		struct ext4_xattr_inode_array *ea_inode_array = NULL;
 
+		/*
+		 * Wait for outstanding users of xattr entry so that we know
+		 * they don't try to reuse xattr block before we free it - that
+		 * revokes the block from the journal which upsets
+		 * jbd2_journal_get_write_access()
+		 */
+		if (old_ce)
+			mb_cache_entry_wait_unused(old_ce);
 		ext4_xattr_release_block(handle, inode, bs->bh,
 					 &ea_inode_array,
 					 0 /* extra_credits */);
@@ -2151,6 +2175,8 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
 	}
 	if (ce)
 		mb_cache_entry_put(ea_block_cache, ce);
+	if (old_ce)
+		mb_cache_entry_put(ea_block_cache, old_ce);
 	brelse(new_bh);
 	if (!(bs->bh && s->base == bs->bh->b_data))
 		kfree(s->base);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] ext4: Fix possible fs corruption due to xattr races
  2022-06-06 14:28 [PATCH 0/2] ext4: Fix possible fs corruption due to xattr races Jan Kara
  2022-06-06 14:28 ` [PATCH 1/2] mbcache: Add functions to invalidate entries and wait for entry users Jan Kara
  2022-06-06 14:28 ` [PATCH 2/2] ext4: Fix race when reusing xattr blocks Jan Kara
@ 2022-06-07 12:22 ` Lukas Czerner
  2022-06-08  4:51 ` Ritesh Harjani
  3 siblings, 0 replies; 7+ messages in thread
From: Lukas Czerner @ 2022-06-07 12:22 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ted Tso, linux-ext4, Ritesh Harjani

On Mon, Jun 06, 2022 at 04:28:45PM +0200, Jan Kara wrote:
> Hello,
> 
> I've tracked down the culprit of the jbd2 assertion Ritesh reported to me. In
> the end it does not have much to do with jbd2 but rather points to a subtle
> race in xattr code between xattr block reuse and xattr block freeing that can
> result in fs corruption during journal replay. See patch 2/2 for more details.
> These patches fix the problem. I have to say I'm not too happy with the special
> mbcache interface I had to add because it just requires too deep knowledge of
> how things work internally to get things right. If you get it wrong, you'll
> have subtle races like above. But I didn't find a more transparent way to
> fix this race. If someone has ideas, suggestions are welcome!
> 
> 								Honza

I haven't give too much thought towards finding a better way to fix the
race, but this seems like ok solution to me.

You can add to the series

Reviewed-by: Lukas Czerner <lczerner@redhat.com>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] ext4: Fix possible fs corruption due to xattr races
  2022-06-06 14:28 [PATCH 0/2] ext4: Fix possible fs corruption due to xattr races Jan Kara
                   ` (2 preceding siblings ...)
  2022-06-07 12:22 ` [PATCH 0/2] ext4: Fix possible fs corruption due to xattr races Lukas Czerner
@ 2022-06-08  4:51 ` Ritesh Harjani
  2022-06-08  9:54   ` Jan Kara
  3 siblings, 1 reply; 7+ messages in thread
From: Ritesh Harjani @ 2022-06-08  4:51 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ted Tso, linux-ext4

On 22/06/06 04:28PM, Jan Kara wrote:
> Hello,
>
> I've tracked down the culprit of the jbd2 assertion Ritesh reported to me. In

Hello Jan,

Thanks for working on the problem and identifying the race.

> the end it does not have much to do with jbd2 but rather points to a subtle
> race in xattr code between xattr block reuse and xattr block freeing that can
> result in fs corruption during journal replay. See patch 2/2 for more details.
> These patches fix the problem. I have to say I'm not too happy with the special

So while I was still reviewing this patch-set, I thought of giving a try with some
stress test for xattrs (given that this is some sort of race which is not always
easy to track down).

So it seems it is easy to recreate the crash with stress-ng xattr test (even
with your patches included).
	stress-ng --xattr 16 --timeout=10000s

Hope this might help further narrow down the problem.

root@qemu:/home/qemu# [  257.862064] ------------[ cut here ]------------
[  257.862834] kernel BUG at fs/jbd2/revoke.c:460!
[  257.863461] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[  257.864084] CPU: 0 PID: 1499 Comm: stress-ng-xattr Not tainted 5.18.0-rc5+
#102
[  257.864973] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
[  257.865972] RIP: 0010:jbd2_journal_cancel_revoke+0x12c/0x170
[  257.866606] Code: 49 89 44 24 08 e8 b4 bf d8 00 48 8b 3d 2d f9 29 02 4c 89 e6
e8 c5 81 ea ff 48 8b 73 18 4c 89 ef e8 39 f8
[  257.868547] RSP: 0018:ffffc9000170b9c0 EFLAGS: 00010286
[  257.869106] RAX: ffff888101cadb00 RBX: ffff888121cb9f08 RCX: 0000000000000000
[  257.869837] RDX: 0000000000000001 RSI: 000000000000242d RDI: 00000000ffffffff
[  257.870552] RBP: ffffc9000170b9e0 R08: ffffffff82cf2f20 R09: ffff888108831e10
[  257.871264] R10: 00000000000000bb R11: 000000000105d68a R12: ffff888108831e10
[  257.871977] R13: ffff888120937000 R14: ffff888108928500 R15: ffff888108831e18
[  257.872689] FS:  00007ffff6b4dc00(0000) GS:ffff88842fc00000(0000)
knlGS:0000000000000000
[  257.873528] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  257.874101] CR2: 0000555556361220 CR3: 00000001201dc000 CR4: 00000000000006f0
[  257.874813] Call Trace:
[  257.875077]  <TASK>
[  257.875313]  do_get_write_access+0x3d9/0x460
[  257.875753]  jbd2_journal_get_write_access+0x54/0x80
[  257.876260]  __ext4_journal_get_write_access+0x8b/0x1b0
[  257.876805]  ? ext4_dirty_inode+0x70/0x80
[  257.877255]  ext4_xattr_block_set+0x935/0xfb0
[  257.877709]  ext4_xattr_set_handle+0x5c8/0x680
[  257.878159]  ext4_xattr_set+0xd5/0x180
[  257.878540]  ext4_xattr_user_set+0x35/0x40
[  257.878957]  __vfs_removexattr+0x5a/0x70
[  257.879373]  __vfs_removexattr_locked+0xc5/0x160
[  257.879846]  vfs_removexattr+0x5b/0x100
[  257.880235]  removexattr+0x61/0x90
[  257.880611]  ? kvm_clock_read+0x18/0x30
[  257.881023]  ? kvm_clock_get_cycles+0x9/0x10
[  257.881492]  ? ktime_get+0x3e/0xa0
[  257.881856]  ? native_apic_msr_read+0x40/0x40
[  257.882302]  ? lapic_next_event+0x21/0x30
[  257.882716]  ? clockevents_program_event+0x8f/0xe0
[  257.883206]  ? hrtimer_update_next_event+0x4b/0x70
[  257.883698]  ? debug_smp_processor_id+0x17/0x20
[  257.884181]  ? preempt_count_add+0x4d/0xc0
[  257.884605]  __x64_sys_fremovexattr+0x82/0xb0
[  257.885063]  do_syscall_64+0x3b/0x90
[  257.885495]  entry_SYSCALL_64_after_hwframe+0x44/0xae
<...>
[  257.892816]  </TASK>

-ritesh

> mbcache interface I had to add because it just requires too deep knowledge of
> how things work internally to get things right. If you get it wrong, you'll
> have subtle races like above. But I didn't find a more transparent way to
> fix this race. If someone has ideas, suggestions are welcome!
>
> 								Honza

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] ext4: Fix possible fs corruption due to xattr races
  2022-06-08  4:51 ` Ritesh Harjani
@ 2022-06-08  9:54   ` Jan Kara
  2022-06-08 15:02     ` Jan Kara
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2022-06-08  9:54 UTC (permalink / raw)
  To: Ritesh Harjani; +Cc: Jan Kara, Ted Tso, linux-ext4

On Wed 08-06-22 10:21:00, Ritesh Harjani wrote:
> On 22/06/06 04:28PM, Jan Kara wrote:
> > Hello,
> >
> > I've tracked down the culprit of the jbd2 assertion Ritesh reported to me. In
> 
> Hello Jan,
> 
> Thanks for working on the problem and identifying the race.
> 
> > the end it does not have much to do with jbd2 but rather points to a subtle
> > race in xattr code between xattr block reuse and xattr block freeing that can
> > result in fs corruption during journal replay. See patch 2/2 for more details.
> > These patches fix the problem. I have to say I'm not too happy with the special
> 
> So while I was still reviewing this patch-set, I thought of giving a try with some
> stress test for xattrs (given that this is some sort of race which is not always
> easy to track down).
> 
> So it seems it is easy to recreate the crash with stress-ng xattr test (even
> with your patches included).
> 	stress-ng --xattr 16 --timeout=10000s
> 
> Hope this might help further narrow down the problem.

Drat. I was actually running "stress-ng --xattr
<some-number-I-dont-remember>" to test my patches and it
didn't reproduce the crash for me within 5 minutes or so. Let me try harder
and thanks for the testing!

								Honza

> 
> root@qemu:/home/qemu# [  257.862064] ------------[ cut here ]------------
> [  257.862834] kernel BUG at fs/jbd2/revoke.c:460!
> [  257.863461] invalid opcode: 0000 [#1] PREEMPT SMP PTI
> [  257.864084] CPU: 0 PID: 1499 Comm: stress-ng-xattr Not tainted 5.18.0-rc5+
> #102
> [  257.864973] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.13.0-1ubuntu1.1 04/01/2014
> [  257.865972] RIP: 0010:jbd2_journal_cancel_revoke+0x12c/0x170
> [  257.866606] Code: 49 89 44 24 08 e8 b4 bf d8 00 48 8b 3d 2d f9 29 02 4c 89 e6
> e8 c5 81 ea ff 48 8b 73 18 4c 89 ef e8 39 f8
> [  257.868547] RSP: 0018:ffffc9000170b9c0 EFLAGS: 00010286
> [  257.869106] RAX: ffff888101cadb00 RBX: ffff888121cb9f08 RCX: 0000000000000000
> [  257.869837] RDX: 0000000000000001 RSI: 000000000000242d RDI: 00000000ffffffff
> [  257.870552] RBP: ffffc9000170b9e0 R08: ffffffff82cf2f20 R09: ffff888108831e10
> [  257.871264] R10: 00000000000000bb R11: 000000000105d68a R12: ffff888108831e10
> [  257.871977] R13: ffff888120937000 R14: ffff888108928500 R15: ffff888108831e18
> [  257.872689] FS:  00007ffff6b4dc00(0000) GS:ffff88842fc00000(0000)
> knlGS:0000000000000000
> [  257.873528] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  257.874101] CR2: 0000555556361220 CR3: 00000001201dc000 CR4: 00000000000006f0
> [  257.874813] Call Trace:
> [  257.875077]  <TASK>
> [  257.875313]  do_get_write_access+0x3d9/0x460
> [  257.875753]  jbd2_journal_get_write_access+0x54/0x80
> [  257.876260]  __ext4_journal_get_write_access+0x8b/0x1b0
> [  257.876805]  ? ext4_dirty_inode+0x70/0x80
> [  257.877255]  ext4_xattr_block_set+0x935/0xfb0
> [  257.877709]  ext4_xattr_set_handle+0x5c8/0x680
> [  257.878159]  ext4_xattr_set+0xd5/0x180
> [  257.878540]  ext4_xattr_user_set+0x35/0x40
> [  257.878957]  __vfs_removexattr+0x5a/0x70
> [  257.879373]  __vfs_removexattr_locked+0xc5/0x160
> [  257.879846]  vfs_removexattr+0x5b/0x100
> [  257.880235]  removexattr+0x61/0x90
> [  257.880611]  ? kvm_clock_read+0x18/0x30
> [  257.881023]  ? kvm_clock_get_cycles+0x9/0x10
> [  257.881492]  ? ktime_get+0x3e/0xa0
> [  257.881856]  ? native_apic_msr_read+0x40/0x40
> [  257.882302]  ? lapic_next_event+0x21/0x30
> [  257.882716]  ? clockevents_program_event+0x8f/0xe0
> [  257.883206]  ? hrtimer_update_next_event+0x4b/0x70
> [  257.883698]  ? debug_smp_processor_id+0x17/0x20
> [  257.884181]  ? preempt_count_add+0x4d/0xc0
> [  257.884605]  __x64_sys_fremovexattr+0x82/0xb0
> [  257.885063]  do_syscall_64+0x3b/0x90
> [  257.885495]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> <...>
> [  257.892816]  </TASK>
> 
> -ritesh
> 
> > mbcache interface I had to add because it just requires too deep knowledge of
> > how things work internally to get things right. If you get it wrong, you'll
> > have subtle races like above. But I didn't find a more transparent way to
> > fix this race. If someone has ideas, suggestions are welcome!
> >
> > 								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] ext4: Fix possible fs corruption due to xattr races
  2022-06-08  9:54   ` Jan Kara
@ 2022-06-08 15:02     ` Jan Kara
  0 siblings, 0 replies; 7+ messages in thread
From: Jan Kara @ 2022-06-08 15:02 UTC (permalink / raw)
  To: Ritesh Harjani; +Cc: Jan Kara, Ted Tso, linux-ext4

On Wed 08-06-22 11:54:25, Jan Kara wrote:
> On Wed 08-06-22 10:21:00, Ritesh Harjani wrote:
> > On 22/06/06 04:28PM, Jan Kara wrote:
> > > Hello,
> > >
> > > I've tracked down the culprit of the jbd2 assertion Ritesh reported to me. In
> > 
> > Hello Jan,
> > 
> > Thanks for working on the problem and identifying the race.
> > 
> > > the end it does not have much to do with jbd2 but rather points to a subtle
> > > race in xattr code between xattr block reuse and xattr block freeing that can
> > > result in fs corruption during journal replay. See patch 2/2 for more details.
> > > These patches fix the problem. I have to say I'm not too happy with the special
> > 
> > So while I was still reviewing this patch-set, I thought of giving a try with some
> > stress test for xattrs (given that this is some sort of race which is not always
> > easy to track down).
> > 
> > So it seems it is easy to recreate the crash with stress-ng xattr test (even
> > with your patches included).
> > 	stress-ng --xattr 16 --timeout=10000s
> > 
> > Hope this might help further narrow down the problem.
> 
> Drat. I was actually running "stress-ng --xattr
> <some-number-I-dont-remember>" to test my patches and it
> didn't reproduce the crash for me within 5 minutes or so. Let me try harder
> and thanks for the testing!

Indeed, I forgot to enable JBD2_DEBUG in my kernel config and so the
assertion was not triggering for me. Now I can reproduce the issue even
with my patches (although it takes longer to reproduce) so I'm digging more
into it.

							Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-06-08 15:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-06 14:28 [PATCH 0/2] ext4: Fix possible fs corruption due to xattr races Jan Kara
2022-06-06 14:28 ` [PATCH 1/2] mbcache: Add functions to invalidate entries and wait for entry users Jan Kara
2022-06-06 14:28 ` [PATCH 2/2] ext4: Fix race when reusing xattr blocks Jan Kara
2022-06-07 12:22 ` [PATCH 0/2] ext4: Fix possible fs corruption due to xattr races Lukas Czerner
2022-06-08  4:51 ` Ritesh Harjani
2022-06-08  9:54   ` Jan Kara
2022-06-08 15:02     ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).