All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: linux-xfs@vger.kernel.org
Subject: [PATCH 1/4] xfs: log worker needs to start before intent/unlink recovery
Date: Wed,  9 Mar 2022 12:55:09 +1100	[thread overview]
Message-ID: <20220309015512.2648074-2-david@fromorbit.com> (raw)
In-Reply-To: <20220309015512.2648074-1-david@fromorbit.com>

From: Dave Chinner <dchinner@redhat.com>

After 963 iterations of generic/530, it deadlocked during recovery
on a pinned inode cluster buffer like so:

XFS (pmem1): Starting recovery (logdev: internal)
INFO: task kworker/8:0:306037 blocked for more than 122 seconds.
      Not tainted 5.17.0-rc6-dgc+ #975
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/8:0     state:D stack:13024 pid:306037 ppid:     2 flags:0x00004000
Workqueue: xfs-inodegc/pmem1 xfs_inodegc_worker
Call Trace:
 <TASK>
 __schedule+0x30d/0x9e0
 schedule+0x55/0xd0
 schedule_timeout+0x114/0x160
 __down+0x99/0xf0
 down+0x5e/0x70
 xfs_buf_lock+0x36/0xf0
 xfs_buf_find+0x418/0x850
 xfs_buf_get_map+0x47/0x380
 xfs_buf_read_map+0x54/0x240
 xfs_trans_read_buf_map+0x1bd/0x490
 xfs_imap_to_bp+0x4f/0x70
 xfs_iunlink_map_ino+0x66/0xd0
 xfs_iunlink_map_prev.constprop.0+0x148/0x2f0
 xfs_iunlink_remove_inode+0xf2/0x1d0
 xfs_inactive_ifree+0x1a3/0x900
 xfs_inode_unlink+0xcc/0x210
 xfs_inodegc_worker+0x1ac/0x2f0
 process_one_work+0x1ac/0x390
 worker_thread+0x56/0x3c0
 kthread+0xf6/0x120
 ret_from_fork+0x1f/0x30
 </TASK>
task:mount           state:D stack:13248 pid:324509 ppid:324233 flags:0x00004000
Call Trace:
 <TASK>
 __schedule+0x30d/0x9e0
 schedule+0x55/0xd0
 schedule_timeout+0x114/0x160
 __down+0x99/0xf0
 down+0x5e/0x70
 xfs_buf_lock+0x36/0xf0
 xfs_buf_find+0x418/0x850
 xfs_buf_get_map+0x47/0x380
 xfs_buf_read_map+0x54/0x240
 xfs_trans_read_buf_map+0x1bd/0x490
 xfs_imap_to_bp+0x4f/0x70
 xfs_iget+0x300/0xb40
 xlog_recover_process_one_iunlink+0x4c/0x170
 xlog_recover_process_iunlinks.isra.0+0xee/0x130
 xlog_recover_finish+0x57/0x110
 xfs_log_mount_finish+0xfc/0x1e0
 xfs_mountfs+0x540/0x910
 xfs_fs_fill_super+0x495/0x850
 get_tree_bdev+0x171/0x270
 xfs_fs_get_tree+0x15/0x20
 vfs_get_tree+0x24/0xc0
 path_mount+0x304/0xba0
 __x64_sys_mount+0x108/0x140
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
 </TASK>
task:xfsaild/pmem1   state:D stack:14544 pid:324525 ppid:     2 flags:0x00004000
Call Trace:
 <TASK>
 __schedule+0x30d/0x9e0
 schedule+0x55/0xd0
 io_schedule+0x4b/0x80
 xfs_buf_wait_unpin+0x9e/0xf0
 __xfs_buf_submit+0x14a/0x230
 xfs_buf_delwri_submit_buffers+0x107/0x280
 xfs_buf_delwri_submit_nowait+0x10/0x20
 xfsaild+0x27e/0x9d0
 kthread+0xf6/0x120
 ret_from_fork+0x1f/0x30

We have the mount process waiting on an inode cluster buffer read,
inodegc doing unlink waiting on the same inode cluster buffer, and
the AIL push thread blocked in writeback waiting for the inode to
become unpinned.

What has happened here is that the AIL push thread has raced with
the inodegc process modifying, committing and pinning the inode
cluster buffer here in xfs_buf_delwri_submit_buffers() here:

	blk_start_plug(&plug);
	list_for_each_entry_safe(bp, n, buffer_list, b_list) {
		if (!wait_list) {
			if (xfs_buf_ispinned(bp)) {
				pinned++;
				continue;
			}
Here >>>>>>
			if (!xfs_buf_trylock(bp))
				continue;

Basically, the AIL has found the buffer wasn't pinned and got the
lock without blocking, but then the buffer was pinned. This implies
the processing here was pre-empted between the pin check and the
lock, because the pin count can only be increased while holding the
buffer locked. Hence when it has gone to submit the IO, it has
blocked waiting for the buffer to be unpinned.

With all executing threads now waiting on the buffer to be unpinned,
we normally get out of situations like this via the background log
worker issuing a log force which will unpinned stuck buffers like
this. But at this point in recovery, we haven't started the log
worker. In fact, the first thing we do after processing intents and
unlinked inodes is *start the log worker*. IOWs, we start it too
late to have it break deadlocks like this.

Avoid this and any other similar deadlock vectors in intent and
unlinked inode recovery by starting the log worker before we recover
intents and unlinked inodes. This part of recovery runs as though
the filesystem is fully active, so we really should have the same
infrastructure running as we normally do at runtime.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 89fec9a18c34..ffd928cf9a9a 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -812,10 +812,9 @@ xfs_log_mount_finish(
 	 * mount failure occurs.
 	 */
 	mp->m_super->s_flags |= SB_ACTIVE;
+	xfs_log_work_queue(mp);
 	if (xlog_recovery_needed(log))
 		error = xlog_recover_finish(log);
-	if (!error)
-		xfs_log_work_queue(mp);
 	mp->m_super->s_flags &= ~SB_ACTIVE;
 	evict_inodes(mp->m_super);
 
-- 
2.33.0


  reply	other threads:[~2022-03-09  1:55 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-09  1:55 PATCH [0/4 V2] xfs: log recovery hang fixes Dave Chinner
2022-03-09  1:55 ` Dave Chinner [this message]
2022-03-10 23:46   ` [PATCH 1/4] xfs: log worker needs to start before intent/unlink recovery Darrick J. Wong
2022-03-11  0:13     ` Dave Chinner
2022-03-09  1:55 ` [PATCH 2/4] xfs: check buffer pin state after locking in delwri_submit Dave Chinner
2022-03-09  1:55 ` [PATCH 3/4] xfs: xfs_ail_push_all_sync() stalls when racing with updates Dave Chinner
2022-03-09  1:55 ` [PATCH 4/4] xfs: async CIL flushes need pending pushes to be made stable Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220309015512.2648074-2-david@fromorbit.com \
    --to=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.