From: Donald Buczek <buczek@molgen.mpg.de>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
it+linux-xfs@molgen.mpg.de
Subject: Re: v5.10.1 xfs deadlock
Date: Tue, 29 Dec 2020 00:13:33 +0100 [thread overview]
Message-ID: <1705b481-16db-391e-48a8-a932d1f137e7@molgen.mpg.de> (raw)
In-Reply-To: <066cb9e2-f583-b2c7-f42c-861568d38e2f@molgen.mpg.de>
On 27.12.20 18:34, Donald Buczek wrote:
> On 18.12.20 19:35, Donald Buczek wrote:
>> On 18.12.20 16:35, Brian Foster wrote:
>>> On Thu, Dec 17, 2020 at 10:30:37PM +0100, Donald Buczek wrote:
>>>> On 17.12.20 20:43, Brian Foster wrote:
>>>>> On Thu, Dec 17, 2020 at 06:44:51PM +0100, Donald Buczek wrote:
>>>>>> Dear xfs developer,
>>>>>>
>>>>>> I was doing some testing on a Linux 5.10.1 system with two 100 TB xfs filesystems on md raid6 raids.
>>>>>>
>>>>>> The stress test was essentially `cp -a`ing a Linux source repository with two threads in parallel on each filesystem.
>>>>>>
>>>>>> After about on hour, the processes to one filesystem (md1) blocked, 30 minutes later the process to the other filesystem (md0) did.
>>>>>>
>>>>>> root 7322 2167 0 Dec16 pts/1 00:00:06 cp -a /jbod/M8068/scratch/linux /jbod/M8068/scratch/1/linux.018.TMP
>>>>>> root 7329 2169 0 Dec16 pts/1 00:00:05 cp -a /jbod/M8068/scratch/linux /jbod/M8068/scratch/2/linux.019.TMP
>>>>>> root 13856 2170 0 Dec16 pts/1 00:00:08 cp -a /jbod/M8067/scratch/linux /jbod/M8067/scratch/2/linux.028.TMP
>>>>>> root 13899 2168 0 Dec16 pts/1 00:00:05 cp -a /jbod/M8067/scratch/linux /jbod/M8067/scratch/1/linux.027.TMP
>>>>>>
>>>
>>> Do you have any indication of whether these workloads actually hung or
>>> just became incredibly slow?
>>
>> There is zero progress. iostat doesn't show any I/O on any of the block devices (md or member)
>>
>>>>>> Some info from the system (all stack traces, slabinfo) is available here: https://owww.molgen.mpg.de/~buczek/2020-12-16.info.txt
>>>>>>
>>>>>> It stands out, that there are many (549 for md0, but only 10 for md1) "xfs-conv" threads all with stacks like this
>>>>>>
>>>>>> [<0>] xfs_log_commit_cil+0x6cc/0x7c0
>>>>>> [<0>] __xfs_trans_commit+0xab/0x320
>>>>>> [<0>] xfs_iomap_write_unwritten+0xcb/0x2e0
>>>>>> [<0>] xfs_end_ioend+0xc6/0x110
>>>>>> [<0>] xfs_end_io+0xad/0xe0
>>>>>> [<0>] process_one_work+0x1dd/0x3e0
>>>>>> [<0>] worker_thread+0x2d/0x3b0
>>>>>> [<0>] kthread+0x118/0x130
>>>>>> [<0>] ret_from_fork+0x22/0x30
>>>>>>
>>>>>> xfs_log_commit_cil+0x6cc is
>>>>>>
>>>>>> xfs_log_commit_cil()
>>>>>> xlog_cil_push_background(log)
>>>>>> xlog_wait(&cil->xc_push_wait, &cil->xc_push_lock);
>>>>>>
>>>
>>> This looks like the transaction commit throttling code. That was
>>> introduced earlier this year in v5.7 via commit 0e7ab7efe7745 ("xfs:
>>> Throttle commits on delayed background CIL push"). The purpose of that
>>> change was to prevent the CIL from growing too large. FWIW, I don't
>>> recall that being a functional problem so it should be possible to
>>> simply remove that blocking point and see if that avoids the problem or
>>> if we simply stall out somewhere else, if you wanted to give that a
>>> test.
>>
>> Will do. Before trying with this commit reverted, I will repeat the test without any change to see if the problem is reproducible at all.
>
> I'm now able to reliably reproduce the deadlock with a little less complex setup (e.g. with only one filesystem involved). One key to that was to run the test against a freshly created filesystem (mkfs).
>
> And, yes, you are right: When I revert ef565ab8cc2e ("xfs: Throttle commits on delayed background CIL push") and 7ee6dfa2a245 ("xfs: fix use-after-free on CIL context on shutdown") the deadlock seems to be gone.
Hash Correction: 0e7ab7efe774 ("xfs: Throttle commits on delayed background CIL push") and c7f87f3984cf ("xfs: fix use-after-free on CIL context on shutdown")
When I tried to remove only the wait:
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -936,7 +936,6 @@ xlog_cil_push_background(
if (cil->xc_ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) {
trace_xfs_log_cil_wait(log, cil->xc_ctx->ticket);
ASSERT(cil->xc_ctx->space_used < log->l_logsize);
- xlog_wait(&cil->xc_push_wait, &cil->xc_push_lock);
return;
}
the system got into a critical overload situation with many kworker threads at 100% CPU and became totally unresponsive after a while, requiring sysrq triggered reboot.
I noticed, that 7ee6dfa2a245 ("xfs: fix use-after-free on CIL context on shutdown") put the `wake_up_all(&ctx->push_wait)` into an additional conditional `if (ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log))` - the same as is used to decide whether the background push is triggered at all. But in the deadlock situation, I've seen waiters waiting on cil->xc_push_wait with ctx->space_used being zero with the debugger:
$L_003: (struct xfs_cil *) 0xffff889886db0700 : {
xc_cil = {next = 0xffff889886db0708, prev = 0xffff889886db0708} /* empty */
xc_ctx = 0xffff888e2cb6dd80 /* --> $L_004 */
xc_push_seq = 31
xc_committing = {next = 0xffff889886db0790, prev = 0xffff889886db0790} /* empty */
xc_current_sequence = 32
xc_push_wait = {
head = {next = 0xffffc9000dbcbd48, prev = 0xffffc90021a1bd20}
}
}
$L_004: (struct xfs_cil_ctx *) 0xffff888e2cb6dd80 : {
sequence = 32
start_lsn = 0
commit_lsn = 0
space_used = 0
committing = {next = 0xffff888e2cb6ddd8, prev = 0xffff888e2cb6ddd8} /* empty */
}
So, out of curiosity I've tried
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -670,8 +670,7 @@ xlog_cil_push_work(
/*
* Wake up any background push waiters now this context is being pushed.
*/
- if (ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log))
- wake_up_all(&cil->xc_push_wait);
+ wake_up_all(&cil->xc_push_wait);
/*
* Check if we've anything to push. If there is nothing, then we don't
and this seems to fix the problem. But I don't understand, how this is possible.
next prev parent reply other threads:[~2020-12-28 23:14 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-17 17:44 v5.10.1 xfs deadlock Donald Buczek
2020-12-17 19:43 ` Brian Foster
2020-12-17 21:30 ` Donald Buczek
2020-12-18 15:35 ` Brian Foster
2020-12-18 18:35 ` Donald Buczek
2020-12-27 17:34 ` Donald Buczek
2020-12-28 23:13 ` Donald Buczek [this message]
2020-12-29 23:56 ` [PATCH] xfs: Wake CIL push waiters more reliably Donald Buczek
2020-12-30 22:16 ` Dave Chinner
2020-12-31 11:48 ` Donald Buczek
2020-12-31 21:59 ` Dave Chinner
2021-01-02 19:12 ` Donald Buczek
2021-01-02 22:44 ` Dave Chinner
2021-01-03 16:03 ` Donald Buczek
2021-01-07 22:19 ` Dave Chinner
2021-01-09 14:39 ` Donald Buczek
2021-01-04 16:23 ` Brian Foster
2021-01-07 21:54 ` Dave Chinner
2021-01-08 16:56 ` Brian Foster
2021-01-11 16:38 ` Brian Foster
2021-01-13 21:53 ` Dave Chinner
2021-02-15 13:36 ` Donald Buczek
2021-02-16 11:18 ` Brian Foster
2021-02-16 12:40 ` Donald Buczek
2021-01-13 21:44 ` Dave Chinner
[not found] ` <20201230024642.2171-1-hdanton@sina.com>
2020-12-30 16:54 ` Donald Buczek
2020-12-18 21:49 ` v5.10.1 xfs deadlock Dave Chinner
2020-12-21 12:22 ` Donald Buczek
2020-12-27 17:22 ` Donald Buczek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1705b481-16db-391e-48a8-a932d1f137e7@molgen.mpg.de \
--to=buczek@molgen.mpg.de \
--cc=bfoster@redhat.com \
--cc=it+linux-xfs@molgen.mpg.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).