All of lore.kernel.org
 help / color / mirror / Atom feed
From: Donald Buczek <buczek@molgen.mpg.de>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	it+linux-xfs@molgen.mpg.de
Subject: Re: v5.10.1 xfs deadlock
Date: Tue, 29 Dec 2020 00:13:33 +0100	[thread overview]
Message-ID: <1705b481-16db-391e-48a8-a932d1f137e7@molgen.mpg.de> (raw)
In-Reply-To: <066cb9e2-f583-b2c7-f42c-861568d38e2f@molgen.mpg.de>



On 27.12.20 18:34, Donald Buczek wrote:
> On 18.12.20 19:35, Donald Buczek wrote:
>> On 18.12.20 16:35, Brian Foster wrote:
>>> On Thu, Dec 17, 2020 at 10:30:37PM +0100, Donald Buczek wrote:
>>>> On 17.12.20 20:43, Brian Foster wrote:
>>>>> On Thu, Dec 17, 2020 at 06:44:51PM +0100, Donald Buczek wrote:
>>>>>> Dear xfs developer,
>>>>>>
>>>>>> I was doing some testing on a Linux 5.10.1 system with two 100 TB xfs filesystems on md raid6 raids.
>>>>>>
>>>>>> The stress test was essentially `cp -a`ing a Linux source repository with two threads in parallel on each filesystem.
>>>>>>
>>>>>> After about on hour, the processes to one filesystem (md1) blocked, 30 minutes later the process to the other filesystem (md0) did.
>>>>>>
>>>>>>       root      7322  2167  0 Dec16 pts/1    00:00:06 cp -a /jbod/M8068/scratch/linux /jbod/M8068/scratch/1/linux.018.TMP
>>>>>>       root      7329  2169  0 Dec16 pts/1    00:00:05 cp -a /jbod/M8068/scratch/linux /jbod/M8068/scratch/2/linux.019.TMP
>>>>>>       root     13856  2170  0 Dec16 pts/1    00:00:08 cp -a /jbod/M8067/scratch/linux /jbod/M8067/scratch/2/linux.028.TMP
>>>>>>       root     13899  2168  0 Dec16 pts/1    00:00:05 cp -a /jbod/M8067/scratch/linux /jbod/M8067/scratch/1/linux.027.TMP
>>>>>>
>>>
>>> Do you have any indication of whether these workloads actually hung or
>>> just became incredibly slow?
>>
>> There is zero progress. iostat doesn't show any I/O on any of the block devices (md or member)
>>
>>>>>> Some info from the system (all stack traces, slabinfo) is available here: https://owww.molgen.mpg.de/~buczek/2020-12-16.info.txt
>>>>>>
>>>>>> It stands out, that there are many (549 for md0, but only 10 for md1)  "xfs-conv" threads all with stacks like this
>>>>>>
>>>>>>       [<0>] xfs_log_commit_cil+0x6cc/0x7c0
>>>>>>       [<0>] __xfs_trans_commit+0xab/0x320
>>>>>>       [<0>] xfs_iomap_write_unwritten+0xcb/0x2e0
>>>>>>       [<0>] xfs_end_ioend+0xc6/0x110
>>>>>>       [<0>] xfs_end_io+0xad/0xe0
>>>>>>       [<0>] process_one_work+0x1dd/0x3e0
>>>>>>       [<0>] worker_thread+0x2d/0x3b0
>>>>>>       [<0>] kthread+0x118/0x130
>>>>>>       [<0>] ret_from_fork+0x22/0x30
>>>>>>
>>>>>> xfs_log_commit_cil+0x6cc is
>>>>>>
>>>>>>     xfs_log_commit_cil()
>>>>>>       xlog_cil_push_background(log)
>>>>>>         xlog_wait(&cil->xc_push_wait, &cil->xc_push_lock);
>>>>>>
>>>
>>> This looks like the transaction commit throttling code. That was
>>> introduced earlier this year in v5.7 via commit 0e7ab7efe7745 ("xfs:
>>> Throttle commits on delayed background CIL push"). The purpose of that
>>> change was to prevent the CIL from growing too large. FWIW, I don't
>>> recall that being a functional problem so it should be possible to
>>> simply remove that blocking point and see if that avoids the problem or
>>> if we simply stall out somewhere else, if you wanted to give that a
>>> test.
>>
>> Will do. Before trying with this commit reverted, I will repeat the test without any change to see if the problem is reproducible at all.
> 
> I'm now able to reliably reproduce the deadlock with a little less complex setup (e.g. with only one filesystem involved). One key to that was to run the test against a freshly created filesystem (mkfs).
> 
> And, yes, you are right: When I revert ef565ab8cc2e ("xfs: Throttle commits on delayed background CIL push") and 7ee6dfa2a245 ("xfs: fix use-after-free on CIL context on shutdown") the deadlock seems to be gone.

Hash Correction: 0e7ab7efe774 ("xfs: Throttle commits on delayed background CIL push") and c7f87f3984cf ("xfs: fix use-after-free on CIL context on shutdown")

When I tried to remove only the wait:

     --- a/fs/xfs/xfs_log_cil.c
     +++ b/fs/xfs/xfs_log_cil.c
     @@ -936,7 +936,6 @@ xlog_cil_push_background(
             if (cil->xc_ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) {
                     trace_xfs_log_cil_wait(log, cil->xc_ctx->ticket);
                     ASSERT(cil->xc_ctx->space_used < log->l_logsize);
     -               xlog_wait(&cil->xc_push_wait, &cil->xc_push_lock);
                     return;
             }

the system got into a critical overload situation with many kworker threads at 100% CPU and became totally unresponsive after a while, requiring sysrq triggered reboot.

I noticed, that 7ee6dfa2a245 ("xfs: fix use-after-free on CIL context on shutdown") put the `wake_up_all(&ctx->push_wait)` into an additional conditional `if (ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log))` - the same as is used to decide whether the background push is triggered at all. But in the deadlock situation, I've seen waiters waiting on cil->xc_push_wait with ctx->space_used being zero with the debugger:

     $L_003: (struct xfs_cil *) 0xffff889886db0700 : {
         xc_cil = {next = 0xffff889886db0708, prev = 0xffff889886db0708}    /* empty */
         xc_ctx = 0xffff888e2cb6dd80 /* --> $L_004 */
         xc_push_seq = 31
         xc_committing = {next = 0xffff889886db0790, prev = 0xffff889886db0790}    /* empty */
         xc_current_sequence = 32
         xc_push_wait = {
             head = {next = 0xffffc9000dbcbd48, prev = 0xffffc90021a1bd20}
         }
     }
     $L_004: (struct xfs_cil_ctx *) 0xffff888e2cb6dd80 : {
         sequence = 32
         start_lsn = 0
         commit_lsn = 0
         space_used = 0
         committing = {next = 0xffff888e2cb6ddd8, prev = 0xffff888e2cb6ddd8}    /* empty */
     }


So, out of curiosity I've tried

     --- a/fs/xfs/xfs_log_cil.c
     +++ b/fs/xfs/xfs_log_cil.c
     @@ -670,8 +670,7 @@ xlog_cil_push_work(
             /*
              * Wake up any background push waiters now this context is being pushed.
              */
     -       if (ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log))
     -               wake_up_all(&cil->xc_push_wait);
     +       wake_up_all(&cil->xc_push_wait);
      
             /*
              * Check if we've anything to push. If there is nothing, then we don't

and this seems to fix the problem. But I don't understand, how this is possible.

  reply	other threads:[~2020-12-28 23:14 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-17 17:44 v5.10.1 xfs deadlock Donald Buczek
2020-12-17 19:43 ` Brian Foster
2020-12-17 21:30   ` Donald Buczek
2020-12-18 15:35     ` Brian Foster
2020-12-18 18:35       ` Donald Buczek
2020-12-27 17:34         ` Donald Buczek
2020-12-28 23:13           ` Donald Buczek [this message]
2020-12-29 23:56             ` [PATCH] xfs: Wake CIL push waiters more reliably Donald Buczek
2020-12-30 22:16               ` Dave Chinner
2020-12-31 11:48                 ` Donald Buczek
2020-12-31 21:59                   ` Dave Chinner
2021-01-02 19:12                     ` Donald Buczek
2021-01-02 22:44                       ` Dave Chinner
2021-01-03 16:03                         ` Donald Buczek
2021-01-07 22:19                           ` Dave Chinner
2021-01-09 14:39                             ` Donald Buczek
2021-01-04 16:23                 ` Brian Foster
2021-01-07 21:54                   ` Dave Chinner
2021-01-08 16:56                     ` Brian Foster
2021-01-11 16:38                       ` Brian Foster
2021-01-13 21:53                         ` Dave Chinner
2021-02-15 13:36                           ` Donald Buczek
2021-02-16 11:18                             ` Brian Foster
2021-02-16 12:40                               ` Donald Buczek
2021-01-13 21:44                       ` Dave Chinner
     [not found]             ` <20201230024642.2171-1-hdanton@sina.com>
2020-12-30 16:54               ` Donald Buczek
2020-12-18 21:49 ` v5.10.1 xfs deadlock Dave Chinner
2020-12-21 12:22   ` Donald Buczek
2020-12-27 17:22     ` Donald Buczek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1705b481-16db-391e-48a8-a932d1f137e7@molgen.mpg.de \
    --to=buczek@molgen.mpg.de \
    --cc=bfoster@redhat.com \
    --cc=it+linux-xfs@molgen.mpg.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.