linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Donald Buczek <buczek@molgen.mpg.de>
Cc: linux-xfs@vger.kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	it+linux-xfs@molgen.mpg.de
Subject: Re: v5.10.1 xfs deadlock
Date: Thu, 17 Dec 2020 14:43:17 -0500	[thread overview]
Message-ID: <20201217194317.GD2507317@bfoster> (raw)
In-Reply-To: <b8da4aed-ee44-5d9f-88dc-3d32f0298564@molgen.mpg.de>

On Thu, Dec 17, 2020 at 06:44:51PM +0100, Donald Buczek wrote:
> Dear xfs developer,
> 
> I was doing some testing on a Linux 5.10.1 system with two 100 TB xfs filesystems on md raid6 raids.
> 
> The stress test was essentially `cp -a`ing a Linux source repository with two threads in parallel on each filesystem.
> 
> After about on hour, the processes to one filesystem (md1) blocked, 30 minutes later the process to the other filesystem (md0) did.
> 
>     root      7322  2167  0 Dec16 pts/1    00:00:06 cp -a /jbod/M8068/scratch/linux /jbod/M8068/scratch/1/linux.018.TMP
>     root      7329  2169  0 Dec16 pts/1    00:00:05 cp -a /jbod/M8068/scratch/linux /jbod/M8068/scratch/2/linux.019.TMP
>     root     13856  2170  0 Dec16 pts/1    00:00:08 cp -a /jbod/M8067/scratch/linux /jbod/M8067/scratch/2/linux.028.TMP
>     root     13899  2168  0 Dec16 pts/1    00:00:05 cp -a /jbod/M8067/scratch/linux /jbod/M8067/scratch/1/linux.027.TMP
> 
> Some info from the system (all stack traces, slabinfo) is available here: https://owww.molgen.mpg.de/~buczek/2020-12-16.info.txt
> 
> It stands out, that there are many (549 for md0, but only 10 for md1)  "xfs-conv" threads all with stacks like this
> 
>     [<0>] xfs_log_commit_cil+0x6cc/0x7c0
>     [<0>] __xfs_trans_commit+0xab/0x320
>     [<0>] xfs_iomap_write_unwritten+0xcb/0x2e0
>     [<0>] xfs_end_ioend+0xc6/0x110
>     [<0>] xfs_end_io+0xad/0xe0
>     [<0>] process_one_work+0x1dd/0x3e0
>     [<0>] worker_thread+0x2d/0x3b0
>     [<0>] kthread+0x118/0x130
>     [<0>] ret_from_fork+0x22/0x30
> 
> xfs_log_commit_cil+0x6cc is
> 
>   xfs_log_commit_cil()
>     xlog_cil_push_background(log)
>       xlog_wait(&cil->xc_push_wait, &cil->xc_push_lock);
> 
> Some other threads, including the four "cp" commands are also blocking at xfs_log_commit_cil+0x6cc
> 
> There are also single "flush" process for each md device with this stack signature:
> 
>     [<0>] xfs_map_blocks+0xbf/0x400
>     [<0>] iomap_do_writepage+0x15e/0x880
>     [<0>] write_cache_pages+0x175/0x3f0
>     [<0>] iomap_writepages+0x1c/0x40
>     [<0>] xfs_vm_writepages+0x59/0x80
>     [<0>] do_writepages+0x4b/0xe0
>     [<0>] __writeback_single_inode+0x42/0x300
>     [<0>] writeback_sb_inodes+0x198/0x3f0
>     [<0>] __writeback_inodes_wb+0x5e/0xc0
>     [<0>] wb_writeback+0x246/0x2d0
>     [<0>] wb_workfn+0x26e/0x490
>     [<0>] process_one_work+0x1dd/0x3e0
>     [<0>] worker_thread+0x2d/0x3b0
>     [<0>] kthread+0x118/0x130
>     [<0>] ret_from_fork+0x22/0x30
> 
> xfs_map_blocks+0xbf is the
> 
>     xfs_ilock(ip, XFS_ILOCK_SHARED);
> 
> in xfs_map_blocks().
> 
> The system is low on free memory
> 
>     MemTotal:       197587764 kB
>     MemFree:          2196496 kB
>     MemAvailable:   189895408 kB
> 
> but responsive.
> 
> I have an out of tree driver for the HBA ( smartpqi 2.1.6-005 pulled from linux-scsi) , but it is unlikely that this blocking is related to that, because the md block devices itself are responsive (`xxd /dev/md0` )
> 
> I can keep the system in the state for a while. Is there an idea what was going from or an idea what data I could collect from the running system to help? I have full debug info and could walk lists or retrieve data structures with gdb.
> 

It might be useful to dump the values under /sys/fs/xfs/<dev>/log/* for
each fs to get an idea of the state of the logs as well...

Brian

> Best
>   Donald
> 


  reply	other threads:[~2020-12-17 19:45 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-17 17:44 v5.10.1 xfs deadlock Donald Buczek
2020-12-17 19:43 ` Brian Foster [this message]
2020-12-17 21:30   ` Donald Buczek
2020-12-18 15:35     ` Brian Foster
2020-12-18 18:35       ` Donald Buczek
2020-12-27 17:34         ` Donald Buczek
2020-12-28 23:13           ` Donald Buczek
2020-12-29 23:56             ` [PATCH] xfs: Wake CIL push waiters more reliably Donald Buczek
2020-12-30 22:16               ` Dave Chinner
2020-12-31 11:48                 ` Donald Buczek
2020-12-31 21:59                   ` Dave Chinner
2021-01-02 19:12                     ` Donald Buczek
2021-01-02 22:44                       ` Dave Chinner
2021-01-03 16:03                         ` Donald Buczek
2021-01-07 22:19                           ` Dave Chinner
2021-01-09 14:39                             ` Donald Buczek
2021-01-04 16:23                 ` Brian Foster
2021-01-07 21:54                   ` Dave Chinner
2021-01-08 16:56                     ` Brian Foster
2021-01-11 16:38                       ` Brian Foster
2021-01-13 21:53                         ` Dave Chinner
2021-02-15 13:36                           ` Donald Buczek
2021-02-16 11:18                             ` Brian Foster
2021-02-16 12:40                               ` Donald Buczek
2021-01-13 21:44                       ` Dave Chinner
     [not found]             ` <20201230024642.2171-1-hdanton@sina.com>
2020-12-30 16:54               ` Donald Buczek
2020-12-18 21:49 ` v5.10.1 xfs deadlock Dave Chinner
2020-12-21 12:22   ` Donald Buczek
2020-12-27 17:22     ` Donald Buczek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201217194317.GD2507317@bfoster \
    --to=bfoster@redhat.com \
    --cc=buczek@molgen.mpg.de \
    --cc=it+linux-xfs@molgen.mpg.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).