All of lore.kernel.org
 help / color / mirror / Atom feed
From: Donald Buczek <buczek@molgen.mpg.de>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	it+linux-xfs@molgen.mpg.de
Subject: Re: v5.10.1 xfs deadlock
Date: Sun, 27 Dec 2020 18:22:53 +0100	[thread overview]
Message-ID: <38614fd9-dc3f-15f9-e922-0b3d80b3b6c4@molgen.mpg.de> (raw)
In-Reply-To: <c89ecac6-f5bb-52b8-4fcd-9098983cdf2e@molgen.mpg.de>

On 21.12.20 13:22, Donald Buczek wrote:
> On 18.12.20 22:49, Dave Chinner wrote:
>> On Thu, Dec 17, 2020 at 06:44:51PM +0100, Donald Buczek wrote:
>>> Dear xfs developer,
>>>
>>> I was doing some testing on a Linux 5.10.1 system with two 100 TB xfs filesystems on md raid6 raids.
>>>
>>> The stress test was essentially `cp -a`ing a Linux source repository with two threads in parallel on each filesystem.
>>>
>>> After about on hour, the processes to one filesystem (md1) blocked, 30 minutes later the process to the other filesystem (md0) did.
>>>
>>>      root      7322  2167  0 Dec16 pts/1    00:00:06 cp -a /jbod/M8068/scratch/linux /jbod/M8068/scratch/1/linux.018.TMP
>>>      root      7329  2169  0 Dec16 pts/1    00:00:05 cp -a /jbod/M8068/scratch/linux /jbod/M8068/scratch/2/linux.019.TMP
>>>      root     13856  2170  0 Dec16 pts/1    00:00:08 cp -a /jbod/M8067/scratch/linux /jbod/M8067/scratch/2/linux.028.TMP
>>>      root     13899  2168  0 Dec16 pts/1    00:00:05 cp -a /jbod/M8067/scratch/linux /jbod/M8067/scratch/1/linux.027.TMP
>>>
>>> Some info from the system (all stack traces, slabinfo) is available here: https://owww.molgen.mpg.de/~buczek/2020-12-16.info.txt
>>>
>>> It stands out, that there are many (549 for md0, but only 10 for md1)  "xfs-conv" threads all with stacks like this
>>>
>>>      [<0>] xfs_log_commit_cil+0x6cc/0x7c0
>>>      [<0>] __xfs_trans_commit+0xab/0x320
>>>      [<0>] xfs_iomap_write_unwritten+0xcb/0x2e0
>>>      [<0>] xfs_end_ioend+0xc6/0x110
>>>      [<0>] xfs_end_io+0xad/0xe0
>>>      [<0>] process_one_work+0x1dd/0x3e0
>>>      [<0>] worker_thread+0x2d/0x3b0
>>>      [<0>] kthread+0x118/0x130
>>>      [<0>] ret_from_fork+0x22/0x30
>>>
>>> xfs_log_commit_cil+0x6cc is
>>>
>>>    xfs_log_commit_cil()
>>>      xlog_cil_push_background(log)
>>>        xlog_wait(&cil->xc_push_wait, &cil->xc_push_lock);
>>>
>>> Some other threads, including the four "cp" commands are also blocking at xfs_log_commit_cil+0x6cc
>>>
>>> There are also single "flush" process for each md device with this stack signature:
>>>
>>>      [<0>] xfs_map_blocks+0xbf/0x400
>>>      [<0>] iomap_do_writepage+0x15e/0x880
>>>      [<0>] write_cache_pages+0x175/0x3f0
>>>      [<0>] iomap_writepages+0x1c/0x40
>>>      [<0>] xfs_vm_writepages+0x59/0x80
>>>      [<0>] do_writepages+0x4b/0xe0
>>>      [<0>] __writeback_single_inode+0x42/0x300
>>>      [<0>] writeback_sb_inodes+0x198/0x3f0
>>>      [<0>] __writeback_inodes_wb+0x5e/0xc0
>>>      [<0>] wb_writeback+0x246/0x2d0
>>>      [<0>] wb_workfn+0x26e/0x490
>>>      [<0>] process_one_work+0x1dd/0x3e0
>>>      [<0>] worker_thread+0x2d/0x3b0
>>>      [<0>] kthread+0x118/0x130
>>>      [<0>] ret_from_fork+0x22/0x30
>>>
>>> xfs_map_blocks+0xbf is the
>>>
>>>      xfs_ilock(ip, XFS_ILOCK_SHARED);
>>>
>>> in xfs_map_blocks().
>>
>> Can you post the entire dmesg output after running
>> 'echo w > /proc/sysrq-trigger' to dump all the block threads to
>> dmesg?
>>
>>> I have an out of tree driver for the HBA ( smartpqi 2.1.6-005
>>> pulled from linux-scsi) , but it is unlikely that this blocking is
>>> related to that, because the md block devices itself are
>>> responsive (`xxd /dev/md0` )
>>
>> My bet is that the OOT driver/hardware had dropped a log IO on the
>> floor - XFS is waiting for the CIL push to complete, and I'm betting
>> that is stuck waiting for iclog IO completion while writing the CIL
>> to the journal. The sysrq output will tell us if this is the case,
>> so that's the first place to look.
> 
> I think you are right here, and I'm sorry for blaming the wrong layer.

I was to fast to accept that. The display of the non-zero "inflight" counters of the underlying block devices couldn't be reproduced. They usually are "0 0" when the filesystem is deadlocked.

To make sure, I've also added my on "bio requested" and "bio completed" atomic counters to the two submit_bio calls and to xlog_bio_end_io in xfs_log.c and when the system is deadlocked, the requests and completions do match:

     sudo gdb linux/vmlinux /proc/kcore  -ex 'print xxx_bio_requested' -ex 'print xxx_bio_completed'
     [...]
     $1 = {counter = 34723}
     $2 = {counter = 34723}

So at least log writes are not lost.

Best
   Donald

> I've got the system into another (though little different) zero-progress situation. This time it happened on md0 only while md1 was still working.
> 
> I think, this should be prove, that the failure is on the block layer of the member disks:
> 
>      root:deadbird:/scratch/local/# for f in /sys/devices/virtual/block/md?/md/rd*/block/inflight;do echo $f: $(cat $f);done
>      /sys/devices/virtual/block/md0/md/rd0/block/inflight: 1 0
>      /sys/devices/virtual/block/md0/md/rd1/block/inflight: 1 0
>      /sys/devices/virtual/block/md0/md/rd10/block/inflight: 1 0
>      /sys/devices/virtual/block/md0/md/rd11/block/inflight: 1 0
>      /sys/devices/virtual/block/md0/md/rd12/block/inflight: 1 0
>      /sys/devices/virtual/block/md0/md/rd13/block/inflight: 1 0
>      /sys/devices/virtual/block/md0/md/rd14/block/inflight: 1 0
>      /sys/devices/virtual/block/md0/md/rd15/block/inflight: 1 0
>      /sys/devices/virtual/block/md0/md/rd2/block/inflight: 1 0
>      /sys/devices/virtual/block/md0/md/rd3/block/inflight: 1 0
>      /sys/devices/virtual/block/md0/md/rd4/block/inflight: 1 0
>      /sys/devices/virtual/block/md0/md/rd5/block/inflight: 1 0
>      /sys/devices/virtual/block/md0/md/rd6/block/inflight: 1 0
>      /sys/devices/virtual/block/md0/md/rd7/block/inflight: 1 0
>      /sys/devices/virtual/block/md0/md/rd8/block/inflight: 1 0
>      /sys/devices/virtual/block/md0/md/rd9/block/inflight: 1 0
>      /sys/devices/virtual/block/md1/md/rd0/block/inflight: 0 0
>      /sys/devices/virtual/block/md1/md/rd1/block/inflight: 0 0
>      /sys/devices/virtual/block/md1/md/rd10/block/inflight: 0 0
>      /sys/devices/virtual/block/md1/md/rd11/block/inflight: 0 0
>      /sys/devices/virtual/block/md1/md/rd12/block/inflight: 0 0
>      /sys/devices/virtual/block/md1/md/rd13/block/inflight: 0 0
>      /sys/devices/virtual/block/md1/md/rd14/block/inflight: 0 0
>      /sys/devices/virtual/block/md1/md/rd15/block/inflight: 0 0
>      /sys/devices/virtual/block/md1/md/rd2/block/inflight: 0 0
>      /sys/devices/virtual/block/md1/md/rd3/block/inflight: 0 0
>      /sys/devices/virtual/block/md1/md/rd4/block/inflight: 0 0
>      /sys/devices/virtual/block/md1/md/rd5/block/inflight: 0 0
>      /sys/devices/virtual/block/md1/md/rd6/block/inflight: 0 0
>      /sys/devices/virtual/block/md1/md/rd7/block/inflight: 0 0
>      /sys/devices/virtual/block/md1/md/rd8/block/inflight: 0 0
>      /sys/devices/virtual/block/md1/md/rd9/block/inflight: 0 0
> 
> Best
> 
>    Donald
> 
>>
>> Cheers,
>>
>> Dave.
>>

-- 
Donald Buczek
buczek@molgen.mpg.de
Tel: +49 30 8413 1433

      reply	other threads:[~2020-12-27 17:23 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-17 17:44 v5.10.1 xfs deadlock Donald Buczek
2020-12-17 19:43 ` Brian Foster
2020-12-17 21:30   ` Donald Buczek
2020-12-18 15:35     ` Brian Foster
2020-12-18 18:35       ` Donald Buczek
2020-12-27 17:34         ` Donald Buczek
2020-12-28 23:13           ` Donald Buczek
2020-12-29 23:56             ` [PATCH] xfs: Wake CIL push waiters more reliably Donald Buczek
2020-12-30 22:16               ` Dave Chinner
2020-12-31 11:48                 ` Donald Buczek
2020-12-31 21:59                   ` Dave Chinner
2021-01-02 19:12                     ` Donald Buczek
2021-01-02 22:44                       ` Dave Chinner
2021-01-03 16:03                         ` Donald Buczek
2021-01-07 22:19                           ` Dave Chinner
2021-01-09 14:39                             ` Donald Buczek
2021-01-04 16:23                 ` Brian Foster
2021-01-07 21:54                   ` Dave Chinner
2021-01-08 16:56                     ` Brian Foster
2021-01-11 16:38                       ` Brian Foster
2021-01-13 21:53                         ` Dave Chinner
2021-02-15 13:36                           ` Donald Buczek
2021-02-16 11:18                             ` Brian Foster
2021-02-16 12:40                               ` Donald Buczek
2021-01-13 21:44                       ` Dave Chinner
     [not found]             ` <20201230024642.2171-1-hdanton@sina.com>
2020-12-30 16:54               ` Donald Buczek
2020-12-18 21:49 ` v5.10.1 xfs deadlock Dave Chinner
2020-12-21 12:22   ` Donald Buczek
2020-12-27 17:22     ` Donald Buczek [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=38614fd9-dc3f-15f9-e922-0b3d80b3b6c4@molgen.mpg.de \
    --to=buczek@molgen.mpg.de \
    --cc=david@fromorbit.com \
    --cc=it+linux-xfs@molgen.mpg.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.