nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	brho@google.com, Matthew Wilcox <willy@infradead.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>
Subject: Re: [PATCH] dax: Fix deadlock in dax_lock_mapping_entry()
Date: Thu, 27 Sep 2018 11:22:22 -0700	[thread overview]
Message-ID: <CAPcyv4hut3P5fiNHgjpf4Qg_nBo9JKkovCdrUnzXyji=ey8zJw@mail.gmail.com> (raw)
In-Reply-To: <20180927134129.GB8800@quack2.suse.cz>

On Thu, Sep 27, 2018 at 6:41 AM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 27-09-18 06:28:43, Matthew Wilcox wrote:
> > On Thu, Sep 27, 2018 at 01:23:32PM +0200, Jan Kara wrote:
> > > When dax_lock_mapping_entry() has to sleep to obtain entry lock, it will
> > > fail to unlock mapping->i_pages spinlock and thus immediately deadlock
> > > against itself when retrying to grab the entry lock again. Fix the
> > > problem by unlocking mapping->i_pages before retrying.
> >
> > It seems weird that xfstests doesn't provoke this ...
>
> The function currently gets called only from mm/memory-failure.c. And yes,
> we are lacking DAX hwpoison error tests in fstests...

I have an item on my backlog to port the ndctl unit test that does
memory_failure() injection vs ext4 over to fstests. That said I've
been investigating a deadlock on ext4 caused by this test. When I saw
this patch I hoped it was root cause, but the test is still failing
for me. Vishal is able to pass the test on his system, so the failure
mode is timing dependent. I'm running this patch on top of -rc5 and
still seeing the following deadlock.

    EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
    EXT4-fs (pmem0): mounted filesystem with ordered data mode. Opts: dax
    Injecting memory failure for pfn 0x208900 at process virtual
address 0x7f5872900000
    Memory failure: 0x208900: Killing dax-pmd:7095 due to hardware
memory corruption
    Memory failure: 0x208900: recovery action for dax page: Recovered
    watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [dax-pmd:7095]
    [..]
    irq event stamp: 121911146
    hardirqs last  enabled at (121911145): [<ffffffff81aa1bd9>]
_raw_spin_unlock_irq+0x29/0x40    hardirqs last disabled at
(121911146): [<ffffffff810037a3>] trace_hardirqs_off_thunk+0x1a/0x1c
    softirqs last  enabled at (78238674): [<ffffffff81e0032e>]
__do_softirq+0x32e/0x428
    softirqs last disabled at (78238627): [<ffffffff810bc6f6>]
irq_exit+0xf6/0x100
    CPU: 35 PID: 7095 Comm: dax-pmd Tainted: G           OE
4.19.0-rc5+ #2394
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
    RIP: 0010:lock_release+0x134/0x2a0
    [..]
    Call Trace:
     find_get_entries+0x299/0x3c0
     pagevec_lookup_entries+0x1a/0x30
     dax_layout_busy_page+0x9c/0x280
     ? __lock_acquire+0x12fa/0x1310
     ext4_break_layouts+0x48/0x100
     ? ext4_punch_hole+0x108/0x5a0
     ext4_punch_hole+0x110/0x5a0
     ext4_fallocate+0x189/0xa40
     ? rcu_read_lock_sched_held+0x6b/0x80
     ? rcu_sync_lockdep_assert+0x2e/0x60
     vfs_fallocate+0x13f/0x270

The same test against xfs is not failing for me. I have been seeking
some focus time to dig in on this.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

  reply	other threads:[~2018-09-27 18:22 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-27 11:23 [PATCH] dax: Fix deadlock in dax_lock_mapping_entry() Jan Kara
2018-09-27 13:28 ` Matthew Wilcox
2018-09-27 13:41   ` Jan Kara
2018-09-27 18:22     ` Dan Williams [this message]
2018-10-04 16:27       ` Jan Kara
2018-10-05  1:57         ` Dan Williams
2018-10-05  2:52           ` Matthew Wilcox
2018-10-05  4:01             ` Dan Williams
2018-10-05  4:28               ` Dan Williams
2018-10-05  9:54                 ` Jan Kara
2018-10-06 18:04                   ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4hut3P5fiNHgjpf4Qg_nBo9JKkovCdrUnzXyji=ey8zJw@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=brho@google.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).