From: Jan Kara <jack@suse.cz> To: Dan Williams <dan.j.williams@intel.com> Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>, brho@google.com, Jan Kara <jack@suse.cz>, Matthew Wilcox <willy@infradead.org>, linux-nvdimm <linux-nvdimm@lists.01.org> Subject: Re: [PATCH] dax: Fix deadlock in dax_lock_mapping_entry() Date: Thu, 4 Oct 2018 18:27:48 +0200 [thread overview] Message-ID: <20181004162748.GI28384@quack2.suse.cz> (raw) In-Reply-To: <CAPcyv4hut3P5fiNHgjpf4Qg_nBo9JKkovCdrUnzXyji=ey8zJw@mail.gmail.com> On Thu 27-09-18 11:22:22, Dan Williams wrote: > On Thu, Sep 27, 2018 at 6:41 AM Jan Kara <jack@suse.cz> wrote: > > > > On Thu 27-09-18 06:28:43, Matthew Wilcox wrote: > > > On Thu, Sep 27, 2018 at 01:23:32PM +0200, Jan Kara wrote: > > > > When dax_lock_mapping_entry() has to sleep to obtain entry lock, it will > > > > fail to unlock mapping->i_pages spinlock and thus immediately deadlock > > > > against itself when retrying to grab the entry lock again. Fix the > > > > problem by unlocking mapping->i_pages before retrying. > > > > > > It seems weird that xfstests doesn't provoke this ... > > > > The function currently gets called only from mm/memory-failure.c. And yes, > > we are lacking DAX hwpoison error tests in fstests... > > I have an item on my backlog to port the ndctl unit test that does > memory_failure() injection vs ext4 over to fstests. That said I've > been investigating a deadlock on ext4 caused by this test. When I saw > this patch I hoped it was root cause, but the test is still failing > for me. Vishal is able to pass the test on his system, so the failure > mode is timing dependent. I'm running this patch on top of -rc5 and > still seeing the following deadlock. I went through the code but I don't see where the problem could be. How can I run that test? Is KVM enough or do I need hardware with AEP dimms? Honza > > EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk > EXT4-fs (pmem0): mounted filesystem with ordered data mode. Opts: dax > Injecting memory failure for pfn 0x208900 at process virtual > address 0x7f5872900000 > Memory failure: 0x208900: Killing dax-pmd:7095 due to hardware > memory corruption > Memory failure: 0x208900: recovery action for dax page: Recovered > watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [dax-pmd:7095] > [..] > irq event stamp: 121911146 > hardirqs last enabled at (121911145): [<ffffffff81aa1bd9>] > _raw_spin_unlock_irq+0x29/0x40 hardirqs last disabled at > (121911146): [<ffffffff810037a3>] trace_hardirqs_off_thunk+0x1a/0x1c > softirqs last enabled at (78238674): [<ffffffff81e0032e>] > __do_softirq+0x32e/0x428 > softirqs last disabled at (78238627): [<ffffffff810bc6f6>] > irq_exit+0xf6/0x100 > CPU: 35 PID: 7095 Comm: dax-pmd Tainted: G OE > 4.19.0-rc5+ #2394 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014 > RIP: 0010:lock_release+0x134/0x2a0 > [..] > Call Trace: > find_get_entries+0x299/0x3c0 > pagevec_lookup_entries+0x1a/0x30 > dax_layout_busy_page+0x9c/0x280 > ? __lock_acquire+0x12fa/0x1310 > ext4_break_layouts+0x48/0x100 > ? ext4_punch_hole+0x108/0x5a0 > ext4_punch_hole+0x110/0x5a0 > ext4_fallocate+0x189/0xa40 > ? rcu_read_lock_sched_held+0x6b/0x80 > ? rcu_sync_lockdep_assert+0x2e/0x60 > vfs_fallocate+0x13f/0x270 > > The same test against xfs is not failing for me. I have been seeking > some focus time to dig in on this. -- Jan Kara <jack@suse.com> SUSE Labs, CR _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz> To: Dan Williams <dan.j.williams@intel.com> Cc: Jan Kara <jack@suse.cz>, Matthew Wilcox <willy@infradead.org>, linux-fsdevel <linux-fsdevel@vger.kernel.org>, linux-nvdimm <linux-nvdimm@lists.01.org>, brho@google.com Subject: Re: [PATCH] dax: Fix deadlock in dax_lock_mapping_entry() Date: Thu, 4 Oct 2018 18:27:48 +0200 [thread overview] Message-ID: <20181004162748.GI28384@quack2.suse.cz> (raw) In-Reply-To: <CAPcyv4hut3P5fiNHgjpf4Qg_nBo9JKkovCdrUnzXyji=ey8zJw@mail.gmail.com> On Thu 27-09-18 11:22:22, Dan Williams wrote: > On Thu, Sep 27, 2018 at 6:41 AM Jan Kara <jack@suse.cz> wrote: > > > > On Thu 27-09-18 06:28:43, Matthew Wilcox wrote: > > > On Thu, Sep 27, 2018 at 01:23:32PM +0200, Jan Kara wrote: > > > > When dax_lock_mapping_entry() has to sleep to obtain entry lock, it will > > > > fail to unlock mapping->i_pages spinlock and thus immediately deadlock > > > > against itself when retrying to grab the entry lock again. Fix the > > > > problem by unlocking mapping->i_pages before retrying. > > > > > > It seems weird that xfstests doesn't provoke this ... > > > > The function currently gets called only from mm/memory-failure.c. And yes, > > we are lacking DAX hwpoison error tests in fstests... > > I have an item on my backlog to port the ndctl unit test that does > memory_failure() injection vs ext4 over to fstests. That said I've > been investigating a deadlock on ext4 caused by this test. When I saw > this patch I hoped it was root cause, but the test is still failing > for me. Vishal is able to pass the test on his system, so the failure > mode is timing dependent. I'm running this patch on top of -rc5 and > still seeing the following deadlock. I went through the code but I don't see where the problem could be. How can I run that test? Is KVM enough or do I need hardware with AEP dimms? Honza > > EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk > EXT4-fs (pmem0): mounted filesystem with ordered data mode. Opts: dax > Injecting memory failure for pfn 0x208900 at process virtual > address 0x7f5872900000 > Memory failure: 0x208900: Killing dax-pmd:7095 due to hardware > memory corruption > Memory failure: 0x208900: recovery action for dax page: Recovered > watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [dax-pmd:7095] > [..] > irq event stamp: 121911146 > hardirqs last enabled at (121911145): [<ffffffff81aa1bd9>] > _raw_spin_unlock_irq+0x29/0x40 hardirqs last disabled at > (121911146): [<ffffffff810037a3>] trace_hardirqs_off_thunk+0x1a/0x1c > softirqs last enabled at (78238674): [<ffffffff81e0032e>] > __do_softirq+0x32e/0x428 > softirqs last disabled at (78238627): [<ffffffff810bc6f6>] > irq_exit+0xf6/0x100 > CPU: 35 PID: 7095 Comm: dax-pmd Tainted: G OE > 4.19.0-rc5+ #2394 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014 > RIP: 0010:lock_release+0x134/0x2a0 > [..] > Call Trace: > find_get_entries+0x299/0x3c0 > pagevec_lookup_entries+0x1a/0x30 > dax_layout_busy_page+0x9c/0x280 > ? __lock_acquire+0x12fa/0x1310 > ext4_break_layouts+0x48/0x100 > ? ext4_punch_hole+0x108/0x5a0 > ext4_punch_hole+0x110/0x5a0 > ext4_fallocate+0x189/0xa40 > ? rcu_read_lock_sched_held+0x6b/0x80 > ? rcu_sync_lockdep_assert+0x2e/0x60 > vfs_fallocate+0x13f/0x270 > > The same test against xfs is not failing for me. I have been seeking > some focus time to dig in on this. -- Jan Kara <jack@suse.com> SUSE Labs, CR
next prev parent reply other threads:[~2018-10-04 16:27 UTC|newest] Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-09-27 11:23 [PATCH] dax: Fix deadlock in dax_lock_mapping_entry() Jan Kara 2018-09-27 11:23 ` Jan Kara 2018-09-27 13:28 ` Matthew Wilcox 2018-09-27 13:28 ` Matthew Wilcox 2018-09-27 13:41 ` Jan Kara 2018-09-27 13:41 ` Jan Kara 2018-09-27 18:22 ` Dan Williams 2018-09-27 18:22 ` Dan Williams 2018-10-04 16:27 ` Jan Kara [this message] 2018-10-04 16:27 ` Jan Kara 2018-10-05 1:57 ` Dan Williams 2018-10-05 1:57 ` Dan Williams 2018-10-05 2:52 ` Matthew Wilcox 2018-10-05 2:52 ` Matthew Wilcox 2018-10-05 4:01 ` Dan Williams 2018-10-05 4:01 ` Dan Williams 2018-10-05 4:28 ` Dan Williams 2018-10-05 4:28 ` Dan Williams 2018-10-05 9:54 ` Jan Kara 2018-10-05 9:54 ` Jan Kara 2018-10-06 18:04 ` Dan Williams 2018-10-06 18:04 ` Dan Williams
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20181004162748.GI28384@quack2.suse.cz \ --to=jack@suse.cz \ --cc=brho@google.com \ --cc=dan.j.williams@intel.com \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-nvdimm@lists.01.org \ --cc=willy@infradead.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.