From: Oleg Nesterov <oleg@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>, Al Viro <viro@zeniv.linux.org.uk>,
Nikolay Borisov <kernel@kyup.com>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
fstests@vger.kernel.org
Subject: Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths
Date: Fri, 7 Oct 2016 19:15:18 +0200 [thread overview]
Message-ID: <20161007171517.GA23721@redhat.com> (raw)
In-Reply-To: <20161006215920.GE9806@dastard>
On 10/07, Dave Chinner wrote:
>
> On Thu, Oct 06, 2016 at 07:17:58PM +0200, Oleg Nesterov wrote:
> > Probably false positive? Although when I look at the comment above xfs_sync_sb()
> > I think that may be sometging like below makes sense, but I know absolutely nothing
> > about fs/ and XFS in particular.
> >
> > Oleg.
> >
> >
> > --- x/fs/xfs/xfs_trans.c
> > +++ x/fs/xfs/xfs_trans.c
> > @@ -245,7 +245,8 @@ xfs_trans_alloc(
> > atomic_inc(&mp->m_active_trans);
> >
> > tp = kmem_zone_zalloc(xfs_trans_zone,
> > - (flags & XFS_TRANS_NOFS) ? KM_NOFS : KM_SLEEP);
> > + (flags & (XFS_TRANS_NOFS | XFS_TRANS_NO_WRITECOUNT))
> > + ? KM_NOFS : KM_SLEEP);
> > tp->t_magic = XFS_TRANS_HEADER_MAGIC;
> > tp->t_flags = flags;
> > tp->t_mountp = mp;
>
> Brief examination says caller should set XFS_TRANS_NOFS, not change
> the implementation to make XFS_TRANS_NO_WRITECOUNT flag to also mean
> XFS_TRANS_NOFS.
I didn't mean the change above can fix the problem, and I don't really
understand your suggestion. Obviously any GFP_FS allocation in xfs_fs_freeze()
paths will trigger the same warning.
I added this hack
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1333,10 +1333,15 @@ xfs_fs_freeze(
struct super_block *sb)
{
struct xfs_mount *mp = XFS_M(sb);
+ int ret;
+ current->flags |= PF_FSTRANS; // tell kmem_flags_convert() to remove GFP_FS
xfs_save_resvblks(mp);
xfs_quiesce_attr(mp);
- return xfs_sync_sb(mp, true);
+ ret = xfs_sync_sb(mp, true);
+ current->flags &= ~PF_FSTRANS;
+
+ return ret;
}
just for testing purposes and after that I got another warning below. I didn't
read it carefully yet, but _at first glance_ it looks like the lock inversion
uncovered by 2/2, although I can be easily wrong. cancel_delayed_work_sync(l_work)
under sb_internal can hang if xfs_log_worker() waits for this rwsem?`
Oleg.
=========================================================
[ INFO: possible irq lock inversion dependency detected ]
4.8.0+ #10 Tainted: G W
---------------------------------------------------------
kswapd0/32 just changed the state of lock:
(sb_internal){++++.?}, at: [<ffffffffb92925c7>] __sb_start_write+0xb7/0xf0
but this lock took another, RECLAIM_FS-unsafe lock in the past:
((&(&log->l_work)->work)){+.+.+.}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock((&(&log->l_work)->work));
local_irq_disable();
lock(sb_internal);
lock((&(&log->l_work)->work));
<Interrupt>
lock(sb_internal);
*** DEADLOCK ***
no locks held by kswapd0/32.
the shortest dependencies between 2nd lock and 1st lock:
-> ((&(&log->l_work)->work)){+.+.+.} ops: 803 {
HARDIRQ-ON-W at:
[<ffffffffb9107ef1>] __lock_acquire+0x611/0x1870
[<ffffffffb91097dd>] lock_acquire+0x10d/0x200
[<ffffffffb90cad49>] process_one_work+0x1c9/0x690
[<ffffffffb90cb25e>] worker_thread+0x4e/0x480
[<ffffffffb90d1d13>] kthread+0xf3/0x110
[<ffffffffb9a2146f>] ret_from_fork+0x1f/0x40
SOFTIRQ-ON-W at:
[<ffffffffb9107f23>] __lock_acquire+0x643/0x1870
[<ffffffffb91097dd>] lock_acquire+0x10d/0x200
[<ffffffffb90cad49>] process_one_work+0x1c9/0x690
[<ffffffffb90cb25e>] worker_thread+0x4e/0x480
[<ffffffffb90d1d13>] kthread+0xf3/0x110
[<ffffffffb9a2146f>] ret_from_fork+0x1f/0x40
RECLAIM_FS-ON-W at:
[<ffffffffb910735f>] mark_held_locks+0x6f/0xa0
[<ffffffffb910a5f3>] lockdep_trace_alloc+0xd3/0x120
[<ffffffffb92603bf>] kmem_cache_alloc+0x2f/0x280
[<ffffffffb941f5e1>] kmem_zone_alloc+0x81/0x120
[<ffffffffb941ec4c>] xfs_trans_alloc+0x6c/0x130
[<ffffffffb93ef649>] xfs_sync_sb+0x39/0x80
[<ffffffffb9423914>] xfs_log_worker+0xd4/0xf0
[<ffffffffb90cad70>] process_one_work+0x1f0/0x690
[<ffffffffb90cb25e>] worker_thread+0x4e/0x480
[<ffffffffb90d1d13>] kthread+0xf3/0x110
[<ffffffffb9a2146f>] ret_from_fork+0x1f/0x40
INITIAL USE at:
[<ffffffffb9107b2b>] __lock_acquire+0x24b/0x1870
[<ffffffffb91097dd>] lock_acquire+0x10d/0x200
[<ffffffffb90cad49>] process_one_work+0x1c9/0x690
[<ffffffffb90cb25e>] worker_thread+0x4e/0x480
[<ffffffffb90d1d13>] kthread+0xf3/0x110
[<ffffffffb9a2146f>] ret_from_fork+0x1f/0x40
}
... key at: [<ffffffffbb133758>] __key.130101+0x0/0x8
... acquired at:
[<ffffffffb91097dd>] lock_acquire+0x10d/0x200
[<ffffffffb90c981c>] flush_work+0x4c/0x2c0
[<ffffffffb90cbe01>] __cancel_work_timer+0x131/0x1f0
[<ffffffffb90cbef3>] cancel_delayed_work_sync+0x13/0x20
[<ffffffffb94245a4>] xfs_log_quiesce+0x34/0x3b0
[<ffffffffb9419c6d>] xfs_quiesce_attr+0x6d/0xb0
[<ffffffffb9419ce3>] xfs_fs_freeze+0x33/0x50
[<ffffffffb9291eff>] freeze_super+0xcf/0x190
[<ffffffffb92a528f>] do_vfs_ioctl+0x55f/0x6c0
[<ffffffffb92a5469>] SyS_ioctl+0x79/0x90
[<ffffffffb9a2123c>] entry_SYSCALL_64_fastpath+0x1f/0xbd
-> (sb_internal){++++.?} ops: 28351355 {
HARDIRQ-ON-W at:
[<ffffffffb9107ef1>] __lock_acquire+0x611/0x1870
[<ffffffffb91097dd>] lock_acquire+0x10d/0x200
[<ffffffffb9a1e3a6>] down_write+0x36/0x70
[<ffffffffb9102fb3>] percpu_down_write+0x33/0x100
[<ffffffffb9291eed>] freeze_super+0xbd/0x190
[<ffffffffb92a528f>] do_vfs_ioctl+0x55f/0x6c0
[<ffffffffb92a5469>] SyS_ioctl+0x79/0x90
[<ffffffffb9a2123c>] entry_SYSCALL_64_fastpath+0x1f/0xbd
HARDIRQ-ON-R at:
[<ffffffffb9107bed>] __lock_acquire+0x30d/0x1870
[<ffffffffb91097dd>] lock_acquire+0x10d/0x200
[<ffffffffb91030c8>] percpu_down_read_trylock+0x48/0xa0
[<ffffffffb929258a>] __sb_start_write+0x7a/0xf0
[<ffffffffb941ecc3>] xfs_trans_alloc+0xe3/0x130
[<ffffffffb940dcba>] xfs_vn_update_time+0x7a/0x210
[<ffffffffb92b1038>] touch_atime+0xa8/0xd0
[<ffffffffb91ed573>] generic_file_read_iter+0x703/0x8d0
[<ffffffffb9401513>] xfs_file_buffered_aio_read+0x63/0x170
[<ffffffffb9401688>] xfs_file_read_iter+0x68/0xc0
[<ffffffffb928de90>] __vfs_read+0xe0/0x150
[<ffffffffb928efb5>] vfs_read+0x95/0x140
[<ffffffffb92904e8>] SyS_read+0x58/0xc0
[<ffffffffb9003edc>] do_syscall_64+0x6c/0x1e0
[<ffffffffb9a212ff>] return_from_SYSCALL_64+0x0/0x7a
SOFTIRQ-ON-W at:
[<ffffffffb9107f23>] __lock_acquire+0x643/0x1870
[<ffffffffb91097dd>] lock_acquire+0x10d/0x200
[<ffffffffb9a1e3a6>] down_write+0x36/0x70
[<ffffffffb9102fb3>] percpu_down_write+0x33/0x100
[<ffffffffb9291eed>] freeze_super+0xbd/0x190
[<ffffffffb92a528f>] do_vfs_ioctl+0x55f/0x6c0
[<ffffffffb92a5469>] SyS_ioctl+0x79/0x90
[<ffffffffb9a2123c>] entry_SYSCALL_64_fastpath+0x1f/0xbd
SOFTIRQ-ON-R at:
[<ffffffffb9107f23>] __lock_acquire+0x643/0x1870
[<ffffffffb91097dd>] lock_acquire+0x10d/0x200
[<ffffffffb91030c8>] percpu_down_read_trylock+0x48/0xa0
[<ffffffffb929258a>] __sb_start_write+0x7a/0xf0
[<ffffffffb941ecc3>] xfs_trans_alloc+0xe3/0x130
[<ffffffffb940dcba>] xfs_vn_update_time+0x7a/0x210
[<ffffffffb92b1038>] touch_atime+0xa8/0xd0
[<ffffffffb91ed573>] generic_file_read_iter+0x703/0x8d0
[<ffffffffb9401513>] xfs_file_buffered_aio_read+0x63/0x170
[<ffffffffb9401688>] xfs_file_read_iter+0x68/0xc0
[<ffffffffb928de90>] __vfs_read+0xe0/0x150
[<ffffffffb928efb5>] vfs_read+0x95/0x140
[<ffffffffb92904e8>] SyS_read+0x58/0xc0
[<ffffffffb9003edc>] do_syscall_64+0x6c/0x1e0
[<ffffffffb9a212ff>] return_from_SYSCALL_64+0x0/0x7a
IN-RECLAIM_FS-R at:
[<ffffffffb9107c4d>] __lock_acquire+0x36d/0x1870
[<ffffffffb91097dd>] lock_acquire+0x10d/0x200
[<ffffffffb9102ecc>] percpu_down_read+0x3c/0x90
[<ffffffffb92925c7>] __sb_start_write+0xb7/0xf0
[<ffffffffb941ecc3>] xfs_trans_alloc+0xe3/0x130
[<ffffffffb940d157>] xfs_iomap_write_allocate+0x1f7/0x380
[<ffffffffb93f17fa>] xfs_map_blocks+0x22a/0x380
[<ffffffffb93f2f78>] xfs_do_writepage+0x188/0x6c0
[<ffffffffb93f34eb>] xfs_vm_writepage+0x3b/0x70
[<ffffffffb9204a20>] pageout.isra.46+0x190/0x380
[<ffffffffb9207d1b>] shrink_page_list+0x9ab/0xa70
[<ffffffffb9208602>] shrink_inactive_list+0x252/0x5d0
[<ffffffffb920928f>] shrink_node_memcg+0x5af/0x790
[<ffffffffb9209551>] shrink_node+0xe1/0x320
[<ffffffffb920aa47>] kswapd+0x387/0x8b0
[<ffffffffb90d1d13>] kthread+0xf3/0x110
[<ffffffffb9a2146f>] ret_from_fork+0x1f/0x40
RECLAIM_FS-ON-R at:
[<ffffffffb910735f>] mark_held_locks+0x6f/0xa0
[<ffffffffb910a5f3>] lockdep_trace_alloc+0xd3/0x120
[<ffffffffb92603bf>] kmem_cache_alloc+0x2f/0x280
[<ffffffffb941f5e1>] kmem_zone_alloc+0x81/0x120
[<ffffffffb941ec4c>] xfs_trans_alloc+0x6c/0x130
[<ffffffffb940dcba>] xfs_vn_update_time+0x7a/0x210
[<ffffffffb92b1038>] touch_atime+0xa8/0xd0
[<ffffffffb91ed573>] generic_file_read_iter+0x703/0x8d0
[<ffffffffb9401513>] xfs_file_buffered_aio_read+0x63/0x170
[<ffffffffb9401688>] xfs_file_read_iter+0x68/0xc0
[<ffffffffb928de90>] __vfs_read+0xe0/0x150
[<ffffffffb928efb5>] vfs_read+0x95/0x140
[<ffffffffb92904e8>] SyS_read+0x58/0xc0
[<ffffffffb9003edc>] do_syscall_64+0x6c/0x1e0
[<ffffffffb9a212ff>] return_from_SYSCALL_64+0x0/0x7a
INITIAL USE at:
[<ffffffffb9107b2b>] __lock_acquire+0x24b/0x1870
[<ffffffffb91097dd>] lock_acquire+0x10d/0x200
[<ffffffffb91030c8>] percpu_down_read_trylock+0x48/0xa0
[<ffffffffb929258a>] __sb_start_write+0x7a/0xf0
[<ffffffffb941ecc3>] xfs_trans_alloc+0xe3/0x130
[<ffffffffb940dcba>] xfs_vn_update_time+0x7a/0x210
[<ffffffffb92b1038>] touch_atime+0xa8/0xd0
[<ffffffffb91ed573>] generic_file_read_iter+0x703/0x8d0
[<ffffffffb9401513>] xfs_file_buffered_aio_read+0x63/0x170
[<ffffffffb9401688>] xfs_file_read_iter+0x68/0xc0
[<ffffffffb928de90>] __vfs_read+0xe0/0x150
[<ffffffffb928efb5>] vfs_read+0x95/0x140
[<ffffffffb92904e8>] SyS_read+0x58/0xc0
[<ffffffffb9003edc>] do_syscall_64+0x6c/0x1e0
[<ffffffffb9a212ff>] return_from_SYSCALL_64+0x0/0x7a
}
... key at: [<ffffffffba0b1a00>] xfs_fs_type+0x60/0x80
... acquired at:
[<ffffffffb91061d1>] check_usage_forwards+0x151/0x160
[<ffffffffb9106fea>] mark_lock+0x34a/0x650
[<ffffffffb9107c4d>] __lock_acquire+0x36d/0x1870
[<ffffffffb91097dd>] lock_acquire+0x10d/0x200
[<ffffffffb9102ecc>] percpu_down_read+0x3c/0x90
[<ffffffffb92925c7>] __sb_start_write+0xb7/0xf0
[<ffffffffb941ecc3>] xfs_trans_alloc+0xe3/0x130
[<ffffffffb940d157>] xfs_iomap_write_allocate+0x1f7/0x380
[<ffffffffb93f17fa>] xfs_map_blocks+0x22a/0x380
[<ffffffffb93f2f78>] xfs_do_writepage+0x188/0x6c0
[<ffffffffb93f34eb>] xfs_vm_writepage+0x3b/0x70
[<ffffffffb9204a20>] pageout.isra.46+0x190/0x380
[<ffffffffb9207d1b>] shrink_page_list+0x9ab/0xa70
[<ffffffffb9208602>] shrink_inactive_list+0x252/0x5d0
[<ffffffffb920928f>] shrink_node_memcg+0x5af/0x790
[<ffffffffb9209551>] shrink_node+0xe1/0x320
[<ffffffffb920aa47>] kswapd+0x387/0x8b0
[<ffffffffb90d1d13>] kthread+0xf3/0x110
[<ffffffffb9a2146f>] ret_from_fork+0x1f/0x40
stack backtrace:
CPU: 0 PID: 32 Comm: kswapd0 Tainted: G W 4.8.0+ #10
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
0000000000000086 0000000020cdca74 ffff880136c5b490 ffffffffb95b7a73
ffffffffbae7ed80 ffff880136c5b4f0 ffff880136c5b4d0 ffffffffb91e7c5a
ffff880136c3aee8 ffff880136c3aee8 ffff880136c3a680 ffffffffb9e8bbdd
Call Trace:
[<ffffffffb95b7a73>] dump_stack+0x85/0xc2
[<ffffffffb91e7c5a>] print_irq_inversion_bug.part.36+0x1a4/0x1b0
[<ffffffffb91061d1>] check_usage_forwards+0x151/0x160
[<ffffffffb9106fea>] mark_lock+0x34a/0x650
[<ffffffffb9106080>] ? print_shortest_lock_dependencies+0x1a0/0x1a0
[<ffffffffb9107c4d>] __lock_acquire+0x36d/0x1870
[<ffffffffb9107d55>] ? __lock_acquire+0x475/0x1870
[<ffffffffb91097dd>] lock_acquire+0x10d/0x200
[<ffffffffb92925c7>] ? __sb_start_write+0xb7/0xf0
[<ffffffffb9102ecc>] percpu_down_read+0x3c/0x90
[<ffffffffb92925c7>] ? __sb_start_write+0xb7/0xf0
[<ffffffffb92925c7>] __sb_start_write+0xb7/0xf0
[<ffffffffb941ecc3>] xfs_trans_alloc+0xe3/0x130
[<ffffffffb940d157>] xfs_iomap_write_allocate+0x1f7/0x380
[<ffffffffb93f16b3>] ? xfs_map_blocks+0xe3/0x380
[<ffffffffb91268b8>] ? rcu_read_lock_sched_held+0x58/0x60
[<ffffffffb93f17fa>] xfs_map_blocks+0x22a/0x380
[<ffffffffb93f2f78>] xfs_do_writepage+0x188/0x6c0
[<ffffffffb93f34eb>] xfs_vm_writepage+0x3b/0x70
[<ffffffffb9204a20>] pageout.isra.46+0x190/0x380
[<ffffffffb9207d1b>] shrink_page_list+0x9ab/0xa70
[<ffffffffb9208602>] shrink_inactive_list+0x252/0x5d0
[<ffffffffb920928f>] shrink_node_memcg+0x5af/0x790
[<ffffffffb9209551>] shrink_node+0xe1/0x320
[<ffffffffb920aa47>] kswapd+0x387/0x8b0
[<ffffffffb920a6c0>] ? mem_cgroup_shrink_node+0x2e0/0x2e0
[<ffffffffb90d1d13>] kthread+0xf3/0x110
[<ffffffffb9a2146f>] ret_from_fork+0x1f/0x40
[<ffffffffb90d1c20>] ? kthread_create_on_node+0x230/0x230
next prev parent reply other threads:[~2016-10-07 17:15 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-26 16:07 [PATCH 0/2] (Was: BUG_ON in rcu_sync_func triggered) Oleg Nesterov
2016-09-26 16:07 ` [PATCH 1/2] fs/super.c: fix race between freeze_super() and thaw_super() Oleg Nesterov
2016-09-26 16:11 ` Jan Kara
2016-09-26 16:08 ` [PATCH 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths Oleg Nesterov
2016-09-26 16:18 ` Jan Kara
2016-09-26 16:55 ` [PATCH V2 " Oleg Nesterov
2016-09-27 6:51 ` Jan Kara
2016-09-27 7:14 ` Dave Chinner
2016-09-27 17:29 ` Oleg Nesterov
2016-09-30 17:14 ` Oleg Nesterov
2016-10-02 21:42 ` Dave Chinner
2016-10-03 16:44 ` Oleg Nesterov
2016-10-04 11:43 ` Oleg Nesterov
2016-10-04 11:48 ` Michal Hocko
2016-10-06 13:44 ` Johannes Weiner
2016-10-07 16:52 ` Oleg Nesterov
2016-10-04 16:58 ` Oleg Nesterov
2016-10-04 20:03 ` Dave Chinner
2016-10-05 16:33 ` Oleg Nesterov
2016-10-04 19:44 ` Dave Chinner
2016-10-05 16:44 ` Oleg Nesterov
2016-10-06 7:27 ` Jan Kara
2016-10-06 17:17 ` Oleg Nesterov
2016-10-06 21:59 ` Dave Chinner
2016-10-07 17:15 ` Oleg Nesterov [this message]
2016-10-07 22:52 ` Dave Chinner
2016-10-09 16:14 ` Oleg Nesterov
2016-10-10 1:02 ` Dave Chinner
2016-10-13 16:58 ` Oleg Nesterov
2016-10-13 17:10 ` [PATCH 0/2] (Was: BUG_ON in rcu_sync_func triggered) Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161007171517.GA23721@redhat.com \
--to=oleg@redhat.com \
--cc=david@fromorbit.com \
--cc=fstests@vger.kernel.org \
--cc=jack@suse.cz \
--cc=kernel@kyup.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).