Linux-XFS Archive on lore.kernel.org
 help / color / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Qu Wenruo <quwenruo@cn.fujitsu.com>,
	Michal Hocko <mhocko@kernel.org>,
	xfs@oss.sgi.com
Subject: Re: Xfs lockdep warning with for-dave-for-4.6 branch
Date: Thu, 12 May 2016 18:03:21 +1000
Message-ID: <20160512080321.GA18496@dastard> (raw)
In-Reply-To: <20160512055756.GE6648@birch.djwong.org>

[ cc Michal Hocko, just so he can see a lockdep reclaim state false
positive ]

On Wed, May 11, 2016 at 10:57:56PM -0700, Darrick J. Wong wrote:
> On Thu, May 12, 2016 at 01:53:44PM +0800, Qu Wenruo wrote:
> > Hi Darrick,
> > 
> > Thanks for your reflink work for xfs first, that's quite a good and useful
> > feature, also helps to debug btrfs problems.
> > (Without that, there is no good reference for reflink behavior)
> > 
> > But when testing your for-dave-for-4.6 branch, even I'm just testing btrfs
> > with xfstests, kernel report some strange lockdep from xfs:
.....
> > =================================
> > [ INFO: inconsistent lock state ]
> > 4.5.0-rc2+ #4 Tainted: G           O
> > ---------------------------------
> > inconsistent {RECLAIM_FS-ON-R} -> {IN-RECLAIM_FS-W} usage.
> > kswapd0/543 [HC0[0]:SC0[0]:HE1:SE1] takes:
> >  (&xfs_nondir_ilock_class){++++-+}, at: [<ffffffffa00781f7>]
> > xfs_ilock+0x177/0x200 [xfs]
> > {RECLAIM_FS-ON-R} state was registered at:
> >   [<ffffffff8110f369>] mark_held_locks+0x79/0xa0
> >   [<ffffffff81113a43>] lockdep_trace_alloc+0xb3/0x100
> >   [<ffffffff81224623>] kmem_cache_alloc+0x33/0x230
> >   [<ffffffffa008acc1>] kmem_zone_alloc+0x81/0x120 [xfs]
> >   [<ffffffffa005456e>] xfs_refcountbt_init_cursor+0x3e/0xa0 [xfs]
> >   [<ffffffffa0053455>] __xfs_refcount_find_shared+0x75/0x580 [xfs]
> >   [<ffffffffa00539e4>] xfs_refcount_find_shared+0x84/0xb0 [xfs]
> >   [<ffffffffa005dcb8>] xfs_getbmap+0x608/0x8c0 [xfs]
> >   [<ffffffffa007634b>] xfs_vn_fiemap+0xab/0xc0 [xfs]
> >   [<ffffffff81244208>] do_vfs_ioctl+0x498/0x670
> >   [<ffffffff81244459>] SyS_ioctl+0x79/0x90
> >   [<ffffffff81847cd7>] entry_SYSCALL_64_fastpath+0x12/0x6f

Ignoring whether reflink should be doing anything or not, that's a
"xfs_refcountbt_init_cursor() gets called both outside and inside
transactions" lockdep false positive case. The problem here is
lockdep has seen this allocation from within a transaction, hence a
GFP_NOFS allocation, and now it's seeing it in a GFP_KERNEL context.
Also note that we have an active reference to this inode.

> > irq event stamp: 510775
> > hardirqs last  enabled at (510775): [<ffffffff812245d0>]
> > __slab_alloc+0x50/0x70
> > hardirqs last disabled at (510774): [<ffffffff812245ae>]
> > __slab_alloc+0x2e/0x70
> > softirqs last  enabled at (510506): [<ffffffff810c8ea8>]
> > __do_softirq+0x358/0x430
> > softirqs last disabled at (510489): [<ffffffff810c911d>] irq_exit+0xad/0xb0
> > 
> > other info that might help us debug this:
> >  Possible unsafe locking scenario:
> > 
> >        CPU0
> >        ----
> >   lock(&xfs_nondir_ilock_class);
> >   <Interrupt>
> >     lock(&xfs_nondir_ilock_class);

So, because the reclaim annotations overload the interrupt level
detections and it's seen the inode ilock been taken in reclaim
("interrupt") context, this triggers a reclaim context warning where
it thinks it is unsafe to do this allocation in GFP_KERNEL context
holding the inode ilock...

> >  *** DEADLOCK ***
> > 
> > 3 locks held by kswapd0/543:
> >  #0:  (shrinker_rwsem){++++..}, at: [<ffffffff811e0b78>]
> > shrink_slab.part.63.constprop.79+0x48/0x4e0
> >  #1:  (&type->s_umount_key#26){++++++}, at: [<ffffffff81232ffb>]
> > trylock_super+0x1b/0x50
> >  #2:  (sb_internal){.+.+.?}, at: [<ffffffff812327f4>]
> > __sb_start_write+0xb4/0xf0
> > 
> > stack backtrace:
> > CPU: 0 PID: 543 Comm: kswapd0 Tainted: G           O    4.5.0-rc2+ #4
> > Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox
> > 12/01/2006
> >  ffffffff82a34f10 ffff88003aa078d0 ffffffff813a14f9 ffff88003d8551c0
> >  ffff88003aa07920 ffffffff8110ec65 0000000000000000 0000000000000001
> >  ffff880000000001 000000000000000b 0000000000000008 ffff88003d855aa0
> > Call Trace:
> >  [<ffffffff813a14f9>] dump_stack+0x4b/0x72
> >  [<ffffffff8110ec65>] print_usage_bug+0x215/0x240
> >  [<ffffffff8110ee85>] mark_lock+0x1f5/0x660
> >  [<ffffffff8110e100>] ? print_shortest_lock_dependencies+0x1a0/0x1a0
> >  [<ffffffff811102e0>] __lock_acquire+0xa80/0x1e50
> >  [<ffffffff8122474e>] ? kmem_cache_alloc+0x15e/0x230
> >  [<ffffffffa008acc1>] ? kmem_zone_alloc+0x81/0x120 [xfs]
> >  [<ffffffff811122e8>] lock_acquire+0xd8/0x1e0
> >  [<ffffffffa00781f7>] ? xfs_ilock+0x177/0x200 [xfs]
> >  [<ffffffffa0083a70>] ? xfs_reflink_cancel_cow_range+0x150/0x300 [xfs]
> >  [<ffffffff8110aace>] down_write_nested+0x5e/0xc0
> >  [<ffffffffa00781f7>] ? xfs_ilock+0x177/0x200 [xfs]
> >  [<ffffffffa00781f7>] xfs_ilock+0x177/0x200 [xfs]
> >  [<ffffffffa0083a70>] xfs_reflink_cancel_cow_range+0x150/0x300 [xfs]
> >  [<ffffffffa0085bdc>] xfs_fs_evict_inode+0xdc/0x1e0 [xfs]
> >  [<ffffffff8124d7d5>] evict+0xc5/0x190
> >  [<ffffffff8124d8d9>] dispose_list+0x39/0x60
> >  [<ffffffff8124eb2b>] prune_icache_sb+0x4b/0x60
> >  [<ffffffff8123317f>] super_cache_scan+0x14f/0x1a0
> >  [<ffffffff811e0d19>] shrink_slab.part.63.constprop.79+0x1e9/0x4e0
> >  [<ffffffff811e50ee>] shrink_zone+0x15e/0x170
> >  [<ffffffff811e5ef1>] kswapd+0x4f1/0xa80
> >  [<ffffffff811e5a00>] ? zone_reclaim+0x230/0x230
> >  [<ffffffff810e6882>] kthread+0xf2/0x110
> >  [<ffffffff810e6790>] ? kthread_create_on_node+0x220/0x220
> >  [<ffffffff8184803f>] ret_from_fork+0x3f/0x70
> >  [<ffffffff810e6790>] ? kthread_create_on_node+0x220/0x220
> > hrtimer: interrupt took 4824925 ns

The reclaim -> lock context that it's complaining about here is on
an inode being reclaimed - it has no active references and so, by
definition, cannot deadlock with a context holding an active
reference to an inode ilock. Hence there cannot possibly be a
deadlock here, but we can't tell lockdep that easily in any way
without going back to the bad old ways of creating a new lockdep
class for inode ilocks the moment they enter ->evict. This then
disables "entire lifecycle" lockdep checking on the xfs inode ilock,
which is why we got rid of it in the first place.

Our "traditional" solution to this is to add KM_NOFS to the
allocation to force GFP_NOFS allocation in all contexts so that we
shut up lockdep and people don't keep endlessly reporting the same
non-problem.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply index

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-12  5:53 Qu Wenruo
2016-05-12  5:57 ` Darrick J. Wong
2016-05-12  8:03   ` Dave Chinner [this message]
2016-05-13 16:03     ` Michal Hocko
2016-05-16 10:41       ` Peter Zijlstra
2016-05-16 13:05         ` Michal Hocko
2016-05-16 13:25           ` Peter Zijlstra
2016-05-16 23:10             ` Dave Chinner
2016-05-17 14:49               ` Peter Zijlstra
2016-05-17 22:35                 ` Dave Chinner
2016-05-18  7:20                   ` Peter Zijlstra
2016-05-18  8:25                     ` Michal Hocko
2016-05-18  9:49                       ` Peter Zijlstra
2016-05-18 11:31                         ` Michal Hocko
2016-05-19  8:11                   ` Peter Zijlstra
2016-05-20  0:17                     ` Dave Chinner
2016-06-01 13:17                       ` Michal Hocko
2016-06-01 18:16                         ` Peter Zijlstra
2016-06-02 14:50                           ` Michal Hocko
2016-06-02 15:11                             ` Peter Zijlstra
2016-06-02 15:46                               ` Michal Hocko
2016-06-02 23:22                                 ` Dave Chinner
2016-06-06 12:20                                   ` Michal Hocko
2016-06-15  7:21                                     ` Dave Chinner
2016-06-21 14:26                                       ` Michal Hocko
2016-06-22  1:03                                         ` Dave Chinner
2016-06-22 12:38                                           ` Michal Hocko
2016-06-22 22:58                                             ` Dave Chinner
2016-06-23 11:35                                               ` Michal Hocko
2016-10-06 13:04                           ` Michal Hocko
2016-10-17 13:49                             ` Michal Hocko
2016-10-19  0:33                             ` Dave Chinner
2016-10-19  5:30                               ` Dave Chinner
2016-10-19  8:33                             ` Peter Zijlstra
2016-10-19 12:06                               ` Michal Hocko
2016-10-19 21:49                                 ` Dave Chinner
2016-10-20  7:15                                   ` Michal Hocko

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160512080321.GA18496@dastard \
    --to=david@fromorbit.com \
    --cc=darrick.wong@oracle.com \
    --cc=mhocko@kernel.org \
    --cc=quwenruo@cn.fujitsu.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-XFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-xfs/0 linux-xfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-xfs linux-xfs/ https://lore.kernel.org/linux-xfs \
		linux-xfs@vger.kernel.org
	public-inbox-index linux-xfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-xfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git