linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Qu Wenruo <quwenruo@cn.fujitsu.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	xfs@oss.sgi.com, linux-mm@kvack.org,
	Michal Hocko <mhocko@kernel.org>, Ingo Molnar <mingo@kernel.org>
Subject: Re: Xfs lockdep warning with for-dave-for-4.6 branch
Date: Fri, 20 May 2016 10:17:14 +1000	[thread overview]
Message-ID: <20160520001714.GC26977@dastard> (raw)
In-Reply-To: <20160519081146.GS3193@twins.programming.kicks-ass.net>

On Thu, May 19, 2016 at 10:11:46AM +0200, Peter Zijlstra wrote:
> On Wed, May 18, 2016 at 08:35:49AM +1000, Dave Chinner wrote:
> > On Tue, May 17, 2016 at 04:49:12PM +0200, Peter Zijlstra wrote:
> > > On Tue, May 17, 2016 at 09:10:56AM +1000, Dave Chinner wrote:
> > > 
> > > > The reason we don't have lock clases for the ilock is that we aren't
> > > > supposed to call memory reclaim with that lock held in exclusive
> > > > mode. This is because reclaim can run transactions, and that may
> > > > need to flush dirty inodes to make progress. Flushing dirty inode
> > > > requires taking the ilock in shared mode.
> > > > 
> > > > In the code path that was reported, we hold the ilock in /shared/
> > > > mode with no transaction context (we are doing a read-only
> > > > operation). This means we can run transactions in memory reclaim
> > > > because a) we can't deadlock on the inode we hold locks on, and b)
> > > > transaction reservations will be able to make progress as we don't
> > > > hold any locks it can block on.
> > > 
> > > Just to clarify; I read the above as that we cannot block on recursive
> > > shared locks, is this correct?
> > > 
> > > Because we can in fact block on down_read()+down_read() just fine, so if
> > > you're assuming that, then something's busted.
> > 
> > The transaction reservation path will run down_read_trylock() on the
> > inode, not down_read(). Hence if there are no pending writers, it
> > will happily take the lock twice and make progress, otherwise it
> > will skip the inode and there's no deadlock.  If there's a pending
> > writer, then we have another context that is already in a
> > transaction context and has already pushed the item, hence it is
> > only in the scope of the current push because IO hasn't completed
> > yet and removed it from the list.
> > 
> > > Otherwise, I'm not quite reading it right, which is, given the
> > > complexity of that stuff, entirely possible.
> > 
> > There's a maze of dark, grue-filled twisty passages here...
> 
> OK; I might need a bit more again.
> 
> So now the code does something like:
> 
> 	down_read(&i_lock);		-- lockdep marks lock as held
> 	kmalloc(GFP_KERNEL);		-- lockdep marks held locks as ENABLED_RECLAIM_FS
> 	  --> reclaim()
> 	     down_read_trylock(&i_lock); -- lockdep does _NOT_ mark as USED_IN_RECLAIM_FS
> 
> Right?

In the path that can deadlock the log, yes. It's actually way more
complex than the above, because the down_read_trylock(&i_lock) that
matters is run in a completely separate, async kthread that
xfs_trans_reserve() will block waiting for.

process context				xfsaild kthread(*)
---------------				------------------
down_read(&i_lock);		-- lockdep marks lock as held
kmalloc(GFP_KERNEL);		-- lockdep marks held locks as ENABLED_RECLAIM_FS
  --> reclaim()
     xfs_trans_reserve()
     ....
	  xfs_trans_push_ail()	---- called if no space in the log to kick the xfsaild into action
	  ....
       xlog_grant_head_wait()	---- blocks waiting for log space
       .....

					xfsaild_push()   ----- iterates AIL
					  grabs log item
					    lock log item
	>>>>>>>>>>>>>>>>>>>>>		      down_read_trylock(&i_lock);
					      format item into buffer
					      add to dirty buffer list
					  ....
					  submit dirty buffer list for IO
					    buffer IO started
					.....
					<async IO completion context>
					buffer callbacks
					  mark inode clean
					  remove inode from AIL
					  move tail of log forwards
					    wake grant head waiters
	<woken by log tail moving>
	<log space available>
	transaction reservation granted
     .....
     down_write(some other inode ilock)
     <modify some other inode>
     xfs_trans_commit
     .....

(*) xfsaild runs with PF_MEMALLOC context.

The problem is that if the ilock is held exclusively at GFP_KERNEL
time, the xfsaild cannot lock the inode to flush it, so if that
inode pins the tail of the log then we can't make space available
for xfs_trans_reserve and there is the deadlock.

Once xfs_trans_reserve completes, however, we'll take the ilock on
*some other inode*, and that's where the "it can't be the inode we
currently hold locked because we have references to it" and
henceit's safe to have a pattern like:

down_read(&i_lock);		-- lockdep marks lock as held
kmalloc(GFP_KERNEL);		-- lockdep marks held locks as ENABLED_RECLAIM_FS
  --> reclaim()
    down_write(&ilock)

because the lock within reclaim context is completely unrelated to
the lock we already hold.

Lockdep can't possibly know about this because the deadlock involves
locking contexts that *aren't doing anything wrong within their own
contexts*. It's only when you add the dependency of log space
reservation requirements needed to make forwards progress that
there's then an issue with locking and reclaim.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2016-05-20  0:25 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-12  5:53 Xfs lockdep warning with for-dave-for-4.6 branch Qu Wenruo
2016-05-12  5:57 ` Darrick J. Wong
2016-05-12  8:03   ` Dave Chinner
2016-05-13 16:03     ` Michal Hocko
2016-05-16 10:41       ` Peter Zijlstra
2016-05-16 13:05         ` Michal Hocko
2016-05-16 13:25           ` Peter Zijlstra
2016-05-16 23:10             ` Dave Chinner
2016-05-17 14:49               ` Peter Zijlstra
2016-05-17 22:35                 ` Dave Chinner
2016-05-18  7:20                   ` Peter Zijlstra
2016-05-18  8:25                     ` Michal Hocko
2016-05-18  9:49                       ` Peter Zijlstra
2016-05-18 11:31                         ` Michal Hocko
2016-05-19  8:11                   ` Peter Zijlstra
2016-05-20  0:17                     ` Dave Chinner [this message]
2016-06-01 13:17                       ` Michal Hocko
2016-06-01 18:16                         ` Peter Zijlstra
2016-06-02 14:50                           ` Michal Hocko
2016-06-02 15:11                             ` Peter Zijlstra
2016-06-02 15:46                               ` Michal Hocko
2016-06-02 23:22                                 ` Dave Chinner
2016-06-06 12:20                                   ` Michal Hocko
2016-06-15  7:21                                     ` Dave Chinner
2016-06-21 14:26                                       ` Michal Hocko
2016-06-22  1:03                                         ` Dave Chinner
2016-06-22 12:38                                           ` Michal Hocko
2016-06-22 22:58                                             ` Dave Chinner
2016-06-23 11:35                                               ` Michal Hocko
2016-10-06 13:04                           ` Michal Hocko
2016-10-17 13:49                             ` Michal Hocko
2016-10-19  0:33                             ` Dave Chinner
2016-10-19  5:30                               ` Dave Chinner
2016-10-19  8:33                             ` Peter Zijlstra
2016-10-19 12:06                               ` Michal Hocko
2016-10-19 21:49                                 ` Dave Chinner
2016-10-20  7:15                                   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160520001714.GC26977@dastard \
    --to=david@fromorbit.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=quwenruo@cn.fujitsu.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).