All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: Oleg Nesterov <oleg@redhat.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/4] change sb_writers to use percpu_rw_semaphore
Date: Tue, 11 Aug 2015 08:41:54 +1000	[thread overview]
Message-ID: <20150810224154.GK3902@dastard> (raw)
In-Reply-To: <20150810145942.GF3768@quack.suse.cz>

On Mon, Aug 10, 2015 at 04:59:42PM +0200, Jan Kara wrote:
> On Fri 07-08-15 21:55:52, Oleg Nesterov wrote:
> > I'll try to re-check/re-test, but so far I think that this and the
> > previous series are correct. However 3/4 from the previous series
> > (attached at the end just in case) uncovers (I think) some problems
> > in xfs locking.
....
> 
> Hum, I had a look at the lockdep report below and it's one of the
> peculiarities of the freeze protection. For record let me repeat the full
> argument for why I don't think there's a possibility for a real deadlock.
> Feel free to skip to the end if you get bored.  
> 
> One would like to construct the lock chain as:
> 
> CPU0 (chown foo dir)	CPU1 (readdir dir)	CPU2 (page fault)
> process Y		process X, thread 0	process X, thread 1
> 
> 			get ILOCK for dir
> gets freeze protection
> starts transaction in xfs_setattr_nonsize
> waits to get ILOCK on 'dir'
> 						get mmap_sem for X
> 			wait for mmap_sem for process X
> 			  in filldir()
> 						wait for freeze protection in
> 						  xfs_page_mkwrite
> 
> and CPU3 then being in freeze_super() blocking CPU2 and waiting for CPU0 to
> finish it's freeze-protected section. But this cannot happen. The reason is
> that we block writers level-by-level and thus while there are writers at
> level X, we do not block writers at level X+1. So in this particular case
> freeze_super() will block waiting for CPU0 to finish its freeze protected
> section while CPU2 is free to continue.
> 
> In general we have a chain like
> 
> freeze L0 -> freeze L1 -> freeze L2 -> ILOCK -> mmap_sem --\
>                 A                                          |
>                 \------------------------------------------/
> 
> But since ILOCK is always acquired with freeze protection at L0 and we can
> block at L1 only after there are no writers at L0, this loop can never
> happen.
> 
> Note that if we use the property of freezing that lock at level X+1 cannot
> block when we hold lock at level X, we can as well simplify the dependency
> graph and track in it only the lowest level of freeze lock that is
> currently acquired (since the levels above it cannot block and do not in
> any way influence blocking of other processes either and thus are
> irrelevant for the purpose of deadlock detection). Then the dependency
> graph we'd get would be:
> 
> freeze L0 -> ILOCK -> mmap_sem -> freeze L1
> 
> and we have a nice acyclic graph we like to see... So probably we have to
> hack the lockdep instrumentation some more and just don't tell lockdep
> about freeze locks at higher levels if we already hold a lock at lower
> level. Thoughts?

The XFS directory ilock->filldir->might_fault locking path has been
generating false positives in quite a lot of places because of
things we do on one side of the mmap_sem in filesystem paths vs
thigs we do on the other side of the mmap_sem in the page fault
path.

I'm getting sick of these stupid lockdep false positives. I think I
need to rework the XFS readdir locking patch I wrote a while back:

http://oss.sgi.com/archives/xfs/2014-03/msg00146.html

At the time it wasn't clear what the best way forward was. Right now
I think the method I originally used (IOLOCK for directory data and
hence readdir, ILOCK for everything else) will be sufficient; if we
need to do anything smarter that can be addressed further down the
road.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2015-08-10 22:41 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-22 21:15 [PATCH 0/4] change sb_writers to use percpu_rw_semaphore Oleg Nesterov
2015-07-22 21:15 ` [PATCH 1/4] percpu-rwsem: introduce percpu_down_read_trylock() Oleg Nesterov
2015-07-22 21:15 ` [PATCH 2/4] percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire() Oleg Nesterov
2015-07-31 10:20   ` Peter Zijlstra
2015-08-03 15:40     ` Oleg Nesterov
2015-07-22 21:15 ` [PATCH 3/4] shift percpu_counter_destroy() into destroy_super_work() Oleg Nesterov
2015-07-28  8:36   ` Jan Kara
2015-07-22 21:15 ` [PATCH 4/4] change sb_writers to use percpu_rw_semaphore Oleg Nesterov
2015-07-22 21:34   ` Oleg Nesterov
2015-07-28  8:34   ` Jan Kara
2015-08-03 17:30     ` Oleg Nesterov
2015-08-07 19:54   ` Oleg Nesterov
2015-08-07 19:55 ` [PATCH 0/4] " Oleg Nesterov
2015-08-10 14:59   ` Jan Kara
2015-08-10 22:41     ` Dave Chinner [this message]
2015-08-11 13:16       ` Oleg Nesterov
2015-08-11 13:29         ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150810224154.GK3902@dastard \
    --to=david@fromorbit.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.