All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>, Boaz Harrosh <openosd@gmail.com>,
	Amir Goldstein <amir73il@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Kent Overstreet <kent.overstreet@gmail.com>,
	Dave Chinner <dchinner@redhat.com>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	Christoph Hellwig <hch@lst.de>,
	Matthew Wilcox <willy@infradead.org>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Josef Bacik <josef@toxicpanda.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: pagecache locking
Date: Wed, 10 Jul 2019 10:41:53 +0200	[thread overview]
Message-ID: <20190710084153.GA962@quack2.suse.cz> (raw)
In-Reply-To: <20190709234712.GL7689@dread.disaster.area>

On Wed 10-07-19 09:47:12, Dave Chinner wrote:
> On Mon, Jul 08, 2019 at 03:31:14PM +0200, Jan Kara wrote:
> > I'd be really careful with nesting range locks. You can have nasty
> > situations like:
> > 
> > Thread 1		Thread 2
> > read_lock(0,1000)	
> > 			write_lock(500,1500) -> blocks due to read lock
> > read_lock(1001,1500) -> blocks due to write lock (or you have to break
> >   fairness and then deal with starvation issues).
> >
> > So once you allow nesting of range locks, you have to very carefully define
> > what is and what is not allowed.
> 
> Yes. I do understand the problem with rwsem read nesting and how
> that can translate to reange locks.
> 
> That's why my range locks don't even try to block on other pending
> waiters. The case where read nesting vs write might occur like above
> is something like copy_file_range() vs truncate, but otherwise for
> IO locks we simply don't have arbitrarily deep nesting of range
> locks.
> 
> i.e. for your example my range lock would result in:
> 
> Thread 1		Thread 2
> read_lock(0,1000)	
> 			write_lock(500,1500)
> 			<finds conflicting read lock>
> 			<marks read lock as having a write waiter>
> 			<parks on range lock wait list>
> <...>
> read_lock_nested(1001,1500)
> <no overlapping range in tree>
> <gets nested range lock>
> 
> <....>
> read_unlock(1001,1500)	<stays blocked because nothing is waiting
> 		         on (1001,1500) so no wakeup>
> <....>
> read_unlock(0,1000)
> <sees write waiter flag, runs wakeup>
> 			<gets woken>
> 			<retries write lock>
> 			<write lock gained>
> 
> IOWs, I'm not trying to complicate the range lock implementation
> with complex stuff like waiter fairness or anti-starvation semantics
> at this point in time. Waiters simply don't impact whether a new lock
> can be gained or not, and hence the limitations of rwsem semantics
> don't apply.
> 
> If such functionality is necessary (and I suspect it will be to
> prevent AIO from delaying truncate and fallocate-like operations
> indefinitely) then I'll add a "barrier" lock type (e.g.
> range_lock_write_barrier()) that will block new range lock attempts
> across it's span.
> 
> However, because this can cause deadlocks like the above, a barrier
> lock will not block new *_lock_nested() or *_trylock() calls, hence
> allowing runs of nested range locking to complete and drain without
> deadlocking on a conflicting barrier range. And that still can't be
> modelled by existing rwsem semantics....

Clever :). Thanks for explanation.

> > That's why in my range lock implementation
> > ages back I've decided to treat range lock as a rwsem for deadlock
> > verification purposes.
> 
> IMO, treating a range lock as a rwsem for deadlock purposes defeats
> the purpose of adding range locks in the first place. The
> concurrency models are completely different, and some of the
> limitations on rwsems are a result of implementation choices rather
> than limitations of a rwsem construct.

Well, even a range lock that cannot nest allows concurrent non-overlapping
read and write to the same file which rwsem doesn't allow. But I agree that
your nesting variant offers more (but also at the cost of higher complexity
of the lock semantics).

> In reality I couldn't care less about what lockdep can or can't
> verify. I've always said lockdep is a crutch for people who don't
> understand locks and the concurrency model of the code they
> maintain. That obviously extends to the fact that lockdep
> verification limitations should not limit what we allow new locking
> primitives to do.

I didn't say we have to constrain new locking primitives to what lockdep
can support. It is just a locking correctness verification tool so naturally
lockdep should be taught to what it needs to know about any locking scheme
we come up with. And sometimes it is just too much effort which is why e.g.
page lock still doesn't have lockdep coverage.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2019-07-10  8:41 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-10 19:14 bcachefs status update (it's done cooking; let's get this sucker merged) Kent Overstreet
2019-06-10 19:14 ` [PATCH 01/12] Compiler Attributes: add __flatten Kent Overstreet
2019-06-12 17:16   ` Greg KH
2019-06-10 19:14 ` [PATCH 02/12] locking: SIX locks (shared/intent/exclusive) Kent Overstreet
2019-06-10 19:14 ` [PATCH 03/12] mm: pagecache add lock Kent Overstreet
2019-06-10 19:14 ` [PATCH 04/12] mm: export find_get_pages() Kent Overstreet
2019-06-10 19:14 ` [PATCH 05/12] fs: insert_inode_locked2() Kent Overstreet
2019-06-10 19:14 ` [PATCH 06/12] fs: factor out d_mark_tmpfile() Kent Overstreet
2019-06-10 19:14 ` [PATCH 07/12] Propagate gfp_t when allocating pte entries from __vmalloc Kent Overstreet
2019-06-10 19:14 ` [PATCH 08/12] block: Add some exports for bcachefs Kent Overstreet
2019-06-10 19:14 ` [PATCH 09/12] bcache: optimize continue_at_nobarrier() Kent Overstreet
2019-06-10 19:14 ` [PATCH 10/12] bcache: move closures to lib/ Kent Overstreet
2019-06-11 10:25   ` Coly Li
2019-06-13  7:28   ` Christoph Hellwig
2019-06-13 11:04     ` Kent Overstreet
2019-06-10 19:14 ` [PATCH 11/12] closures: closure_wait_event() Kent Overstreet
2019-06-11 10:25   ` Coly Li
2019-06-12 17:17   ` Greg KH
2019-06-10 19:14 ` [PATCH 12/12] closures: fix a race on wakeup from closure_sync Kent Overstreet
2019-07-16 10:47   ` Coly Li
2019-07-18  7:46     ` Coly Li
2019-07-22 17:22       ` Kent Overstreet
2020-11-24 23:07         ` Marc Smith
2020-11-25 18:10           ` Marc Smith
2019-06-10 20:46 ` bcachefs status update (it's done cooking; let's get this sucker merged) Linus Torvalds
2019-06-11  1:17   ` Kent Overstreet
2019-06-11  4:33     ` Dave Chinner
2019-06-12 16:21       ` Kent Overstreet
2019-06-12 23:02         ` Dave Chinner
2019-06-13 18:36           ` pagecache locking (was: bcachefs status update) merged) Kent Overstreet
2019-06-13 21:13             ` Andreas Dilger
2019-06-13 21:21               ` Kent Overstreet
2019-06-14  0:35                 ` Dave Chinner
2019-06-13 23:55             ` Dave Chinner
2019-06-14  2:30               ` Linus Torvalds
2019-06-14  7:30                 ` Dave Chinner
2019-06-15  1:15                   ` Linus Torvalds
2019-06-14  3:08               ` Linus Torvalds
2019-06-15  4:01                 ` Linus Torvalds
2019-06-17 22:47                   ` Dave Chinner
2019-06-17 23:38                     ` Linus Torvalds
2019-06-18  4:21                       ` Amir Goldstein
2019-06-19 10:38                         ` Jan Kara
2019-06-19 22:37                           ` Dave Chinner
2019-07-03  0:04                             ` pagecache locking Boaz Harrosh
     [not found]                               ` <DM6PR19MB250857CB8A3A1C8279D6F2F3C5FB0@DM6PR19MB2508.namprd19.prod.outlook.com>
2019-07-03  1:25                                 ` Boaz Harrosh
2019-07-05 23:31                               ` Dave Chinner
2019-07-07 15:05                                 ` Boaz Harrosh
2019-07-07 23:55                                   ` Dave Chinner
2019-07-08 13:31                                 ` Jan Kara
2019-07-09 23:47                                   ` Dave Chinner
2019-07-10  8:41                                     ` Jan Kara [this message]
2019-06-14 17:08               ` pagecache locking (was: bcachefs status update) merged) Kent Overstreet
2019-06-19  8:21           ` bcachefs status update (it's done cooking; let's get this sucker merged) Jan Kara
2019-07-03  1:04             ` [PATCH] mm: Support madvise_willneed override by Filesystems Boaz Harrosh
2019-07-03 17:21               ` Jan Kara
2019-07-03 18:03                 ` Boaz Harrosh
2019-06-11  4:55     ` bcachefs status update (it's done cooking; let's get this sucker merged) Linus Torvalds
2019-06-11 14:26       ` Matthew Wilcox
2019-06-11  4:10   ` Dave Chinner
2019-06-11  4:39     ` Linus Torvalds
2019-06-11  7:10       ` Dave Chinner
2019-06-12  2:07         ` Linus Torvalds
2019-07-03  5:59 ` Stefan K
2020-06-10 11:02 ` Stefan K
2020-06-10 11:32   ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190710084153.GA962@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=amir73il@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=hch@lst.de \
    --cc=josef@toxicpanda.com \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=openosd@gmail.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.