All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 0/6 RFC] Mapping range lock
Date: Wed, 6 Feb 2013 20:25:34 +0100	[thread overview]
Message-ID: <20130206192534.GB11254@quack.suse.cz> (raw)
In-Reply-To: <20130205232512.GR2667@dastard>

On Wed 06-02-13 10:25:12, Dave Chinner wrote:
> On Mon, Feb 04, 2013 at 01:38:31PM +0100, Jan Kara wrote:
> > On Thu 31-01-13 16:07:57, Andrew Morton wrote:
> > > > c) i_mutex doesn't allow any paralellism of operations using it and some
> > > >    filesystems workaround this for specific cases (e.g. DIO reads). Using
> > > >    range locking allows for concurrent operations (e.g. writes, DIO) on
> > > >    different parts of the file. Of course, range locking itself isn't
> > > >    enough to make the parallelism possible. Filesystems still have to
> > > >    somehow deal with the concurrency when manipulating inode allocation
> > > >    data. But the range locking at least provides a common VFS mechanism for
> > > >    serialization VFS itself needs and it's upto each filesystem to
> > > >    serialize more if it needs to.
> > > 
> > > That would be useful to end-users, but I'm having trouble predicting
> > > *how* useful.
> >   As Zheng said, there are people interested in this for DIO. Currently
> > filesystems each invent their own tweaks to avoid the serialization at
> > least for the easiest cases.
> 
> The thing is, this won't replace the locking those filesystems use
> to parallelise DIO - it just adds another layer of locking they'll
> need to use. The locks filesystems like XFS use to serialise IO
> against hole punch also serialise against many more internal
> functions and so if these range locks don't have the same capability
> we're going to have to retain those locks even after the range locks
> are introduced. It basically means we're going to have two layers
> of range locks - one for IO sanity and atomicity, and then this
> layer just for hole punch vs mmap.
> 
> As i've said before, what we really need in XFS is IO range locks
> because we need to be able to serialise operations against IO in
> progress, not page cache operations in progress.
  Hum, I'm not sure I follow you here. So mapping tree lock + PageLocked +
PageWriteback serialize all IO for part of the file underlying the page.
I.e. at most one of truncate (punch hole), DIO, writeback, buffered write,
buffered read, page fault can run on that part of file. So how come it
doesn't provide enough serialization for XFS?

Ah, is it the problem that if two threads do overlapping buffered writes
to a file then we can end up with data mixed from the two writes (if we
didn't have something like i_mutex)?

> IOWs, locking at
> the mapping tree level does not provide the right exclusion
> semantics we need to get rid of the existing filesystem locking that
> allows concurrent IO to be managed.  Hence the XFS IO path locking
> suddenly because 4 locks deep:
> 
> 	i_mutex
> 	  XFS_IOLOCK_{SHARED,EXCL}
> 	    mapping range lock
> 	      XFS_ILOCK_{SHARED,EXCL}
> 
> That's because the buffered IO path uses per-page lock ranges and to
> provide atomicity of read vs write, read vs truncate, etc we still
> need to use the XFS_IOLOCK_EXCL to provide this functionality.
>
> Hence I really think we need to be driving this lock outwards to
> where the i_mutex currently sits, turning it into an *IO range
> lock*, and not an inner-level mapping range lock. i.e flattening the
> locking to:
> 
> 	io_range_lock(off, len)
> 	  fs internal inode metadata modification lock
  If I get you right, your IO range lock would be +- what current mapping
tree lock is for DIO and truncate but for buffered IO you'd want to release
it after the read / write is finished? That is possible to do if we keep the
per-page granularity but that's what you seem to have problems with.

> Yes, I know this causes problems with mmap and locking orders, but
> perhaps we should be trying to get that fixed first because it
> simplifies the whole locking schema we need for filesystems to
> behave sanely. i.e. shouldn't we be aiming to simplify things
> as we rework locking rather than make the more complex?
  Yes. I was looking at how we could free filesystems from mmap_sem locking
issues but as Al Viro put it, the use of mmap_sem is a mess which isn't
easy to untangle. I want to have a look at it but I fear it's going to be a
long run...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 0/6 RFC] Mapping range lock
Date: Wed, 6 Feb 2013 20:25:34 +0100	[thread overview]
Message-ID: <20130206192534.GB11254@quack.suse.cz> (raw)
In-Reply-To: <20130205232512.GR2667@dastard>

On Wed 06-02-13 10:25:12, Dave Chinner wrote:
> On Mon, Feb 04, 2013 at 01:38:31PM +0100, Jan Kara wrote:
> > On Thu 31-01-13 16:07:57, Andrew Morton wrote:
> > > > c) i_mutex doesn't allow any paralellism of operations using it and some
> > > >    filesystems workaround this for specific cases (e.g. DIO reads). Using
> > > >    range locking allows for concurrent operations (e.g. writes, DIO) on
> > > >    different parts of the file. Of course, range locking itself isn't
> > > >    enough to make the parallelism possible. Filesystems still have to
> > > >    somehow deal with the concurrency when manipulating inode allocation
> > > >    data. But the range locking at least provides a common VFS mechanism for
> > > >    serialization VFS itself needs and it's upto each filesystem to
> > > >    serialize more if it needs to.
> > > 
> > > That would be useful to end-users, but I'm having trouble predicting
> > > *how* useful.
> >   As Zheng said, there are people interested in this for DIO. Currently
> > filesystems each invent their own tweaks to avoid the serialization at
> > least for the easiest cases.
> 
> The thing is, this won't replace the locking those filesystems use
> to parallelise DIO - it just adds another layer of locking they'll
> need to use. The locks filesystems like XFS use to serialise IO
> against hole punch also serialise against many more internal
> functions and so if these range locks don't have the same capability
> we're going to have to retain those locks even after the range locks
> are introduced. It basically means we're going to have two layers
> of range locks - one for IO sanity and atomicity, and then this
> layer just for hole punch vs mmap.
> 
> As i've said before, what we really need in XFS is IO range locks
> because we need to be able to serialise operations against IO in
> progress, not page cache operations in progress.
  Hum, I'm not sure I follow you here. So mapping tree lock + PageLocked +
PageWriteback serialize all IO for part of the file underlying the page.
I.e. at most one of truncate (punch hole), DIO, writeback, buffered write,
buffered read, page fault can run on that part of file. So how come it
doesn't provide enough serialization for XFS?

Ah, is it the problem that if two threads do overlapping buffered writes
to a file then we can end up with data mixed from the two writes (if we
didn't have something like i_mutex)?

> IOWs, locking at
> the mapping tree level does not provide the right exclusion
> semantics we need to get rid of the existing filesystem locking that
> allows concurrent IO to be managed.  Hence the XFS IO path locking
> suddenly because 4 locks deep:
> 
> 	i_mutex
> 	  XFS_IOLOCK_{SHARED,EXCL}
> 	    mapping range lock
> 	      XFS_ILOCK_{SHARED,EXCL}
> 
> That's because the buffered IO path uses per-page lock ranges and to
> provide atomicity of read vs write, read vs truncate, etc we still
> need to use the XFS_IOLOCK_EXCL to provide this functionality.
>
> Hence I really think we need to be driving this lock outwards to
> where the i_mutex currently sits, turning it into an *IO range
> lock*, and not an inner-level mapping range lock. i.e flattening the
> locking to:
> 
> 	io_range_lock(off, len)
> 	  fs internal inode metadata modification lock
  If I get you right, your IO range lock would be +- what current mapping
tree lock is for DIO and truncate but for buffered IO you'd want to release
it after the read / write is finished? That is possible to do if we keep the
per-page granularity but that's what you seem to have problems with.

> Yes, I know this causes problems with mmap and locking orders, but
> perhaps we should be trying to get that fixed first because it
> simplifies the whole locking schema we need for filesystems to
> behave sanely. i.e. shouldn't we be aiming to simplify things
> as we rework locking rather than make the more complex?
  Yes. I was looking at how we could free filesystems from mmap_sem locking
issues but as Al Viro put it, the use of mmap_sem is a mess which isn't
easy to untangle. I want to have a look at it but I fear it's going to be a
long run...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-02-06 19:25 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-31 21:49 [PATCH 0/6 RFC] Mapping range lock Jan Kara
2013-01-31 21:49 ` Jan Kara
2013-01-31 21:49 ` [PATCH 1/6] lib: Implement range locks Jan Kara
2013-01-31 21:49   ` Jan Kara
2013-01-31 23:57   ` Andrew Morton
2013-01-31 23:57     ` Andrew Morton
2013-02-04 16:41     ` Jan Kara
2013-02-04 16:41       ` Jan Kara
2013-02-11  5:42   ` Michel Lespinasse
2013-02-11  5:42     ` Michel Lespinasse
2013-02-11 10:27     ` Jan Kara
2013-02-11 10:27       ` Jan Kara
2013-02-11 11:03       ` Michel Lespinasse
2013-02-11 11:03         ` Michel Lespinasse
2013-02-11 12:58         ` Jan Kara
2013-02-11 12:58           ` Jan Kara
2013-01-31 21:49 ` [PATCH 2/6] fs: Take mapping lock in generic read paths Jan Kara
2013-01-31 21:49   ` Jan Kara
2013-01-31 23:59   ` Andrew Morton
2013-01-31 23:59     ` Andrew Morton
2013-02-04 12:47     ` Jan Kara
2013-02-04 12:47       ` Jan Kara
2013-02-08 14:59       ` Jan Kara
2013-02-08 14:59         ` Jan Kara
2013-01-31 21:49 ` [PATCH 3/6] fs: Provide function to take mapping lock in buffered write path Jan Kara
2013-01-31 21:49   ` Jan Kara
2013-01-31 21:49 ` [PATCH 4/6] fs: Don't call dio_cleanup() before submitting all bios Jan Kara
2013-01-31 21:49   ` Jan Kara
2013-01-31 21:49 ` [PATCH 5/6] fs: Take mapping lock during direct IO Jan Kara
2013-01-31 21:49   ` Jan Kara
2013-01-31 21:49 ` [PATCH 6/6] ext3: Convert ext3 to use mapping lock Jan Kara
2013-01-31 21:49   ` Jan Kara
2013-02-01  0:07 ` [PATCH 0/6 RFC] Mapping range lock Andrew Morton
2013-02-01  0:07   ` Andrew Morton
2013-02-04  9:29   ` Zheng Liu
2013-02-04  9:29     ` Zheng Liu
2013-02-04 12:38   ` Jan Kara
2013-02-04 12:38     ` Jan Kara
2013-02-05 23:25     ` Dave Chinner
2013-02-05 23:25       ` Dave Chinner
2013-02-06 19:25       ` Jan Kara [this message]
2013-02-06 19:25         ` Jan Kara
2013-02-07  2:43         ` Dave Chinner
2013-02-07  2:43           ` Dave Chinner
2013-02-07 11:06           ` Jan Kara
2013-02-07 11:06             ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130206192534.GB11254@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.