All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Latchesar Ionkov <lucho@ionkov.net>,
	Trond Myklebust <trond.myklebust@primarydata.com>,
	Linux MM <linux-mm@kvack.org>, Christoph Hellwig <hch@lst.de>,
	linux-cifs@vger.kernel.org,
	Matthew Wilcox <mawilcox@microsoft.com>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	Eric Van Hensbergen <ericvh@gmail.com>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	v9fs-developer@lists.sourceforge.net,
	Jens Axboe <axboe@kernel.dk>,
	linux-nfs@vger.kernel.org,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	samba-technical@lists.samba.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Steve French <sfrench@samba.org>,
	Alexey Kuznetsov <kuznet@virtuozzo.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Ron Minnich <rminnich@sandia.gov>,
	Andrew Morton <akpm@linux-foundation.org>,
	Anna Schumaker <anna.schumaker@netapp.com>
Subject: Re: [PATCH 2/2] dax: fix data corruption due to stale mmap reads
Date: Mon, 1 May 2017 15:59:10 -0700	[thread overview]
Message-ID: <CAPcyv4jWWW6FUC_4f-FunPBva4TY2bdn7FQb+9nhVvA3zx6DiQ@mail.gmail.com> (raw)
In-Reply-To: <20170427072659.GA29789@quack2.suse.cz>

On Thu, Apr 27, 2017 at 12:26 AM, Jan Kara <jack@suse.cz> wrote:
> On Wed 26-04-17 16:52:36, Ross Zwisler wrote:
>> On Wed, Apr 26, 2017 at 10:52:35AM +0200, Jan Kara wrote:
>> > On Tue 25-04-17 16:59:36, Ross Zwisler wrote:
>> > > On Tue, Apr 25, 2017 at 01:10:43PM +0200, Jan Kara wrote:
>> > > <>
>> > > > Hum, but now thinking more about it I have hard time figuring out why write
>> > > > vs fault cannot actually still race:
>> > > >
>> > > > CPU1 - write(2)                         CPU2 - read fault
>> > > >
>> > > >                                         dax_iomap_pte_fault()
>> > > >                                           ->iomap_begin() - sees hole
>> > > > dax_iomap_rw()
>> > > >   iomap_apply()
>> > > >     ->iomap_begin - allocates blocks
>> > > >     dax_iomap_actor()
>> > > >       invalidate_inode_pages2_range()
>> > > >         - there's nothing to invalidate
>> > > >                                           grab_mapping_entry()
>> > > >                                           - we add zero page in the radix
>> > > >                                             tree & map it to page tables
>> > > >
>> > > > Similarly read vs write fault may end up racing in a wrong way and try to
>> > > > replace already existing exceptional entry with a hole page?
>> > >
>> > > Yep, this race seems real to me, too.  This seems very much like the issues
>> > > that exist when a thread is doing direct I/O.  One thread is doing I/O to an
>> > > intermediate buffer (page cache for direct I/O case, zero page for us), and
>> > > the other is going around it directly to media, and they can get out of sync.
>> > >
>> > > IIRC the direct I/O code looked something like:
>> > >
>> > > 1/ invalidate existing mappings
>> > > 2/ do direct I/O to media
>> > > 3/ invalidate mappings again, just in case.  Should be cheap if there weren't
>> > >    any conflicting faults.  This makes sure any new allocations we made are
>> > >    faulted in.
>> >
>> > Yeah, the problem is people generally expect weird behavior when they mix
>> > direct and buffered IO (or let alone mmap) however everyone expects
>> > standard read(2) and write(2) to be completely coherent with mmap(2).
>>
>> Yep, fair enough.
>>
>> > > I guess one option would be to replicate that logic in the DAX I/O path, or we
>> > > could try and enhance our locking so page faults can't race with I/O since
>> > > both can allocate blocks.
>> >
>> > In the abstract way, the problem is that we have radix tree (and page
>> > tables) cache block mapping information and the operation: "read block
>> > mapping information, store it in the radix tree" is not serialized in any
>> > way against other block allocations so the information we store can be out
>> > of date by the time we store it.
>> >
>> > One way to solve this would be to move ->iomap_begin call in the fault
>> > paths under entry lock although that would mean I have to redo how ext4
>> > handles DAX faults because with current code it would create lock inversion
>> > wrt transaction start.
>>
>> I don't think this alone is enough to save us.  The I/O path doesn't currently
>> take any DAX radix tree entry locks, so our race would just become:
>>
>> CPU1 - write(2)                               CPU2 - read fault
>>
>>                                       dax_iomap_pte_fault()
>>                                         grab_mapping_entry() // newly moved
>>                                         ->iomap_begin() - sees hole
>> dax_iomap_rw()
>>   iomap_apply()
>>     ->iomap_begin - allocates blocks
>>     dax_iomap_actor()
>>       invalidate_inode_pages2_range()
>>         - there's nothing to invalidate
>>                                         - we add zero page in the radix
>>                                           tree & map it to page tables
>>
>> In their current form I don't think we want to take DAX radix tree entry locks
>> in the I/O path because that would effectively serialize I/O over a given
>> radix tree entry. For a 2MiB entry, for example, all I/O to that 2MiB range
>> would be serialized.
>
> Note that invalidate_inode_pages2_range() will see the entry created by
> grab_mapping_entry() on CPU2 and block waiting for its lock and this is
> exactly what stops the race. The invalidate_inode_pages2_range()
> effectively makes sure there isn't any page fault in progress for given
> range...
>
> Also note that writes to a file are serialized by i_rwsem anyway (and at
> least serialization of writes to the overlapping range is required by POSIX)
> so this doesn't add any more serialization than we already have.
>
>> > Another solution would be to grab i_mmap_sem for write when doing write
>> > fault of a page and similarly have it grabbed for writing when doing
>> > write(2). This would scale rather poorly but if we later replaced it with a
>> > range lock (Davidlohr has already posted a nice implementation of it) it
>> > won't be as bad. But I guess option 1) is better...
>>
>> The best idea I had for handling this sounds similar, which would be to
>> convert the radix tree locks to essentially be reader/writer locks.  I/O and
>> faults that don't modify the block mapping could just take read-level locks,
>> and could all run concurrently.  I/O or faults that modify a block mapping
>> would take a write lock, and serialize with other writers and readers.
>
> Well, this would be difficult to implement inside the radix tree (not
> enough bits in the entry) so you'd have to go for some external locking
> primitive anyway. And if you do that, read-write range lock Davidlohr has
> implemented is what you describe - well we could also have a radix tree
> with rwsems but I suspect the overhead of maintaining that would be too
> large. It would require larger rewrite than reusing entry locks as I
> suggest above though and it isn't an obvious performance win for realistic
> workloads either so I'd like to see some performance numbers before going
> that way. It likely improves a situation where processes race to fault the
> same page for which we already know the block mapping but I'm not sure if
> that translates to any measurable performance wins for workloads on DAX
> filesystem.

I'm also concerned about inventing new / fancy radix infrastructure
when we're already in the space of needing struct page for any
non-trivial usage of dax. As Kirill's transparent-huge-page page cache
implementation matures I'd be interested in looking at a transition
path away from radix locking towards something that it shared with the
common case page cache locking.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
Cc: Latchesar Ionkov <lucho-OnYtXJJ0/fesTnJN9+BGXg@public.gmane.org>,
	Trond Myklebust
	<trond.myklebust-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>,
	Linux MM <linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>,
	Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Matthew Wilcox <mawilcox-0li6OtcxBFHby3iVrkZq2A@public.gmane.org>,
	Andrey Ryabinin
	<aryabinin-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>,
	Eric Van Hensbergen
	<ericvh-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	"linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org"
	<linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org>,
	Alexander Viro
	<viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>,
	v9fs-developer-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org,
	Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	"Darrick J. Wong"
	<darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	samba-technical-w/Ol4Ecudpl8XjKLYN78aQ@public.gmane.org,
	"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Steve French <sfrench-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>,
	Alexey Kuznetsov <kuznet-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	linux-fsdevel
	<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Ron Minnich <rminnich-4OHPYypu0djtX7QSmKvirg@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Anna
Subject: Re: [PATCH 2/2] dax: fix data corruption due to stale mmap reads
Date: Mon, 1 May 2017 15:59:10 -0700	[thread overview]
Message-ID: <CAPcyv4jWWW6FUC_4f-FunPBva4TY2bdn7FQb+9nhVvA3zx6DiQ@mail.gmail.com> (raw)
In-Reply-To: <20170427072659.GA29789-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>

On Thu, Apr 27, 2017 at 12:26 AM, Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org> wrote:
> On Wed 26-04-17 16:52:36, Ross Zwisler wrote:
>> On Wed, Apr 26, 2017 at 10:52:35AM +0200, Jan Kara wrote:
>> > On Tue 25-04-17 16:59:36, Ross Zwisler wrote:
>> > > On Tue, Apr 25, 2017 at 01:10:43PM +0200, Jan Kara wrote:
>> > > <>
>> > > > Hum, but now thinking more about it I have hard time figuring out why write
>> > > > vs fault cannot actually still race:
>> > > >
>> > > > CPU1 - write(2)                         CPU2 - read fault
>> > > >
>> > > >                                         dax_iomap_pte_fault()
>> > > >                                           ->iomap_begin() - sees hole
>> > > > dax_iomap_rw()
>> > > >   iomap_apply()
>> > > >     ->iomap_begin - allocates blocks
>> > > >     dax_iomap_actor()
>> > > >       invalidate_inode_pages2_range()
>> > > >         - there's nothing to invalidate
>> > > >                                           grab_mapping_entry()
>> > > >                                           - we add zero page in the radix
>> > > >                                             tree & map it to page tables
>> > > >
>> > > > Similarly read vs write fault may end up racing in a wrong way and try to
>> > > > replace already existing exceptional entry with a hole page?
>> > >
>> > > Yep, this race seems real to me, too.  This seems very much like the issues
>> > > that exist when a thread is doing direct I/O.  One thread is doing I/O to an
>> > > intermediate buffer (page cache for direct I/O case, zero page for us), and
>> > > the other is going around it directly to media, and they can get out of sync.
>> > >
>> > > IIRC the direct I/O code looked something like:
>> > >
>> > > 1/ invalidate existing mappings
>> > > 2/ do direct I/O to media
>> > > 3/ invalidate mappings again, just in case.  Should be cheap if there weren't
>> > >    any conflicting faults.  This makes sure any new allocations we made are
>> > >    faulted in.
>> >
>> > Yeah, the problem is people generally expect weird behavior when they mix
>> > direct and buffered IO (or let alone mmap) however everyone expects
>> > standard read(2) and write(2) to be completely coherent with mmap(2).
>>
>> Yep, fair enough.
>>
>> > > I guess one option would be to replicate that logic in the DAX I/O path, or we
>> > > could try and enhance our locking so page faults can't race with I/O since
>> > > both can allocate blocks.
>> >
>> > In the abstract way, the problem is that we have radix tree (and page
>> > tables) cache block mapping information and the operation: "read block
>> > mapping information, store it in the radix tree" is not serialized in any
>> > way against other block allocations so the information we store can be out
>> > of date by the time we store it.
>> >
>> > One way to solve this would be to move ->iomap_begin call in the fault
>> > paths under entry lock although that would mean I have to redo how ext4
>> > handles DAX faults because with current code it would create lock inversion
>> > wrt transaction start.
>>
>> I don't think this alone is enough to save us.  The I/O path doesn't currently
>> take any DAX radix tree entry locks, so our race would just become:
>>
>> CPU1 - write(2)                               CPU2 - read fault
>>
>>                                       dax_iomap_pte_fault()
>>                                         grab_mapping_entry() // newly moved
>>                                         ->iomap_begin() - sees hole
>> dax_iomap_rw()
>>   iomap_apply()
>>     ->iomap_begin - allocates blocks
>>     dax_iomap_actor()
>>       invalidate_inode_pages2_range()
>>         - there's nothing to invalidate
>>                                         - we add zero page in the radix
>>                                           tree & map it to page tables
>>
>> In their current form I don't think we want to take DAX radix tree entry locks
>> in the I/O path because that would effectively serialize I/O over a given
>> radix tree entry. For a 2MiB entry, for example, all I/O to that 2MiB range
>> would be serialized.
>
> Note that invalidate_inode_pages2_range() will see the entry created by
> grab_mapping_entry() on CPU2 and block waiting for its lock and this is
> exactly what stops the race. The invalidate_inode_pages2_range()
> effectively makes sure there isn't any page fault in progress for given
> range...
>
> Also note that writes to a file are serialized by i_rwsem anyway (and at
> least serialization of writes to the overlapping range is required by POSIX)
> so this doesn't add any more serialization than we already have.
>
>> > Another solution would be to grab i_mmap_sem for write when doing write
>> > fault of a page and similarly have it grabbed for writing when doing
>> > write(2). This would scale rather poorly but if we later replaced it with a
>> > range lock (Davidlohr has already posted a nice implementation of it) it
>> > won't be as bad. But I guess option 1) is better...
>>
>> The best idea I had for handling this sounds similar, which would be to
>> convert the radix tree locks to essentially be reader/writer locks.  I/O and
>> faults that don't modify the block mapping could just take read-level locks,
>> and could all run concurrently.  I/O or faults that modify a block mapping
>> would take a write lock, and serialize with other writers and readers.
>
> Well, this would be difficult to implement inside the radix tree (not
> enough bits in the entry) so you'd have to go for some external locking
> primitive anyway. And if you do that, read-write range lock Davidlohr has
> implemented is what you describe - well we could also have a radix tree
> with rwsems but I suspect the overhead of maintaining that would be too
> large. It would require larger rewrite than reusing entry locks as I
> suggest above though and it isn't an obvious performance win for realistic
> workloads either so I'd like to see some performance numbers before going
> that way. It likely improves a situation where processes race to fault the
> same page for which we already know the block mapping but I'm not sure if
> that translates to any measurable performance wins for workloads on DAX
> filesystem.

I'm also concerned about inventing new / fancy radix infrastructure
when we're already in the space of needing struct page for any
non-trivial usage of dax. As Kirill's transparent-huge-page page cache
implementation matures I'd be interested in looking at a transition
path away from radix locking towards something that it shared with the
common case page cache locking.

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Alexey Kuznetsov <kuznet@virtuozzo.com>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	Anna Schumaker <anna.schumaker@netapp.com>,
	Christoph Hellwig <hch@lst.de>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Eric Van Hensbergen <ericvh@gmail.com>,
	Jens Axboe <axboe@kernel.dk>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Latchesar Ionkov <lucho@ionkov.net>,
	linux-cifs@vger.kernel.org,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	linux-nfs@vger.kernel.org,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
	Matthew Wilcox <mawilcox@microsoft.com>,
	Ron Minnich <rminnich@sandia.gov>,
	samba-technical@lists.samba.org, Steve French <sfrench@samba.org>,
	Trond Myklebust <trond.myklebust@primarydata.com>,
	v9fs-developer@lists.sourceforge.net
Subject: Re: [PATCH 2/2] dax: fix data corruption due to stale mmap reads
Date: Mon, 1 May 2017 15:59:10 -0700	[thread overview]
Message-ID: <CAPcyv4jWWW6FUC_4f-FunPBva4TY2bdn7FQb+9nhVvA3zx6DiQ@mail.gmail.com> (raw)
In-Reply-To: <20170427072659.GA29789@quack2.suse.cz>

On Thu, Apr 27, 2017 at 12:26 AM, Jan Kara <jack@suse.cz> wrote:
> On Wed 26-04-17 16:52:36, Ross Zwisler wrote:
>> On Wed, Apr 26, 2017 at 10:52:35AM +0200, Jan Kara wrote:
>> > On Tue 25-04-17 16:59:36, Ross Zwisler wrote:
>> > > On Tue, Apr 25, 2017 at 01:10:43PM +0200, Jan Kara wrote:
>> > > <>
>> > > > Hum, but now thinking more about it I have hard time figuring out why write
>> > > > vs fault cannot actually still race:
>> > > >
>> > > > CPU1 - write(2)                         CPU2 - read fault
>> > > >
>> > > >                                         dax_iomap_pte_fault()
>> > > >                                           ->iomap_begin() - sees hole
>> > > > dax_iomap_rw()
>> > > >   iomap_apply()
>> > > >     ->iomap_begin - allocates blocks
>> > > >     dax_iomap_actor()
>> > > >       invalidate_inode_pages2_range()
>> > > >         - there's nothing to invalidate
>> > > >                                           grab_mapping_entry()
>> > > >                                           - we add zero page in the radix
>> > > >                                             tree & map it to page tables
>> > > >
>> > > > Similarly read vs write fault may end up racing in a wrong way and try to
>> > > > replace already existing exceptional entry with a hole page?
>> > >
>> > > Yep, this race seems real to me, too.  This seems very much like the issues
>> > > that exist when a thread is doing direct I/O.  One thread is doing I/O to an
>> > > intermediate buffer (page cache for direct I/O case, zero page for us), and
>> > > the other is going around it directly to media, and they can get out of sync.
>> > >
>> > > IIRC the direct I/O code looked something like:
>> > >
>> > > 1/ invalidate existing mappings
>> > > 2/ do direct I/O to media
>> > > 3/ invalidate mappings again, just in case.  Should be cheap if there weren't
>> > >    any conflicting faults.  This makes sure any new allocations we made are
>> > >    faulted in.
>> >
>> > Yeah, the problem is people generally expect weird behavior when they mix
>> > direct and buffered IO (or let alone mmap) however everyone expects
>> > standard read(2) and write(2) to be completely coherent with mmap(2).
>>
>> Yep, fair enough.
>>
>> > > I guess one option would be to replicate that logic in the DAX I/O path, or we
>> > > could try and enhance our locking so page faults can't race with I/O since
>> > > both can allocate blocks.
>> >
>> > In the abstract way, the problem is that we have radix tree (and page
>> > tables) cache block mapping information and the operation: "read block
>> > mapping information, store it in the radix tree" is not serialized in any
>> > way against other block allocations so the information we store can be out
>> > of date by the time we store it.
>> >
>> > One way to solve this would be to move ->iomap_begin call in the fault
>> > paths under entry lock although that would mean I have to redo how ext4
>> > handles DAX faults because with current code it would create lock inversion
>> > wrt transaction start.
>>
>> I don't think this alone is enough to save us.  The I/O path doesn't currently
>> take any DAX radix tree entry locks, so our race would just become:
>>
>> CPU1 - write(2)                               CPU2 - read fault
>>
>>                                       dax_iomap_pte_fault()
>>                                         grab_mapping_entry() // newly moved
>>                                         ->iomap_begin() - sees hole
>> dax_iomap_rw()
>>   iomap_apply()
>>     ->iomap_begin - allocates blocks
>>     dax_iomap_actor()
>>       invalidate_inode_pages2_range()
>>         - there's nothing to invalidate
>>                                         - we add zero page in the radix
>>                                           tree & map it to page tables
>>
>> In their current form I don't think we want to take DAX radix tree entry locks
>> in the I/O path because that would effectively serialize I/O over a given
>> radix tree entry. For a 2MiB entry, for example, all I/O to that 2MiB range
>> would be serialized.
>
> Note that invalidate_inode_pages2_range() will see the entry created by
> grab_mapping_entry() on CPU2 and block waiting for its lock and this is
> exactly what stops the race. The invalidate_inode_pages2_range()
> effectively makes sure there isn't any page fault in progress for given
> range...
>
> Also note that writes to a file are serialized by i_rwsem anyway (and at
> least serialization of writes to the overlapping range is required by POSIX)
> so this doesn't add any more serialization than we already have.
>
>> > Another solution would be to grab i_mmap_sem for write when doing write
>> > fault of a page and similarly have it grabbed for writing when doing
>> > write(2). This would scale rather poorly but if we later replaced it with a
>> > range lock (Davidlohr has already posted a nice implementation of it) it
>> > won't be as bad. But I guess option 1) is better...
>>
>> The best idea I had for handling this sounds similar, which would be to
>> convert the radix tree locks to essentially be reader/writer locks.  I/O and
>> faults that don't modify the block mapping could just take read-level locks,
>> and could all run concurrently.  I/O or faults that modify a block mapping
>> would take a write lock, and serialize with other writers and readers.
>
> Well, this would be difficult to implement inside the radix tree (not
> enough bits in the entry) so you'd have to go for some external locking
> primitive anyway. And if you do that, read-write range lock Davidlohr has
> implemented is what you describe - well we could also have a radix tree
> with rwsems but I suspect the overhead of maintaining that would be too
> large. It would require larger rewrite than reusing entry locks as I
> suggest above though and it isn't an obvious performance win for realistic
> workloads either so I'd like to see some performance numbers before going
> that way. It likely improves a situation where processes race to fault the
> same page for which we already know the block mapping but I'm not sure if
> that translates to any measurable performance wins for workloads on DAX
> filesystem.

I'm also concerned about inventing new / fancy radix infrastructure
when we're already in the space of needing struct page for any
non-trivial usage of dax. As Kirill's transparent-huge-page page cache
implementation matures I'd be interested in looking at a transition
path away from radix locking towards something that it shared with the
common case page cache locking.

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Alexey Kuznetsov <kuznet@virtuozzo.com>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	Anna Schumaker <anna.schumaker@netapp.com>,
	Christoph Hellwig <hch@lst.de>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Eric Van Hensbergen <ericvh@gmail.com>,
	Jens Axboe <axboe@kernel.dk>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Latchesar Ionkov <lucho@ionkov.net>,
	linux-cifs@vger.kernel.org,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	linux-nfs@vger.kernel.org,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	Matthew Wilcox <mawilcox@microsoft.com>,
	Ron Minnich <rminnich@sandia.gov>,
	samba-technical@lists.samba.org, Steve French <sfrench@samba.org>,
	Trond Myklebust <trond.myklebust@primarydata.com>,
	v9fs-developer@lists.sourceforge.net
Subject: Re: [PATCH 2/2] dax: fix data corruption due to stale mmap reads
Date: Mon, 1 May 2017 15:59:10 -0700	[thread overview]
Message-ID: <CAPcyv4jWWW6FUC_4f-FunPBva4TY2bdn7FQb+9nhVvA3zx6DiQ@mail.gmail.com> (raw)
In-Reply-To: <20170427072659.GA29789@quack2.suse.cz>

On Thu, Apr 27, 2017 at 12:26 AM, Jan Kara <jack@suse.cz> wrote:
> On Wed 26-04-17 16:52:36, Ross Zwisler wrote:
>> On Wed, Apr 26, 2017 at 10:52:35AM +0200, Jan Kara wrote:
>> > On Tue 25-04-17 16:59:36, Ross Zwisler wrote:
>> > > On Tue, Apr 25, 2017 at 01:10:43PM +0200, Jan Kara wrote:
>> > > <>
>> > > > Hum, but now thinking more about it I have hard time figuring out why write
>> > > > vs fault cannot actually still race:
>> > > >
>> > > > CPU1 - write(2)                         CPU2 - read fault
>> > > >
>> > > >                                         dax_iomap_pte_fault()
>> > > >                                           ->iomap_begin() - sees hole
>> > > > dax_iomap_rw()
>> > > >   iomap_apply()
>> > > >     ->iomap_begin - allocates blocks
>> > > >     dax_iomap_actor()
>> > > >       invalidate_inode_pages2_range()
>> > > >         - there's nothing to invalidate
>> > > >                                           grab_mapping_entry()
>> > > >                                           - we add zero page in the radix
>> > > >                                             tree & map it to page tables
>> > > >
>> > > > Similarly read vs write fault may end up racing in a wrong way and try to
>> > > > replace already existing exceptional entry with a hole page?
>> > >
>> > > Yep, this race seems real to me, too.  This seems very much like the issues
>> > > that exist when a thread is doing direct I/O.  One thread is doing I/O to an
>> > > intermediate buffer (page cache for direct I/O case, zero page for us), and
>> > > the other is going around it directly to media, and they can get out of sync.
>> > >
>> > > IIRC the direct I/O code looked something like:
>> > >
>> > > 1/ invalidate existing mappings
>> > > 2/ do direct I/O to media
>> > > 3/ invalidate mappings again, just in case.  Should be cheap if there weren't
>> > >    any conflicting faults.  This makes sure any new allocations we made are
>> > >    faulted in.
>> >
>> > Yeah, the problem is people generally expect weird behavior when they mix
>> > direct and buffered IO (or let alone mmap) however everyone expects
>> > standard read(2) and write(2) to be completely coherent with mmap(2).
>>
>> Yep, fair enough.
>>
>> > > I guess one option would be to replicate that logic in the DAX I/O path, or we
>> > > could try and enhance our locking so page faults can't race with I/O since
>> > > both can allocate blocks.
>> >
>> > In the abstract way, the problem is that we have radix tree (and page
>> > tables) cache block mapping information and the operation: "read block
>> > mapping information, store it in the radix tree" is not serialized in any
>> > way against other block allocations so the information we store can be out
>> > of date by the time we store it.
>> >
>> > One way to solve this would be to move ->iomap_begin call in the fault
>> > paths under entry lock although that would mean I have to redo how ext4
>> > handles DAX faults because with current code it would create lock inversion
>> > wrt transaction start.
>>
>> I don't think this alone is enough to save us.  The I/O path doesn't currently
>> take any DAX radix tree entry locks, so our race would just become:
>>
>> CPU1 - write(2)                               CPU2 - read fault
>>
>>                                       dax_iomap_pte_fault()
>>                                         grab_mapping_entry() // newly moved
>>                                         ->iomap_begin() - sees hole
>> dax_iomap_rw()
>>   iomap_apply()
>>     ->iomap_begin - allocates blocks
>>     dax_iomap_actor()
>>       invalidate_inode_pages2_range()
>>         - there's nothing to invalidate
>>                                         - we add zero page in the radix
>>                                           tree & map it to page tables
>>
>> In their current form I don't think we want to take DAX radix tree entry locks
>> in the I/O path because that would effectively serialize I/O over a given
>> radix tree entry. For a 2MiB entry, for example, all I/O to that 2MiB range
>> would be serialized.
>
> Note that invalidate_inode_pages2_range() will see the entry created by
> grab_mapping_entry() on CPU2 and block waiting for its lock and this is
> exactly what stops the race. The invalidate_inode_pages2_range()
> effectively makes sure there isn't any page fault in progress for given
> range...
>
> Also note that writes to a file are serialized by i_rwsem anyway (and at
> least serialization of writes to the overlapping range is required by POSIX)
> so this doesn't add any more serialization than we already have.
>
>> > Another solution would be to grab i_mmap_sem for write when doing write
>> > fault of a page and similarly have it grabbed for writing when doing
>> > write(2). This would scale rather poorly but if we later replaced it with a
>> > range lock (Davidlohr has already posted a nice implementation of it) it
>> > won't be as bad. But I guess option 1) is better...
>>
>> The best idea I had for handling this sounds similar, which would be to
>> convert the radix tree locks to essentially be reader/writer locks.  I/O and
>> faults that don't modify the block mapping could just take read-level locks,
>> and could all run concurrently.  I/O or faults that modify a block mapping
>> would take a write lock, and serialize with other writers and readers.
>
> Well, this would be difficult to implement inside the radix tree (not
> enough bits in the entry) so you'd have to go for some external locking
> primitive anyway. And if you do that, read-write range lock Davidlohr has
> implemented is what you describe - well we could also have a radix tree
> with rwsems but I suspect the overhead of maintaining that would be too
> large. It would require larger rewrite than reusing entry locks as I
> suggest above though and it isn't an obvious performance win for realistic
> workloads either so I'd like to see some performance numbers before going
> that way. It likely improves a situation where processes race to fault the
> same page for which we already know the block mapping but I'm not sure if
> that translates to any measurable performance wins for workloads on DAX
> filesystem.

I'm also concerned about inventing new / fancy radix infrastructure
when we're already in the space of needing struct page for any
non-trivial usage of dax. As Kirill's transparent-huge-page page cache
implementation matures I'd be interested in looking at a transition
path away from radix locking towards something that it shared with the
common case page cache locking.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Alexey Kuznetsov <kuznet@virtuozzo.com>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	Anna Schumaker <anna.schumaker@netapp.com>,
	Christoph Hellwig <hch@lst.de>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Eric Van Hensbergen <ericvh@gmail.com>,
	Jens Axboe <axboe@kernel.dk>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Latchesar Ionkov <lucho@ionkov.net>,
	linux-cifs@vger.kernel.org,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	linux-nfs@vger.kernel.org,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	Matthew Wilcox <mawilcox@microsoft.com>,
	Ron Minnich <rminnich@sandia.gov>,
	samba-technical@lists.samba.org, Steve French <sfrench@samba.org>,
	Trond Myklebust <trond.myklebust@primarydata.com>,
	v9fs-developer@lists.sourceforge.net
Subject: Re: [PATCH 2/2] dax: fix data corruption due to stale mmap reads
Date: Mon, 1 May 2017 15:59:10 -0700	[thread overview]
Message-ID: <CAPcyv4jWWW6FUC_4f-FunPBva4TY2bdn7FQb+9nhVvA3zx6DiQ@mail.gmail.com> (raw)
In-Reply-To: <20170427072659.GA29789@quack2.suse.cz>

On Thu, Apr 27, 2017 at 12:26 AM, Jan Kara <jack@suse.cz> wrote:
> On Wed 26-04-17 16:52:36, Ross Zwisler wrote:
>> On Wed, Apr 26, 2017 at 10:52:35AM +0200, Jan Kara wrote:
>> > On Tue 25-04-17 16:59:36, Ross Zwisler wrote:
>> > > On Tue, Apr 25, 2017 at 01:10:43PM +0200, Jan Kara wrote:
>> > > <>
>> > > > Hum, but now thinking more about it I have hard time figuring out why write
>> > > > vs fault cannot actually still race:
>> > > >
>> > > > CPU1 - write(2)                         CPU2 - read fault
>> > > >
>> > > >                                         dax_iomap_pte_fault()
>> > > >                                           ->iomap_begin() - sees hole
>> > > > dax_iomap_rw()
>> > > >   iomap_apply()
>> > > >     ->iomap_begin - allocates blocks
>> > > >     dax_iomap_actor()
>> > > >       invalidate_inode_pages2_range()
>> > > >         - there's nothing to invalidate
>> > > >                                           grab_mapping_entry()
>> > > >                                           - we add zero page in the radix
>> > > >                                             tree & map it to page tables
>> > > >
>> > > > Similarly read vs write fault may end up racing in a wrong way and try to
>> > > > replace already existing exceptional entry with a hole page?
>> > >
>> > > Yep, this race seems real to me, too.  This seems very much like the issues
>> > > that exist when a thread is doing direct I/O.  One thread is doing I/O to an
>> > > intermediate buffer (page cache for direct I/O case, zero page for us), and
>> > > the other is going around it directly to media, and they can get out of sync.
>> > >
>> > > IIRC the direct I/O code looked something like:
>> > >
>> > > 1/ invalidate existing mappings
>> > > 2/ do direct I/O to media
>> > > 3/ invalidate mappings again, just in case.  Should be cheap if there weren't
>> > >    any conflicting faults.  This makes sure any new allocations we made are
>> > >    faulted in.
>> >
>> > Yeah, the problem is people generally expect weird behavior when they mix
>> > direct and buffered IO (or let alone mmap) however everyone expects
>> > standard read(2) and write(2) to be completely coherent with mmap(2).
>>
>> Yep, fair enough.
>>
>> > > I guess one option would be to replicate that logic in the DAX I/O path, or we
>> > > could try and enhance our locking so page faults can't race with I/O since
>> > > both can allocate blocks.
>> >
>> > In the abstract way, the problem is that we have radix tree (and page
>> > tables) cache block mapping information and the operation: "read block
>> > mapping information, store it in the radix tree" is not serialized in any
>> > way against other block allocations so the information we store can be out
>> > of date by the time we store it.
>> >
>> > One way to solve this would be to move ->iomap_begin call in the fault
>> > paths under entry lock although that would mean I have to redo how ext4
>> > handles DAX faults because with current code it would create lock inversion
>> > wrt transaction start.
>>
>> I don't think this alone is enough to save us.  The I/O path doesn't currently
>> take any DAX radix tree entry locks, so our race would just become:
>>
>> CPU1 - write(2)                               CPU2 - read fault
>>
>>                                       dax_iomap_pte_fault()
>>                                         grab_mapping_entry() // newly moved
>>                                         ->iomap_begin() - sees hole
>> dax_iomap_rw()
>>   iomap_apply()
>>     ->iomap_begin - allocates blocks
>>     dax_iomap_actor()
>>       invalidate_inode_pages2_range()
>>         - there's nothing to invalidate
>>                                         - we add zero page in the radix
>>                                           tree & map it to page tables
>>
>> In their current form I don't think we want to take DAX radix tree entry locks
>> in the I/O path because that would effectively serialize I/O over a given
>> radix tree entry. For a 2MiB entry, for example, all I/O to that 2MiB range
>> would be serialized.
>
> Note that invalidate_inode_pages2_range() will see the entry created by
> grab_mapping_entry() on CPU2 and block waiting for its lock and this is
> exactly what stops the race. The invalidate_inode_pages2_range()
> effectively makes sure there isn't any page fault in progress for given
> range...
>
> Also note that writes to a file are serialized by i_rwsem anyway (and at
> least serialization of writes to the overlapping range is required by POSIX)
> so this doesn't add any more serialization than we already have.
>
>> > Another solution would be to grab i_mmap_sem for write when doing write
>> > fault of a page and similarly have it grabbed for writing when doing
>> > write(2). This would scale rather poorly but if we later replaced it with a
>> > range lock (Davidlohr has already posted a nice implementation of it) it
>> > won't be as bad. But I guess option 1) is better...
>>
>> The best idea I had for handling this sounds similar, which would be to
>> convert the radix tree locks to essentially be reader/writer locks.  I/O and
>> faults that don't modify the block mapping could just take read-level locks,
>> and could all run concurrently.  I/O or faults that modify a block mapping
>> would take a write lock, and serialize with other writers and readers.
>
> Well, this would be difficult to implement inside the radix tree (not
> enough bits in the entry) so you'd have to go for some external locking
> primitive anyway. And if you do that, read-write range lock Davidlohr has
> implemented is what you describe - well we could also have a radix tree
> with rwsems but I suspect the overhead of maintaining that would be too
> large. It would require larger rewrite than reusing entry locks as I
> suggest above though and it isn't an obvious performance win for realistic
> workloads either so I'd like to see some performance numbers before going
> that way. It likely improves a situation where processes race to fault the
> same page for which we already know the block mapping but I'm not sure if
> that translates to any measurable performance wins for workloads on DAX
> filesystem.

I'm also concerned about inventing new / fancy radix infrastructure
when we're already in the space of needing struct page for any
non-trivial usage of dax. As Kirill's transparent-huge-page page cache
implementation matures I'd be interested in looking at a transition
path away from radix locking towards something that it shared with the
common case page cache locking.

  parent reply	other threads:[~2017-05-01 22:59 UTC|newest]

Thread overview: 144+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-14 14:07 [PATCH 0/4] Properly invalidate data in the cleancache Andrey Ryabinin
2017-04-14 14:07 ` Andrey Ryabinin
2017-04-14 14:07 ` Andrey Ryabinin
2017-04-14 14:07 ` Andrey Ryabinin
2017-04-14 14:07 ` [PATCH 1/4] fs: fix data invalidation in the cleancache during direct IO Andrey Ryabinin
2017-04-14 14:07   ` Andrey Ryabinin
2017-04-14 14:07   ` Andrey Ryabinin
2017-04-18 19:38   ` Ross Zwisler
2017-04-18 19:38     ` Ross Zwisler
2017-04-18 19:38     ` Ross Zwisler
2017-04-19 15:11     ` Andrey Ryabinin
2017-04-19 15:11       ` Andrey Ryabinin
2017-04-19 15:11       ` Andrey Ryabinin
2017-04-19 19:28       ` Ross Zwisler
2017-04-19 19:28         ` Ross Zwisler
     [not found]         ` <20170419192836.GA6364-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2017-04-20 14:35           ` Jan Kara
2017-04-20 14:35             ` Jan Kara
2017-04-20 14:35             ` Jan Kara
     [not found]             ` <20170420143510.GF22135-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2017-04-20 14:44               ` Jan Kara
2017-04-20 14:44                 ` Jan Kara
2017-04-20 14:44                 ` Jan Kara
2017-04-20 19:14                 ` Ross Zwisler
2017-04-20 19:14                   ` Ross Zwisler
2017-04-21  3:44                   ` [PATCH 1/2] dax: prevent invalidation of mapped DAX entries Ross Zwisler
2017-04-21  3:44                     ` Ross Zwisler
2017-04-21  3:44                     ` Ross Zwisler
2017-04-21  3:44                     ` Ross Zwisler
2017-04-21  3:44                     ` Ross Zwisler
2017-04-21  3:44                     ` [PATCH 2/2] dax: fix data corruption due to stale mmap reads Ross Zwisler
2017-04-21  3:44                       ` Ross Zwisler
2017-04-21  3:44                       ` Ross Zwisler
2017-04-21  3:44                       ` Ross Zwisler
2017-04-21  3:44                       ` Ross Zwisler
2017-04-25 11:10                       ` Jan Kara
2017-04-25 11:10                         ` Jan Kara
2017-04-25 11:10                         ` Jan Kara
2017-04-25 11:10                         ` Jan Kara
2017-04-25 11:10                         ` Jan Kara
2017-04-25 22:59                         ` Ross Zwisler
2017-04-25 22:59                           ` Ross Zwisler
2017-04-25 22:59                           ` Ross Zwisler
2017-04-25 22:59                           ` Ross Zwisler
2017-04-25 22:59                           ` Ross Zwisler
2017-04-26  8:52                           ` Jan Kara
2017-04-26  8:52                             ` Jan Kara
2017-04-26  8:52                             ` Jan Kara
2017-04-26  8:52                             ` Jan Kara
2017-04-26  8:52                             ` Jan Kara
2017-04-26 22:52                             ` Ross Zwisler
2017-04-26 22:52                               ` Ross Zwisler
2017-04-26 22:52                               ` Ross Zwisler
2017-04-26 22:52                               ` Ross Zwisler
2017-04-26 22:52                               ` Ross Zwisler
2017-04-27  7:26                               ` Jan Kara
2017-04-27  7:26                                 ` Jan Kara
2017-04-27  7:26                                 ` Jan Kara
2017-04-27  7:26                                 ` Jan Kara
2017-04-27  7:26                                 ` Jan Kara
2017-05-01 22:38                                 ` Ross Zwisler
2017-05-01 22:38                                   ` Ross Zwisler
2017-05-01 22:38                                   ` Ross Zwisler
2017-05-01 22:38                                   ` Ross Zwisler
2017-05-01 22:38                                   ` Ross Zwisler
2017-05-04  9:12                                   ` Jan Kara
2017-05-04  9:12                                     ` Jan Kara
2017-05-04  9:12                                     ` Jan Kara
2017-05-04  9:12                                     ` Jan Kara
2017-05-01 22:59                                 ` Dan Williams [this message]
2017-05-01 22:59                                   ` Dan Williams
2017-05-01 22:59                                   ` Dan Williams
2017-05-01 22:59                                   ` Dan Williams
2017-05-01 22:59                                   ` Dan Williams
2017-04-24 17:49                     ` [PATCH 1/2] xfs: fix incorrect argument count check Ross Zwisler
2017-04-24 17:49                       ` Ross Zwisler
2017-04-24 17:49                       ` Ross Zwisler
2017-04-24 17:49                       ` [PATCH 2/2] dax: add regression test for stale mmap reads Ross Zwisler
2017-04-24 17:49                         ` Ross Zwisler
2017-04-24 17:49                         ` Ross Zwisler
2017-04-25 11:27                         ` Eryu Guan
2017-04-25 11:27                           ` Eryu Guan
2017-04-25 11:27                           ` Eryu Guan
2017-04-25 20:39                           ` Ross Zwisler
2017-04-25 20:39                             ` Ross Zwisler
2017-04-25 20:39                             ` Ross Zwisler
2017-04-26  3:42                             ` Eryu Guan
2017-04-26  3:42                               ` Eryu Guan
2017-04-26  3:42                               ` Eryu Guan
2017-04-25 10:10                     ` [PATCH 1/2] dax: prevent invalidation of mapped DAX entries Jan Kara
2017-04-25 10:10                       ` Jan Kara
2017-04-25 10:10                       ` Jan Kara
2017-04-25 10:10                       ` Jan Kara
2017-04-25 10:10                       ` Jan Kara
2017-05-01 16:54                       ` Ross Zwisler
2017-05-01 16:54                         ` Ross Zwisler
2017-05-01 16:54                         ` Ross Zwisler
2017-05-01 16:54                         ` Ross Zwisler
2017-05-01 16:54                         ` Ross Zwisler
     [not found]   ` <20170414140753.16108-2-aryabinin-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2017-04-18 22:46     ` [PATCH 1/4] fs: fix data invalidation in the cleancache during direct IO Andrew Morton
2017-04-18 22:46       ` Andrew Morton
2017-04-18 22:46       ` Andrew Morton
2017-04-18 22:46       ` Andrew Morton
2017-04-19 15:15       ` Andrey Ryabinin
2017-04-19 15:15         ` Andrey Ryabinin
2017-04-19 15:15         ` Andrey Ryabinin
2017-04-14 14:07 ` [PATCH 2/4] fs/block_dev: always invalidate cleancache in invalidate_bdev() Andrey Ryabinin
2017-04-14 14:07   ` Andrey Ryabinin
2017-04-14 14:07   ` Andrey Ryabinin
2017-04-18 18:51   ` Nikolay Borisov
2017-04-18 18:51     ` Nikolay Borisov
2017-04-19 13:22     ` Andrey Ryabinin
2017-04-19 13:22       ` Andrey Ryabinin
2017-04-19 13:22       ` Andrey Ryabinin
2017-04-14 14:07 ` [PATCH 3/4] mm/truncate: bail out early from invalidate_inode_pages2_range() if mapping is empty Andrey Ryabinin
2017-04-14 14:07   ` Andrey Ryabinin
2017-04-14 14:07   ` Andrey Ryabinin
2017-04-14 14:07 ` [PATCH 4/4] mm/truncate: avoid pointless cleancache_invalidate_inode() calls Andrey Ryabinin
2017-04-14 14:07   ` Andrey Ryabinin
2017-04-14 14:07   ` Andrey Ryabinin
     [not found] ` <20170414140753.16108-1-aryabinin-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2017-04-18 15:24   ` [PATCH 0/4] Properly invalidate data in the cleancache Konrad Rzeszutek Wilk
2017-04-18 15:24     ` Konrad Rzeszutek Wilk
2017-04-18 15:24     ` Konrad Rzeszutek Wilk
2017-04-24 16:41 ` [PATCH v2 " Andrey Ryabinin
2017-04-24 16:41   ` Andrey Ryabinin
2017-04-24 16:41   ` Andrey Ryabinin
2017-04-24 16:41   ` [PATCH v2 1/4] fs: fix data invalidation in the cleancache during direct IO Andrey Ryabinin
2017-04-24 16:41     ` Andrey Ryabinin
2017-04-24 16:41     ` Andrey Ryabinin
2017-04-25  8:25     ` Jan Kara
2017-04-25  8:25       ` Jan Kara
2017-04-24 16:41   ` [PATCH v2 2/4] fs/block_dev: always invalidate cleancache in invalidate_bdev() Andrey Ryabinin
2017-04-24 16:41     ` Andrey Ryabinin
2017-04-24 16:41     ` Andrey Ryabinin
2017-04-25  8:34     ` Jan Kara
2017-04-25  8:34       ` Jan Kara
2017-04-24 16:41   ` [PATCH v2 3/4] mm/truncate: bail out early from invalidate_inode_pages2_range() if mapping is empty Andrey Ryabinin
2017-04-24 16:41     ` Andrey Ryabinin
2017-04-24 16:41     ` Andrey Ryabinin
2017-04-25  8:37     ` Jan Kara
2017-04-25  8:37       ` Jan Kara
2017-04-24 16:41   ` [PATCH v2 4/4] mm/truncate: avoid pointless cleancache_invalidate_inode() calls Andrey Ryabinin
2017-04-24 16:41     ` Andrey Ryabinin
2017-04-24 16:41     ` Andrey Ryabinin
2017-04-25  8:41     ` Jan Kara
2017-04-25  8:41       ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4jWWW6FUC_4f-FunPBva4TY2bdn7FQb+9nhVvA3zx6DiQ@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=anna.schumaker@netapp.com \
    --cc=aryabinin@virtuozzo.com \
    --cc=axboe@kernel.dk \
    --cc=darrick.wong@oracle.com \
    --cc=ericvh@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=kuznet@virtuozzo.com \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=lucho@ionkov.net \
    --cc=mawilcox@microsoft.com \
    --cc=rminnich@sandia.gov \
    --cc=samba-technical@lists.samba.org \
    --cc=sfrench@samba.org \
    --cc=trond.myklebust@primarydata.com \
    --cc=v9fs-developer@lists.sourceforge.net \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.