All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: don't invalidate whole file on DAX read/write
@ 2016-08-02 23:40 Dave Chinner
  2016-08-03  9:25 ` Christoph Hellwig
  2016-08-03 15:34 ` Jan Kara
  0 siblings, 2 replies; 4+ messages in thread
From: Dave Chinner @ 2016-08-02 23:40 UTC (permalink / raw)
  To: xfs; +Cc: jack

From: Dave Chinner <dchinner@redhat.com>

When we do DAX IO, we try to invalidate the entire page cache held
on the file. This is incorrect as it will trash the entire mapping
tree that now tracks dirty state in exceptional entries in the radix
tree slots.

What we are trying to do is remove cached pages (e.g from reads
into holes) that sit in the radix tree over the range we are about
to write to. Hence we should just limit the invalidation to the
range we are about to overwrite.

Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_file.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index ed95e5b..e612a02 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -741,9 +741,20 @@ xfs_file_dax_write(
 	 * page is inserted into the pagecache when we have to serve a write
 	 * fault on a hole.  It should never be dirtied and can simply be
 	 * dropped from the pagecache once we get real data for the page.
+	 *
+	 * XXX: This is racy against mmap, and there's nothing we can do about
+	 * it. dax_do_io() should really do this invalidation internally as
+	 * it will know if we've allocated over a holei for this specific IO and
+	 * if so it needs to update the mapping tree and invalidate existing
+	 * PTEs over the newly allocated range. Remove this invalidation when
+	 * dax_do_io() is fixed up.
 	 */
 	if (mapping->nrpages) {
-		ret = invalidate_inode_pages2(mapping);
+		loff_t end = iocb->ki_pos + iov_iter_count(from) - 1;
+
+		ret = invalidate_inode_pages2_range(mapping,
+						    iocb->ki_pos >> PAGE_SHIFT,
+						    end >> PAGE_SHIFT);
 		WARN_ON_ONCE(ret);
 	}
 
-- 
2.8.0.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] xfs: don't invalidate whole file on DAX read/write
  2016-08-02 23:40 [PATCH] xfs: don't invalidate whole file on DAX read/write Dave Chinner
@ 2016-08-03  9:25 ` Christoph Hellwig
  2016-08-03 15:34 ` Jan Kara
  1 sibling, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2016-08-03  9:25 UTC (permalink / raw)
  To: Dave Chinner; +Cc: jack, xfs

On Wed, Aug 03, 2016 at 09:40:26AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When we do DAX IO, we try to invalidate the entire page cache held
> on the file. This is incorrect as it will trash the entire mapping
> tree that now tracks dirty state in exceptional entries in the radix
> tree slots.
> 
> What we are trying to do is remove cached pages (e.g from reads
> into holes) that sit in the radix tree over the range we are about
> to write to. Hence we should just limit the invalidation to the
> range we are about to overwrite.

Looks fine (for a broad defintion of "fine"):


Reviewed-by: Christoph Hellwig <hch@lst.de>

> +	 * XXX: This is racy against mmap, and there's nothing we can do about
> +	 * it. dax_do_io() should really do this invalidation internally as
> +	 * it will know if we've allocated over a holei for this specific IO and
> +	 * if so it needs to update the mapping tree and invalidate existing
> +	 * PTEs over the newly allocated range. Remove this invalidation when
> +	 * dax_do_io() is fixed up.

FYI, I've got a basically working version of an iomap based DAX I/O path
(still fails a few corner cases), and I'll see if I can add that to it.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] xfs: don't invalidate whole file on DAX read/write
  2016-08-02 23:40 [PATCH] xfs: don't invalidate whole file on DAX read/write Dave Chinner
  2016-08-03  9:25 ` Christoph Hellwig
@ 2016-08-03 15:34 ` Jan Kara
  2016-08-03 22:32   ` Dave Chinner
  1 sibling, 1 reply; 4+ messages in thread
From: Jan Kara @ 2016-08-03 15:34 UTC (permalink / raw)
  To: Dave Chinner; +Cc: jack, xfs

On Wed 03-08-16 09:40:26, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When we do DAX IO, we try to invalidate the entire page cache held
> on the file. This is incorrect as it will trash the entire mapping
> tree that now tracks dirty state in exceptional entries in the radix
> tree slots.
> 
> What we are trying to do is remove cached pages (e.g from reads
> into holes) that sit in the radix tree over the range we are about
> to write to. Hence we should just limit the invalidation to the
> range we are about to overwrite.

The patch looks good. Just one comment below.

> 
> Reported-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_file.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index ed95e5b..e612a02 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -741,9 +741,20 @@ xfs_file_dax_write(
>  	 * page is inserted into the pagecache when we have to serve a write
>  	 * fault on a hole.  It should never be dirtied and can simply be
>  	 * dropped from the pagecache once we get real data for the page.
> +	 *
> +	 * XXX: This is racy against mmap, and there's nothing we can do about
> +	 * it. dax_do_io() should really do this invalidation internally as
> +	 * it will know if we've allocated over a holei for this specific IO and
> +	 * if so it needs to update the mapping tree and invalidate existing
> +	 * PTEs over the newly allocated range. Remove this invalidation when
> +	 * dax_do_io() is fixed up.

And would it be OK for XFS if dax_do_io() actually invalidated page cache /
PTEs under just XFS_IOLOCK_SHARED? Because currently you seem to be careful
to call invalidate_inode_pages2() only when holding the lock exclusively
and then demote it to a shared one when calling dax_do_io().

								Honza

>  	 */
>  	if (mapping->nrpages) {
> -		ret = invalidate_inode_pages2(mapping);
> +		loff_t end = iocb->ki_pos + iov_iter_count(from) - 1;
> +
> +		ret = invalidate_inode_pages2_range(mapping,
> +						    iocb->ki_pos >> PAGE_SHIFT,
> +						    end >> PAGE_SHIFT);
>  		WARN_ON_ONCE(ret);
>  	}
>  
> -- 
> 2.8.0.rc3
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] xfs: don't invalidate whole file on DAX read/write
  2016-08-03 15:34 ` Jan Kara
@ 2016-08-03 22:32   ` Dave Chinner
  0 siblings, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2016-08-03 22:32 UTC (permalink / raw)
  To: Jan Kara; +Cc: xfs

On Wed, Aug 03, 2016 at 05:34:37PM +0200, Jan Kara wrote:
> On Wed 03-08-16 09:40:26, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When we do DAX IO, we try to invalidate the entire page cache held
> > on the file. This is incorrect as it will trash the entire mapping
> > tree that now tracks dirty state in exceptional entries in the radix
> > tree slots.
> > 
> > What we are trying to do is remove cached pages (e.g from reads
> > into holes) that sit in the radix tree over the range we are about
> > to write to. Hence we should just limit the invalidation to the
> > range we are about to overwrite.
> 
> The patch looks good. Just one comment below.
> 
> > 
> > Reported-by: Jan Kara <jack@suse.cz>
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_file.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index ed95e5b..e612a02 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -741,9 +741,20 @@ xfs_file_dax_write(
> >  	 * page is inserted into the pagecache when we have to serve a write
> >  	 * fault on a hole.  It should never be dirtied and can simply be
> >  	 * dropped from the pagecache once we get real data for the page.
> > +	 *
> > +	 * XXX: This is racy against mmap, and there's nothing we can do about
> > +	 * it. dax_do_io() should really do this invalidation internally as
> > +	 * it will know if we've allocated over a holei for this specific IO and
> > +	 * if so it needs to update the mapping tree and invalidate existing
> > +	 * PTEs over the newly allocated range. Remove this invalidation when
> > +	 * dax_do_io() is fixed up.
> 
> And would it be OK for XFS if dax_do_io() actually invalidated page cache /
> PTEs under just XFS_IOLOCK_SHARED? Because currently you seem to be careful
> to call invalidate_inode_pages2() only when holding the lock exclusively
> and then demote it to a shared one when calling dax_do_io().

That really only exists to prevent multiple IOs trying to do
invalidation at once. In the direct IO code, we don't want multiple
page cache flushers running at once - one is enough - so we
serialise on that state knowing that once the invalidation is done
the remaining EXCL lock waiters will pass straight through.

For DAX, I don't think that's a problem - the invalidation is
ranged, and it's unlikely there will be overlaps, and mapping/pte
invalidation is done under fine grained locks so we don't have to
worry about races there, either. So it seems fine to me to do this
under a SHARED lock. It will still serialise against truncate and
other extent manipulation operations, and that's mainly what we care
about here.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-08-03 22:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-02 23:40 [PATCH] xfs: don't invalidate whole file on DAX read/write Dave Chinner
2016-08-03  9:25 ` Christoph Hellwig
2016-08-03 15:34 ` Jan Kara
2016-08-03 22:32   ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.