All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] iomap: report collisions between directio and buffered writes to userspace
@ 2017-11-14 21:46 Darrick J. Wong
  2017-11-15 12:12 ` Brian Foster
  2017-11-15 13:16 ` Holger Hoffstätte
  0 siblings, 2 replies; 7+ messages in thread
From: Darrick J. Wong @ 2017-11-14 21:46 UTC (permalink / raw)
  To: xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

If two programs simultaneously try to write to the same part of a file
via direct IO and buffered IO, there's a chance that the post-diowrite
pagecache invalidation will fail on the dirty page.  When this happens,
the dio write succeeded, which means that the page cache is no longer
coherent with the disk!  Programs are not supposed to mix IO types and
this is a clear case of data corruption, so store an EIO which will be
reflected to userspace during the next fsync.  Get rid of the WARN_ON
to assuage the fuzz-tester complaints.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/iomap.c |   19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index d4801f8..61b2eca 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -710,6 +710,13 @@ struct iomap_dio {
 	};
 };
 
+static void iomap_warn_stale_pagecache(struct inode *inode)
+{
+	errseq_set(&inode->i_mapping->wb_err, -EIO);
+	pr_crit_ratelimited("Stale pagecache contents after collision "
+			    "between direct and buffered write!\n");
+}
+
 static ssize_t iomap_dio_complete(struct iomap_dio *dio)
 {
 	struct kiocb *iocb = dio->iocb;
@@ -752,7 +759,8 @@ static ssize_t iomap_dio_complete(struct iomap_dio *dio)
 		err = invalidate_inode_pages2_range(inode->i_mapping,
 				offset >> PAGE_SHIFT,
 				(offset + dio->size - 1) >> PAGE_SHIFT);
-		WARN_ON_ONCE(err);
+		if (err)
+			iomap_warn_stale_pagecache(inode);
 	}
 
 	inode_dio_end(file_inode(iocb->ki_filp));
@@ -1011,9 +1019,16 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	if (ret)
 		goto out_free_dio;
 
+	/*
+	 * Try to invalidate cache pages for the range we're direct
+	 * writing.  If this invalidation fails, tough, the write will
+	 * still work, but racing two incompatible write paths is a
+	 * pretty crazy thing to do, so we don't support it 100%.
+	 */
 	ret = invalidate_inode_pages2_range(mapping,
 			start >> PAGE_SHIFT, end >> PAGE_SHIFT);
-	WARN_ON_ONCE(ret);
+	if (ret)
+		iomap_warn_stale_pagecache(inode);
 	ret = 0;
 
 	if (iov_iter_rw(iter) == WRITE && !is_sync_kiocb(iocb) &&

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] iomap: report collisions between directio and buffered writes to userspace
  2017-11-14 21:46 [RFC PATCH] iomap: report collisions between directio and buffered writes to userspace Darrick J. Wong
@ 2017-11-15 12:12 ` Brian Foster
  2017-11-15 18:46   ` Darrick J. Wong
  2017-11-15 13:16 ` Holger Hoffstätte
  1 sibling, 1 reply; 7+ messages in thread
From: Brian Foster @ 2017-11-15 12:12 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

On Tue, Nov 14, 2017 at 01:46:25PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> If two programs simultaneously try to write to the same part of a file
> via direct IO and buffered IO, there's a chance that the post-diowrite
> pagecache invalidation will fail on the dirty page.  When this happens,
> the dio write succeeded, which means that the page cache is no longer
> coherent with the disk!  Programs are not supposed to mix IO types and
> this is a clear case of data corruption, so store an EIO which will be
> reflected to userspace during the next fsync.  Get rid of the WARN_ON
> to assuage the fuzz-tester complaints.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/iomap.c |   19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/iomap.c b/fs/iomap.c
> index d4801f8..61b2eca 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -710,6 +710,13 @@ struct iomap_dio {
>  	};
>  };
>  
> +static void iomap_warn_stale_pagecache(struct inode *inode)
> +{
> +	errseq_set(&inode->i_mapping->wb_err, -EIO);
> +	pr_crit_ratelimited("Stale pagecache contents after collision "
> +			    "between direct and buffered write!\n");
> +}

Is stale pagecache always necessarily the end result of the race? For
example, is it possible that the page is under writeback and is about to
overwrite the range just written by the dio? Or what about one of those
weird cases where we check for whether the page mapping has changed down
in the invalidate code? I'm wondering if it's appropriate to set an
error if any such other cases are possible.

As a nit, I guess I'd just prefer a bit more generic of a warning
message. E.g., something like:

"Cache invalidation failure on direct I/O. Possible data corruption due
to collision with buffered I/O!"

... but feel free to rephrase that however. Otherwise that bit seems
reasonable enough to me.

Brian

> +
>  static ssize_t iomap_dio_complete(struct iomap_dio *dio)
>  {
>  	struct kiocb *iocb = dio->iocb;
> @@ -752,7 +759,8 @@ static ssize_t iomap_dio_complete(struct iomap_dio *dio)
>  		err = invalidate_inode_pages2_range(inode->i_mapping,
>  				offset >> PAGE_SHIFT,
>  				(offset + dio->size - 1) >> PAGE_SHIFT);
> -		WARN_ON_ONCE(err);
> +		if (err)
> +			iomap_warn_stale_pagecache(inode);
>  	}
>  
>  	inode_dio_end(file_inode(iocb->ki_filp));
> @@ -1011,9 +1019,16 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  	if (ret)
>  		goto out_free_dio;
>  
> +	/*
> +	 * Try to invalidate cache pages for the range we're direct
> +	 * writing.  If this invalidation fails, tough, the write will
> +	 * still work, but racing two incompatible write paths is a
> +	 * pretty crazy thing to do, so we don't support it 100%.
> +	 */
>  	ret = invalidate_inode_pages2_range(mapping,
>  			start >> PAGE_SHIFT, end >> PAGE_SHIFT);
> -	WARN_ON_ONCE(ret);
> +	if (ret)
> +		iomap_warn_stale_pagecache(inode);
>  	ret = 0;
>  
>  	if (iov_iter_rw(iter) == WRITE && !is_sync_kiocb(iocb) &&
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] iomap: report collisions between directio and buffered writes to userspace
  2017-11-14 21:46 [RFC PATCH] iomap: report collisions between directio and buffered writes to userspace Darrick J. Wong
  2017-11-15 12:12 ` Brian Foster
@ 2017-11-15 13:16 ` Holger Hoffstätte
  2017-11-15 18:54   ` Darrick J. Wong
  1 sibling, 1 reply; 7+ messages in thread
From: Holger Hoffstätte @ 2017-11-15 13:16 UTC (permalink / raw)
  To: Darrick J. Wong, xfs

On 11/14/17 22:46, Darrick J. Wong wrote:
(snip)
> +static void iomap_warn_stale_pagecache(struct inode *inode)
> +{
> +	errseq_set(&inode->i_mapping->wb_err, -EIO);
> +	pr_crit_ratelimited("Stale pagecache contents after collision "
> +			    "between direct and buffered write!\n");
> +}

In this form the error message is IMHO useless since it tells me
neither the file in question nor the misbehaving application.
"Something went wrong somewhere" is not actionable information
and in practice will only be ignored.

Since you already have the inode in question at hand, print at least
the full path + filename so that it's clear where things are going
wrong. Usually that will let people deduce which application is
misbehaving.

-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] iomap: report collisions between directio and buffered writes to userspace
  2017-11-15 12:12 ` Brian Foster
@ 2017-11-15 18:46   ` Darrick J. Wong
  0 siblings, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2017-11-15 18:46 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Wed, Nov 15, 2017 at 07:12:28AM -0500, Brian Foster wrote:
> On Tue, Nov 14, 2017 at 01:46:25PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > If two programs simultaneously try to write to the same part of a file
> > via direct IO and buffered IO, there's a chance that the post-diowrite
> > pagecache invalidation will fail on the dirty page.  When this happens,
> > the dio write succeeded, which means that the page cache is no longer
> > coherent with the disk!  Programs are not supposed to mix IO types and
> > this is a clear case of data corruption, so store an EIO which will be
> > reflected to userspace during the next fsync.  Get rid of the WARN_ON
> > to assuage the fuzz-tester complaints.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/iomap.c |   19 +++++++++++++++++--
> >  1 file changed, 17 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/iomap.c b/fs/iomap.c
> > index d4801f8..61b2eca 100644
> > --- a/fs/iomap.c
> > +++ b/fs/iomap.c
> > @@ -710,6 +710,13 @@ struct iomap_dio {
> >  	};
> >  };
> >  
> > +static void iomap_warn_stale_pagecache(struct inode *inode)
> > +{
> > +	errseq_set(&inode->i_mapping->wb_err, -EIO);
> > +	pr_crit_ratelimited("Stale pagecache contents after collision "
> > +			    "between direct and buffered write!\n");
> > +}
> 
> Is stale pagecache always necessarily the end result of the race? For
> example, is it possible that the page is under writeback and is about to
> overwrite the range just written by the dio? Or what about one of those
> weird cases where we check for whether the page mapping has changed down
> in the invalidate code? I'm wondering if it's appropriate to set an
> error if any such other cases are possible.
> 
> As a nit, I guess I'd just prefer a bit more generic of a warning
> message. E.g., something like:
> 
> "Cache invalidation failure on direct I/O. Possible data corruption due
> to collision with buffered I/O!"
> 
> ... but feel free to rephrase that however. Otherwise that bit seems
> reasonable enough to me.

Sure, that seems like a more accurate description of what's going on anyway.

--D

> Brian
> 
> > +
> >  static ssize_t iomap_dio_complete(struct iomap_dio *dio)
> >  {
> >  	struct kiocb *iocb = dio->iocb;
> > @@ -752,7 +759,8 @@ static ssize_t iomap_dio_complete(struct iomap_dio *dio)
> >  		err = invalidate_inode_pages2_range(inode->i_mapping,
> >  				offset >> PAGE_SHIFT,
> >  				(offset + dio->size - 1) >> PAGE_SHIFT);
> > -		WARN_ON_ONCE(err);
> > +		if (err)
> > +			iomap_warn_stale_pagecache(inode);
> >  	}
> >  
> >  	inode_dio_end(file_inode(iocb->ki_filp));
> > @@ -1011,9 +1019,16 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> >  	if (ret)
> >  		goto out_free_dio;
> >  
> > +	/*
> > +	 * Try to invalidate cache pages for the range we're direct
> > +	 * writing.  If this invalidation fails, tough, the write will
> > +	 * still work, but racing two incompatible write paths is a
> > +	 * pretty crazy thing to do, so we don't support it 100%.
> > +	 */
> >  	ret = invalidate_inode_pages2_range(mapping,
> >  			start >> PAGE_SHIFT, end >> PAGE_SHIFT);
> > -	WARN_ON_ONCE(ret);
> > +	if (ret)
> > +		iomap_warn_stale_pagecache(inode);
> >  	ret = 0;
> >  
> >  	if (iov_iter_rw(iter) == WRITE && !is_sync_kiocb(iocb) &&
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] iomap: report collisions between directio and buffered writes to userspace
  2017-11-15 13:16 ` Holger Hoffstätte
@ 2017-11-15 18:54   ` Darrick J. Wong
  2017-11-15 19:35     ` Holger Hoffstätte
  0 siblings, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2017-11-15 18:54 UTC (permalink / raw)
  To: Holger Hoffstätte; +Cc: xfs

On Wed, Nov 15, 2017 at 02:16:01PM +0100, Holger Hoffstätte wrote:
> On 11/14/17 22:46, Darrick J. Wong wrote:
> (snip)
> > +static void iomap_warn_stale_pagecache(struct inode *inode)
> > +{
> > +	errseq_set(&inode->i_mapping->wb_err, -EIO);
> > +	pr_crit_ratelimited("Stale pagecache contents after collision "
> > +			    "between direct and buffered write!\n");
> > +}
> 
> In this form the error message is IMHO useless since it tells me
> neither the file in question nor the misbehaving application.
> "Something went wrong somewhere" is not actionable information
> and in practice will only be ignored.
> 
> Since you already have the inode in question at hand, print at least
> the full path + filename so that it's clear where things are going
> wrong. Usually that will let people deduce which application is
> misbehaving.

The whole point of the errseq_set call in this patch is to record the
write collision so that all the writers of this file will receive an EIO
the next time they try to flush the file.  You can pinpoint exactly
which fd(s) in which application(s) caused the problem.  The old dmesg
spew only captured which program issued the dio write.

(And the whole point of this patch is to see what people think about
that change of behavior w.r.t. us no longer letting userspace silently
corrupt the file...)

--D

> -h
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] iomap: report collisions between directio and buffered writes to userspace
  2017-11-15 18:54   ` Darrick J. Wong
@ 2017-11-15 19:35     ` Holger Hoffstätte
  2017-11-15 20:53       ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Holger Hoffstätte @ 2017-11-15 19:35 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

On 11/15/17 19:54, Darrick J. Wong wrote:
> On Wed, Nov 15, 2017 at 02:16:01PM +0100, Holger Hoffstätte wrote:
>> On 11/14/17 22:46, Darrick J. Wong wrote:
>> (snip)
>>> +static void iomap_warn_stale_pagecache(struct inode *inode)
>>> +{
>>> +	errseq_set(&inode->i_mapping->wb_err, -EIO);
>>> +	pr_crit_ratelimited("Stale pagecache contents after collision "
>>> +			    "between direct and buffered write!\n");
>>> +}
>>
>> In this form the error message is IMHO useless since it tells me
>> neither the file in question nor the misbehaving application.
>> "Something went wrong somewhere" is not actionable information
>> and in practice will only be ignored.
>>
>> Since you already have the inode in question at hand, print at least
>> the full path + filename so that it's clear where things are going
>> wrong. Usually that will let people deduce which application is
>> misbehaving.
> 
> The whole point of the errseq_set call in this patch is to record the
> write collision so that all the writers of this file will receive an EIO
> the next time they try to flush the file.  You can pinpoint exactly
> which fd(s) in which application(s) caused the problem.  The old dmesg
> spew only captured which program issued the dio write.

Then what is the use of printing the message? I'm not arguing against
handling the error, which is of course much better than silent corruption;
I'm asking what the point of the message is because it doesn't tell
anything actionable after the fact. If you really want to rely on
applications handling this condition (which I agree is the right thing
to do!) then there is simply no need for the message; if I found it in
the log one day, I'd have no idea what to do about it. That's all.

> (And the whole point of this patch is to see what people think about
> that change of behavior w.r.t. us no longer letting userspace silently
> corrupt the file...)

No argument there!

-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] iomap: report collisions between directio and buffered writes to userspace
  2017-11-15 19:35     ` Holger Hoffstätte
@ 2017-11-15 20:53       ` Dave Chinner
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Chinner @ 2017-11-15 20:53 UTC (permalink / raw)
  To: Holger Hoffstätte; +Cc: Darrick J. Wong, xfs

On Wed, Nov 15, 2017 at 08:35:57PM +0100, Holger Hoffstätte wrote:
> On 11/15/17 19:54, Darrick J. Wong wrote:
> > On Wed, Nov 15, 2017 at 02:16:01PM +0100, Holger Hoffstätte wrote:
> >> On 11/14/17 22:46, Darrick J. Wong wrote:
> >> (snip)
> >>> +static void iomap_warn_stale_pagecache(struct inode *inode)
> >>> +{
> >>> +	errseq_set(&inode->i_mapping->wb_err, -EIO);
> >>> +	pr_crit_ratelimited("Stale pagecache contents after collision "
> >>> +			    "between direct and buffered write!\n");
> >>> +}
> >>
> >> In this form the error message is IMHO useless since it tells me
> >> neither the file in question nor the misbehaving application.
> >> "Something went wrong somewhere" is not actionable information
> >> and in practice will only be ignored.
> >>
> >> Since you already have the inode in question at hand, print at least
> >> the full path + filename so that it's clear where things are going
> >> wrong. Usually that will let people deduce which application is
> >> misbehaving.
> > 
> > The whole point of the errseq_set call in this patch is to record the
> > write collision so that all the writers of this file will receive an EIO
> > the next time they try to flush the file.  You can pinpoint exactly
> > which fd(s) in which application(s) caused the problem.  The old dmesg
> > spew only captured which program issued the dio write.
> 
> Then what is the use of printing the message? I'm not arguing against
> handling the error, which is of course much better than silent corruption;
> I'm asking what the point of the message is because it doesn't tell
> anything actionable after the fact. If you really want to rely on
> applications handling this condition (which I agree is the right thing
> to do!) then there is simply no need for the message; if I found it in
> the log one day, I'd have no idea what to do about it. That's all.

The message is there for the people that have to triage reports of
spurious EIO errors at the application level, or if the app ignores
the errors, reports of data corruption being detected. The output
lets us know the likely trigger of the problem rather than having to
start searching for phantom data corruption vectors that don't
exist.

If we add the process name to the error message, then we have pretty
much all we need to correlate the "app misbehaving/data corrupted"
report with the cause...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-11-15 20:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-14 21:46 [RFC PATCH] iomap: report collisions between directio and buffered writes to userspace Darrick J. Wong
2017-11-15 12:12 ` Brian Foster
2017-11-15 18:46   ` Darrick J. Wong
2017-11-15 13:16 ` Holger Hoffstätte
2017-11-15 18:54   ` Darrick J. Wong
2017-11-15 19:35     ` Holger Hoffstätte
2017-11-15 20:53       ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.