All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/4] xfs: force writes to delalloc regions to unwritten
Date: Thu, 21 Mar 2019 08:10:19 +1100	[thread overview]
Message-ID: <20190320211019.GS23020@dastard> (raw)
In-Reply-To: <20190320120204.GA28268@bfoster>

On Wed, Mar 20, 2019 at 08:02:05AM -0400, Brian Foster wrote:
> On Wed, Mar 20, 2019 at 07:25:37AM +1100, Dave Chinner wrote:
> > > > The broader point is that by controlling writeback ordering, we can
> > > > explicitly demonstrate stale data exposure. If fixed properly, we should
> > > > never expect stale data exposure even in the event of out of order
> > > > writeback completion, regardless of the cause.
> > > 
> > > Agree.  Even if s_f_r is a giant trash fire, if we're exposing stale
> > > disk contents it's game over anyway and we ought to fix it.
> > 
> > Well, that's exactly the problem with SFR, isn't it? The dirty data
> > is outside the range the user asks to sync and the implementation
> > doesn't go anywhere near the filesystem so we can't actually do the
> > right thing here. So we are left to either hack fixes around a
> > shitty, broken interface and everyone pays the penalty, or we ignore
> > it. We've simply ignored it for the past 10+ years because it really
> > can't be used safely for it's intended purposes (as everyone who
> > tries to use it finds out eventually)...
> > 
> 
> Just to make this abundantly clear, there is nothing unique, magic or
> special required of sync_file_range() to reproduce this stale data
> exposure. See the following variation of a command sequence from the
> fstest I posted the other day:
> 
> ...
> # xfs_io -fc "pwrite 0 68k" -c fsync -c "pwrite 96k 4k" \
> 	-c "fpunch 96k 4k" <file>
> ...
> # hexdump <file>
> 0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
> *
> 0011000 abab abab abab abab abab abab abab abab
> *
> 0018000 0000 0000 0000 0000 0000 0000 0000 0000
> *
> 0019000
> 
> Same problem, and it can be reproduced just as easily with a reflink or
> even an overlapping direct I/O.

How does overlapping direct IO cause stale data exposure given that
it uses unwritten extent allocation? That, of course, is ignoring
the fact that the result of overlapping concurent direct IO is, by
definition, undefined and an application bug if it occurs....

> I haven't confirmed, but I suspect with
> enough effort background writeback is susceptible to this problem as
> well.

Well, the similarity here is with the NULL files problems and using
XFS_ITRUNCATE case to avoid NULL files after truncating them down.
i.e.  if that flag is set, we do extra flushing where appropriate to
ensure ithe data at risk of being lost hits the disk before the loss
mechanism can be triggered.

i.e. this is the mirror equivalent of XFS_ITRUNCATE - it is the
truncate-up flushing case in that we need to wait for the zeroing to
hit the disk before we change the file size.

Eseentially, what this says to me is we need to track when we
extended the file via a write beyond EOF via an inode flag. Then if we need to
flush a data range from the file for integrity reasons (such as a
fpnuch) and this flag is set, we ignore the "start" parameter of the
range and flush from the start of the file instead. i.e. we
guarantee that any dirty zeroed pages between the start of the file
and the end of the range we are operating on gets written back. This
will capture all these "didn't flush regions zeroed on file
extension" problems in one go, without needing to change the way we
do allocation or EOF zeroing.

I suspect we could just use the XFS_ITRUNCATE flag for this - set it
in the EOF zeroing code, check it in xfs_flush_unmap_range() and
other places that do range based flushing....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2019-03-20 21:10 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-14 21:28 [PATCH 0/4] xfs: various rough fixes Darrick J. Wong
2019-03-14 21:29 ` [PATCH 1/4] xfs: only free posteof blocks on first close Darrick J. Wong
2019-03-15  3:42   ` Dave Chinner
2019-03-14 21:29 ` [PATCH 2/4] xfs: force writes to delalloc regions to unwritten Darrick J. Wong
2019-03-14 23:08   ` Dave Chinner
2019-03-15  3:13     ` Darrick J. Wong
2019-03-15  3:40       ` Dave Chinner
2019-03-15 12:29         ` Brian Foster
2019-03-17 22:40           ` Dave Chinner
2019-03-18 12:40             ` Brian Foster
2019-03-18 20:35               ` Dave Chinner
2019-03-18 21:50                 ` Brian Foster
2019-03-19 18:02                   ` Darrick J. Wong
2019-03-19 20:25                     ` Dave Chinner
2019-03-20 12:02                       ` Brian Foster
2019-03-20 21:10                         ` Dave Chinner [this message]
2019-03-21 12:27                           ` Brian Foster
2019-03-22  1:52                             ` Dave Chinner
2019-03-22 12:46                               ` Brian Foster
2019-04-03 22:38                                 ` Darrick J. Wong
2019-04-04 12:50                                   ` Brian Foster
2019-04-04 15:22                                     ` Darrick J. Wong
2019-04-04 16:16                                       ` Brian Foster
2019-04-04 22:08                                     ` Dave Chinner
2019-04-05 11:24                                       ` Brian Foster
2019-04-05 15:12                                         ` Darrick J. Wong
2019-04-08  0:17                                           ` Dave Chinner
2019-04-08 12:02                                             ` Brian Foster
2019-03-14 21:29 ` [PATCH 3/4] xfs: trace transaction reservation logcount overflow Darrick J. Wong
2019-03-15 12:30   ` Brian Foster
2019-03-14 21:29 ` [PATCH 4/4] xfs: avoid reflink end cow deadlock Darrick J. Wong
2019-03-15 12:31   ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190320211019.GS23020@dastard \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.