All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 4/5] xfs: log forces imply data device cache flushes
Date: Thu, 22 Jul 2021 16:13:46 -0700	[thread overview]
Message-ID: <20210722231346.GP559212@magnolia> (raw)
In-Reply-To: <20210722221258.GQ664593@dread.disaster.area>

On Fri, Jul 23, 2021 at 08:12:58AM +1000, Dave Chinner wrote:
> On Thu, Jul 22, 2021 at 12:30:18PM -0700, Darrick J. Wong wrote:
> > On Thu, Jul 22, 2021 at 11:53:34AM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > After fixing the tail_lsn vs cache flush race, generic/482 continued
> > > to fail in a similar way where cache flushes were missing before
> > > iclog FUA writes. Tracing of iclog state changes during the fsstress
> > 
> > Heh. ;)
> ....
> > > +		 * xlog_cil_force_seq() call, but there are other writers still
> > > +		 * accessing it so it hasn't been pushed to disk yet. Like the
> > > +		 * ACTIVE case above, we need to make sure caches are flushed
> > > +		 * when this iclog is written.
> > > +		 */
> > > +		iclog->ic_flags |= XLOG_ICL_NEED_FLUSH | XLOG_ICL_NEED_FUA;
> > > +		if (log_flushed)
> > > +			*log_flushed = 1;
> > > +		break;
> > > +	default:
> > > +		/*
> > > +		 * The entire checkpoint was written by the CIL force and is on
> > > +		 * it's way to disk already. It will be stable when it
> > 
> > s/it's/its/
> > 
> > So now that we're at the end of this series, what are the rules for when
> > when issue cache flushes and FUA writes?
> > 
> > - Writing the unmount record always flushes the data and log devices.
> >   Does it need to flush the rt device too?  I guess xfs_free_buftarg
> >   does that.
> 
> Correct. RT device behaviour is unchanged as it only contains data
> and data is already guaranteed to be on stable storage before we
> write the unmount record.
> 
> > - Start an async flush of the data device when doing CIL push work so
> >   that anything the AIL wrote to disk (and pushed the tail) is persisted
> >   before we assign a tail to the log record that we write at the end?
> > 
> > - If any other AIL work completes (and pushes the tail ahead) by the
> >   time we actually write the log record, flush the data device a second
> >   time?
> 
> Yes.
> 
> > - If a log checkpoint spans multiple iclogs, flush the *log* device
> >   before writing the iclog with the commit record in it.
> 
> Yes. And for internal logs we have the natural optimisation that
> these two cases are handled by same cache flush and so for large
> checkpoints on internal logs we don't see lot tail update races.
> 
> > - Any time we write an iclog that commits a checkpoint, write that
> >   record with FUA to ensure it's persisted.
> 
> *nod*
> 
> > - If we're forcing the log to disk as part of an integrity operation
> >   (fsync, syncfs, etc.) then issue cache flushes for ... each? iclog
> >   written to disk?  And use FUA for that write too?
> 
> This is where it gets messy, because log forces are not based around
> checkpoint completions. Hence we have no idea what is actually in
> the iclog we are flushing and so must treat them all as if they
> contain a commit record, close off a multi-iclog checkpoint, and
> might have raced with a log tail update. We don't know - and can't
> know from the iclog state - which conditions exist and so we have to
> assume that at least one of the above states exist for any ACTIVE or
> WANT_SYNC iclog we end flushing or up waiting on.
> 
> If the iclog is already on it's way to disk, and it contains a
> commit record, then the cache flush requirements for
> metadata/journal ordering have already been met and we don't need to
> do anything other than wait. But if we have to flush the iclog or
> wait for a flush by a third party, we need to ensure that cache
> flushes occur so that the log force semantics are upheld.
> 
> If the iclog doesn't contain a commit record (i.e. a log force in
> the middle of a new, racing checkpoint write) we don't actually care
> if the iclog contains flushes or not, because a crash immediately
> after the log force won't actually recover the checkpoint contained
> in that iclog. From the log force perspective, the iclog contains
> future changes, so we don't care about whether it can be recovered.
> But we don't know this, so we have to issue cache flushes on every
> iclog we flush from the log force code.
> 
> This is why I mentioned that the log force code needs to be turned
> inside out to guarantee CIL checkpoints are flushed and stable
> rather than iclogs. We care about whole checkpoints being
> recoverable, not whether some random iclog in the middle of a
> checkpoint write is stable....

<nod> Ok, that's kinda what I thought -- the first few "where do we
flush?" cases were pretty straightforward, and the last one is murky.

With all the random little changes applied,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

  reply	other threads:[~2021-07-22 23:13 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-22  1:53 [PATCH 0/5] xfs: fix log cache flush regressions Dave Chinner
2021-07-22  1:53 ` [PATCH 1/5] xfs: flush data dev on external log write Dave Chinner
2021-07-22  6:41   ` Christoph Hellwig
2021-07-22 15:52   ` Darrick J. Wong
2021-07-22  1:53 ` [PATCH 2/5] xfs: external logs need to flush data device Dave Chinner
2021-07-22  6:48   ` Christoph Hellwig
2021-07-22 18:14   ` Darrick J. Wong
2021-07-22 21:45     ` Dave Chinner
2021-07-22 23:10       ` Darrick J. Wong
2021-07-22  1:53 ` [PATCH 3/5] xfs: fix ordering violation between cache flushes and tail updates Dave Chinner
2021-07-22  7:06   ` Christoph Hellwig
2021-07-22  7:28     ` Dave Chinner
2021-07-22 19:12     ` Darrick J. Wong
2021-07-22  1:53 ` [PATCH 4/5] xfs: log forces imply data device cache flushes Dave Chinner
2021-07-22  7:14   ` Christoph Hellwig
2021-07-22  7:32     ` Dave Chinner
2021-07-22 19:30   ` Darrick J. Wong
2021-07-22 22:12     ` Dave Chinner
2021-07-22 23:13       ` Darrick J. Wong [this message]
2021-07-22  1:53 ` [PATCH 5/5] xfs: avoid unnecessary waits in xfs_log_force_lsn() Dave Chinner
2021-07-22  7:15   ` Christoph Hellwig
2021-07-22 19:13   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210722231346.GP559212@magnolia \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.