All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: linux-xfs@vger.kernel.org
Subject: Re: Musings over REQ_PREFLUSH and REQ_FUA in journal IO
Date: Tue, 26 Jan 2021 10:58:56 +1100	[thread overview]
Message-ID: <20210125235856.GH4662@dread.disaster.area> (raw)
In-Reply-To: <20210125061422.GF4662@dread.disaster.area>

On Mon, Jan 25, 2021 at 05:14:22PM +1100, Dave Chinner wrote:
> Hi folks,
> 
> I've been thinking a little about the way we write use cache flushes
> recently and I was thinking about how we do journal writes and
> whether we need to issue as many cache flushes as we currently do.

....

> And then I woundered if we could apply the same logic to
> post-journal write cache flushes (REQ_FUA) that guarantee that the
> journal writes are stable before we allow writeback of the metadata
> in that LSN range (i.e. once they are unpinned). Again, we have a
> completion to submission ordering requirement here, only this time
> it is journal IO completion to metadata IO submission.
> 
> IOWs, I think the same observation about the log head and the AIL
> writeback mechanism can be made here: we only need to ensure a cache
> flush occurs before we start writing back metadata at an LSN higher
> than the journal head at the time of the last cache flush. The first
> iclog write of last CIL checkpoint will have ensured all
> metadata lower than the LSN of the CIL checkpoint is stable, hence
> we only need to concern ourselves about metadata at the same LSN as
> that checkpoint. checkpoint completion will unpin that metadata, but
> we still need a cache flush to guarantee ordering at the stable
> storage level.
> 
> Hence we can use an on-demand AIL traversal cache flush to ensure
> we have journal-to-metadata ordering. This will be much rarer than
> every using FUA for every iclog write, and should be of similar
> order of gains to the REQ_PREFLUSH optimisation.
> 
> FWIW, because we use checksums to detect complete checkpoints in
> the journal now, we don't actually need to use FUA writes to
> guarantee they hit stable storage. We don't have a guarantee in what
> order they will hit the disk (even with FUA), so the only thing that
> the FUA write gains us is that on some hardware it elides the need
> for a post-write cache flush. Hence I don't think we need REQ_FUA,
> either.

I think that this can be greatly simplified. We simply us
REQ_PREFLUSH | REQ_FUA on all commit records that close off a
transaction. The pre-flush can be used to guarantee that all the
preceeding log writes have completed to the journal, then the commit
record is written w/ FUA, guaranteeing the entire checkpoint is on
stable storage before we run the checkpoint completion callbacks
that unpin the dirty items and insert them into the AIL. This means
we don't need to modify the AIL at all, and all the metadata vs
journal ordering is still maintained entirely within the journal.

The only additional complexity is that we have to separate the
commit record into a new iclog from the rest of the checkpoint,
unless the checkpoint fits entirely inside a single iclog. I don't
think this is hard to do - we can probably do it once we've written
the commit record and hold a reference to the iclog the commit
record was written to that prevents it from being flushed until
we release the reference to it.

> The only explicit ordering we really have are log forces. As long as
> log forces issue a cache flush when they are left pending by CIL
> transaction completion, we shouldn't require anything more here. The
> situation is similar to the AIL requirement...

This won't a concern with the above change, because the commit
mechanism provides the same guarantees about stable journal contents
as it does now...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2021-01-26  5:21 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-25  6:14 Musings over REQ_PREFLUSH and REQ_FUA in journal IO Dave Chinner
2021-01-25 23:58 ` Dave Chinner [this message]
2021-01-26  2:05 ` Darrick J. Wong
2021-01-26 20:21   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210125235856.GH4662@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.