All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: linux-xfs@vger.kernel.org
Subject: [PATCH 0/13] xfs: rewrite xlog_write()
Date: Wed, 24 Feb 2021 17:34:46 +1100	[thread overview]
Message-ID: <20210224063459.3436852-1-david@fromorbit.com> (raw)

Hi folks,

xlog_write() is code that causes severe eye bleeding. It's extremely
difficult to understand the way it is structured, and extremely easy
to break because of all the weird parameters it passes between
functions that do very non-obvious things. state is set in
xlog_write_finish_copy() that is carried across both outer and inner
loop iterations that is used by xlog_write_setup_copy(), which also
sets state that xlog_write_finish_copy() needs. The way iclog space
was obtained affects the accounting logic that ends up being passed
to xlog_state_finish_copy(). The code that handles commit iclogs is
spread over multiple functions and is obfuscated by the set/finish
copy code.

It's just a mess.

It's also extremely inefficient.

For all the complexity created by having to keep track of partial
copy state for loop continuation, this is a *rare* occurrence. On a
256kB iclog buffer, we can copy a couple of thousand regions (e.g.
inode log format + inode core regions) without hitting the partial
copy case once. IOWs, we don't need to track partial copy state for
the vast majority of region copy operations we perform. It's all
unnecessary overhead.

Not to mention that before we even start the copy, we walk every log
vector and log iovec to count the op headers needed, the amount of
space consumed by the log vector chain, and record every log region
in a small debug array that overflows and is emptied every 15
regions. Given we often see chains with hundreds of thousands of
entries in them, this is all pure overhead given that the CIL code
already counts the vectors and can trivially count the size of the
log vector chain at the same time.

Then there is the opheaders that xlog_write() adds before every
region. This is why it needs to count headers, because we have
to account for this space they use in the journal. We've already
reserved this space when committing items to the CIL, but we still
require xlog_write() to add these fixed size headers to log regions
and account for the space in the CIL context log ticket.

Hence we have complexity tracking and reserving CIL space at
transaction commit time, as well as complexity formatting and
accounting for this space when the CIL writes them to the log. This
can be done much more efficiently by adding the opheaders to the log
regions at item formatting time, largely removing the need to do
anything but update opheader lengths and flags in xlog_write().

This opens up a bunch of other optimisations in the item formatting
code that can reduce the number of regions we need to track, but
that is not a topic this patch set tries to address.

The simplification of the accounting and reservation of space for
log opheaders during the transaction commit path allows for further
optimisations of that fast path. Namely, it makes it possible to
move to per-cpu accounting and tracking of the CIL. That is not a
topic this patchset tries to address, but will be sent in a followup
patchset soon.

The simplifications in opheader management allow us to implement a
fast path for xlog_write() that does almost nothing but iterate the
log vector chain doing memcpy() and advancing the iclog data
pointer. The only time we need to worry about partial copies is when
a log vector will not wholly fit within the current iclog. By
writing a slow path that just handles a single log vector that needs
to be split across multiple iclogs, we get rid of all the difficult
to follow logic. That is what this patchset implements.

Overall, we go from 3 iterations of the log vector chain to two, the
majority of the copies hit the xlog_write_single() fast path, the
xlog_write() code is much simpler, the handling of iclog state is
much cleaner, and all the partial copy state is contained within a
single function. The fast path loops are much smaller and tighter,
the regions we memcpy() are larger, and so overall the code is much
more efficient.

Unfortunately, this code is complex and full of subtle stuff that
takes a *lot* of time and effort to understand. Hence I feel that
these changes aren't actually properly reviewable by anyone. If
someone else presented this patchset to me, I'd be pretty much say
the same thing, because to understand all the subtle corner cases in
xlog_write() takes weeks of bashing your head repeatedly into said
corner cases.

But, really, that's why I've rewritten the code. I think the code
I've written is much easier to understand and there's less of it.
The compiled code is smaller and faster. It passes fstests. And it
opens more avenues for future improvement of the log write code
that would otherwise have to do with the mess of xlog_write()....

Thoughts, comments?

Cheers,

Dave.

Diffstat:
 fs/xfs/xfs_log.c      | 722 +++++++++++++++++++++-----------------------------
 fs/xfs/xfs_log.h      |  74 ++++--
 fs/xfs/xfs_log_cil.c  | 159 +++++++----
 fs/xfs/xfs_log_priv.h |  29 +-
 fs/xfs/xfs_trans.c    |   6 +-
 5 files changed, 458 insertions(+), 532 deletions(-)



             reply	other threads:[~2021-02-24  6:36 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-24  6:34 Dave Chinner [this message]
2021-02-24  6:34 ` [PATCH 01/13] xfs: factor out the CIL transaction header building Dave Chinner
2021-02-25  9:12   ` Christoph Hellwig
2021-02-24  6:34 ` [PATCH 02/13] xfs: only CIL pushes require a start record Dave Chinner
2021-02-25  9:15   ` Christoph Hellwig
2021-02-25 22:11     ` Dave Chinner
2021-02-24  6:34 ` [PATCH 03/13] xfs: embed the xlog_op_header in the unmount record Dave Chinner
2021-02-25  9:26   ` Christoph Hellwig
2021-02-24  6:34 ` [PATCH 04/13] " Dave Chinner
2021-02-25  9:38   ` Christoph Hellwig
2021-02-25 22:13     ` Dave Chinner
2021-02-26  2:57   ` Darrick J. Wong
2021-02-26  4:23     ` Dave Chinner
2021-02-24  6:34 ` [PATCH 05/13] xfs: log tickets don't need log client id Dave Chinner
2021-02-25  9:20   ` Christoph Hellwig
2021-02-24  6:34 ` [PATCH 06/13] xfs: move log iovec alignment to preparation function Dave Chinner
2021-02-25  9:39   ` Christoph Hellwig
2021-02-24  6:34 ` [PATCH 07/13] xfs: reserve space and initialise xlog_op_header in item formatting Dave Chinner
2021-02-25 18:27   ` Christoph Hellwig
2021-02-25 22:16     ` Dave Chinner
2021-02-24  6:34 ` [PATCH 08/13] xfs: log ticket region debug is largely useless Dave Chinner
2021-02-25  9:21   ` Christoph Hellwig
2021-02-24  6:34 ` [PATCH 09/13] xfs: pass lv chain length and size into xlog_write() Dave Chinner
2021-02-25 18:30   ` Christoph Hellwig
2021-02-24  6:34 ` [PATCH 10/13] xfs: introduce xlog_write_single() Dave Chinner
2021-02-25 18:43   ` Christoph Hellwig
2021-02-25 22:21     ` Dave Chinner
2021-02-24  6:34 ` [PATCH 11/13] xfs:_introduce xlog_write_partial() Dave Chinner
2021-02-25 18:54   ` Christoph Hellwig
2021-02-25 22:23     ` Dave Chinner
2021-02-24  6:34 ` [PATCH 12/13] xfs: xlog_write() no longer needs contwr state Dave Chinner
2021-02-24  6:34 ` [PATCH 13/13] xfs: CIL context doesn't need to count iovecs Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210224063459.3436852-1-david@fromorbit.com \
    --to=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.