All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 33/39] xfs: Add order IDs to log items in CIL
Date: Thu, 3 Jun 2021 12:13:30 +1000	[thread overview]
Message-ID: <20210603021330.GL664593@dread.disaster.area> (raw)
In-Reply-To: <20210603004914.GC26402@locust>

On Wed, Jun 02, 2021 at 05:49:14PM -0700, Darrick J. Wong wrote:
> On Thu, Jun 03, 2021 at 10:16:22AM +1000, Dave Chinner wrote:
> > On Thu, May 27, 2021 at 12:00:23PM -0700, Darrick J. Wong wrote:
> > > On Wed, May 19, 2021 at 10:13:11PM +1000, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > > 
> > > > Before we split the ordered CIL up into per cpu lists, we need a
> > > > mechanism to track the order of the items in the CIL. We need to do
> > > > this because there are rules around the order in which related items
> > > > must physically appear in the log even inside a single checkpoint
> > > > transaction.
> > > > 
> > > > An example of this is intents - an intent must appear in the log
> > > > before it's intent done record so taht log recovery can cancel the
> > > 
> > > s/taht/that/
> > > 
> > > > intent correctly. If we have these two records misordered in the
> > > > CIL, then they will not be recovered correctly by journal replay.
> > > > 
> > > > We also will not be able to move items to the tail of
> > > > the CIL list when they are relogged, hence the log items will need
> > > > some mechanism to allow the correct log item order to be recreated
> > > > before we write log items to the hournal.
> > > > 
> > > > Hence we need to have a mechanism for recording global order of
> > > > transactions in the log items  so that we can recover that order
> > > > from un-ordered per-cpu lists.
> > > > 
> > > > Do this with a simple monotonic increasing commit counter in the CIL
> > > > context. Each log item in the transaction gets stamped with the
> > > > current commit order ID before it is added to the CIL. If the item
> > > > is already in the CIL, leave it where it is instead of moving it to
> > > > the tail of the list and instead sort the list before we start the
> > > > push work.
> > > > 
> > > > XXX: list_sort() under the cil_ctx_lock held exclusive starts
> > > > hurting that >16 threads. Front end commits are waiting on the push
> > > > to switch contexts much longer. The item order id should likely be
> > > > moved into the logvecs when they are detacted from the items, then
> > > > the sort can be done on the logvec after the cil_ctx_lock has been
> > > > released. logvecs will need to use a list_head for this rather than
> > > > a single linked list like they do now....
> > > 
> > > ...which I guess happens in patch 35 now?
> > 
> > Right. I'll just remove this from the commit message.
> > 
> > > > @@ -780,6 +780,26 @@ xlog_cil_build_trans_hdr(
> > > >  	tic->t_curr_res -= lvhdr->lv_bytes;
> > > >  }
> > > >  
> > > > +/*
> > > > + * CIL item reordering compare function. We want to order in ascending ID order,
> > > > + * but we want to leave items with the same ID in the order they were added to
> > > 
> > > When do we have items with the same id?
> > 
> > All the items in a single transaction have the same id. The order id
> > increments before we tag all the items in the transaction and insert
> > them into the CIL.
> > 
> > > I guess that happens if we have multiple transactions adding items to
> > > the cil at the same time?  I guess that's not a big deal since each of
> > > those threads will hold a disjoint set of locks, so even if the order
> > > ids are the same for a bunch of items, they're never going to be
> > > touching the same AG/inode/metadata object, right?
> > >
> > > If that's correct, then:
> > > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> > 
> > 
> > While true, it's not the way this works so I won't immediately
> > accept your RVB. The reason for not changing the ordering within a
> > single transaction is actually intent logging.  i.e. this:
> > 
> > > > + * the list. This is important for operations like reflink where we log 4 order
> > > > + * dependent intents in a single transaction when we overwrite an existing
> > > > + * shared extent with a new shared extent. i.e. BUI(unmap), CUI(drop),
> > > > + * CUI (inc), BUI(remap)...
> > 
> > There's a specific order of operations that recovery must run these
> > intents in, and so if we re-order them here in the CIL they'll be
> > out of order in the log and recovery will replay the intents in the
> > wrong order. Replaying the intents in the wrong order results in
> > corruption warnings and assert failures during log recovery, hence
> > the constraint of not re-ordering items within the same transaction.
> 
> <ding> lightbulb comes on.  I think I understood this better the last
> time I read all these patches. :/
> 
> Basically, for each item that can be attached to a transaction, you're
> assigning it an "order id" that is a monotonically increasing counter
> that (roughly) records the last time the item was committed.  Certain
> items (like inodes) can be relogged and committed multiple times in
> rapid fire succession, in which case the order_id will get bumped
> forward.

Effectively, yes.

> In the /next/ patch you'll change the cil item list to be per-cpu and
> only splice the mess together at cil push time.  For that to work
> properly, you have to re-sort that resulting list in commit order (aka
> the order_id) to keep the items in order of commit.
> 
> For items *within* a transaction, you take advantage of the property
> of list_sort that it won't reorder items with cmp(a, b) == 0, which
> means that all the intents logged to a transaction will maintain the
> same order that the author of higher level code wrote into the software.

Correct.

> Question: xlog_cil_push_work zeroes the order_id of pushed log items.
> Is there any potential problem here when ctx->order_id wraps around to
> zero?  I think the answer is that we'll move on to a new cil context
> long before we hit 2^32-1 transactions?

Yes. At the moment, the max transaction rate is about 800k/s, which
means it'd take a couple of hours to run 4 billion transactions. So
we're in no danger of overruning the number of transactions in a CIL
commit any time soon. And if we ever get near that, we can just bump
the counter to a 64 bit value...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2021-06-03  2:13 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-19 12:12 [PATCH 00/39 v4] xfs: CIL and log optimisations Dave Chinner
2021-05-19 12:12 ` [PATCH 01/39] xfs: log stripe roundoff is a property of the log Dave Chinner
2021-05-28  0:54   ` Allison Henderson
2021-05-19 12:12 ` [PATCH 02/39] xfs: separate CIL commit record IO Dave Chinner
2021-05-28  0:54   ` Allison Henderson
2021-05-19 12:12 ` [PATCH 03/39] xfs: remove xfs_blkdev_issue_flush Dave Chinner
2021-05-28  0:54   ` Allison Henderson
2021-05-19 12:12 ` [PATCH 04/39] xfs: async blkdev cache flush Dave Chinner
2021-05-20 23:53   ` Darrick J. Wong
2021-05-28  0:54   ` Allison Henderson
2021-05-19 12:12 ` [PATCH 05/39] xfs: CIL checkpoint flushes caches unconditionally Dave Chinner
2021-05-28  0:54   ` Allison Henderson
2021-05-19 12:12 ` [PATCH 06/39] xfs: remove need_start_rec parameter from xlog_write() Dave Chinner
2021-05-19 12:12 ` [PATCH 07/39] xfs: journal IO cache flush reductions Dave Chinner
2021-05-21  0:16   ` Darrick J. Wong
2021-05-19 12:12 ` [PATCH 08/39] xfs: Fix CIL throttle hang when CIL space used going backwards Dave Chinner
2021-05-19 12:12 ` [PATCH 09/39] xfs: xfs_log_force_lsn isn't passed a LSN Dave Chinner
2021-05-21  0:20   ` Darrick J. Wong
2021-05-19 12:12 ` [PATCH 10/39] xfs: AIL needs asynchronous CIL forcing Dave Chinner
2021-05-21  0:33   ` Darrick J. Wong
2021-05-19 12:12 ` [PATCH 11/39] xfs: CIL work is serialised, not pipelined Dave Chinner
2021-05-21  0:32   ` Darrick J. Wong
2021-05-19 12:12 ` [PATCH 12/39] xfs: factor out the CIL transaction header building Dave Chinner
2021-05-19 12:12 ` [PATCH 13/39] xfs: only CIL pushes require a start record Dave Chinner
2021-05-19 12:12 ` [PATCH 14/39] xfs: embed the xlog_op_header in the unmount record Dave Chinner
2021-05-21  0:35   ` Darrick J. Wong
2021-05-19 12:12 ` [PATCH 15/39] xfs: embed the xlog_op_header in the commit record Dave Chinner
2021-05-19 12:12 ` [PATCH 16/39] xfs: log tickets don't need log client id Dave Chinner
2021-05-21  0:38   ` Darrick J. Wong
2021-05-19 12:12 ` [PATCH 17/39] xfs: move log iovec alignment to preparation function Dave Chinner
2021-05-19 12:12 ` [PATCH 18/39] xfs: reserve space and initialise xlog_op_header in item formatting Dave Chinner
2021-05-19 12:12 ` [PATCH 19/39] xfs: log ticket region debug is largely useless Dave Chinner
2021-05-19 12:12 ` [PATCH 20/39] xfs: pass lv chain length into xlog_write() Dave Chinner
2021-05-27 17:20   ` Darrick J. Wong
2021-06-02 22:18     ` Dave Chinner
2021-06-02 22:24       ` Darrick J. Wong
2021-06-02 22:58         ` [PATCH 20/39 V2] " Dave Chinner
2021-06-02 23:01           ` Darrick J. Wong
2021-05-19 12:12 ` [PATCH 21/39] xfs: introduce xlog_write_single() Dave Chinner
2021-05-27 17:27   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 22/39] xfs:_introduce xlog_write_partial() Dave Chinner
2021-05-27 18:06   ` Darrick J. Wong
2021-06-02 22:21     ` Dave Chinner
2021-05-19 12:13 ` [PATCH 23/39] xfs: xlog_write() no longer needs contwr state Dave Chinner
2021-05-19 12:13 ` [PATCH 24/39] xfs: xlog_write() doesn't need optype anymore Dave Chinner
2021-05-27 18:07   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 25/39] xfs: CIL context doesn't need to count iovecs Dave Chinner
2021-05-27 18:08   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 26/39] xfs: use the CIL space used counter for emptiness checks Dave Chinner
2021-05-19 12:13 ` [PATCH 27/39] xfs: lift init CIL reservation out of xc_cil_lock Dave Chinner
2021-05-19 12:13 ` [PATCH 28/39] xfs: rework per-iclog header CIL reservation Dave Chinner
2021-05-27 18:17   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 29/39] xfs: introduce per-cpu CIL tracking structure Dave Chinner
2021-05-27 18:31   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 30/39] xfs: implement percpu cil space used calculation Dave Chinner
2021-05-27 18:41   ` Darrick J. Wong
2021-06-02 23:47     ` Dave Chinner
2021-06-03  1:26       ` Darrick J. Wong
2021-06-03  2:28         ` Dave Chinner
2021-06-03  3:01           ` Darrick J. Wong
2021-06-03  3:56             ` Dave Chinner
2021-05-19 12:13 ` [PATCH 31/39] xfs: track CIL ticket reservation in percpu structure Dave Chinner
2021-05-27 18:48   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 32/39] xfs: convert CIL busy extents to per-cpu Dave Chinner
2021-05-27 18:49   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 33/39] xfs: Add order IDs to log items in CIL Dave Chinner
2021-05-27 19:00   ` Darrick J. Wong
2021-06-03  0:16     ` Dave Chinner
2021-06-03  0:49       ` Darrick J. Wong
2021-06-03  2:13         ` Dave Chinner [this message]
2021-06-03  3:02           ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 34/39] xfs: convert CIL to unordered per cpu lists Dave Chinner
2021-05-27 19:03   ` Darrick J. Wong
2021-06-03  0:27     ` Dave Chinner
2021-05-19 12:13 ` [PATCH 35/39] xfs: convert log vector chain to use list heads Dave Chinner
2021-05-27 19:13   ` Darrick J. Wong
2021-06-03  0:38     ` Dave Chinner
2021-06-03  0:50       ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 36/39] xfs: move CIL ordering to the logvec chain Dave Chinner
2021-05-27 19:14   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 37/39] xfs: avoid cil push lock if possible Dave Chinner
2021-05-27 19:18   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 38/39] xfs: xlog_sync() manually adjusts grant head space Dave Chinner
2021-05-19 12:13 ` [PATCH 39/39] xfs: expanding delayed logging design with background material Dave Chinner
2021-05-27 20:38   ` Darrick J. Wong
2021-06-03  0:57     ` Dave Chinner
2021-06-03  5:22 [PATCH 00/39 v5] xfs: CIL and log optimisations Dave Chinner
2021-06-03  5:22 ` [PATCH 33/39] xfs: Add order IDs to log items in CIL Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210603021330.GL664593@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.