All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/2] xfs: shutdown in intent recovery has non-intent items in the AIL
Date: Tue, 22 Mar 2022 09:41:23 +1100	[thread overview]
Message-ID: <20220321224123.GK1544202@dread.disaster.area> (raw)
In-Reply-To: <20220321215236.GK8224@magnolia>

On Mon, Mar 21, 2022 at 02:52:36PM -0700, Darrick J. Wong wrote:
> On Mon, Mar 21, 2022 at 12:23:29PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > generic/388 triggered a failure in RUI recovery due to a corrupted
> > btree record and the system then locked up hard due to a subsequent
> > assert failure while holding a spinlock cancelling intents:
> > 
> >  XFS (pmem1): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_trans.c:964).  Shutting down filesystem.
> >  XFS (pmem1): Please unmount the filesystem and rectify the problem(s)
> >  XFS: Assertion failed: !xlog_item_is_intent(lip), file: fs/xfs/xfs_log_recover.c, line: 2632
> >  Call Trace:
> >   <TASK>
> >   xlog_recover_cancel_intents.isra.0+0xd1/0x120
> >   xlog_recover_finish+0xb9/0x110
> >   xfs_log_mount_finish+0x15a/0x1e0
> >   xfs_mountfs+0x540/0x910
> >   xfs_fs_fill_super+0x476/0x830
> >   get_tree_bdev+0x171/0x270
> >   ? xfs_init_fs_context+0x1e0/0x1e0
> >   xfs_fs_get_tree+0x15/0x20
> >   vfs_get_tree+0x24/0xc0
> >   path_mount+0x304/0xba0
> >   ? putname+0x55/0x60
> >   __x64_sys_mount+0x108/0x140
> >   do_syscall_64+0x35/0x80
> >   entry_SYSCALL_64_after_hwframe+0x44/0xae
> > 
> > Essentially, there's dirty metadata in the AIL from intent recovery
> > transactions, so when we go to cancel the remaining intents we assume
> > that all objects after the first non-intent log item in the AIL are
> > not intents.
> > 
> > This is not true. Intent recovery can log new intents to continue
> > the operations the original intent could not complete in a single
> > transaction. The new intents are committed before they are deferred,
> > which means if the CIL commits in the background they will get
> > inserted into the AIL at the head.
> 
> Like you, I wonder how I never hit this.  Maybe I've never hit a
> corrupted rmap btree record during recovery?
> 
> So I guess what we're tripping over is a sequence of items in the AIL
> that looks something like this?
> 
> 0. <recovered non intent items>
> 1. <recovered intent item>
> 2. <new non-intent item>
> 3. <new intent items>

Mostly correct, but not quite. At the end of the first phase of
recovery (i.e. reading the log and extracting/recovering individual
items), the AIL looks like this:

0. <incomplete recovered intents>

It has nothing else in it, just the intents that need recovery.

In the second phase of recovery, we start processing those intents,
walking the above list and calling ->recover on them. IIUC,
recovering a BUI can create CUI and RUI intents, recovering a CUI
can create both RUI and EFI intents, and recovering a RUI can create
EFI intents.

Now, processing these newly created chained intents is deferred
after committing them to the *CIL*, so only the initial intents in 0
above are replayed because we check the AIL contents again. If there
is enough space in the journal and CIL to aggregate all these new
intents and objects, they will never reach the AIL before the
initial intent recovery has completed, and the check will never
fail.  Hence I think the actual logged contents check is rarely, if
ever, run properly because nothing has reached the AIL in normal
cases.

However, if the CIL -ever- commits during intent processing, then
the AIL check is completely invalid - the AIL can contain buffers,
inodes, new intents, etc in any order:

0. <incomplete recovered intents>
1. <non intent items>
2. <new chained/deferred intents>
3. <non intent items>
4. <new chained/deferred intents>
5. <non intent items>
....

> So we speed along the AIL list, dealing with the <0> items until we get
> to <1>.  We recover <1>, which generates <2> and <3>.  Next, the
> debugging code thinks we've hit the end of the list of recovered items,
> and therefore it can keep walking the AIL and that it will only find
> items like <2>.  Unfortunately, it finds the new intent <3> and trips?

Effectively, yes.

> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Thanks!

-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2022-03-21 22:50 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-21  1:23 [PATCH 0/2] xfs: more shutdown/recovery fixes Dave Chinner
2022-03-21  1:23 ` [PATCH 1/2] xfs: aborting inodes on shutdown may need buffer lock Dave Chinner
2022-03-21 21:56   ` Darrick J. Wong
2022-03-21 22:43     ` Dave Chinner
2022-03-21  1:23 ` [PATCH 2/2] xfs: shutdown in intent recovery has non-intent items in the AIL Dave Chinner
2022-03-21 21:52   ` Darrick J. Wong
2022-03-21 22:41     ` Dave Chinner [this message]
2022-03-22  2:35 ` [PATCH 0/2] xfs: more shutdown/recovery fixes Darrick J. Wong
2022-03-22  3:26   ` Dave Chinner
2022-03-22 16:23     ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220321224123.GK1544202@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.