All of lore.kernel.org
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs <linux-xfs@vger.kernel.org>,
	Hou Tao <houtao1@huawei.com>, Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH] xfs: log recovery should replay deferred ops in order
Date: Thu, 23 Nov 2017 12:46:45 +0200	[thread overview]
Message-ID: <CAOQ4uxhHM7B_fG89if2DcLd65zFgvLBbp+RkPwKWGuD0swQBhQ@mail.gmail.com> (raw)
In-Reply-To: <20171122182949.GH2135@magnolia>

On Wed, Nov 22, 2017 at 8:29 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> As part of testing log recovery with dm_log_writes, Amir Goldstein
> discovered an error in the deferred ops recovery that lead to corruption
> of the filesystem metadata if a reflink+rmap filesystem happened to shut
> down midway through a CoW remap:
>
> "This is what happens [after failed log recovery]:
>
> "Phase 1 - find and verify superblock...
> "Phase 2 - using internal log
> "        - zero log...
> "        - scan filesystem freespace and inode maps...
> "        - found root inode chunk
> "Phase 3 - for each AG...
> "        - scan (but don't clear) agi unlinked lists...
> "        - process known inodes and perform inode discovery...
> "        - agno = 0
> "data fork in regular inode 134 claims CoW block 376
> "correcting nextents for inode 134
> "bad data fork in inode 134
> "would have cleared inode 134"
>
> Hou Tao dissected the log contents of exactly such a crash:
>
> "According to the implementation of xfs_defer_finish(), these ops should
> be completed in the following sequence:
>
> "Have been done:
> "(1) CUI: Oper (160)
> "(2) BUI: Oper (161)
> "(3) CUD: Oper (194), for CUI Oper (160)
> "(4) RUI A: Oper (197), free rmap [0x155, 2, -9]
>
> "Should be done:
> "(5) BUD: for BUI Oper (161)
> "(6) RUI B: add rmap [0x155, 2, 137]
> "(7) RUD: for RUI A
> "(8) RUD: for RUI B
>
> "Actually be done by xlog_recover_process_intents()
> "(5) BUD: for BUI Oper (161)
> "(6) RUI B: add rmap [0x155, 2, 137]
> "(7) RUD: for RUI B
> "(8) RUD: for RUI A
>
> "So the rmap entry [0x155, 2, -9] for COW should be freed firstly,
> then a new rmap entry [0x155, 2, 137] will be added. However, as we can see
> from the log record in post_mount.log (generated after umount) and the trace
> print, the new rmap entry [0x155, 2, 137] are added firstly, then the rmap
> entry [0x155, 2, -9] are freed."
>
> When reconstructing the internal log state from the log items found on
> disk, it's required that deferred ops replay in exactly the same order
> that they would have had the filesystem not gone down.  However,
> replaying unfinished deferred ops can create /more/ deferred ops.  These
> new deferred ops are finished in the wrong order.  This causes fs
> corruption and replay crashes, so let's create a single defer_ops to
> handle the subsequent ops created during replay, then use one single
> transaction at the end of log recovery to ensure that everything is
> replayed in the same order as they're supposed to be.
>
> Reported-by: Amir Goldstein <amir73il@gmail.com>
> Analyzed-by: Hou Tao <houtao1@huawei.com>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

You may add:
Tested-by: Amir Goldstein <amir73il@gmail.com>

  reply	other threads:[~2017-11-23 10:46 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-22 18:29 [PATCH] xfs: log recovery should replay deferred ops in order Darrick J. Wong
2017-11-23 10:46 ` Amir Goldstein [this message]
2017-11-23 13:36 ` Christoph Hellwig
2017-11-27 17:50   ` Darrick J. Wong
2017-11-30 20:31 ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOQ4uxhHM7B_fG89if2DcLd65zFgvLBbp+RkPwKWGuD0swQBhQ@mail.gmail.com \
    --to=amir73il@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=hch@lst.de \
    --cc=houtao1@huawei.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.