From: Eric Sandeen <sandeen@sandeen.net>
To: Jan Kara <jack@suse.cz>, Pingfan Liu <kernelfans@gmail.com>
Cc: Eric Sandeen <esandeen@redhat.com>,
"Darrick J. Wong" <darrick.wong@oracle.com>,
Dave Chinner <dchinner@redhat.com>,
linux-xfs@vger.kernel.org, Jan Kara <jack@suse.com>,
linux-fsdevel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
Hari Bathini <hbathini@linux.ibm.com>
Subject: Re: [PATCH] xfs: introduce "metasync" api to sync metadata to fsblock
Date: Mon, 14 Oct 2019 08:23:39 -0500 [thread overview]
Message-ID: <d3ffa114-8b73-90dc-8ba6-3f44f47135d7@sandeen.net> (raw)
In-Reply-To: <20191014094311.GD5939@quack2.suse.cz>
On 10/14/19 4:43 AM, Jan Kara wrote:
> On Mon 14-10-19 16:33:15, Pingfan Liu wrote:
>> On Sun, Oct 13, 2019 at 09:34:17AM -0700, Darrick J. Wong wrote:
>>> On Sun, Oct 13, 2019 at 10:37:00PM +0800, Pingfan Liu wrote:
>>>> When using fadump (fireware assist dump) mode on powerpc, a mismatch
>>>> between grub xfs driver and kernel xfs driver has been obsevered. Note:
>>>> fadump boots up in the following sequence: fireware -> grub reads kernel
>>>> and initramfs -> kernel boots.
>>>>
>>>> The process to reproduce this mismatch:
>>>> - On powerpc, boot kernel with fadump=on and edit /etc/kdump.conf.
>>>> - Replacing "path /var/crash" with "path /var/crashnew", then, "kdumpctl
>>>> restart" to rebuild the initramfs. Detail about the rebuilding looks
>>>> like: mkdumprd /boot/initramfs-`uname -r`.img.tmp;
>>>> mv /boot/initramfs-`uname -r`.img.tmp /boot/initramfs-`uname -r`.img
>>>> sync
>>>> - "echo c >/proc/sysrq-trigger".
>>>>
>>>> The result:
>>>> The dump image will not be saved under /var/crashnew/* as expected, but
>>>> still saved under /var/crash.
>>>>
>>>> The root cause:
>>>> As Eric pointed out that on xfs, 'sync' ensures the consistency by writing
>>>> back metadata to xlog, but not necessary to fsblock. This raises issue if
>>>> grub can not replay the xlog before accessing the xfs files. Since the
>>>> above dir entry of initramfs should be saved as inline data with xfs_inode,
>>>> so xfs_fs_sync_fs() does not guarantee it written to fsblock.
>>>>
>>>> umount can be used to write metadata fsblock, but the filesystem can not be
>>>> umounted if still in use.
>>>>
>>>> There are two ways to fix this mismatch, either grub or xfs. It may be
>>>> easier to do this in xfs side by introducing an interface to flush metadata
>>>> to fsblock explicitly.
>>>>
>>>> With this patch, metadata can be written to fsblock by:
>>>> # update AIL
>>>> sync
>>>> # new introduced interface to flush metadata to fsblock
>>>> mount -o remount,metasync mountpoint
>>>
>>> I think this ought to be an ioctl or some sort of generic call since the
>>> jbd2 filesystems (ext3, ext4, ocfs2) suffer from the same "$BOOTLOADER
>>> is too dumb to recover logs but still wants to write to the fs"
>>> checkpointing problem.
>> Yes, a syscall sounds more reasonable.
>>>
>>> (Or maybe we should just put all that stuff in a vfat filesystem, I
>>> don't know...)
>> I think it is unavoidable to involve in each fs' implementation. What
>> about introducing an interface sync_to_fsblock(struct super_block *sb) in
>> the struct super_operations, then let each fs manage its own case?
>
> Well, we already have a way to achieve what you need: fsfreeze.
> Traditionally, that is guaranteed to put fs into a "clean" state very much
> equivalent to the fs being unmounted and that seems to be what the
> bootloader wants so that it can access the filesystem without worrying
> about some recovery details. So do you see any problem with replacing
> 'sync' in your example above with 'fsfreeze /boot && fsfreeze -u /boot'?
>
> Honza
The problem with fsfreeze is that if the device you want to quiesce is, say,
the root fs, freeze isn't really a good option.
But the other thing I want to highlight about this approach is that it does not
solve the root problem: something is trying to read the block device without
first replaying the log.
A call such as the proposal here is only going to leave consistent metadata at
the time the call returns; at any time after that, all guarantees are off again,
so the problem hasn't been solved.
-Eric
next prev parent reply other threads:[~2019-10-14 18:18 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-13 14:37 [PATCH] xfs: introduce "metasync" api to sync metadata to fsblock Pingfan Liu
2019-10-13 16:34 ` Darrick J. Wong
2019-10-14 8:33 ` Pingfan Liu
2019-10-14 9:43 ` Jan Kara
2019-10-14 13:23 ` Eric Sandeen [this message]
2019-10-14 20:03 ` Jan Kara
2019-10-14 20:09 ` Eric Sandeen
2019-10-15 8:01 ` Christoph Hellwig
2019-10-15 13:10 ` Theodore Y. Ts'o
2019-10-15 16:18 ` Darrick J. Wong
2019-10-15 2:20 ` Pingfan Liu
2019-10-15 2:12 ` Pingfan Liu
2019-10-14 8:40 ` Christoph Hellwig
2019-10-15 1:56 ` Pingfan Liu
2019-10-15 8:01 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d3ffa114-8b73-90dc-8ba6-3f44f47135d7@sandeen.net \
--to=sandeen@sandeen.net \
--cc=darrick.wong@oracle.com \
--cc=dchinner@redhat.com \
--cc=esandeen@redhat.com \
--cc=hbathini@linux.ibm.com \
--cc=jack@suse.com \
--cc=jack@suse.cz \
--cc=kernelfans@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).