Re: [PATCH 1/1] xfs: fallback to readonly during recovery

From: Brian Foster <bfoster@redhat.com>
To: Eric Sandeen <sandeen@sandeen.net>
Cc: Aaron Sierra <asierra@xes-inc.com>,
	Vincent Fazio <vfazio@xes-inc.com>,
	linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/1] xfs: fallback to readonly during recovery
Date: Tue, 11 Feb 2020 07:55:04 -0500	[thread overview]
Message-ID: <20200211125504.GA2951@bfoster> (raw)
In-Reply-To: <400031d2-dbcb-a0de-338d-9a11f97c795c@sandeen.net>

On Mon, Feb 10, 2020 at 05:40:03PM -0600, Eric Sandeen wrote:
> On 2/10/20 4:31 PM, Aaron Sierra wrote:
> >> From: "Eric Sandeen" <sandeen@sandeen.net>
> >> Sent: Monday, February 10, 2020 3:43:50 PM
> > 
> >> On 2/10/20 3:10 PM, Vincent Fazio wrote:
> >>> Previously, XFS would fail to mount if there was an error during log
> >>> recovery. This can occur as a result of inevitable I/O errors when
> >>> trying to apply the log on read-only ATA devices since the ATA layer
> >>> does not support reporting a device as read-only.
> >>>
> >>> Now, if there's an error during log recovery, fall back to norecovery
> >>> mode and mark the filesystem as read-only in the XFS and VFS layers.
> >>>
> >>> This roughly approximates the 'errors=remount-ro' mount option in ext4
> >>> but is implicit and the scope only covers errors during log recovery.
> >>> Since XFS is the default filesystem for some distributions, this change
> >>> allows users to continue to use XFS on these read-only ATA devices.
> >>
> >> What is the workload or scenario where you need this behavior?
> >>
> >> I'm not a big fan of ~silently mounting a filesystem with latent errors,
> >> tbh, but maybe you can explain a bit more about the problem you're solving
> >> here?
> > 
> > Hi Eric,
> > 
> > We use SSDs from multiple vendors that can be configured at power-on (via
> > GPIO) to be read-write or write-protected. When write-protected we get I/O
> > errors for any writes that reach the device. We believe that behavior is
> > correct.
> > 
> > We have found that XFS fails during log recovery even when the log is clean
> > (apparently due to metadata writes immediately before actual recovery).
> 
> There should be no log recovery if it's clean ...
> 
> And I don't see that here - a clean log on a readonly device simply mounts
> RO for me by default, with no muss, no fuss.
> 
> # mkfs.xfs -f fsfile
> ...
> # losetup /dev/loop0 fsfile
> # mount /dev/loop0 mnt
> # touch mnt/blah
> # umount mnt
> # blockdev --setro /dev/loop0
> # dd if=/dev/zero of=/dev/loop0 bs=4k count=1
> dd: error writing ‘/dev/loop0’: Operation not permitted
> # mount /dev/loop0 mnt
> mount: /dev/loop0 is write-protected, mounting read-only
> # dmesg
> [  419.941649] /dev/loop0: Can't open blockdev
> [  419.947106] XFS (loop0): Mounting V5 Filesystem
> [  419.952895] XFS (loop0): Ending clean mount
> # uname -r
> 5.5.0
> 
> > Vincent and I believe that mounting read-only without recovery should be
> > fine even when the log is not clean, since the filesystem will be consistent,
> > even if out-of-date.
> 
> I think that you may be making too many assumptions here, i.e. that "log
> recovery failure leaves the filesystem in a consistent state" - and that
> may not be true in all cases.
> 
> IOWS, transitioning to a new RO state for your particular case may be safe,
> but I'm not sure that's universally true for all log replay failures.
> 

Agreed. Just to double down on this bit, this is definitely a misguided
assumption. Generally speaking, XFS logging places ordering rules on
metadata writes to the filesystem such that we can guarantee we can
always recover to a consistent point after a crash. By skipping recovery
of a dirty log, you are actively bypassing that mechanism.

For example, if a filesystem transaction modifies several objects, those
objects are logged in a transaction and committed to the physical log.
Once the transaction is committed to the physical log, the individual
objects are free to be written back in any arbitrary order because of
the transactional guarantee that log recovery provides. So nothing
prevents one object from being written back while another is reused (and
re-pinned) before a crash that leaves the filesystem in a corrupted
state. Log recovery is required to update the associated metadata
objects and make the fs consistent again.

In short, it's probably safer to assume any filesystem mounted with a
dirty log and norecovery is in fact corrupted as opposed to the other
way around.

Brian

> > Our customers' use often requires nonvolatile memory to be write-protected
> > or not based on the device being installed in a development or deployed
> > system. It is ideal for them to be able to mount their filesystems read-
> > write when possible and read-only when not without having to alter mount
> > options.
> 
> From my example above, I'd like to understand more why/how you have a
> clean log that fails to mount by default on a readonly block device...
> in my testing, no writes get sent to the device when mounting a clean
> log.
> 
> -Eric
>