Re: [PATCH v3] btrfs: fix mount failure due to past and transient device flush error

From: David Sterba <dsterba@suse.cz>
To: fdmanana@kernel.org
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v3] btrfs: fix mount failure due to past and transient device flush error
Date: Thu, 9 Sep 2021 17:43:31 +0200	[thread overview]
Message-ID: <20210909154331.GA15306@twin.jikos.cz> (raw)
In-Reply-To: <893dad4768973411df7867e4436fe728d989fe1a.1631122173.git.fdmanana@suse.com>

On Wed, Sep 08, 2021 at 07:05:44PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When we get an error flushing one device, during a super block commit, we
> record the error in the device structure, in the field 'last_flush_error'.
> This is used to later check if we should error out the super block commit,
> depending on whether the number of flush errors is greater than or equals
> to the maximum tolerated device failures for a raid profile.
> 
> However if we get a transient device flush error, unmount the filesystem
> and later try to mount it, we can fail the mount because we treat that
> past error as critical and consider the device is missing. Even if it's
> very likely that the error will happen again, as it's probably due to a
> hardware related problem, there may be cases where the error might not
> happen again. One example is during testing, and a test case like the
> new generic/648 from fstests always triggers this. The test cases
> generic/019 and generic/475 also trigger this scenario, but very
> sporadically.
> 
> When this happens we get an error like this:
> 
>   $ mount /dev/sdc /mnt
>   mount: /mnt wrong fs type, bad option, bad superblock on /dev/sdc, missing codepage or helper program, or other error.
> 
>   $ dmesg
>   (...)
>   [12918.886926] BTRFS warning (device sdc): chunk 13631488 missing 1 devices, max tolerance is 0 for writable mount
>   [12918.888293] BTRFS warning (device sdc): writable mount is not allowed due to too many missing devices
>   [12918.890853] BTRFS error (device sdc): open_ctree failed
> 
> The failure happens because when btrfs_check_rw_degradable() is called at
> mount time, or at remount from RO to RW time, is sees a non zero value in
> a device's ->last_flush_error attribute, and therefore considers that the
> device is 'missing'.
> 
> Fix this by setting a device's ->last_flush_error to zero when we close a
> device, making sure the error is not seen on the next mount attempt. We
> only need to track flush errors during the current mount, so that we never
> commit a super block if such errors happened.
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> 
> V3: Use a different and cleaner approach, by reseting the flush error
>     from a device when we close it, so that it's not seen on the next
>     mount attempt.

Added to misc-next, thanks.