All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: fdmanana@kernel.org
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v3] btrfs: fix mount failure due to past and transient device flush error
Date: Thu, 9 Sep 2021 17:43:31 +0200	[thread overview]
Message-ID: <20210909154331.GA15306@twin.jikos.cz> (raw)
In-Reply-To: <893dad4768973411df7867e4436fe728d989fe1a.1631122173.git.fdmanana@suse.com>

On Wed, Sep 08, 2021 at 07:05:44PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When we get an error flushing one device, during a super block commit, we
> record the error in the device structure, in the field 'last_flush_error'.
> This is used to later check if we should error out the super block commit,
> depending on whether the number of flush errors is greater than or equals
> to the maximum tolerated device failures for a raid profile.
> 
> However if we get a transient device flush error, unmount the filesystem
> and later try to mount it, we can fail the mount because we treat that
> past error as critical and consider the device is missing. Even if it's
> very likely that the error will happen again, as it's probably due to a
> hardware related problem, there may be cases where the error might not
> happen again. One example is during testing, and a test case like the
> new generic/648 from fstests always triggers this. The test cases
> generic/019 and generic/475 also trigger this scenario, but very
> sporadically.
> 
> When this happens we get an error like this:
> 
>   $ mount /dev/sdc /mnt
>   mount: /mnt wrong fs type, bad option, bad superblock on /dev/sdc, missing codepage or helper program, or other error.
> 
>   $ dmesg
>   (...)
>   [12918.886926] BTRFS warning (device sdc): chunk 13631488 missing 1 devices, max tolerance is 0 for writable mount
>   [12918.888293] BTRFS warning (device sdc): writable mount is not allowed due to too many missing devices
>   [12918.890853] BTRFS error (device sdc): open_ctree failed
> 
> The failure happens because when btrfs_check_rw_degradable() is called at
> mount time, or at remount from RO to RW time, is sees a non zero value in
> a device's ->last_flush_error attribute, and therefore considers that the
> device is 'missing'.
> 
> Fix this by setting a device's ->last_flush_error to zero when we close a
> device, making sure the error is not seen on the next mount attempt. We
> only need to track flush errors during the current mount, so that we never
> commit a super block if such errors happened.
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> 
> V3: Use a different and cleaner approach, by reseting the flush error
>     from a device when we close it, so that it's not seen on the next
>     mount attempt.

Added to misc-next, thanks.

  reply	other threads:[~2021-09-09 15:43 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-06  9:09 [PATCH] btrfs: fix mount failure due to past and transient device flush error fdmanana
2021-09-07 11:26 ` David Sterba
2021-09-07 15:15   ` Filipe Manana
2021-09-07 15:15 ` [PATCH 0/2] btrfs: fix mount/remount failure due to past device flush errors fdmanana
2021-09-07 15:15   ` [PATCH 1/2] btrfs: fix mount failure due to past and transient device flush error fdmanana
2021-09-08 14:19     ` Anand Jain
2021-09-08 14:26       ` Filipe Manana
2021-09-08 16:30         ` David Sterba
2021-09-08 17:32           ` Filipe Manana
2021-09-08 18:05     ` [PATCH v3] " fdmanana
2021-09-09 15:43       ` David Sterba [this message]
2021-09-07 15:15   ` [PATCH 2/2] btrfs: remove the failing device argument from btrfs_check_rw_degradable() fdmanana
2021-09-07 16:05     ` David Sterba
2021-09-08 14:25       ` Anand Jain
2021-09-09 14:19         ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210909154331.GA15306@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.