All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anand Jain <anand.jain@oracle.com>
To: fdmanana@kernel.org
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 1/2] btrfs: fix mount failure due to past and transient device flush error
Date: Wed, 8 Sep 2021 22:19:47 +0800	[thread overview]
Message-ID: <89c736d1-2e8c-b9ef-40a0-298b94fcebde@oracle.com> (raw)
In-Reply-To: <dcf9de78faa6ec5cef443d031a987c87301805b1.1631026981.git.fdmanana@suse.com>

On 07/09/2021 23:15, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When we get an error flushing one device, during a super block commit, we
> record the error in the device structure, in the field 'last_flush_error'.
> This is used to later check if we should error out the super block commit,
> depending on whether the number of flush errors is greater than or equals
> to the maximum tolerated device failures for a raid profile.


> However if we get a transient device flush error, unmount the filesystem
> and later try to mount it, we can fail the mount because we treat that
> past error as critical and consider the device is missing.

> Even if it's
> very likely that the error will happen again, as it's probably due to a
> hardware related problem, there may be cases where the error might not
> happen again. 

  But is there an impact due to flush error, like storage cache lost few 
block? If so, then the current design is correct. No?

Thanks, Anand

> One example is during testing, and a test case like the
> new generic/648 from fstests always triggers this. The test cases
> generic/019 and generic/475 also trigger this scenario, but very
> sporadically.
> 
> When this happens we get an error like this:
> 
>    $ mount /dev/sdc /mnt
>    mount: /mnt wrong fs type, bad option, bad superblock on /dev/sdc, missing codepage or helper program, or other error.
> 
>    $ dmesg
>    (...)
>    [12918.886926] BTRFS warning (device sdc): chunk 13631488 missing 1 devices, max tolerance is 0 for writable mount
>    [12918.888293] BTRFS warning (device sdc): writable mount is not allowed due to too many missing devices
>    [12918.890853] BTRFS error (device sdc): open_ctree failed
> 
> So fix this by making sure btrfs_check_rw_degradable() does not consider
> flush errors from past mounts when it's being called either on a mount
> context or on a RO to RW remount context, and clears the flush errors
> from the devices. Any path that triggers a super block commit during
> mount/remount must still check for any flush errors and lead to a
> mount/remount failure if any are found - all these paths (replaying log
> trees, convert space cache v1 to v2, etc) all happen after the first
> call to btrfs_check_rw_degradable(), which is the only call that should
> ignore and reset past flush errors from the devices.
> 

What if the flush error is real that the storage cache dropped few 
blocks and, did not make it to the permanent storage.

Thanks, Anand


> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>   fs/btrfs/disk-io.c |  4 ++--
>   fs/btrfs/super.c   |  2 +-
>   fs/btrfs/volumes.c | 26 +++++++++++++++++++++-----
>   fs/btrfs/volumes.h |  3 ++-
>   4 files changed, 26 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 7d80e5b22d32..6d7d6288f80a 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3564,7 +3564,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
>   		goto fail_sysfs;
>   	}
>   
> -	if (!sb_rdonly(sb) && !btrfs_check_rw_degradable(fs_info, NULL)) {
> +	if (!sb_rdonly(sb) && !btrfs_check_rw_degradable(fs_info, NULL, true)) {
>   		btrfs_warn(fs_info,
>   		"writable mount is not allowed due to too many missing devices");
>   		goto fail_sysfs;
> @@ -4013,7 +4013,7 @@ static blk_status_t wait_dev_flush(struct btrfs_device *device)
>   
>   static int check_barrier_error(struct btrfs_fs_info *fs_info)
>   {
> -	if (!btrfs_check_rw_degradable(fs_info, NULL))
> +	if (!btrfs_check_rw_degradable(fs_info, NULL, false))
>   		return -EIO;
>   	return 0;
>   }
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index a927009f02a2..51519141b12f 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -2017,7 +2017,7 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
>   			goto restore;
>   		}
>   
> -		if (!btrfs_check_rw_degradable(fs_info, NULL)) {
> +		if (!btrfs_check_rw_degradable(fs_info, NULL, true)) {
>   			btrfs_warn(fs_info,
>   		"too many missing devices, writable remount is not allowed");
>   			ret = -EACCES;
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index b81f25eed298..2a5beba98273 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -7367,7 +7367,7 @@ int btrfs_read_sys_array(struct btrfs_fs_info *fs_info)
>    * Return false if any chunk doesn't meet the minimal RW mount requirements.
>    */
>   bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info,
> -					struct btrfs_device *failing_dev)
> +			       struct btrfs_device *failing_dev, bool mounting_fs)
>   {
>   	struct extent_map_tree *map_tree = &fs_info->mapping_tree;
>   	struct extent_map *em;
> @@ -7395,12 +7395,28 @@ bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info,
>   		for (i = 0; i < map->num_stripes; i++) {
>   			struct btrfs_device *dev = map->stripes[i].dev;
>   
> -			if (!dev || !dev->bdev ||
> -			    test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state) ||
> -			    dev->last_flush_error)
> +			if (dev && dev->last_flush_error) {
> +				/*
> +				 * If we had a flush error from a previous mount,
> +				 * don't treat it as an error and clear the error
> +				 * status. Such an error may be transient, and
> +				 * just because it happened in a previous mount,
> +				 * it does not mean it will happen again if we
> +				 * mount the fs again. If it turns out the error
> +				 * happens again after mounting, then we will
> +				 * deal with it, abort the running transaction
> +				 * and set the fs state to BTRFS_FS_STATE_ERROR.
> +				 */
> +				if (mounting_fs)
> +					dev->last_flush_error = 0;
> +				else
> +					missing++;
> +			} else if (!dev || !dev->bdev ||
> +			    test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state)) {
>   				missing++;
> -			else if (failing_dev && failing_dev == dev)
> +			} else if (failing_dev && failing_dev == dev) {
>   				missing++;
> +			}
>   		}
>   		if (missing > max_tolerated) {
>   			if (!failing_dev)
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 7e8f205978d9..7299aa36f41f 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -575,7 +575,8 @@ void btrfs_commit_device_sizes(struct btrfs_transaction *trans);
>   
>   struct list_head * __attribute_const__ btrfs_get_fs_uuids(void);
>   bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info,
> -					struct btrfs_device *failing_dev);
> +			       struct btrfs_device *failing_dev,
> +			       bool mounting_fs);
>   void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info,
>   			       struct block_device *bdev,
>   			       const char *device_path);
> 


  reply	other threads:[~2021-09-08 14:20 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-06  9:09 [PATCH] btrfs: fix mount failure due to past and transient device flush error fdmanana
2021-09-07 11:26 ` David Sterba
2021-09-07 15:15   ` Filipe Manana
2021-09-07 15:15 ` [PATCH 0/2] btrfs: fix mount/remount failure due to past device flush errors fdmanana
2021-09-07 15:15   ` [PATCH 1/2] btrfs: fix mount failure due to past and transient device flush error fdmanana
2021-09-08 14:19     ` Anand Jain [this message]
2021-09-08 14:26       ` Filipe Manana
2021-09-08 16:30         ` David Sterba
2021-09-08 17:32           ` Filipe Manana
2021-09-08 18:05     ` [PATCH v3] " fdmanana
2021-09-09 15:43       ` David Sterba
2021-09-07 15:15   ` [PATCH 2/2] btrfs: remove the failing device argument from btrfs_check_rw_degradable() fdmanana
2021-09-07 16:05     ` David Sterba
2021-09-08 14:25       ` Anand Jain
2021-09-09 14:19         ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=89c736d1-2e8c-b9ef-40a0-298b94fcebde@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.