linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Filipe Manana <fdmanana@suse.com>,
	David Sterba <dsterba@suse.com>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.10 06/29] btrfs: fix mount failure due to past and transient device flush error
Date: Fri,  8 Oct 2021 13:27:53 +0200	[thread overview]
Message-ID: <20211008112717.141324446@linuxfoundation.org> (raw)
In-Reply-To: <20211008112716.914501436@linuxfoundation.org>

From: Filipe Manana <fdmanana@suse.com>

[ Upstream commit 6b225baababf1e3d41a4250e802cbd193e1343fb ]

When we get an error flushing one device, during a super block commit, we
record the error in the device structure, in the field 'last_flush_error'.
This is used to later check if we should error out the super block commit,
depending on whether the number of flush errors is greater than or equals
to the maximum tolerated device failures for a raid profile.

However if we get a transient device flush error, unmount the filesystem
and later try to mount it, we can fail the mount because we treat that
past error as critical and consider the device is missing. Even if it's
very likely that the error will happen again, as it's probably due to a
hardware related problem, there may be cases where the error might not
happen again. One example is during testing, and a test case like the
new generic/648 from fstests always triggers this. The test cases
generic/019 and generic/475 also trigger this scenario, but very
sporadically.

When this happens we get an error like this:

  $ mount /dev/sdc /mnt
  mount: /mnt wrong fs type, bad option, bad superblock on /dev/sdc, missing codepage or helper program, or other error.

  $ dmesg
  (...)
  [12918.886926] BTRFS warning (device sdc): chunk 13631488 missing 1 devices, max tolerance is 0 for writable mount
  [12918.888293] BTRFS warning (device sdc): writable mount is not allowed due to too many missing devices
  [12918.890853] BTRFS error (device sdc): open_ctree failed

The failure happens because when btrfs_check_rw_degradable() is called at
mount time, or at remount from RO to RW time, is sees a non zero value in
a device's ->last_flush_error attribute, and therefore considers that the
device is 'missing'.

Fix this by setting a device's ->last_flush_error to zero when we close a
device, making sure the error is not seen on the next mount attempt. We
only need to track flush errors during the current mount, so that we never
commit a super block if such errors happened.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/volumes.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d8b8764f5bd1..593e0c6d6b44 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1147,6 +1147,19 @@ static void btrfs_close_one_device(struct btrfs_device *device)
 	atomic_set(&device->dev_stats_ccnt, 0);
 	extent_io_tree_release(&device->alloc_state);
 
+	/*
+	 * Reset the flush error record. We might have a transient flush error
+	 * in this mount, and if so we aborted the current transaction and set
+	 * the fs to an error state, guaranteeing no super blocks can be further
+	 * committed. However that error might be transient and if we unmount the
+	 * filesystem and mount it again, we should allow the mount to succeed
+	 * (btrfs_check_rw_degradable() should not fail) - if after mounting the
+	 * filesystem again we still get flush errors, then we will again abort
+	 * any transaction and set the error state, guaranteeing no commits of
+	 * unsafe super blocks.
+	 */
+	device->last_flush_error = 0;
+
 	/* Verify the device is back in a pristine state  */
 	ASSERT(!test_bit(BTRFS_DEV_STATE_FLUSH_SENT, &device->dev_state));
 	ASSERT(!test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state));
-- 
2.33.0




  parent reply	other threads:[~2021-10-08 11:35 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-08 11:27 [PATCH 5.10 00/29] 5.10.72-rc1 review Greg Kroah-Hartman
2021-10-08 11:27 ` [PATCH 5.10 01/29] spi: rockchip: handle zero length transfers without timing out Greg Kroah-Hartman
2021-10-08 11:27 ` [PATCH 5.10 02/29] platform/x86: touchscreen_dmi: Add info for the Chuwi HiBook (CWI514) tablet Greg Kroah-Hartman
2021-10-08 11:27 ` [PATCH 5.10 03/29] platform/x86: touchscreen_dmi: Update info for the Chuwi Hi10 Plus (CWI527) tablet Greg Kroah-Hartman
2021-10-08 11:27 ` [PATCH 5.10 04/29] nfsd: back channel stuck in SEQ4_STATUS_CB_PATH_DOWN Greg Kroah-Hartman
2021-10-08 11:27 ` [PATCH 5.10 05/29] btrfs: replace BUG_ON() in btrfs_csum_one_bio() with proper error handling Greg Kroah-Hartman
2021-10-08 11:27 ` Greg Kroah-Hartman [this message]
2021-10-08 11:27 ` [PATCH 5.10 07/29] net: mdio: introduce a shutdown method to mdio device drivers Greg Kroah-Hartman
2021-10-08 11:27 ` [PATCH 5.10 08/29] xen-netback: correct success/error reporting for the SKB-with-fraglist case Greg Kroah-Hartman
2021-10-08 11:27 ` [PATCH 5.10 09/29] sparc64: fix pci_iounmap() when CONFIG_PCI is not set Greg Kroah-Hartman
2021-10-08 11:27 ` [PATCH 5.10 10/29] ext2: fix sleeping in atomic bugs on error Greg Kroah-Hartman
2021-10-08 11:27 ` [PATCH 5.10 11/29] scsi: sd: Free scsi_disk device via put_device() Greg Kroah-Hartman
2021-10-08 11:27 ` [PATCH 5.10 12/29] usb: testusb: Fix for showing the connection speed Greg Kroah-Hartman
2021-10-08 11:28 ` [PATCH 5.10 13/29] usb: dwc2: check return value after calling platform_get_resource() Greg Kroah-Hartman
2021-10-08 11:28 ` [PATCH 5.10 14/29] habanalabs/gaudi: fix LBW RR configuration Greg Kroah-Hartman
2021-10-08 11:28 ` [PATCH 5.10 15/29] selftests: be sure to make khdr before other targets Greg Kroah-Hartman
2021-10-08 11:28 ` [PATCH 5.10 16/29] selftests:kvm: fix get_warnings_count() ignoring fscanf() return warn Greg Kroah-Hartman
2021-10-08 11:28 ` [PATCH 5.10 17/29] nvme-fc: update hardware queues before using them Greg Kroah-Hartman
2021-10-08 11:28 ` [PATCH 5.10 18/29] nvme-fc: avoid race between time out and tear down Greg Kroah-Hartman
2021-10-08 11:28 ` [PATCH 5.10 19/29] thermal/drivers/tsens: Fix wrong check for tzd in irq handlers Greg Kroah-Hartman
2021-10-08 11:28 ` [PATCH 5.10 20/29] scsi: ses: Retry failed Send/Receive Diagnostic commands Greg Kroah-Hartman
2021-10-08 11:28 ` [PATCH 5.10 21/29] irqchip/gic: Work around broken Renesas integration Greg Kroah-Hartman
2021-10-08 11:28 ` [PATCH 5.10 22/29] smb3: correct smb3 ACL security descriptor Greg Kroah-Hartman
2021-10-08 11:28 ` [PATCH 5.10 23/29] tools/vm/page-types: remove dependency on opt_file for idle page tracking Greg Kroah-Hartman
2021-10-08 11:28 ` [PATCH 5.10 24/29] selftests: KVM: Align SMCCC call with the spec in steal_time Greg Kroah-Hartman
2021-10-08 11:28 ` [PATCH 5.10 25/29] KVM: do not shrink halt_poll_ns below grow_start Greg Kroah-Hartman
2021-10-08 11:28 ` [PATCH 5.10 26/29] kvm: x86: Add AMD PMU MSRs to msrs_to_save_all[] Greg Kroah-Hartman
2021-10-08 11:28 ` [PATCH 5.10 27/29] KVM: x86: nSVM: restore int_vector in svm_clear_vintr Greg Kroah-Hartman
2021-10-08 11:28 ` [PATCH 5.10 28/29] perf/x86: Reset destroy callback on event init failure Greg Kroah-Hartman
2021-10-08 11:28 ` [PATCH 5.10 29/29] libata: Add ATA_HORKAGE_NO_NCQ_ON_ATI for Samsung 860 and 870 SSD Greg Kroah-Hartman
2021-10-08 14:28 ` [PATCH 5.10 00/29] 5.10.72-rc1 review Fox Chen
2021-10-08 19:40 ` Florian Fainelli
2021-10-08 20:46 ` Shuah Khan
2021-10-08 20:47 ` Pavel Machek
2021-10-08 21:04 ` Guenter Roeck
2021-10-09  4:24 ` Naresh Kamboju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211008112717.141324446@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dsterba@suse.com \
    --cc=fdmanana@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).