linux-bcachefs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Metadata rereplication not triggering
@ 2021-10-12  9:07 Chris Webb
  2021-10-12  9:11 ` [PATCH] [ktest] Test simple drive replacement on a replicated fs Chris Webb
  2021-10-12 18:19 ` Metadata rereplication not triggering Kent Overstreet
  0 siblings, 2 replies; 6+ messages in thread
From: Chris Webb @ 2021-10-12  9:07 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs

[Spotted while double-checking c85e27c45512 fixed the rereplicate BUG_ON at
fs/bcachefs/btree_iter.c:2475 the other day.]

If I create a filesystem with --replicas=2, fail a component drive and
replace with a new one, then use bcachefs data rereplicate, metadata
doesn't seem to get copied to the new drive.

bcachefs fs usage suggests that all the data contents of the filesystem do
get correctly rereplicated; it's just the metadata that's missing.

Here's a minimal ktest to reproduce - I'll follow up with a patch to
tests/bcachefs/replication.ktest:

  test_replace_replica()
  {
      bcachefs format --errors=panic --replicas=2 /dev/sd[bc]
      mount -t bcachefs -o degraded /dev/sdb /mnt

      bcachefs device add -f /mnt /dev/sdd
      bcachefs data rereplicate /mnt
      umount /mnt

      mount -t bcachefs -o degraded /dev/sdd /mnt
      umount /mnt
  }

The second mount (of the new replacement drive alone) fails:

  00016 bcachefs (dev-1): btree write error: device removed
  00016 bcachefs (a5f018b0-5a18-40b9-9539-b8d2325a4dc4): insufficient devices online (0) for replicas entry btree: 1/1 [0]
  00016 bcachefs: bch2_fs_open() bch_fs_open err opening /dev/sdd: insufficient devices
  00016 mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdd, missing codepage or helper program, or other error.

Mounting /dev/sdb solo at this point will work fine, as that does contain the
metadata + data, as does /dev/sdc, albeit for an older version of the fs.

The specific procedure here don't seem to matter: you can mount the
filesystem non-degraded, force remove a drive then add a new one, etc. I've
not found a scenario where I'm able to successfully trigger metadata
rereplication.

/sys/fs/bcachefs/*/options/data_replicas and metadata_replicas are both
still 2 at the point the new device is added and following data replication.
(Setting them explicitly 'just in case' doesn't change the behaviour.)

I'm still learning my way around fs/bcachefs, but there seems to be code to
replicate metadata already in tree - looks like it should be automatic - so I
wonder if it's not getting triggered when the replacement is added any more?

Best wishes,

Chris.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] [ktest] Test simple drive replacement on a replicated fs
  2021-10-12  9:07 Metadata rereplication not triggering Chris Webb
@ 2021-10-12  9:11 ` Chris Webb
  2021-10-12 18:19 ` Metadata rereplication not triggering Kent Overstreet
  1 sibling, 0 replies; 6+ messages in thread
From: Chris Webb @ 2021-10-12  9:11 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs

This currently fails because metadata doesn't get copied to the
replacement device, although the data content of the filesystem
does get replicated during the data rereplicate step.

Signed-off-by: Chris Webb <chris@arachsys.com>
---
 tests/bcachefs/replication.ktest | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/tests/bcachefs/replication.ktest b/tests/bcachefs/replication.ktest
index 7057fa7..569b984 100644
--- a/tests/bcachefs/replication.ktest
+++ b/tests/bcachefs/replication.ktest
@@ -629,3 +629,17 @@ test_replicas_gc()
 
     bcachefs fsck /dev/sd[bcdef]
 }
+
+# Fails
+test_replace_replica()
+{
+    bcachefs format --errors=panic --replicas=2 /dev/sd[bc]
+    mount -t bcachefs -o degraded /dev/sdb /mnt
+
+    bcachefs device add -f /mnt /dev/sdd
+    bcachefs data rereplicate /mnt
+    umount /mnt
+
+    mount -t bcachefs -o degraded /dev/sdd /mnt
+    umount /mnt
+}

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Metadata rereplication not triggering
  2021-10-12  9:07 Metadata rereplication not triggering Chris Webb
  2021-10-12  9:11 ` [PATCH] [ktest] Test simple drive replacement on a replicated fs Chris Webb
@ 2021-10-12 18:19 ` Kent Overstreet
  2021-10-13 16:52   ` Chris Webb
  1 sibling, 1 reply; 6+ messages in thread
From: Kent Overstreet @ 2021-10-12 18:19 UTC (permalink / raw)
  To: Chris Webb; +Cc: linux-bcachefs

On Tue, Oct 12, 2021 at 10:07:46AM +0100, Chris Webb wrote:
> [Spotted while double-checking c85e27c45512 fixed the rereplicate BUG_ON at
> fs/bcachefs/btree_iter.c:2475 the other day.]
> 
> If I create a filesystem with --replicas=2, fail a component drive and
> replace with a new one, then use bcachefs data rereplicate, metadata
> doesn't seem to get copied to the new drive.

It turns out rereplicate_pred() wasn't checking the key types correctly, and it
wasn't rereplicating any of the newer key types - I just pushed a fix. Thanks
for the report and the test!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Metadata rereplication not triggering
  2021-10-12 18:19 ` Metadata rereplication not triggering Kent Overstreet
@ 2021-10-13 16:52   ` Chris Webb
  2021-10-13 17:29     ` Kent Overstreet
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Webb @ 2021-10-13 16:52 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs

Kent Overstreet <kent.overstreet@gmail.com> writes:

> On Tue, Oct 12, 2021 at 10:07:46AM +0100, Chris Webb wrote:
> > 
> > If I create a filesystem with --replicas=2, fail a component drive and
> > replace with a new one, then use bcachefs data rereplicate, metadata
> > doesn't seem to get copied to the new drive.
> 
> It turns out rereplicate_pred() wasn't checking the key types correctly, and it
> wasn't rereplicating any of the newer key types - I just pushed a fix. Thanks
> for the report and the test!

Hi Kent. Ah, that makes sense, thanks!

I pulled the latest HEADs of bcachefs-tools and bcachefs, including this
patch, but when rerunning the ktest, it still fails.

  00016 bcachefs (dev-1): btree write error: device removed
  00016 bcachefs (f5d59006-9408-4179-859c-16e94b1d9b7a): insufficient devices
  online (0) for replicas entry btree: 1/2 [0 1]
  00016 bcachefs: bch2_fs_open() bch_fs_open err opening /dev/sdd:
  insufficient devices
  00016 mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdd,
  missing codepage or helper program, or other error.
  00016 
  00016 ========= FAILED replace_replica in 2s

I added a bcachefs fs usage to the ktest just before the first umount to
provide a bit of additional debugging info: looks like sb and journal are
still zero on the newly added device even following the data rereplicate
operation.

Is there something extra I should be adding to the test to ensure the
superblock also gets mirrored (e.g. a pause if it's kicked off
asynchronously), or is there still something not quite right here? (Does the
test now pass for you? I'm guessing this should be reasonably deterministic
and host independent?)

Best wishes,

Chris.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Metadata rereplication not triggering
  2021-10-13 16:52   ` Chris Webb
@ 2021-10-13 17:29     ` Kent Overstreet
  2021-10-13 19:39       ` Chris Webb
  0 siblings, 1 reply; 6+ messages in thread
From: Kent Overstreet @ 2021-10-13 17:29 UTC (permalink / raw)
  To: Chris Webb; +Cc: linux-bcachefs

On Wed, Oct 13, 2021 at 05:52:40PM +0100, Chris Webb wrote:
> Kent Overstreet <kent.overstreet@gmail.com> writes:
> 
> > On Tue, Oct 12, 2021 at 10:07:46AM +0100, Chris Webb wrote:
> > > 
> > > If I create a filesystem with --replicas=2, fail a component drive and
> > > replace with a new one, then use bcachefs data rereplicate, metadata
> > > doesn't seem to get copied to the new drive.
> > 
> > It turns out rereplicate_pred() wasn't checking the key types correctly, and it
> > wasn't rereplicating any of the newer key types - I just pushed a fix. Thanks
> > for the report and the test!
> 
> Hi Kent. Ah, that makes sense, thanks!
> 
> I pulled the latest HEADs of bcachefs-tools and bcachefs, including this
> patch, but when rerunning the ktest, it still fails.
> 
>   00016 bcachefs (dev-1): btree write error: device removed
>   00016 bcachefs (f5d59006-9408-4179-859c-16e94b1d9b7a): insufficient devices
>   online (0) for replicas entry btree: 1/2 [0 1]
>   00016 bcachefs: bch2_fs_open() bch_fs_open err opening /dev/sdd:
>   insufficient devices
>   00016 mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdd,
>   missing codepage or helper program, or other error.
>   00016 
>   00016 ========= FAILED replace_replica in 2s
> 
> I added a bcachefs fs usage to the ktest just before the first umount to
> provide a bit of additional debugging info: looks like sb and journal are
> still zero on the newly added device even following the data rereplicate
> operation.

That's a separate (known) bug - adding a new device doesn't account for
sb/journal on the new device correctly. I should revisit that one...

> Is there something extra I should be adding to the test to ensure the
> superblock also gets mirrored (e.g. a pause if it's kicked off
> asynchronously), or is there still something not quite right here? (Does the
> test now pass for you? I'm guessing this should be reasonably deterministic
> and host independent?)

It looks like I must have mixed up some test run output or something - you're
right, it's still failing for me, but it does work if prior to rereplicate we
either remove /dev/sdc or set it to failed - that is, a device being (possibly
momentarily) offline isn't enough for rereplicate to consider a given extent.

I'll update the test - this is also something we should document.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Metadata rereplication not triggering
  2021-10-13 17:29     ` Kent Overstreet
@ 2021-10-13 19:39       ` Chris Webb
  0 siblings, 0 replies; 6+ messages in thread
From: Chris Webb @ 2021-10-13 19:39 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcachefs

Kent Overstreet <kent.overstreet@gmail.com> writes:

> It looks like I must have mixed up some test run output or something - you're
> right, it's still failing for me, but it does work if prior to rereplicate we
> either remove /dev/sdc or set it to failed - that is, a device being (possibly
> momentarily) offline isn't enough for rereplicate to consider a given extent.

Brilliant, that also explains some confusion on my part: I was convinced I'd
seen it work when I manually tested, then the automated test failed and
subsequent manual tests all failed too. I must have removed and added
devices when I first tested, but only used mount -o degraded later,
incorrectly assuming that was equivalent.

Sounds like logical behaviour to me now I know about it. You wouldn't want
to start making extra copies of data and assuming a block device had
permanently failed just because it was briefly inaccessible.

Testing with

-    mount -t bcachefs -o degraded /dev/sdb /mnt
+    mount -t bcachefs /dev/sdb:/dev/sdc /mnt
+    bcachefs device remove -f /dev/sdc

everything works fine, and I now know how to drive it correctly. :)

I see it works just as well using bcachefs device set-state failed instead
of bcachefs device remove, or adding the new drive before removing the old
one.

(Need to work on retraining my fingers not to keep typing "bcachefs device
remove /mnt /dev/sdc" to match "bcachefs device add /mnt /dev/sdc" though!)

Best wishes,

Chris.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-10-13 19:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-12  9:07 Metadata rereplication not triggering Chris Webb
2021-10-12  9:11 ` [PATCH] [ktest] Test simple drive replacement on a replicated fs Chris Webb
2021-10-12 18:19 ` Metadata rereplication not triggering Kent Overstreet
2021-10-13 16:52   ` Chris Webb
2021-10-13 17:29     ` Kent Overstreet
2021-10-13 19:39       ` Chris Webb

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).