linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* btrfs replace interrupted + corruptes fs
@ 2022-08-10 21:48 Samuel Greiner
  2022-08-11  2:31 ` Anand Jain
  0 siblings, 1 reply; 3+ messages in thread
From: Samuel Greiner @ 2022-08-10 21:48 UTC (permalink / raw)
  To: linux-btrfs

Dear folks,

I have the feeling of being in trouble.

I have a btrfs fs upon 4 HDs. 1 HD should be replaced.

1. I issued the btrfs replace command, but got the message, that the 
target HD is mounted (it was not, it did not appear in the mount output).

2. I did a system reboot in hope to do a successfull replace. The system 
did not start but said, that it could not mount the btrfs fs because of 
a missing device.

3. I booted GParted Live to investigate further.

3.1 A mount -o degraded,rescue=usebackuproot,ro failed.
In dmesg I get the following errors

flagging fs with big metadata feature
allowing degraded mounts
trying to use backup root at mount time
disk space caching is enabled
has skinny extents
bdev /dev/sda errs: wr 755, rd 0, flush 0, corrupt 0, gen 0
bdev /dev/sdd1 errs: wr 7601141, rd  3801840, flush 12, corrupt 3755, 
gen 245
replace devid present without n active replace item
failed to init dev_replace -117
open_ctree failed

4. btrfs check runs through without error

-> I guess even if i was prompted the message, that the target device of 
the btrfs replace was mounted the replace was started. Due to the reboot 
now there seems to be errors in the filesystem additional to an replace 
which i cannot stop, because i can't mount the filesystem.

Right now I have a btrfs check --check-data-csum running in hope to get 
the errors fixed.

But actionally I really don't know how to deal with that situation.

Do you have any recommondations?


Thank you very much!
Samuel


Additional info:

I'm on an recent debian bullseye. But I can't run uname -r because right 
now I'm on the GParted (1.4.0-5) Live-System.

btrfs fi show /dev/sdd1
Label: 'Data' uuid:
     Total devices 4 FS bytes used 6.59 TiB
     devid 1 size 3.65 TiB used 3.39 TiB path /dev/sdd1
     devid 2 size 2.73 TiB used 2.49 TiB path /dev/sdb1
     devid 3 size 5.46 TiB used 2.11 TiB path /dev/sdc1
     devid 1 size 5.46 TiB used 5.21 TiB path /dev/sdd1

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: btrfs replace interrupted + corruptes fs
  2022-08-10 21:48 btrfs replace interrupted + corruptes fs Samuel Greiner
@ 2022-08-11  2:31 ` Anand Jain
  2022-08-11  6:52   ` Samuel Greiner
  0 siblings, 1 reply; 3+ messages in thread
From: Anand Jain @ 2022-08-11  2:31 UTC (permalink / raw)
  To: Samuel Greiner, linux-btrfs

On 11/08/2022 05:48, Samuel Greiner wrote:
> Dear folks,
> 
> I have the feeling of being in trouble.
> 
> I have a btrfs fs upon 4 HDs. 1 HD should be replaced.
> 
> 1. I issued the btrfs replace command, but got the message, that the 
> target HD is mounted (it was not, it did not appear in the mount output).
> 
> 2. I did a system reboot in hope to do a successfull replace. The system 
> did not start but said, that it could not mount the btrfs fs because of 
> a missing device.
> 
> 3. I booted GParted Live to investigate further.
> 
> 3.1 A mount -o degraded,rescue=usebackuproot,ro failed.
> In dmesg I get the following errors
> 
> flagging fs with big metadata feature
> allowing degraded mounts
> trying to use backup root at mount time
> disk space caching is enabled
> has skinny extents
> bdev /dev/sda errs: wr 755, rd 0, flush 0, corrupt 0, gen 0
> bdev /dev/sdd1 errs: wr 7601141, rd  3801840, flush 12, corrupt 3755, 
> gen 245

> replace devid present without n active replace item

It appears that replace-device already got the superblock but failed to
update metadata which is good. Could you try physically removing the
replace-target device and reboot and mount -o degraded.

And most importantly, before reusing this replace-target device again,
please run a wipefs -a. If there is a matching fsid and devid=0, it gets
scanned into the kernel, which makes it appear to have mounted.

HTH

Thanks, Anand


> failed to init dev_replace -117
> open_ctree failed
> 
> 4. btrfs check runs through without error
> 
> -> I guess even if i was prompted the message, that the target device of 
> the btrfs replace was mounted the replace was started. Due to the reboot 
> now there seems to be errors in the filesystem additional to an replace 
> which i cannot stop, because i can't mount the filesystem.
> 
> Right now I have a btrfs check --check-data-csum running in hope to get 
> the errors fixed.
> 
> But actionally I really don't know how to deal with that situation.
> 
> Do you have any recommondations?
> 
> 
> Thank you very much!
> Samuel
> 
> 
> Additional info:
> 
> I'm on an recent debian bullseye. But I can't run uname -r because right 
> now I'm on the GParted (1.4.0-5) Live-System.
> 
> btrfs fi show /dev/sdd1
> Label: 'Data' uuid:
>      Total devices 4 FS bytes used 6.59 TiB
>      devid 1 size 3.65 TiB used 3.39 TiB path /dev/sdd1
>      devid 2 size 2.73 TiB used 2.49 TiB path /dev/sdb1
>      devid 3 size 5.46 TiB used 2.11 TiB path /dev/sdc1
>      devid 1 size 5.46 TiB used 5.21 TiB path /dev/sdd1


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: btrfs replace interrupted + corruptes fs
  2022-08-11  2:31 ` Anand Jain
@ 2022-08-11  6:52   ` Samuel Greiner
  0 siblings, 0 replies; 3+ messages in thread
From: Samuel Greiner @ 2022-08-11  6:52 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs

Am 11.08.22 um 04:31 schrieb Anand Jain:
> On 11/08/2022 05:48, Samuel Greiner wrote:
>> Dear folks,
>>
>> I have the feeling of being in trouble.
>>
>> I have a btrfs fs upon 4 HDs. 1 HD should be replaced.
>>
>> 1. I issued the btrfs replace command, but got the message, that the 
>> target HD is mounted (it was not, it did not appear in the mount output).
>>
>> 2. I did a system reboot in hope to do a successfull replace. The 
>> system did not start but said, that it could not mount the btrfs fs 
>> because of a missing device.
>>
>> 3. I booted GParted Live to investigate further.
>>
>> 3.1 A mount -o degraded,rescue=usebackuproot,ro failed.
>> In dmesg I get the following errors
>>
>> flagging fs with big metadata feature
>> allowing degraded mounts
>> trying to use backup root at mount time
>> disk space caching is enabled
>> has skinny extents
>> bdev /dev/sda errs: wr 755, rd 0, flush 0, corrupt 0, gen 0
>> bdev /dev/sdd1 errs: wr 7601141, rd  3801840, flush 12, corrupt 3755, 
>> gen 245
> 
>> replace devid present without n active replace item
> 
> It appears that replace-device already got the superblock but failed to
> update metadata which is good. Could you try physically removing the
> replace-target device and reboot and mount -o degraded.
> 
> And most importantly, before reusing this replace-target device again,
> please run a wipefs -a. If there is a matching fsid and devid=0, it gets
> scanned into the kernel, which makes it appear to have mounted.
> 
> HTH
> 
> Thanks, Anand
> 

Hi all and hi Anand,

thank you very much for your advice. I did as you described:

1. wipefs -a on the target device if the replace.
2. Reboot without the target device plugged in (into the live System)
3. mount -o degraded /dev/sdx /mnt/
4. btrfs replace cancel /mnt/
     btrfs said that there is no replace runnning
5. unmount and check if the file system is mountable without the 
degraded option: mount /dev/sdx/ /mnt/
     works
6. boot in the production system
     works
7. start btrfs scrub
     up and running the next 10 hours

Right now everything seems to be allright - thank you very much!

To replace the device I will power off the machine connect the target 
device and boot up, start the btrfs replace and hope that it will run 
through seamlessly.

Thanks again and best regards
Samuel


> 
>> failed to init dev_replace -117
>> open_ctree failed
>>
>> 4. btrfs check runs through without error
>>
>> -> I guess even if i was prompted the message, that the target device 
>> of the btrfs replace was mounted the replace was started. Due to the 
>> reboot now there seems to be errors in the filesystem additional to an 
>> replace which i cannot stop, because i can't mount the filesystem.
>>
>> Right now I have a btrfs check --check-data-csum running in hope to 
>> get the errors fixed.
>>
>> But actionally I really don't know how to deal with that situation.
>>
>> Do you have any recommondations?
>>
>>
>> Thank you very much!
>> Samuel
>>
>>
>> Additional info:
>>
>> I'm on an recent debian bullseye. But I can't run uname -r because 
>> right now I'm on the GParted (1.4.0-5) Live-System.
>>
>> btrfs fi show /dev/sdd1
>> Label: 'Data' uuid:
>>      Total devices 4 FS bytes used 6.59 TiB
>>      devid 1 size 3.65 TiB used 3.39 TiB path /dev/sdd1
>>      devid 2 size 2.73 TiB used 2.49 TiB path /dev/sdb1
>>      devid 3 size 5.46 TiB used 2.11 TiB path /dev/sdc1
>>      devid 1 size 5.46 TiB used 5.21 TiB path /dev/sdd1
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-08-11  6:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-10 21:48 btrfs replace interrupted + corruptes fs Samuel Greiner
2022-08-11  2:31 ` Anand Jain
2022-08-11  6:52   ` Samuel Greiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).