* Device Delete Stuck
@ 2020-03-29 14:13 Jason Clara
2020-03-29 16:40 ` Steven Fosdick
2020-03-29 18:55 ` Zygo Blaxell
0 siblings, 2 replies; 5+ messages in thread
From: Jason Clara @ 2020-03-29 14:13 UTC (permalink / raw)
To: linux-btrfs
I had a previous post about when trying to do a device delete that it would cause my whole system to hang. I seem to have got past that issue.
For that, it seems like even though all the SCRUBs finished without any errors I still had a problem with some files. By forcing a read of every single file I was able to detect the bad files in DMESG. Not sure though why SCRUB didn’t detect this.
BTRFS warning (device sdd1): csum failed root 5 ino 14654354 off 163852288 csum 0
But now when I attempt to delete a device from the array it seems to get stuck. Normally it will show in the log that it has found some extents and then another message saying they were relocated.
But for the last few days it has just been repeating the same found value and never relocating anything, and the usage of the device doesn’t change at all.
This line has now been repeating for more then 24 hours, and the previous attempt was similar.
[Sun Mar 29 09:59:50 2020] BTRFS info (device sdd1): found 133 extents
Prior to this run I had tried with an earlier kernel (5.5.10) and had the same results. It starts with finding and then relocating, but then relocating. So I upgraded my kernel to see if that would help, and it has not.
System Info
Ubuntu 18.04
btrfs-progs v5.4.1
Linux FileServer 5.5.13-050513-generic #202003251631 SMP Wed Mar 25 16:35:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
DEVICE USAGE
/dev/sdd1, ID: 1
Device size: 2.73TiB
Device slack: 0.00B
Data,RAID6: 188.67GiB
Data,RAID6: 1.68TiB
Data,RAID6: 888.43GiB
Unallocated: 1.00MiB
/dev/sdb1, ID: 2
Device size: 2.73TiB
Device slack: 2.73TiB
Data,RAID6: 188.67GiB
Data,RAID6: 508.82GiB
Data,RAID6: 2.00GiB
Unallocated: -699.50GiB
/dev/sdc1, ID: 3
Device size: 2.73TiB
Device slack: 0.00B
Data,RAID6: 188.67GiB
Data,RAID6: 1.68TiB
Data,RAID6: 888.43GiB
Unallocated: 1.00MiB
/dev/sdi1, ID: 5
Device size: 2.73TiB
Device slack: 1.36TiB
Data,RAID6: 188.67GiB
Data,RAID6: 1.18TiB
Unallocated: 1.00MiB
/dev/sdh1, ID: 6
Device size: 4.55TiB
Device slack: 0.00B
Data,RAID6: 188.67GiB
Data,RAID6: 1.68TiB
Data,RAID6: 1.23TiB
Data,RAID6: 888.43GiB
Data,RAID6: 2.00GiB
Metadata,RAID1: 2.00GiB
Unallocated: 601.01GiB
/dev/sda1, ID: 7
Device size: 7.28TiB
Device slack: 0.00B
Data,RAID6: 188.67GiB
Data,RAID6: 1.68TiB
Data,RAID6: 1.23TiB
Data,RAID6: 888.43GiB
Data,RAID6: 2.00GiB
Metadata,RAID1: 2.00GiB
System,RAID1: 32.00MiB
Unallocated: 3.32TiB
/dev/sdf1, ID: 8
Device size: 7.28TiB
Device slack: 0.00B
Data,RAID6: 188.67GiB
Data,RAID6: 1.68TiB
Data,RAID6: 1.23TiB
Data,RAID6: 888.43GiB
Data,RAID6: 2.00GiB
Metadata,RAID1: 8.00GiB
Unallocated: 3.31TiB
/dev/sdj1, ID: 9
Device size: 7.28TiB
Device slack: 0.00B
Data,RAID6: 188.67GiB
Data,RAID6: 1.68TiB
Data,RAID6: 1.23TiB
Data,RAID6: 888.43GiB
Data,RAID6: 2.00GiB
Metadata,RAID1: 8.00GiB
System,RAID1: 32.00MiB
Unallocated: 3.31TiB
FI USAGE
WARNING: RAID56 detected, not implemented
Overall:
Device size: 33.20TiB
Device allocated: 20.06GiB
Device unallocated: 33.18TiB
Device missing: 0.00B
Used: 19.38GiB
Free (estimated): 0.00B (min: 8.00EiB)
Data ratio: 0.00
Metadata ratio: 2.00
Global reserve: 512.00MiB (used: 0.00B)
Data,RAID6: Size:15.42TiB, Used:15.18TiB (98.44%)
/dev/sdd1 2.73TiB
/dev/sdb1 699.50GiB
/dev/sdc1 2.73TiB
/dev/sdi1 1.36TiB
/dev/sdh1 3.96TiB
/dev/sda1 3.96TiB
/dev/sdf1 3.96TiB
/dev/sdj1 3.96TiB
Metadata,RAID1: Size:10.00GiB, Used:9.69GiB (96.90%)
/dev/sdh1 2.00GiB
/dev/sda1 2.00GiB
/dev/sdf1 8.00GiB
/dev/sdj1 8.00GiB
System,RAID1: Size:32.00MiB, Used:1.19MiB (3.71%)
/dev/sda1 32.00MiB
/dev/sdj1 32.00MiB
Unallocated:
/dev/sdd1 1.00MiB
/dev/sdb1 -699.50GiB
/dev/sdc1 1.00MiB
/dev/sdi1 1.00MiB
/dev/sdh1 601.01GiB
/dev/sda1 3.32TiB
/dev/sdf1 3.31TiB
/dev/sdj1 3.31TiB
FI SHOW
Label: 'Pool1' uuid: 99935e27-4922-4efa-bf76-5787536dd71f
Total devices 8 FS bytes used 15.19TiB
devid 1 size 2.73TiB used 2.73TiB path /dev/sdd1
devid 2 size 0.00B used 699.50GiB path /dev/sdb1
devid 3 size 2.73TiB used 2.73TiB path /dev/sdc1
devid 5 size 1.36TiB used 1.36TiB path /dev/sdi1
devid 6 size 4.55TiB used 3.96TiB path /dev/sdh1
devid 7 size 7.28TiB used 3.96TiB path /dev/sda1
devid 8 size 7.28TiB used 3.97TiB path /dev/sdf1
devid 9 size 7.28TiB used 3.97TiB path /dev/sdj1
FI DF
Data, RAID6: total=15.42TiB, used=15.18TiB
System, RAID1: total=32.00MiB, used=1.19MiB
Metadata, RAID1: total=10.00GiB, used=9.69GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Device Delete Stuck
2020-03-29 14:13 Device Delete Stuck Jason Clara
@ 2020-03-29 16:40 ` Steven Fosdick
2020-03-29 18:18 ` Jason Clara
2020-03-29 18:55 ` Zygo Blaxell
1 sibling, 1 reply; 5+ messages in thread
From: Steven Fosdick @ 2020-03-29 16:40 UTC (permalink / raw)
To: Jason Clara; +Cc: Btrfs BTRFS
Jason,
I am not a btrfs developer but I had he same problem as you. In my
case the problem went away when I used the mount option to clear the
free space cache. From my own experience, whatever is going wrong
that causes the checksum error also corrupts this cache but that does
no long term harm as, once it is cleared on mount, it gets rebuilt.
Steve.
On Sun, 29 Mar 2020 at 15:15, Jason Clara <jason@clarafamily.com> wrote:
>
> I had a previous post about when trying to do a device delete that it would cause my whole system to hang. I seem to have got past that issue.
>
> For that, it seems like even though all the SCRUBs finished without any errors I still had a problem with some files. By forcing a read of every single file I was able to detect the bad files in DMESG. Not sure though why SCRUB didn’t detect this.
> BTRFS warning (device sdd1): csum failed root 5 ino 14654354 off 163852288 csum 0
>
>
> But now when I attempt to delete a device from the array it seems to get stuck. Normally it will show in the log that it has found some extents and then another message saying they were relocated.
>
> But for the last few days it has just been repeating the same found value and never relocating anything, and the usage of the device doesn’t change at all.
>
> This line has now been repeating for more then 24 hours, and the previous attempt was similar.
> [Sun Mar 29 09:59:50 2020] BTRFS info (device sdd1): found 133 extents
>
> Prior to this run I had tried with an earlier kernel (5.5.10) and had the same results. It starts with finding and then relocating, but then relocating. So I upgraded my kernel to see if that would help, and it has not.
>
> System Info
> Ubuntu 18.04
> btrfs-progs v5.4.1
> Linux FileServer 5.5.13-050513-generic #202003251631 SMP Wed Mar 25 16:35:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>
> DEVICE USAGE
> /dev/sdd1, ID: 1
> Device size: 2.73TiB
> Device slack: 0.00B
> Data,RAID6: 188.67GiB
> Data,RAID6: 1.68TiB
> Data,RAID6: 888.43GiB
> Unallocated: 1.00MiB
>
> /dev/sdb1, ID: 2
> Device size: 2.73TiB
> Device slack: 2.73TiB
> Data,RAID6: 188.67GiB
> Data,RAID6: 508.82GiB
> Data,RAID6: 2.00GiB
> Unallocated: -699.50GiB
>
> /dev/sdc1, ID: 3
> Device size: 2.73TiB
> Device slack: 0.00B
> Data,RAID6: 188.67GiB
> Data,RAID6: 1.68TiB
> Data,RAID6: 888.43GiB
> Unallocated: 1.00MiB
>
> /dev/sdi1, ID: 5
> Device size: 2.73TiB
> Device slack: 1.36TiB
> Data,RAID6: 188.67GiB
> Data,RAID6: 1.18TiB
> Unallocated: 1.00MiB
>
> /dev/sdh1, ID: 6
> Device size: 4.55TiB
> Device slack: 0.00B
> Data,RAID6: 188.67GiB
> Data,RAID6: 1.68TiB
> Data,RAID6: 1.23TiB
> Data,RAID6: 888.43GiB
> Data,RAID6: 2.00GiB
> Metadata,RAID1: 2.00GiB
> Unallocated: 601.01GiB
>
> /dev/sda1, ID: 7
> Device size: 7.28TiB
> Device slack: 0.00B
> Data,RAID6: 188.67GiB
> Data,RAID6: 1.68TiB
> Data,RAID6: 1.23TiB
> Data,RAID6: 888.43GiB
> Data,RAID6: 2.00GiB
> Metadata,RAID1: 2.00GiB
> System,RAID1: 32.00MiB
> Unallocated: 3.32TiB
>
> /dev/sdf1, ID: 8
> Device size: 7.28TiB
> Device slack: 0.00B
> Data,RAID6: 188.67GiB
> Data,RAID6: 1.68TiB
> Data,RAID6: 1.23TiB
> Data,RAID6: 888.43GiB
> Data,RAID6: 2.00GiB
> Metadata,RAID1: 8.00GiB
> Unallocated: 3.31TiB
>
> /dev/sdj1, ID: 9
> Device size: 7.28TiB
> Device slack: 0.00B
> Data,RAID6: 188.67GiB
> Data,RAID6: 1.68TiB
> Data,RAID6: 1.23TiB
> Data,RAID6: 888.43GiB
> Data,RAID6: 2.00GiB
> Metadata,RAID1: 8.00GiB
> System,RAID1: 32.00MiB
> Unallocated: 3.31TiB
>
>
> FI USAGE
> WARNING: RAID56 detected, not implemented
> Overall:
> Device size: 33.20TiB
> Device allocated: 20.06GiB
> Device unallocated: 33.18TiB
> Device missing: 0.00B
> Used: 19.38GiB
> Free (estimated): 0.00B (min: 8.00EiB)
> Data ratio: 0.00
> Metadata ratio: 2.00
> Global reserve: 512.00MiB (used: 0.00B)
>
> Data,RAID6: Size:15.42TiB, Used:15.18TiB (98.44%)
> /dev/sdd1 2.73TiB
> /dev/sdb1 699.50GiB
> /dev/sdc1 2.73TiB
> /dev/sdi1 1.36TiB
> /dev/sdh1 3.96TiB
> /dev/sda1 3.96TiB
> /dev/sdf1 3.96TiB
> /dev/sdj1 3.96TiB
>
> Metadata,RAID1: Size:10.00GiB, Used:9.69GiB (96.90%)
> /dev/sdh1 2.00GiB
> /dev/sda1 2.00GiB
> /dev/sdf1 8.00GiB
> /dev/sdj1 8.00GiB
>
> System,RAID1: Size:32.00MiB, Used:1.19MiB (3.71%)
> /dev/sda1 32.00MiB
> /dev/sdj1 32.00MiB
>
> Unallocated:
> /dev/sdd1 1.00MiB
> /dev/sdb1 -699.50GiB
> /dev/sdc1 1.00MiB
> /dev/sdi1 1.00MiB
> /dev/sdh1 601.01GiB
> /dev/sda1 3.32TiB
> /dev/sdf1 3.31TiB
> /dev/sdj1 3.31TiB
>
>
> FI SHOW
> Label: 'Pool1' uuid: 99935e27-4922-4efa-bf76-5787536dd71f
> Total devices 8 FS bytes used 15.19TiB
> devid 1 size 2.73TiB used 2.73TiB path /dev/sdd1
> devid 2 size 0.00B used 699.50GiB path /dev/sdb1
> devid 3 size 2.73TiB used 2.73TiB path /dev/sdc1
> devid 5 size 1.36TiB used 1.36TiB path /dev/sdi1
> devid 6 size 4.55TiB used 3.96TiB path /dev/sdh1
> devid 7 size 7.28TiB used 3.96TiB path /dev/sda1
> devid 8 size 7.28TiB used 3.97TiB path /dev/sdf1
> devid 9 size 7.28TiB used 3.97TiB path /dev/sdj1
>
> FI DF
> Data, RAID6: total=15.42TiB, used=15.18TiB
> System, RAID1: total=32.00MiB, used=1.19MiB
> Metadata, RAID1: total=10.00GiB, used=9.69GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Device Delete Stuck
2020-03-29 16:40 ` Steven Fosdick
@ 2020-03-29 18:18 ` Jason Clara
0 siblings, 0 replies; 5+ messages in thread
From: Jason Clara @ 2020-03-29 18:18 UTC (permalink / raw)
To: Steven Fosdick; +Cc: Btrfs BTRFS
Thanks for the suggestion. I added clear_cache to my fstab, rebooted and waited about 20-30 minutes to make sure everything had settled down.
I did see this in my log, so it appears to have worked "BTRFS info (device sdd1): force clearing of disk cache"
I attempted the delete again, and it did remove more data but looks like it is stuck again.
Here is the DMESG from when I started the delete. The last line "found 3 extents” has been repeating for the last 20 or so minutes
[Sun Mar 29 13:42:06 2020] BTRFS info (device sdd1): relocating block group 145441210499072 flags data|raid6
[Sun Mar 29 13:43:17 2020] BTRFS info (device sdd1): found 3010 extents
[Sun Mar 29 13:43:23 2020] BTRFS info (device sdd1): found 3010 extents
[Sun Mar 29 13:43:25 2020] BTRFS info (device sdd1): relocating block group 145437989273600 flags data|raid6
[Sun Mar 29 13:44:14 2020] BTRFS info (device sdd1): found 972 extents
[Sun Mar 29 13:44:21 2020] BTRFS info (device sdd1): found 950 extents
[Sun Mar 29 13:44:31 2020] BTRFS info (device sdd1): relocating block group 120428453429248 flags data|raid6
[Sun Mar 29 13:45:23 2020] BTRFS info (device sdd1): found 3884 extents
[Sun Mar 29 13:45:49 2020] BTRFS info (device sdd1): found 3883 extents
[Sun Mar 29 13:46:14 2020] BTRFS info (device sdd1): relocating block group 132181611511808 flags data|raid6
[Sun Mar 29 13:46:19 2020] BTRFS info (device sdd1): found 60 extents
[Sun Mar 29 13:46:21 2020] BTRFS info (device sdd1): found 60 extents
[Sun Mar 29 13:46:23 2020] BTRFS info (device sdd1): relocating block group 132153520160768 flags data|raid6
[Sun Mar 29 13:46:33 2020] BTRFS info (device sdd1): found 42 extents
[Sun Mar 29 13:46:35 2020] BTRFS info (device sdd1): found 42 extents
[Sun Mar 29 13:46:37 2020] BTRFS info (device sdd1): relocating block group 120433822138368 flags data|raid6
[Sun Mar 29 13:47:37 2020] BTRFS info (device sdd1): found 3831 extents
[Sun Mar 29 13:47:59 2020] BTRFS info (device sdd1): found 3831 extents
[Sun Mar 29 13:48:15 2020] BTRFS info (device sdd1): relocating block group 132175346270208 flags data|raid6
[Sun Mar 29 13:48:19 2020] BTRFS info (device sdd1): found 29 extents
[Sun Mar 29 13:48:21 2020] BTRFS info (device sdd1): found 29 extents
[Sun Mar 29 13:48:23 2020] BTRFS info (device sdd1): found 29 extents
[Sun Mar 29 13:48:25 2020] BTRFS info (device sdd1): relocating block group 120439190847488 flags data|raid6
[Sun Mar 29 13:49:12 2020] BTRFS info (device sdd1): relocating block group 132182843588608 flags data|raid6
[Sun Mar 29 13:49:16 2020] BTRFS info (device sdd1): found 3 extents
[Sun Mar 29 13:49:17 2020] BTRFS info (device sdd1): found 3 extents
[Sun Mar 29 13:49:18 2020] BTRFS info (device sdd1): found 3 extents
[Sun Mar 29 13:49:18 2020] BTRFS info (device sdd1): found 3 extents
[Sun Mar 29 13:49:19 2020] BTRFS info (device sdd1): found 3 extents
[Sun Mar 29 13:49:19 2020] BTRFS info (device sdd1): found 3 extents
[Sun Mar 29 13:49:20 2020] BTRFS info (device sdd1): found 3 extents
[Sun Mar 29 13:49:20 2020] BTRFS info (device sdd1): found 3 extents
Updated FI USAGE
WARNING: RAID56 detected, not implemented
Overall:
Device size: 33.20TiB
Device allocated: 20.06GiB
Device unallocated: 33.18TiB
Device missing: 0.00B
Used: 19.38GiB
Free (estimated): 0.00B (min: 8.00EiB)
Data ratio: 0.00
Metadata ratio: 2.00
Global reserve: 512.00MiB (used: 144.00KiB)
Data,RAID6: Size:15.42TiB, Used:15.18TiB (98.47%)
/dev/sdd1 2.73TiB
/dev/sdb1 695.21GiB
/dev/sdc1 2.73TiB
/dev/sdi1 1.36TiB
/dev/sdh1 3.96TiB
/dev/sda1 3.96TiB
/dev/sdf1 3.96TiB
/dev/sdj1 3.96TiB
Metadata,RAID1: Size:10.00GiB, Used:9.69GiB (96.89%)
/dev/sdh1 2.00GiB
/dev/sda1 2.00GiB
/dev/sdf1 8.00GiB
/dev/sdj1 8.00GiB
System,RAID1: Size:32.00MiB, Used:1.19MiB (3.71%)
/dev/sda1 32.00MiB
/dev/sdj1 32.00MiB
Unallocated:
/dev/sdd1 1.00MiB
/dev/sdb1 -695.21GiB
/dev/sdc1 1.00MiB
/dev/sdi1 1.00MiB
/dev/sdh1 601.01GiB
/dev/sda1 3.32TiB
/dev/sdf1 3.31TiB
/dev/sdj1 3.31TiB
> On Mar 29, 2020, at 12:40 PM, Steven Fosdick <stevenfosdick@gmail.com> wrote:
>
> Jason,
>
> I am not a btrfs developer but I had he same problem as you. In my
> case the problem went away when I used the mount option to clear the
> free space cache. From my own experience, whatever is going wrong
> that causes the checksum error also corrupts this cache but that does
> no long term harm as, once it is cleared on mount, it gets rebuilt.
>
> Steve.
>
> On Sun, 29 Mar 2020 at 15:15, Jason Clara <jason@clarafamily.com> wrote:
>>
>> I had a previous post about when trying to do a device delete that it would cause my whole system to hang. I seem to have got past that issue.
>>
>> For that, it seems like even though all the SCRUBs finished without any errors I still had a problem with some files. By forcing a read of every single file I was able to detect the bad files in DMESG. Not sure though why SCRUB didn’t detect this.
>> BTRFS warning (device sdd1): csum failed root 5 ino 14654354 off 163852288 csum 0
>>
>>
>> But now when I attempt to delete a device from the array it seems to get stuck. Normally it will show in the log that it has found some extents and then another message saying they were relocated.
>>
>> But for the last few days it has just been repeating the same found value and never relocating anything, and the usage of the device doesn’t change at all.
>>
>> This line has now been repeating for more then 24 hours, and the previous attempt was similar.
>> [Sun Mar 29 09:59:50 2020] BTRFS info (device sdd1): found 133 extents
>>
>> Prior to this run I had tried with an earlier kernel (5.5.10) and had the same results. It starts with finding and then relocating, but then relocating. So I upgraded my kernel to see if that would help, and it has not.
>>
>> System Info
>> Ubuntu 18.04
>> btrfs-progs v5.4.1
>> Linux FileServer 5.5.13-050513-generic #202003251631 SMP Wed Mar 25 16:35:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>>
>> DEVICE USAGE
>> /dev/sdd1, ID: 1
>> Device size: 2.73TiB
>> Device slack: 0.00B
>> Data,RAID6: 188.67GiB
>> Data,RAID6: 1.68TiB
>> Data,RAID6: 888.43GiB
>> Unallocated: 1.00MiB
>>
>> /dev/sdb1, ID: 2
>> Device size: 2.73TiB
>> Device slack: 2.73TiB
>> Data,RAID6: 188.67GiB
>> Data,RAID6: 508.82GiB
>> Data,RAID6: 2.00GiB
>> Unallocated: -699.50GiB
>>
>> /dev/sdc1, ID: 3
>> Device size: 2.73TiB
>> Device slack: 0.00B
>> Data,RAID6: 188.67GiB
>> Data,RAID6: 1.68TiB
>> Data,RAID6: 888.43GiB
>> Unallocated: 1.00MiB
>>
>> /dev/sdi1, ID: 5
>> Device size: 2.73TiB
>> Device slack: 1.36TiB
>> Data,RAID6: 188.67GiB
>> Data,RAID6: 1.18TiB
>> Unallocated: 1.00MiB
>>
>> /dev/sdh1, ID: 6
>> Device size: 4.55TiB
>> Device slack: 0.00B
>> Data,RAID6: 188.67GiB
>> Data,RAID6: 1.68TiB
>> Data,RAID6: 1.23TiB
>> Data,RAID6: 888.43GiB
>> Data,RAID6: 2.00GiB
>> Metadata,RAID1: 2.00GiB
>> Unallocated: 601.01GiB
>>
>> /dev/sda1, ID: 7
>> Device size: 7.28TiB
>> Device slack: 0.00B
>> Data,RAID6: 188.67GiB
>> Data,RAID6: 1.68TiB
>> Data,RAID6: 1.23TiB
>> Data,RAID6: 888.43GiB
>> Data,RAID6: 2.00GiB
>> Metadata,RAID1: 2.00GiB
>> System,RAID1: 32.00MiB
>> Unallocated: 3.32TiB
>>
>> /dev/sdf1, ID: 8
>> Device size: 7.28TiB
>> Device slack: 0.00B
>> Data,RAID6: 188.67GiB
>> Data,RAID6: 1.68TiB
>> Data,RAID6: 1.23TiB
>> Data,RAID6: 888.43GiB
>> Data,RAID6: 2.00GiB
>> Metadata,RAID1: 8.00GiB
>> Unallocated: 3.31TiB
>>
>> /dev/sdj1, ID: 9
>> Device size: 7.28TiB
>> Device slack: 0.00B
>> Data,RAID6: 188.67GiB
>> Data,RAID6: 1.68TiB
>> Data,RAID6: 1.23TiB
>> Data,RAID6: 888.43GiB
>> Data,RAID6: 2.00GiB
>> Metadata,RAID1: 8.00GiB
>> System,RAID1: 32.00MiB
>> Unallocated: 3.31TiB
>>
>>
>> FI USAGE
>> WARNING: RAID56 detected, not implemented
>> Overall:
>> Device size: 33.20TiB
>> Device allocated: 20.06GiB
>> Device unallocated: 33.18TiB
>> Device missing: 0.00B
>> Used: 19.38GiB
>> Free (estimated): 0.00B (min: 8.00EiB)
>> Data ratio: 0.00
>> Metadata ratio: 2.00
>> Global reserve: 512.00MiB (used: 0.00B)
>>
>> Data,RAID6: Size:15.42TiB, Used:15.18TiB (98.44%)
>> /dev/sdd1 2.73TiB
>> /dev/sdb1 699.50GiB
>> /dev/sdc1 2.73TiB
>> /dev/sdi1 1.36TiB
>> /dev/sdh1 3.96TiB
>> /dev/sda1 3.96TiB
>> /dev/sdf1 3.96TiB
>> /dev/sdj1 3.96TiB
>>
>> Metadata,RAID1: Size:10.00GiB, Used:9.69GiB (96.90%)
>> /dev/sdh1 2.00GiB
>> /dev/sda1 2.00GiB
>> /dev/sdf1 8.00GiB
>> /dev/sdj1 8.00GiB
>>
>> System,RAID1: Size:32.00MiB, Used:1.19MiB (3.71%)
>> /dev/sda1 32.00MiB
>> /dev/sdj1 32.00MiB
>>
>> Unallocated:
>> /dev/sdd1 1.00MiB
>> /dev/sdb1 -699.50GiB
>> /dev/sdc1 1.00MiB
>> /dev/sdi1 1.00MiB
>> /dev/sdh1 601.01GiB
>> /dev/sda1 3.32TiB
>> /dev/sdf1 3.31TiB
>> /dev/sdj1 3.31TiB
>>
>>
>> FI SHOW
>> Label: 'Pool1' uuid: 99935e27-4922-4efa-bf76-5787536dd71f
>> Total devices 8 FS bytes used 15.19TiB
>> devid 1 size 2.73TiB used 2.73TiB path /dev/sdd1
>> devid 2 size 0.00B used 699.50GiB path /dev/sdb1
>> devid 3 size 2.73TiB used 2.73TiB path /dev/sdc1
>> devid 5 size 1.36TiB used 1.36TiB path /dev/sdi1
>> devid 6 size 4.55TiB used 3.96TiB path /dev/sdh1
>> devid 7 size 7.28TiB used 3.96TiB path /dev/sda1
>> devid 8 size 7.28TiB used 3.97TiB path /dev/sdf1
>> devid 9 size 7.28TiB used 3.97TiB path /dev/sdj1
>>
>> FI DF
>> Data, RAID6: total=15.42TiB, used=15.18TiB
>> System, RAID1: total=32.00MiB, used=1.19MiB
>> Metadata, RAID1: total=10.00GiB, used=9.69GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Device Delete Stuck
2020-03-29 14:13 Device Delete Stuck Jason Clara
2020-03-29 16:40 ` Steven Fosdick
@ 2020-03-29 18:55 ` Zygo Blaxell
2020-03-29 19:24 ` Jason Clara
1 sibling, 1 reply; 5+ messages in thread
From: Zygo Blaxell @ 2020-03-29 18:55 UTC (permalink / raw)
To: Jason Clara; +Cc: linux-btrfs
On Sun, Mar 29, 2020 at 10:13:05AM -0400, Jason Clara wrote:
> I had a previous post about when trying to do a device delete that
> it would cause my whole system to hang. I seem to have got past
> that issue.
>
> For that, it seems like even though all the SCRUBs finished without
> any errors I still had a problem with some files. By forcing a read
> of every single file I was able to detect the bad files in DMESG.
> Not sure though why SCRUB didn’t detect this. BTRFS warning (device
> sdd1): csum failed root 5 ino 14654354 off 163852288 csum 0
That sounds like it could be the raid5/6 bug I reported
https://www.spinics.net/lists/linux-btrfs/msg94594.html
To trigger that bug you need pre-existing corruption on the disk.
You can work around by:
1. Read every file, e.g. 'find -type f -exec cat {} + >/dev/null'
This avoids dmesg ratelimiting which will hide some errors.
2. If there are read errors in step 1, remove any that have
failures.
3. Run full scrub to fix parity or inject new errors.
4. Repeat until there are no errors at step 1.
The bug will introduce new errors in a small fraction (<0.1%) of corrupted
raid stripes as you do this. Each pass through the loop will remove
existing errors, but may add a few more new errors at the same time.
The rate of removal is much faster than the rate of addition, so the
loop will eventually terminate at zero errors. You'll be able to use
the filesystem normally again after that.
This bug is not a regression--there has not been a kernel release with
working btrfs raid5/6 yet. All releases from 4.15 to 5.5.3 fail my test
case, and versions before 4.15 have worse bugs. At the moment, btrfs
raid5/6 should only be used by developers who intend to test, debug,
and fix btrfs raid5/6.
> But now when I attempt to delete a device from the array it seems to
> get stuck. Normally it will show in the log that it has found some
> extents and then another message saying they were relocated.
>
> But for the last few days it has just been repeating the same found
> value and never relocating anything, and the usage of the device
> doesn’t change at all.
>
> This line has now been repeating for more then 24 hours, and the
> previous attempt was similar. [Sun Mar 29 09:59:50 2020] BTRFS info
> (device sdd1): found 133 extents
Kernels starting with 5.1 have a known regression where block group
relocation gets stuck in loops. Everything in the block group gets
relocated except for shared data backref items, then the relocation can't
seem to move those and no further progress is made. This has not been
fixed yet.
> Prior to this run I had tried with an earlier kernel (5.5.10) and had
> the same results. It starts with finding and then relocating, but
> then relocating. So I upgraded my kernel to see if that would help,
> and it has not.
Use kernel 4.19 for device deletes or other big relocation operations.
(5.0 and 4.20 are OK too, but 4.19 is still maintained and has fixes
for non-btrfs issues).
> System Info
> Ubuntu 18.04
> btrfs-progs v5.4.1
> Linux FileServer 5.5.13-050513-generic #202003251631 SMP Wed Mar 25 16:35:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>
> DEVICE USAGE
> /dev/sdd1, ID: 1
> Device size: 2.73TiB
> Device slack: 0.00B
> Data,RAID6: 188.67GiB
> Data,RAID6: 1.68TiB
> Data,RAID6: 888.43GiB
> Unallocated: 1.00MiB
>
> /dev/sdb1, ID: 2
> Device size: 2.73TiB
> Device slack: 2.73TiB
> Data,RAID6: 188.67GiB
> Data,RAID6: 508.82GiB
> Data,RAID6: 2.00GiB
> Unallocated: -699.50GiB
>
> /dev/sdc1, ID: 3
> Device size: 2.73TiB
> Device slack: 0.00B
> Data,RAID6: 188.67GiB
> Data,RAID6: 1.68TiB
> Data,RAID6: 888.43GiB
> Unallocated: 1.00MiB
>
> /dev/sdi1, ID: 5
> Device size: 2.73TiB
> Device slack: 1.36TiB
> Data,RAID6: 188.67GiB
> Data,RAID6: 1.18TiB
> Unallocated: 1.00MiB
>
> /dev/sdh1, ID: 6
> Device size: 4.55TiB
> Device slack: 0.00B
> Data,RAID6: 188.67GiB
> Data,RAID6: 1.68TiB
> Data,RAID6: 1.23TiB
> Data,RAID6: 888.43GiB
> Data,RAID6: 2.00GiB
> Metadata,RAID1: 2.00GiB
> Unallocated: 601.01GiB
>
> /dev/sda1, ID: 7
> Device size: 7.28TiB
> Device slack: 0.00B
> Data,RAID6: 188.67GiB
> Data,RAID6: 1.68TiB
> Data,RAID6: 1.23TiB
> Data,RAID6: 888.43GiB
> Data,RAID6: 2.00GiB
> Metadata,RAID1: 2.00GiB
> System,RAID1: 32.00MiB
> Unallocated: 3.32TiB
>
> /dev/sdf1, ID: 8
> Device size: 7.28TiB
> Device slack: 0.00B
> Data,RAID6: 188.67GiB
> Data,RAID6: 1.68TiB
> Data,RAID6: 1.23TiB
> Data,RAID6: 888.43GiB
> Data,RAID6: 2.00GiB
> Metadata,RAID1: 8.00GiB
> Unallocated: 3.31TiB
>
> /dev/sdj1, ID: 9
> Device size: 7.28TiB
> Device slack: 0.00B
> Data,RAID6: 188.67GiB
> Data,RAID6: 1.68TiB
> Data,RAID6: 1.23TiB
> Data,RAID6: 888.43GiB
> Data,RAID6: 2.00GiB
> Metadata,RAID1: 8.00GiB
> System,RAID1: 32.00MiB
> Unallocated: 3.31TiB
>
>
> FI USAGE
> WARNING: RAID56 detected, not implemented
> Overall:
> Device size: 33.20TiB
> Device allocated: 20.06GiB
> Device unallocated: 33.18TiB
> Device missing: 0.00B
> Used: 19.38GiB
> Free (estimated): 0.00B (min: 8.00EiB)
> Data ratio: 0.00
> Metadata ratio: 2.00
> Global reserve: 512.00MiB (used: 0.00B)
>
> Data,RAID6: Size:15.42TiB, Used:15.18TiB (98.44%)
> /dev/sdd1 2.73TiB
> /dev/sdb1 699.50GiB
> /dev/sdc1 2.73TiB
> /dev/sdi1 1.36TiB
> /dev/sdh1 3.96TiB
> /dev/sda1 3.96TiB
> /dev/sdf1 3.96TiB
> /dev/sdj1 3.96TiB
>
> Metadata,RAID1: Size:10.00GiB, Used:9.69GiB (96.90%)
> /dev/sdh1 2.00GiB
> /dev/sda1 2.00GiB
> /dev/sdf1 8.00GiB
> /dev/sdj1 8.00GiB
>
> System,RAID1: Size:32.00MiB, Used:1.19MiB (3.71%)
> /dev/sda1 32.00MiB
> /dev/sdj1 32.00MiB
>
> Unallocated:
> /dev/sdd1 1.00MiB
> /dev/sdb1 -699.50GiB
> /dev/sdc1 1.00MiB
> /dev/sdi1 1.00MiB
> /dev/sdh1 601.01GiB
> /dev/sda1 3.32TiB
> /dev/sdf1 3.31TiB
> /dev/sdj1 3.31TiB
>
>
> FI SHOW
> Label: 'Pool1' uuid: 99935e27-4922-4efa-bf76-5787536dd71f
> Total devices 8 FS bytes used 15.19TiB
> devid 1 size 2.73TiB used 2.73TiB path /dev/sdd1
> devid 2 size 0.00B used 699.50GiB path /dev/sdb1
> devid 3 size 2.73TiB used 2.73TiB path /dev/sdc1
> devid 5 size 1.36TiB used 1.36TiB path /dev/sdi1
> devid 6 size 4.55TiB used 3.96TiB path /dev/sdh1
> devid 7 size 7.28TiB used 3.96TiB path /dev/sda1
> devid 8 size 7.28TiB used 3.97TiB path /dev/sdf1
> devid 9 size 7.28TiB used 3.97TiB path /dev/sdj1
>
> FI DF
> Data, RAID6: total=15.42TiB, used=15.18TiB
> System, RAID1: total=32.00MiB, used=1.19MiB
> Metadata, RAID1: total=10.00GiB, used=9.69GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Device Delete Stuck
2020-03-29 18:55 ` Zygo Blaxell
@ 2020-03-29 19:24 ` Jason Clara
0 siblings, 0 replies; 5+ messages in thread
From: Jason Clara @ 2020-03-29 19:24 UTC (permalink / raw)
To: Zygo Blaxell; +Cc: linux-btrfs
Thanks, I will give it a try. Your step 1 is actually what I used to detect the errors the first time when the delete would cause the system to hang completely. I then deleted all bad files and restored from a backup. I did do a scrub after that, but didn’t repeat step 1 again.
I will try your suggestion and repeat the steps till I see no errors.
Also, I understand the state of RAID 5/6. This pool has all important data backed up to another RAID1 pool daily. I am actually trying to reduce the size of this pool to add to the RAID1 pool.
It was previously a RAID1 pool I converted to RAID6 and since then I have not been able to remove that device.
> On Mar 29, 2020, at 2:55 PM, Zygo Blaxell <ce3g8jdj@umail.furryterror.org> wrote:
>
> On Sun, Mar 29, 2020 at 10:13:05AM -0400, Jason Clara wrote:
>> I had a previous post about when trying to do a device delete that
>> it would cause my whole system to hang. I seem to have got past
>> that issue.
>>
>> For that, it seems like even though all the SCRUBs finished without
>> any errors I still had a problem with some files. By forcing a read
>> of every single file I was able to detect the bad files in DMESG.
>> Not sure though why SCRUB didn’t detect this. BTRFS warning (device
>> sdd1): csum failed root 5 ino 14654354 off 163852288 csum 0
>
> That sounds like it could be the raid5/6 bug I reported
>
> https://www.spinics.net/lists/linux-btrfs/msg94594.html
>
> To trigger that bug you need pre-existing corruption on the disk.
>
> You can work around by:
>
> 1. Read every file, e.g. 'find -type f -exec cat {} + >/dev/null'
> This avoids dmesg ratelimiting which will hide some errors.
>
> 2. If there are read errors in step 1, remove any that have
> failures.
>
> 3. Run full scrub to fix parity or inject new errors.
>
> 4. Repeat until there are no errors at step 1.
>
> The bug will introduce new errors in a small fraction (<0.1%) of corrupted
> raid stripes as you do this. Each pass through the loop will remove
> existing errors, but may add a few more new errors at the same time.
> The rate of removal is much faster than the rate of addition, so the
> loop will eventually terminate at zero errors. You'll be able to use
> the filesystem normally again after that.
>
> This bug is not a regression--there has not been a kernel release with
> working btrfs raid5/6 yet. All releases from 4.15 to 5.5.3 fail my test
> case, and versions before 4.15 have worse bugs. At the moment, btrfs
> raid5/6 should only be used by developers who intend to test, debug,
> and fix btrfs raid5/6.
>
>> But now when I attempt to delete a device from the array it seems to
>> get stuck. Normally it will show in the log that it has found some
>> extents and then another message saying they were relocated.
>>
>> But for the last few days it has just been repeating the same found
>> value and never relocating anything, and the usage of the device
>> doesn’t change at all.
>>
>> This line has now been repeating for more then 24 hours, and the
>> previous attempt was similar. [Sun Mar 29 09:59:50 2020] BTRFS info
>> (device sdd1): found 133 extents
>
> Kernels starting with 5.1 have a known regression where block group
> relocation gets stuck in loops. Everything in the block group gets
> relocated except for shared data backref items, then the relocation can't
> seem to move those and no further progress is made. This has not been
> fixed yet.
>
>> Prior to this run I had tried with an earlier kernel (5.5.10) and had
>> the same results. It starts with finding and then relocating, but
>> then relocating. So I upgraded my kernel to see if that would help,
>> and it has not.
>
> Use kernel 4.19 for device deletes or other big relocation operations.
> (5.0 and 4.20 are OK too, but 4.19 is still maintained and has fixes
> for non-btrfs issues).
>
>> System Info
>> Ubuntu 18.04
>> btrfs-progs v5.4.1
>> Linux FileServer 5.5.13-050513-generic #202003251631 SMP Wed Mar 25 16:35:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>>
>> DEVICE USAGE
>> /dev/sdd1, ID: 1
>> Device size: 2.73TiB
>> Device slack: 0.00B
>> Data,RAID6: 188.67GiB
>> Data,RAID6: 1.68TiB
>> Data,RAID6: 888.43GiB
>> Unallocated: 1.00MiB
>>
>> /dev/sdb1, ID: 2
>> Device size: 2.73TiB
>> Device slack: 2.73TiB
>> Data,RAID6: 188.67GiB
>> Data,RAID6: 508.82GiB
>> Data,RAID6: 2.00GiB
>> Unallocated: -699.50GiB
>>
>> /dev/sdc1, ID: 3
>> Device size: 2.73TiB
>> Device slack: 0.00B
>> Data,RAID6: 188.67GiB
>> Data,RAID6: 1.68TiB
>> Data,RAID6: 888.43GiB
>> Unallocated: 1.00MiB
>>
>> /dev/sdi1, ID: 5
>> Device size: 2.73TiB
>> Device slack: 1.36TiB
>> Data,RAID6: 188.67GiB
>> Data,RAID6: 1.18TiB
>> Unallocated: 1.00MiB
>>
>> /dev/sdh1, ID: 6
>> Device size: 4.55TiB
>> Device slack: 0.00B
>> Data,RAID6: 188.67GiB
>> Data,RAID6: 1.68TiB
>> Data,RAID6: 1.23TiB
>> Data,RAID6: 888.43GiB
>> Data,RAID6: 2.00GiB
>> Metadata,RAID1: 2.00GiB
>> Unallocated: 601.01GiB
>>
>> /dev/sda1, ID: 7
>> Device size: 7.28TiB
>> Device slack: 0.00B
>> Data,RAID6: 188.67GiB
>> Data,RAID6: 1.68TiB
>> Data,RAID6: 1.23TiB
>> Data,RAID6: 888.43GiB
>> Data,RAID6: 2.00GiB
>> Metadata,RAID1: 2.00GiB
>> System,RAID1: 32.00MiB
>> Unallocated: 3.32TiB
>>
>> /dev/sdf1, ID: 8
>> Device size: 7.28TiB
>> Device slack: 0.00B
>> Data,RAID6: 188.67GiB
>> Data,RAID6: 1.68TiB
>> Data,RAID6: 1.23TiB
>> Data,RAID6: 888.43GiB
>> Data,RAID6: 2.00GiB
>> Metadata,RAID1: 8.00GiB
>> Unallocated: 3.31TiB
>>
>> /dev/sdj1, ID: 9
>> Device size: 7.28TiB
>> Device slack: 0.00B
>> Data,RAID6: 188.67GiB
>> Data,RAID6: 1.68TiB
>> Data,RAID6: 1.23TiB
>> Data,RAID6: 888.43GiB
>> Data,RAID6: 2.00GiB
>> Metadata,RAID1: 8.00GiB
>> System,RAID1: 32.00MiB
>> Unallocated: 3.31TiB
>>
>>
>> FI USAGE
>> WARNING: RAID56 detected, not implemented
>> Overall:
>> Device size: 33.20TiB
>> Device allocated: 20.06GiB
>> Device unallocated: 33.18TiB
>> Device missing: 0.00B
>> Used: 19.38GiB
>> Free (estimated): 0.00B (min: 8.00EiB)
>> Data ratio: 0.00
>> Metadata ratio: 2.00
>> Global reserve: 512.00MiB (used: 0.00B)
>>
>> Data,RAID6: Size:15.42TiB, Used:15.18TiB (98.44%)
>> /dev/sdd1 2.73TiB
>> /dev/sdb1 699.50GiB
>> /dev/sdc1 2.73TiB
>> /dev/sdi1 1.36TiB
>> /dev/sdh1 3.96TiB
>> /dev/sda1 3.96TiB
>> /dev/sdf1 3.96TiB
>> /dev/sdj1 3.96TiB
>>
>> Metadata,RAID1: Size:10.00GiB, Used:9.69GiB (96.90%)
>> /dev/sdh1 2.00GiB
>> /dev/sda1 2.00GiB
>> /dev/sdf1 8.00GiB
>> /dev/sdj1 8.00GiB
>>
>> System,RAID1: Size:32.00MiB, Used:1.19MiB (3.71%)
>> /dev/sda1 32.00MiB
>> /dev/sdj1 32.00MiB
>>
>> Unallocated:
>> /dev/sdd1 1.00MiB
>> /dev/sdb1 -699.50GiB
>> /dev/sdc1 1.00MiB
>> /dev/sdi1 1.00MiB
>> /dev/sdh1 601.01GiB
>> /dev/sda1 3.32TiB
>> /dev/sdf1 3.31TiB
>> /dev/sdj1 3.31TiB
>>
>>
>> FI SHOW
>> Label: 'Pool1' uuid: 99935e27-4922-4efa-bf76-5787536dd71f
>> Total devices 8 FS bytes used 15.19TiB
>> devid 1 size 2.73TiB used 2.73TiB path /dev/sdd1
>> devid 2 size 0.00B used 699.50GiB path /dev/sdb1
>> devid 3 size 2.73TiB used 2.73TiB path /dev/sdc1
>> devid 5 size 1.36TiB used 1.36TiB path /dev/sdi1
>> devid 6 size 4.55TiB used 3.96TiB path /dev/sdh1
>> devid 7 size 7.28TiB used 3.96TiB path /dev/sda1
>> devid 8 size 7.28TiB used 3.97TiB path /dev/sdf1
>> devid 9 size 7.28TiB used 3.97TiB path /dev/sdj1
>>
>> FI DF
>> Data, RAID6: total=15.42TiB, used=15.18TiB
>> System, RAID1: total=32.00MiB, used=1.19MiB
>> Metadata, RAID1: total=10.00GiB, used=9.69GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-03-29 19:25 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-29 14:13 Device Delete Stuck Jason Clara
2020-03-29 16:40 ` Steven Fosdick
2020-03-29 18:18 ` Jason Clara
2020-03-29 18:55 ` Zygo Blaxell
2020-03-29 19:24 ` Jason Clara
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.