* Endless mount and backpointer mismatch
@ 2020-01-27 21:20 Pepie 34
2020-01-27 21:48 ` Chris Murphy
2020-01-28 1:23 ` Qu Wenruo
0 siblings, 2 replies; 6+ messages in thread
From: Pepie 34 @ 2020-01-27 21:20 UTC (permalink / raw)
To: linux-btrfs
Dear BTRFS community,
I've a raid 1 setup on two luks encrypted drives for 4 years that serves
me as btrbk backup target from an other computer.
There is a lot of ro snaptshots on it.
I've mistakenly launched a balance on it which was extremely slow and
tried to cancelled it.
After two days of cancelling without results, I decided to power off the
computer.
After the reboot, even with the skip_balance mount option, the mounting
is endless, no error in the kernel message and it never mounts.
What I have done so far:
- mount the volume with the ro option (fast to mount, data OK).
- scrub in ro mode, no error found
- btrfs check
In the extent check there is plenty of errors like this :
=>
ref mismatch on [9404816285696 32768] extent item 6, found 5
incorrect local backref count on 9404816285696 parent 5712684302336
owner 0 offset 0 found 0 wanted 1 back 0x55f371ee1ad0
backref disk bytenr does not match extent record, bytenr=9404816285696,
ref bytenr=0
backpointer mismatch on [9404816285696 32768]
<=
No errors in other checks, though checking "quota groups" is very slow.
What should I do ? btrfs check --repair ?
btrfs check --init-extent-tree ?
btrfs --clear-space-cache ?
Will the "init extent tree" option break btrfs receive with old snapshot
parents ?
Best regards,
Pepie34
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Endless mount and backpointer mismatch
2020-01-27 21:20 Endless mount and backpointer mismatch Pepie 34
@ 2020-01-27 21:48 ` Chris Murphy
2020-01-27 21:59 ` Pepie 34
2020-01-28 1:23 ` Qu Wenruo
1 sibling, 1 reply; 6+ messages in thread
From: Chris Murphy @ 2020-01-27 21:48 UTC (permalink / raw)
To: Pepie 34; +Cc: Btrfs BTRFS
On Mon, Jan 27, 2020 at 2:20 PM Pepie 34 <pepie34@gmail.com> wrote:
>
> Dear BTRFS community,
>
> I've a raid 1 setup on two luks encrypted drives for 4 years that serves
> me as btrbk backup target from an other computer.
> There is a lot of ro snaptshots on it.
>
> I've mistakenly launched a balance on it which was extremely slow and
> tried to cancelled it.
> After two days of cancelling without results, I decided to power off the
> computer.
>
> After the reboot, even with the skip_balance mount option, the mounting
> is endless, no error in the kernel message and it never mounts.
>
> What I have done so far:
> - mount the volume with the ro option (fast to mount, data OK).
> - scrub in ro mode, no error found
> - btrfs check
> In the extent check there is plenty of errors like this :
> =>
> ref mismatch on [9404816285696 32768] extent item 6, found 5
>
> incorrect local backref count on 9404816285696 parent 5712684302336
> owner 0 offset 0 found 0 wanted 1 back 0x55f371ee1ad0
> backref disk bytenr does not match extent record, bytenr=9404816285696,
> ref bytenr=0
> backpointer mismatch on [9404816285696 32768]
> <=
> No errors in other checks, though checking "quota groups" is very slow.
>
> What should I do ? btrfs check --repair ?
> btrfs check --init-extent-tree ?
> btrfs --clear-space-cache ?
None of the above.
What kernel version and btrfs-progs? Newer kernels should have better
performance with quota enabled and many snapshots, even though I think
that combination is still not advised for performance reasons. Older
kernel might have a known bug related to the behavior you're
experiencing, but we need to know the versions.
--
Chris Murphy
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Endless mount and backpointer mismatch
2020-01-27 21:48 ` Chris Murphy
@ 2020-01-27 21:59 ` Pepie 34
0 siblings, 0 replies; 6+ messages in thread
From: Pepie 34 @ 2020-01-27 21:59 UTC (permalink / raw)
To: linux-btrfs
Le 27/01/2020 à 22:48, Chris Murphy a écrit :
> On Mon, Jan 27, 2020 at 2:20 PM Pepie 34 <pepie34@gmail.com> wrote:
>> Dear BTRFS community,
>>
>> I've a raid 1 setup on two luks encrypted drives for 4 years that serves
>> me as btrbk backup target from an other computer.
>> There is a lot of ro snaptshots on it.
>>
>> I've mistakenly launched a balance on it which was extremely slow and
>> tried to cancelled it.
>> After two days of cancelling without results, I decided to power off the
>> computer.
>>
>> After the reboot, even with the skip_balance mount option, the mounting
>> is endless, no error in the kernel message and it never mounts.
>>
>> What I have done so far:
>> - mount the volume with the ro option (fast to mount, data OK).
>> - scrub in ro mode, no error found
>> - btrfs check
>> In the extent check there is plenty of errors like this :
>> =>
>> ref mismatch on [9404816285696 32768] extent item 6, found 5
>>
>> incorrect local backref count on 9404816285696 parent 5712684302336
>> owner 0 offset 0 found 0 wanted 1 back 0x55f371ee1ad0
>> backref disk bytenr does not match extent record, bytenr=9404816285696,
>> ref bytenr=0
>> backpointer mismatch on [9404816285696 32768]
>> <=
>> No errors in other checks, though checking "quota groups" is very slow.
>>
>> What should I do ? btrfs check --repair ?
>> btrfs check --init-extent-tree ?
>> btrfs --clear-space-cache ?
> None of the above.
>
> What kernel version and btrfs-progs? Newer kernels should have better
> performance with quota enabled and many snapshots, even though I think
> that combination is still not advised for performance reasons. Older
> kernel might have a known bug related to the behavior you're
> experiencing, but we need to know the versions.
>
>
>
Kernel : 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11)
x86_64 GNU/Linux
btrfs-progs: 4.20.1-2
I'm on debian buster.
Note that I don't really care about perf now, I mainly want to mount r/w
my volume.
With the same setup raid1 / luks / quota, it was fast to mount and use
before the failed balancing.
(mount took about 2s... )
Now it is not mounting at all (except with ro)
Best regards,
Pepie 34
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Endless mount and backpointer mismatch
2020-01-27 21:20 Endless mount and backpointer mismatch Pepie 34
2020-01-27 21:48 ` Chris Murphy
@ 2020-01-28 1:23 ` Qu Wenruo
2020-01-28 17:32 ` Pepie 34
1 sibling, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2020-01-28 1:23 UTC (permalink / raw)
To: Pepie 34, linux-btrfs
[-- Attachment #1.1: Type: text/plain, Size: 1958 bytes --]
On 2020/1/28 上午5:20, Pepie 34 wrote:
> Dear BTRFS community,
>
> I've a raid 1 setup on two luks encrypted drives for 4 years that serves
> me as btrbk backup target from an other computer.
> There is a lot of ro snaptshots on it.
>
> I've mistakenly launched a balance on it which was extremely slow and
> tried to cancelled it.
> After two days of cancelling without results, I decided to power off the
> computer.
>
> After the reboot, even with the skip_balance mount option, the mounting
> is endless, no error in the kernel message and it never mounts.
Is there anything like "relocating block group XXXX flags XXXX" ?
>
> What I have done so far:
> - mount the volume with the ro option (fast to mount, data OK).
> - scrub in ro mode, no error found
So data are all OK.
Just need a way to cancel the balance.
> - btrfs check
> In the extent check there is plenty of errors like this :
> =>
> ref mismatch on [9404816285696 32768] extent item 6, found 5
>
> incorrect local backref count on 9404816285696 parent 5712684302336
> owner 0 offset 0 found 0 wanted 1 back 0x55f371ee1ad0
> backref disk bytenr does not match extent record, bytenr=9404816285696,
> ref bytenr=0
> backpointer mismatch on [9404816285696 32768]
> <=
It could be caused by half-balanced fs.
Need to re-check after we cancel the balance.
> No errors in other checks, though checking "quota groups" is very slow.
That's caused by the nature of qgroup.
>
> What should I do ? btrfs check --repair ?
> btrfs check --init-extent-tree ?
> btrfs --clear-space-cache ?
None of the options should affect data, but none of them are recommened.
Since the problem is about the balance.
Have you tried to mount the fs with RO,skip_balance, then remount it rw?
Thanks,
Qu
>
> Will the "init extent tree" option break btrfs receive with old snapshot
> parents ?
>
> Best regards,
>
> Pepie34
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Endless mount and backpointer mismatch
2020-01-28 1:23 ` Qu Wenruo
@ 2020-01-28 17:32 ` Pepie 34
2020-02-05 12:50 ` Pepie 34
0 siblings, 1 reply; 6+ messages in thread
From: Pepie 34 @ 2020-01-28 17:32 UTC (permalink / raw)
To: linux-btrfs
Le 28/01/2020 à 02:23, Qu Wenruo a écrit :
>
> On 2020/1/28 上午5:20, Pepie 34 wrote:
>> Dear BTRFS community,
>>
>> I've a raid 1 setup on two luks encrypted drives for 4 years that serves
>> me as btrbk backup target from an other computer.
>> There is a lot of ro snaptshots on it.
>>
>> I've mistakenly launched a balance on it which was extremely slow and
>> tried to cancelled it.
>> After two days of cancelling without results, I decided to power off the
>> computer.
>>
>> After the reboot, even with the skip_balance mount option, the mounting
>> is endless, no error in the kernel message and it never mounts.
> Is there anything like "relocating block group XXXX flags XXXX" ?
No but other messages see below
>
>> What I have done so far:
>> - mount the volume with the ro option (fast to mount, data OK).
>> - scrub in ro mode, no error found
> So data are all OK.
> Just need a way to cancel the balance.
>
>> - btrfs check
>> In the extent check there is plenty of errors like this :
>> =>
>> ref mismatch on [9404816285696 32768] extent item 6, found 5
>>
>> incorrect local backref count on 9404816285696 parent 5712684302336
>> owner 0 offset 0 found 0 wanted 1 back 0x55f371ee1ad0
>> backref disk bytenr does not match extent record, bytenr=9404816285696,
>> ref bytenr=0
>> backpointer mismatch on [9404816285696 32768]
>> <=
> It could be caused by half-balanced fs.
> Need to re-check after we cancel the balance.
>
>> No errors in other checks, though checking "quota groups" is very slow.
> That's caused by the nature of qgroup.
>
>> What should I do ? btrfs check --repair ?
>> btrfs check --init-extent-tree ?
>> btrfs --clear-space-cache ?
> None of the options should affect data, but none of them are recommened.
>
> Since the problem is about the balance.
>
> Have you tried to mount the fs with RO,skip_balance, then remount it rw?
I have mount it ro,skip_balance then rw.
It is now 12h it is trying to mount rw.
I 've messages that tasks have taken more than 120 seconds in the kernel
log.
Some samples:
[43621.876315] INFO: task btrfs-transacti:21846 blocked for more than
120
seconds.
[43621.876325] Not tainted 4.19.0-6-amd64 #1 Debian
4.19.67-2+deb10u2
[43621.876327] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this
message.
[43621.876331] btrfs-transacti D 0 21846 2
0x80000000
[43621.876334] Call
Trace:
[43621.876345] ?
__schedule+0x2a2/0x870
[43621.876347]
schedule+0x28/0x80
[43621.876394] btrfs_commit_transaction+0x75f/0x880
[btrfs]
[43621.876399] ?
finish_wait+0x80/0x80
[43621.876419] transaction_kthread+0x147/0x180
[btrfs]
[43621.876440] ? btrfs_cleanup_transaction+0x530/0x530
[btrfs]
[43621.876443]
kthread+0x112/0x130
[43621.876445] ?
kthread_bind+0x30/0x30
[43621.876447]
ret_from_fork+0x22/0x40
[44346.867777] INFO: task mount:21595 blocked for more than 120
seconds.
[44346.867788] Not tainted 4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2
[44346.867791] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[44346.867795] mount D 0 21595 21594 0x00000000
[44346.867797] Call Trace:
[44346.867809] ? __schedule+0x2a2/0x870
[44346.867812] ? __wake_up_common+0x7a/0x190
[44346.867814] schedule+0x28/0x80
[44346.867859] wait_current_trans+0xc3/0xf0 [btrfs]
[44346.867863] ? finish_wait+0x80/0x80
[44346.867884] start_transaction+0x317/0x3e0 [btrfs]
[44346.867908] merge_reloc_root+0xf5/0x560 [btrfs]
[44346.867933] merge_reloc_roots+0xda/0x1f0 [btrfs]
[44346.867957] btrfs_recover_relocation+0x42d/0x490 [btrfs]
[44346.867978] open_ctree+0x1860/0x1bf0 [btrfs]
[44346.867995] btrfs_mount_root+0x682/0x740 [btrfs]
[44346.867999] ? cpumask_next+0x16/0x20
[44346.868002] ? pcpu_alloc+0x321/0x640
[44346.868005] mount_fs+0x3e/0x145
[44346.868008] vfs_kern_mount.part.36+0x54/0x120
[44346.868024] btrfs_mount+0x16f/0x860 [btrfs]
[44346.868027] ? path_lookupat.isra.48+0xa3/0x220
[44346.868028] ? legitimize_path.isra.41+0x2d/0x60
[44346.868030] ? cpumask_next+0x16/0x20
[44346.868031] ? pcpu_alloc+0x321/0x640
[44346.868032] ? mount_fs+0x3e/0x145
[44346.868034] mount_fs+0x3e/0x145
[44346.868035] vfs_kern_mount.part.36+0x54/0x120
[44346.868037] do_mount+0x20e/0xcc0
[44346.868039] ? _cond_resched+0x15/0x30
[44346.868041] ? kmem_cache_alloc_trace+0x155/0x1d0
[44346.868043] ksys_mount+0xb6/0xd0
[44346.868044] __x64_sys_mount+0x21/0x30
[44346.868047] do_syscall_64+0x53/0x110
[44346.868050] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[44346.868052] RIP: 0033:0x7ff50cb41fea
[44346.868060] Code: Bad RIP value.
[44346.868061] RSP: 002b:00007ffd2257b2e8 EFLAGS: 00000246 ORIG_RAX:
00000000000000a5
[44346.868063] RAX: ffffffffffffffda RBX: 000055cc47409a40 RCX:
00007ff50cb41fea
[44346.868064] RDX: 000055cc4740be00 RSI: 000055cc47409c50 RDI:
000055cc4740aa50
[44346.868065] RBP: 00007ff50ce961c4 R08: 000055cc47409c70 R09:
000055cc474119e0
[44346.868065] R10: 0000000000000000 R11: 0000000000000246 R12:
0000000000000000
[44346.868066] R13: 0000000000000000 R14: 000055cc4740aa50 R15:
000055cc4740be00
Besides shutting down the computer, is there a proper way to stop the
mounting ?
Best regards,
Pepie 34
>
> Thanks,
> Qu
>
>> Will the "init extent tree" option break btrfs receive with old snapshot
>> parents ?
>>
>> Best regards,
>>
>> Pepie34
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Endless mount and backpointer mismatch
2020-01-28 17:32 ` Pepie 34
@ 2020-02-05 12:50 ` Pepie 34
0 siblings, 0 replies; 6+ messages in thread
From: Pepie 34 @ 2020-02-05 12:50 UTC (permalink / raw)
To: linux-btrfs
Hi,
For your information I have updated my kernel to 5.4.8-1~bpo10+1 and
btrfs-progs to 5.2.1-1~bpo10+1 (from buster backports).
From there I could mount rw with skip_balance, then cancel balance.
After that, btrfs scrub and btrfs check displayed no error, except
inconsistency in the space cache which I corrected by a clear_cache.
By the way, do you advise to use space_cache on encrypted device ?
Best regards,
Pepie 34
Le 28/01/2020 à 18:32, Pepie 34 a écrit :
> Le 28/01/2020 à 02:23, Qu Wenruo a écrit :
>> On 2020/1/28 上午5:20, Pepie 34 wrote:
>>> Dear BTRFS community,
>>>
>>> I've a raid 1 setup on two luks encrypted drives for 4 years that serves
>>> me as btrbk backup target from an other computer.
>>> There is a lot of ro snaptshots on it.
>>>
>>> I've mistakenly launched a balance on it which was extremely slow and
>>> tried to cancelled it.
>>> After two days of cancelling without results, I decided to power off the
>>> computer.
>>>
>>> After the reboot, even with the skip_balance mount option, the mounting
>>> is endless, no error in the kernel message and it never mounts.
>> Is there anything like "relocating block group XXXX flags XXXX" ?
> No but other messages see below
>
>
>>> What I have done so far:
>>> - mount the volume with the ro option (fast to mount, data OK).
>>> - scrub in ro mode, no error found
>> So data are all OK.
>> Just need a way to cancel the balance.
>>
>>> - btrfs check
>>> In the extent check there is plenty of errors like this :
>>> =>
>>> ref mismatch on [9404816285696 32768] extent item 6, found 5
>>>
>>> incorrect local backref count on 9404816285696 parent 5712684302336
>>> owner 0 offset 0 found 0 wanted 1 back 0x55f371ee1ad0
>>> backref disk bytenr does not match extent record, bytenr=9404816285696,
>>> ref bytenr=0
>>> backpointer mismatch on [9404816285696 32768]
>>> <=
>> It could be caused by half-balanced fs.
>> Need to re-check after we cancel the balance.
>>
>>> No errors in other checks, though checking "quota groups" is very slow.
>> That's caused by the nature of qgroup.
>>
>>> What should I do ? btrfs check --repair ?
>>> btrfs check --init-extent-tree ?
>>> btrfs --clear-space-cache ?
>> None of the options should affect data, but none of them are recommened.
>>
>> Since the problem is about the balance.
>>
>> Have you tried to mount the fs with RO,skip_balance, then remount it rw?
> I have mount it ro,skip_balance then rw.
>
> It is now 12h it is trying to mount rw.
>
> I 've messages that tasks have taken more than 120 seconds in the kernel
> log.
>
> Some samples:
>
> [43621.876315] INFO: task btrfs-transacti:21846 blocked for more than
> 120
> seconds.
>
> [43621.876325] Not tainted 4.19.0-6-amd64 #1 Debian
> 4.19.67-2+deb10u2
>
> [43621.876327] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this
> message.
>
> [43621.876331] btrfs-transacti D 0 21846 2
> 0x80000000
>
> [43621.876334] Call
> Trace:
>
> [43621.876345] ?
> __schedule+0x2a2/0x870
>
> [43621.876347]
> schedule+0x28/0x80
>
> [43621.876394] btrfs_commit_transaction+0x75f/0x880
> [btrfs]
>
> [43621.876399] ?
> finish_wait+0x80/0x80
>
> [43621.876419] transaction_kthread+0x147/0x180
> [btrfs]
>
> [43621.876440] ? btrfs_cleanup_transaction+0x530/0x530
> [btrfs]
>
> [43621.876443]
> kthread+0x112/0x130
>
> [43621.876445] ?
> kthread_bind+0x30/0x30
>
> [43621.876447]
> ret_from_fork+0x22/0x40
>
>
>
> [44346.867777] INFO: task mount:21595 blocked for more than 120
> seconds.
>
> [44346.867788] Not tainted 4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2
> [44346.867791] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [44346.867795] mount D 0 21595 21594 0x00000000
> [44346.867797] Call Trace:
> [44346.867809] ? __schedule+0x2a2/0x870
> [44346.867812] ? __wake_up_common+0x7a/0x190
> [44346.867814] schedule+0x28/0x80
> [44346.867859] wait_current_trans+0xc3/0xf0 [btrfs]
> [44346.867863] ? finish_wait+0x80/0x80
> [44346.867884] start_transaction+0x317/0x3e0 [btrfs]
> [44346.867908] merge_reloc_root+0xf5/0x560 [btrfs]
> [44346.867933] merge_reloc_roots+0xda/0x1f0 [btrfs]
> [44346.867957] btrfs_recover_relocation+0x42d/0x490 [btrfs]
> [44346.867978] open_ctree+0x1860/0x1bf0 [btrfs]
> [44346.867995] btrfs_mount_root+0x682/0x740 [btrfs]
> [44346.867999] ? cpumask_next+0x16/0x20
> [44346.868002] ? pcpu_alloc+0x321/0x640
> [44346.868005] mount_fs+0x3e/0x145
> [44346.868008] vfs_kern_mount.part.36+0x54/0x120
> [44346.868024] btrfs_mount+0x16f/0x860 [btrfs]
> [44346.868027] ? path_lookupat.isra.48+0xa3/0x220
> [44346.868028] ? legitimize_path.isra.41+0x2d/0x60
> [44346.868030] ? cpumask_next+0x16/0x20
> [44346.868031] ? pcpu_alloc+0x321/0x640
> [44346.868032] ? mount_fs+0x3e/0x145
> [44346.868034] mount_fs+0x3e/0x145
> [44346.868035] vfs_kern_mount.part.36+0x54/0x120
> [44346.868037] do_mount+0x20e/0xcc0
> [44346.868039] ? _cond_resched+0x15/0x30
> [44346.868041] ? kmem_cache_alloc_trace+0x155/0x1d0
> [44346.868043] ksys_mount+0xb6/0xd0
> [44346.868044] __x64_sys_mount+0x21/0x30
> [44346.868047] do_syscall_64+0x53/0x110
> [44346.868050] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [44346.868052] RIP: 0033:0x7ff50cb41fea
> [44346.868060] Code: Bad RIP value.
> [44346.868061] RSP: 002b:00007ffd2257b2e8 EFLAGS: 00000246 ORIG_RAX:
> 00000000000000a5
> [44346.868063] RAX: ffffffffffffffda RBX: 000055cc47409a40 RCX:
> 00007ff50cb41fea
> [44346.868064] RDX: 000055cc4740be00 RSI: 000055cc47409c50 RDI:
> 000055cc4740aa50
> [44346.868065] RBP: 00007ff50ce961c4 R08: 000055cc47409c70 R09:
> 000055cc474119e0
> [44346.868065] R10: 0000000000000000 R11: 0000000000000246 R12:
> 0000000000000000
> [44346.868066] R13: 0000000000000000 R14: 000055cc4740aa50 R15:
> 000055cc4740be00
>
> Besides shutting down the computer, is there a proper way to stop the
> mounting ?
>
> Best regards,
>
> Pepie 34
>
>
>> Thanks,
>> Qu
>>
>>> Will the "init extent tree" option break btrfs receive with old snapshot
>>> parents ?
>>>
>>> Best regards,
>>>
>>> Pepie34
>>>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-02-05 12:50 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-27 21:20 Endless mount and backpointer mismatch Pepie 34
2020-01-27 21:48 ` Chris Murphy
2020-01-27 21:59 ` Pepie 34
2020-01-28 1:23 ` Qu Wenruo
2020-01-28 17:32 ` Pepie 34
2020-02-05 12:50 ` Pepie 34
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.