linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* failed to read block groups: Operation not permitted
@ 2020-10-06  9:09 Johannes Hirte
  2020-10-06 12:06 ` Qu Wenruo
  0 siblings, 1 reply; 4+ messages in thread
From: Johannes Hirte @ 2020-10-06  9:09 UTC (permalink / raw)
  To: linux-btrfs

I recently encountered filesystem damage on a VM. During normal
operation, the filesystem was remounted ro suddenly. Dmesg showed me
some errors about parent transid verify failed. I've forced of the VM
and tried to mount the image on the host, but failed with:

[  340.702391] BTRFS info (device loop0p1): disk space caching is enabled
[  340.702393] BTRFS info (device loop0p1): has skinny extents
[  341.815890] BTRFS error (device loop0p1): parent transid verify failed on 152064327680 wanted 323984 found 323888
[  341.831183] BTRFS error (device loop0p1): parent transid verify failed on 152064327680 wanted 323984 found 323888
[  341.831194] BTRFS error (device loop0p1): failed to read block groups: -5
[  341.851954] BTRFS error (device loop0p1): open_ctree failed

A btrfs check resulted in:

btrfs check /dev/loop0p1
Opening filesystem to check...
parent transid verify failed on 152064327680 wanted 323984 found 323888
parent transid verify failed on 152064327680 wanted 323984 found 323888
parent transid verify failed on 152064327680 wanted 323984 found 323888
Ignoring transid failure
leaf parent key incorrect 152064327680
ERROR: failed to read block groups: Operation not permitted
ERROR: cannot open file system

The host is running libvirt with kvm, btrfs with RAID1. The VMs are raw
images, with btrfs too. I've switche this VM from io=native to
io=io_uring, and suspect that this caused the damage. All machines are
running kernel 5.8.13.

I was able to recover most of the damaged filesystem with btrfs recover.
Is there anything I can do for repair it? And why does the damage happen
at all with io_uring?

-- 
Regards,
  Johannes Hirte


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: failed to read block groups: Operation not permitted
  2020-10-06  9:09 failed to read block groups: Operation not permitted Johannes Hirte
@ 2020-10-06 12:06 ` Qu Wenruo
  2020-10-08  9:24   ` Johannes Hirte
  0 siblings, 1 reply; 4+ messages in thread
From: Qu Wenruo @ 2020-10-06 12:06 UTC (permalink / raw)
  To: Johannes Hirte, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2104 bytes --]



On 2020/10/6 下午5:09, Johannes Hirte wrote:
> I recently encountered filesystem damage on a VM. During normal
> operation, the filesystem was remounted ro suddenly. Dmesg showed me
> some errors about parent transid verify failed. I've forced of the VM
> and tried to mount the image on the host, but failed with:
> 
> [  340.702391] BTRFS info (device loop0p1): disk space caching is enabled
> [  340.702393] BTRFS info (device loop0p1): has skinny extents
> [  341.815890] BTRFS error (device loop0p1): parent transid verify failed on 152064327680 wanted 323984 found 323888
> [  341.831183] BTRFS error (device loop0p1): parent transid verify failed on 152064327680 wanted 323984 found 323888

Your extent tree is corrupted. Metadata CoW is broken.

I don't believe only extent tree get corrupted, other part of your fs
can also be corrupted.

> [  341.831194] BTRFS error (device loop0p1): failed to read block groups: -5
> [  341.851954] BTRFS error (device loop0p1): open_ctree failed
> 
> A btrfs check resulted in:
> 
> btrfs check /dev/loop0p1
> Opening filesystem to check...
> parent transid verify failed on 152064327680 wanted 323984 found 323888
> parent transid verify failed on 152064327680 wanted 323984 found 323888
> parent transid verify failed on 152064327680 wanted 323984 found 323888
> Ignoring transid failure
> leaf parent key incorrect 152064327680
> ERROR: failed to read block groups: Operation not permitted
> ERROR: cannot open file system
> 
> The host is running libvirt with kvm, btrfs with RAID1. The VMs are raw
> images, with btrfs too. I've switche this VM from io=native to
> io=io_uring, and suspect that this caused the damage. All machines are
> running kernel 5.8.13.

I'm not sure about the io_uring setup. IIRC as long as you're not using
cache=unsafe, it should be safe.

Does the io_uring ignores the flush?

Thanks,
Qu
> 
> I was able to recover most of the damaged filesystem with btrfs recover.
> Is there anything I can do for repair it? And why does the damage happen
> at all with io_uring?
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: failed to read block groups: Operation not permitted
  2020-10-06 12:06 ` Qu Wenruo
@ 2020-10-08  9:24   ` Johannes Hirte
  2020-10-08  9:35     ` Johannes Hirte
  0 siblings, 1 reply; 4+ messages in thread
From: Johannes Hirte @ 2020-10-08  9:24 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Jens Axboe

On 2020 Okt 06, Qu Wenruo wrote:
> 
> 
> On 2020/10/6 下午5:09, Johannes Hirte wrote:
> > I recently encountered filesystem damage on a VM. During normal
> > operation, the filesystem was remounted ro suddenly. Dmesg showed me
> > some errors about parent transid verify failed. I've forced of the VM
> > and tried to mount the image on the host, but failed with:
> > 
> > [  340.702391] BTRFS info (device loop0p1): disk space caching is enabled
> > [  340.702393] BTRFS info (device loop0p1): has skinny extents
> > [  341.815890] BTRFS error (device loop0p1): parent transid verify failed on 152064327680 wanted 323984 found 323888
> > [  341.831183] BTRFS error (device loop0p1): parent transid verify failed on 152064327680 wanted 323984 found 323888
> 
> Your extent tree is corrupted. Metadata CoW is broken.
> 
> I don't believe only extent tree get corrupted, other part of your fs
> can also be corrupted.
> 
> > [  341.831194] BTRFS error (device loop0p1): failed to read block groups: -5
> > [  341.851954] BTRFS error (device loop0p1): open_ctree failed
> > 
> > A btrfs check resulted in:
> > 
> > btrfs check /dev/loop0p1
> > Opening filesystem to check...
> > parent transid verify failed on 152064327680 wanted 323984 found 323888
> > parent transid verify failed on 152064327680 wanted 323984 found 323888
> > parent transid verify failed on 152064327680 wanted 323984 found 323888
> > Ignoring transid failure
> > leaf parent key incorrect 152064327680
> > ERROR: failed to read block groups: Operation not permitted
> > ERROR: cannot open file system
> > 
> > The host is running libvirt with kvm, btrfs with RAID1. The VMs are raw
> > images, with btrfs too. I've switche this VM from io=native to
> > io=io_uring, and suspect that this caused the damage. All machines are
> > running kernel 5.8.13.
> 
> I'm not sure about the io_uring setup. IIRC as long as you're not using
> cache=unsafe, it should be safe.
> 
> Does the io_uring ignores the flush?


Putting someone with more knowledge into cc.

For another VM, I've found several errors in the log of the host machine:

BTRFS warning (device sda1): direct IO failed ino 5988432 rw 1,2131969 sector 0x123ab840 len 32768 err no 10


The VM was remounted ro too, like the first one. But in this case the
filesystem was ok after a check. For the first VM with the heavily
damaged filesystem there aren't any log entries.


-- 
Regards,
  Johannes Hirte


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: failed to read block groups: Operation not permitted
  2020-10-08  9:24   ` Johannes Hirte
@ 2020-10-08  9:35     ` Johannes Hirte
  0 siblings, 0 replies; 4+ messages in thread
From: Johannes Hirte @ 2020-10-08  9:35 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Jens Axboe

On 2020 Okt 08, Johannes Hirte wrote:
> On 2020 Okt 06, Qu Wenruo wrote:
> > 
> > 
> > On 2020/10/6 下午5:09, Johannes Hirte wrote:
> > > I recently encountered filesystem damage on a VM. During normal
> > > operation, the filesystem was remounted ro suddenly. Dmesg showed me
> > > some errors about parent transid verify failed. I've forced of the VM
> > > and tried to mount the image on the host, but failed with:
> > > 
> > > [  340.702391] BTRFS info (device loop0p1): disk space caching is enabled
> > > [  340.702393] BTRFS info (device loop0p1): has skinny extents
> > > [  341.815890] BTRFS error (device loop0p1): parent transid verify failed on 152064327680 wanted 323984 found 323888
> > > [  341.831183] BTRFS error (device loop0p1): parent transid verify failed on 152064327680 wanted 323984 found 323888
> > 
> > Your extent tree is corrupted. Metadata CoW is broken.
> > 
> > I don't believe only extent tree get corrupted, other part of your fs
> > can also be corrupted.
> > 
> > > [  341.831194] BTRFS error (device loop0p1): failed to read block groups: -5
> > > [  341.851954] BTRFS error (device loop0p1): open_ctree failed
> > > 
> > > A btrfs check resulted in:
> > > 
> > > btrfs check /dev/loop0p1
> > > Opening filesystem to check...
> > > parent transid verify failed on 152064327680 wanted 323984 found 323888
> > > parent transid verify failed on 152064327680 wanted 323984 found 323888
> > > parent transid verify failed on 152064327680 wanted 323984 found 323888
> > > Ignoring transid failure
> > > leaf parent key incorrect 152064327680
> > > ERROR: failed to read block groups: Operation not permitted
> > > ERROR: cannot open file system
> > > 
> > > The host is running libvirt with kvm, btrfs with RAID1. The VMs are raw
> > > images, with btrfs too. I've switche this VM from io=native to
> > > io=io_uring, and suspect that this caused the damage. All machines are
> > > running kernel 5.8.13.
> > 
> > I'm not sure about the io_uring setup. IIRC as long as you're not using
> > cache=unsafe, it should be safe.
> > 
> > Does the io_uring ignores the flush?
> 
> 
> Putting someone with more knowledge into cc.
> 
> For another VM, I've found several errors in the log of the host machine:
> 
> BTRFS warning (device sda1): direct IO failed ino 5988432 rw 1,2131969 sector 0x123ab840 len 32768 err no 10
> 
> 
> The VM was remounted ro too, like the first one. But in this case the
> filesystem was ok after a check. 

I have to correct this. There are several csum errors on this VM like
this:

BTRFS warning (device vda1): checksum/header error at logical 9660727296 on dev /dev/vda1, physical 6488064: metadata leaf (level 0) in tree 257

-- 
Regards,
  Johannes Hirte


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-10-08  9:35 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-06  9:09 failed to read block groups: Operation not permitted Johannes Hirte
2020-10-06 12:06 ` Qu Wenruo
2020-10-08  9:24   ` Johannes Hirte
2020-10-08  9:35     ` Johannes Hirte

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).