linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* False alert: read time tree block corruption
@ 2019-12-04 11:04 Christian Höppner
  2019-12-04 11:32 ` Nikolay Borisov
  0 siblings, 1 reply; 4+ messages in thread
From: Christian Höppner @ 2019-12-04 11:04 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I'm writing because the kernel wiki page relating to this error[1] says to
write here first.

I'm (was) running Arch Linux, kernel 5.4.1, btrfs-progs 5.3.1

Yesterday during usage, the root file system remounted read-only. I was
dumb enough to react by rebooting the machine, when I was greeted by the
following error:

[  25.634530] BTRFS critical (device nvme0n1p2): corrupf leaf: block=810145234944...
[  25.634793] BTRFS error (device nvme0n1p2): block=810145234944 read time tree block corruption detected
[  25.634961] BTRFS error (device nvme0n1p2): in __btrfs_free_extent:3080: errno=-5 IO failure
[  25.635042] BTRFS error (device nvme0n1p2): in btrfs_run_delayed_refs:2188: errno=-5 IO failure
[  34.653440] systemd-journald[483]: Failed to torate /var/log/journal/8f7037b10bbd4f25aadd3d19105ef920/system.journal

After booting to live media, I checked SMART, badblocks, `btrfs check
--readonly` and `btrfs scrub`. All came back clean. I conclude that this
is a false positive, and have downgraded the kernel to 5.3.13 as a
workaround.

How can I provide more information to help?

[1]: https://btrfs.wiki.kernel.org/index.php/Tree-checker#How_to_handle_such_error

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: False alert: read time tree block corruption
  2019-12-04 11:04 False alert: read time tree block corruption Christian Höppner
@ 2019-12-04 11:32 ` Nikolay Borisov
  2019-12-05  2:50   ` Zygo Blaxell
  2019-12-05 11:44   ` Christian Höppner
  0 siblings, 2 replies; 4+ messages in thread
From: Nikolay Borisov @ 2019-12-04 11:32 UTC (permalink / raw)
  To: Christian Höppner, linux-btrfs



On 4.12.19 г. 13:04 ч., Christian Höppner wrote:
> Hello,
> 
> I'm writing because the kernel wiki page relating to this error[1] says to
> write here first.
> 
> I'm (was) running Arch Linux, kernel 5.4.1, btrfs-progs 5.3.1
> 
> Yesterday during usage, the root file system remounted read-only. I was
> dumb enough to react by rebooting the machine, when I was greeted by the
> following error:
> 
> [  25.634530] BTRFS critical (device nvme0n1p2): corrupf leaf: block=810145234944...

How come you omitted exactly the most useful error that could have
pointed at the problem ? If the data is intact on-disk and the leaf
checker triggered this means you likely have faulty ram.

> [  25.634793] BTRFS error (device nvme0n1p2): block=810145234944 read time tree block corruption detected
> [  25.634961] BTRFS error (device nvme0n1p2): in __btrfs_free_extent:3080: errno=-5 IO failure
> [  25.635042] BTRFS error (device nvme0n1p2): in btrfs_run_delayed_refs:2188: errno=-5 IO failure
> [  34.653440] systemd-journald[483]: Failed to torate /var/log/journal/8f7037b10bbd4f25aadd3d19105ef920/system.journal
> 
> After booting to live media, I checked SMART, badblocks, `btrfs check
> --readonly` and `btrfs scrub`. All came back clean. I conclude that this
> is a false positive, and have downgraded the kernel to 5.3.13 as a
> workaround.
> 
> How can I provide more information to help?
> 
> [1]: https://btrfs.wiki.kernel.org/index.php/Tree-checker#How_to_handle_such_error
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: False alert: read time tree block corruption
  2019-12-04 11:32 ` Nikolay Borisov
@ 2019-12-05  2:50   ` Zygo Blaxell
  2019-12-05 11:44   ` Christian Höppner
  1 sibling, 0 replies; 4+ messages in thread
From: Zygo Blaxell @ 2019-12-05  2:50 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Christian Höppner, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2757 bytes --]

On Wed, Dec 04, 2019 at 01:32:59PM +0200, Nikolay Borisov wrote:
> 
> 
> On 4.12.19 г. 13:04 ч., Christian Höppner wrote:
> > Hello,
> > 
> > I'm writing because the kernel wiki page relating to this error[1] says to
> > write here first.
> > 
> > I'm (was) running Arch Linux, kernel 5.4.1, btrfs-progs 5.3.1
> > 
> > Yesterday during usage, the root file system remounted read-only. I was
> > dumb enough to react by rebooting the machine, when I was greeted by the
> > following error:
> > 
> > [  25.634530] BTRFS critical (device nvme0n1p2): corrupf leaf: block=810145234944...
> 
> How come you omitted exactly the most useful error that could have
> pointed at the problem ? If the data is intact on-disk and the leaf
> checker triggered this means you likely have faulty ram.

Yesterday on IRC there was a similar case where the metadata in the extent
tree had nonsense generation values, but the rest of the filesystem
was fine.  It was very specific:  only the generation fields in several
extent items (sometimes even consecutive ones!).  Bad RAM is usually much
more chaotic:  different fields are corrupted, and some or all of them
will cause a more visible failure than mere mismatched transid.

	https://pastebin.com/raw/GemSDdin

Also it turned out that the filesystem was made in 2014.  Maybe there was
an old kernel bug that was putting garbage in extent generation numbers,
and this is the last remnant of it.

If such a bug was known in 2014, it might explain why btrfs doesn't seem
to detect it today.  btrfs check, read, and delete all said nothing
about the mismatched gen field.  I'd expect at least check and delete
to notice the gen field mismatch--after all, they are inspecting or
manipulating the extent item and the extent data reference already,
so there's no significant performance loss compared to not doing the
check at the same time.

> > [  25.634793] BTRFS error (device nvme0n1p2): block=810145234944 read time tree block corruption detected
> > [  25.634961] BTRFS error (device nvme0n1p2): in __btrfs_free_extent:3080: errno=-5 IO failure
> > [  25.635042] BTRFS error (device nvme0n1p2): in btrfs_run_delayed_refs:2188: errno=-5 IO failure
> > [  34.653440] systemd-journald[483]: Failed to torate /var/log/journal/8f7037b10bbd4f25aadd3d19105ef920/system.journal
> > 
> > After booting to live media, I checked SMART, badblocks, `btrfs check
> > --readonly` and `btrfs scrub`. All came back clean. I conclude that this
> > is a false positive, and have downgraded the kernel to 5.3.13 as a
> > workaround.
> > 
> > How can I provide more information to help?
> > 
> > [1]: https://btrfs.wiki.kernel.org/index.php/Tree-checker#How_to_handle_such_error
> > 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: False alert: read time tree block corruption
  2019-12-04 11:32 ` Nikolay Borisov
  2019-12-05  2:50   ` Zygo Blaxell
@ 2019-12-05 11:44   ` Christian Höppner
  1 sibling, 0 replies; 4+ messages in thread
From: Christian Höppner @ 2019-12-05 11:44 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs

On Wed Dec 4, 2019 at 1:32 PM, Nikolay Borisov wrote:
> How come you omitted exactly the most useful error that could have
> pointed at the problem ?

My bad. Here's the full text:

:: running early hook [udev]
Starting version 244-1-arch
:: running hook [udev]
:: Triggering uevents...
[     4.474941] hid-generic 003:0D8C:0005.0001: No inputs registered, leaving
:: performing fsck on '/dev/nvme0n1p2'
:: mounting '/dev/nvme0n1p2' on real root
[     6.153174] BTRFS critical (device nvme0n1p2): corrupt leaf: block=209407475712 slot=110 extent bytenr=224368013312 len=262144 invalid generation, have 94071693158288 expect (0 3890273]
[     6.153252] BTRFS error (device nvme0n1p2): block=209407475712 read time tree corruption detected
[     6.153421] BTRFS critical (device nvme0n1p2): corrupt leaf: block=209407475712 slot=110 extent bytenr=224368013312 len=262144 invalid generation, have 94071693158288 expect (0 3890273]
[     6.153462] BTRFS error (device nvme0n1p2): block=209407475712 read time tree corruption detected
[     6.153495] BTRFS error (device nvme0n1p2): failed to read block groups: -5
[     6.230015] BTRFS error (device nvme0n1p2): open_ctree failed
mount: /new_root: wrong fs type, bad option, bad superblock on /dev/nvme0n1p2, missing codepage or helper program, or other error.
You are being dropped into an emergency shell.
sh: can't access tty: job control turned off
[rootfs ]#


> If the data is intact on-disk and the leaf
> checker triggered this means you likely have faulty ram.

The data on disk seems fine. System boots with kernel 5.3.13, `btrfs
scrub` and `btrfs check --readonly` report no errors, nor have there
been any further issues during normal usage.

I'll run memtest overnight and report back.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-12-05 11:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-04 11:04 False alert: read time tree block corruption Christian Höppner
2019-12-04 11:32 ` Nikolay Borisov
2019-12-05  2:50   ` Zygo Blaxell
2019-12-05 11:44   ` Christian Höppner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).