All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oliver Freyermuth <o.freyermuth@googlemail.com>
To: Hugo Mills <hugo@carfax.org.uk>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: btrfs recovery
Date: Thu, 26 Jan 2017 12:01:29 +0100	[thread overview]
Message-ID: <9c38e493-e4aa-a718-c6a8-d400bcff0df8@googlemail.com> (raw)
In-Reply-To: <24f6cfb2-d008-af12-ad94-4a4da1be1ee2@googlemail.com>

>    It's on line 248 of the paste:
> 
> 246.   key (5547032576 EXTENT_ITEM 204800) block 596426752 (36403) gen 20441
> 247.   key (5561905152 EXTENT_ITEM 184320) block 596443136 (36404) gen 20441
> 248.   key (15606380089319694336 UNKNOWN.76 303104) block 596459520 (36405) gen 20441
> 249.   key (5726711808 EXTENT_ITEM 524288) block 596475904 (36406) gen 20441
> 250.   key (5820571648 EXTENT_ITEM 524288) block 350322688 (21382) gen 20427
> 
>    I was wrong in my assumption: this isn't a simple bitflip. It looks
> like a small random write of data over the item key. That's not to say
> that bad hardware isn't the culprit -- it's worth checking anyway --
> but it could also be a bug in... well, almost anything.
> 
>    It's not corruption on the disk, because that would be caught by
> the checksum mechanism. This data was corrupted in RAM, before it was
> checksummed and written to disk. That could have happened as a result
> of some rogue piece of kernel code writing to an incorrect address, or
> as a result of some _other_ memory corruption affecting an address
> which is then used to write something to.
In the past, I used the nvidia binary blob on that machine, which would of course be a potential culprit - but since a few months, the machine uses nouveau and the kernel is not tainted. 

In case somebody encounters something similar in the future, a few more details:
The only not-so-common kernel code running on that machine was zram, which I have unloaded just now. Apart from that, there's only very common hardware with in-tree modules (realtek ethernet card, intel chipset + CPU, VIA USB3 controller)
in the machine. Special kernel options are "iommu=soft zswap.enabled=1", I am running Gentoo, no kernel patches apart from those by Gentoo upstream (i.e. I use sys-kernel/gentoo-sources-4.9.0). 

I'm also running 'memtester 12G' right now, which at least tests 2/3 of the memory. I'll leave that running for a day or so, but of course it will not provide a clear answer... 

> 
>    Looking at the data, I think this should be manually fixable, with
> sufficient effort (and a hex editor).
> 
> Looking at the item value:
> 
>>>> hex(15606380089319694336)
> '0xd89500014da12000'
> 
> Compared to the preceding key's value:
> 
>>>> hex(5561905152)
> '0x14b83f000'
> 
> It looks like it's just the top couple of bytes in this field that are
> affected, so those (d8, 95) can be zeroed. The second field should
> clearly be EXTENT_ITEM, which is 0xa8. The offset field (the third
> one) looks OK to me -- the bottom byte is 0.
> 
>    We can probably talk you through fixing this by hand with a decent
> hex editor. I've done it before...
> 
That would be nice! Is it fine via the mailing list? 
Potentially, the instructions could be helpful for future reference, and "real" IRC is not accessible from my current location. 

Do you have suggestions for a decent hexeditor for this job? Until now, I have been mainly using emacs, 
classic hexedit (http://rigaux.org/hexedit.html), or okteta (beware, it's graphical!), but of course these were made for a few MiB of files and are not so well suited for a block device. 

The first thing to do would then probably just be to jump to the offset where 0xd89500014da12000 is written (can I get that via inspect-internal, or do I have to search for it?), fix that to read 
0x00a800014da12000
(if I understood correctly) and then probably adapt a checksum? 

>    Bear in mind that if it is unreliable hardware, then continued use
> of the FS in read-write operation is likely to cause additional
> damage.
Of course. 
I would then, in any case, after the filesystem is up again, clean up, do a fresh external backup, scratch the FS and recreate it. I think it is already over 2 years old, so it has survived several generations of kernels. 

Oliver

  parent reply	other threads:[~2017-01-26 11:01 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-26  9:18 btrfs recovery Oliver Freyermuth
2017-01-26  9:25 ` Hugo Mills
2017-01-26  9:36   ` Oliver Freyermuth
2017-01-26 10:00     ` Hugo Mills
2017-01-26 11:01     ` Oliver Freyermuth [this message]
2017-01-27 11:01       ` Oliver Freyermuth
2017-01-27 12:58         ` Austin S. Hemmelgarn
2017-01-28  5:00           ` Duncan
2017-01-28 12:37             ` Janos Toth F.
2017-01-28 16:51               ` Oliver Freyermuth
2017-01-28 16:46             ` Oliver Freyermuth
2017-01-31  4:58               ` Duncan
2017-01-31 12:45                 ` Austin S. Hemmelgarn
2017-02-01  4:36                   ` Duncan
2017-01-30 12:41             ` Austin S. Hemmelgarn
2017-01-28 21:04       ` Oliver Freyermuth
2017-01-28 22:27         ` Hans van Kranenburg
2017-01-29  2:02           ` Oliver Freyermuth
2017-01-29 16:44             ` Hans van Kranenburg
2017-01-29 19:09               ` Oliver Freyermuth
2017-01-29 19:28                 ` Hans van Kranenburg
2017-01-29 19:52                   ` Oliver Freyermuth
2017-01-29 20:13                     ` Hans van Kranenburg
  -- strict thread matches above, loose matches on Subject: below --
2017-01-30 20:02 Michael Born
2017-01-30 20:27 ` Hans van Kranenburg
2017-01-30 20:51 ` Chris Murphy
2017-01-30 21:07   ` Michael Born
2017-01-30 21:16     ` Hans van Kranenburg
2017-01-30 22:24       ` GWB
2017-01-30 22:37         ` Michael Born
2017-01-31  0:29           ` GWB
2017-01-31  9:08           ` Graham Cobb
2017-01-30 21:20     ` Chris Murphy
2017-01-30 21:35       ` Chris Murphy
2017-01-30 21:40       ` Michael Born
2017-01-31  4:30     ` Duncan
2017-01-19 10:06 Sebastian Gottschall
2017-01-20  1:08 ` Qu Wenruo
2017-01-20  9:45   ` Sebastian Gottschall
2017-01-23 11:15   ` Sebastian Gottschall
2017-01-24  0:39     ` Qu Wenruo
2017-01-20  8:05 ` Duncan
2017-01-20  9:59   ` Sebastian Gottschall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9c38e493-e4aa-a718-c6a8-d400bcff0df8@googlemail.com \
    --to=o.freyermuth@googlemail.com \
    --cc=hugo@carfax.org.uk \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.