All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Fwd: Re: BUG: unable to handle kernel paging request at ffff9fb75f827100
       [not found]                 ` <8DB99A3B-6238-497D-A70F-8834CC014DCF@gmail.com>
@ 2018-02-28  8:36                   ` Qu Wenruo
       [not found]                     ` <1519833022.3714.122.camel@scientia.net>
  0 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2018-02-28  8:36 UTC (permalink / raw)
  To: Christoph Anton Mitterer, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2843 bytes --]

Hi Christoph,

Since I'm still digging the unexpected corruption (although without much
progress yet), would you please describe how the corruption happens?

In my current investigation, btrfs is indeed bullet-proof, (unlike my
original assumption) using newer dm-log-writes tool.

But free space cache file (v1 free space cache) is not CoW protected so
it's vulnerable against power loss.

So my current assumption is, there are at least 2 power loss happens
during the problem.

The 1st power loss caused free space cache corrupted but not detected by
its checksum, and btrfs used the corrupted free space cache to allocate
tree blocks.

And then 2nd power loss happened. Since new allocated tree blocks can
overwrite existing tree blocks, it breaks metadata CoW of btrfs, and
leads the final corruption.

Would you please provide some detailed info about the corruption?

Thanks,
Qu

On 2018年02月23日 03:21, Christoph Anton Mitterer wrote:
> Am 22. Februar 2018 04:57:53 MEZ schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>>
>>
>> On 2018年02月22日 10:56, Christoph Anton Mitterer wrote:
>>> Just one last for today... I did a quick run with the byte nr from
>> the last mail... See screenshot
>>>
>>> It still gives these mapping errors... But does seem to write
>> files...
>>
>>From your previous picture, it seems that FS_TREE is your primary
>> subvolume, and 257 would be your snapshot.
>>
>> And for that block which can't be mapped, it seems to be a corruption
>> and it's really too large.
>>
>> So ignoring it wouldn't be a problem.
>>
>> And just keep btrfs-store running to see what it salvaged?
>>
>>>
>>> But these mapping errors... Wtf?! 
>>>
>>>
>>> Thanks and until tomorrow.
>>>
>>> Chris
>>>
>>> Oh and in my panic (I still fear that my main data fs, which is on
>> other hard disks could be affected by that strange bug, too, and have
>> no idea how to verify they are not) I forgot: you are from China,
>> aren't you? So a blessed happy new year. :-)
>>
>> Happy new year too.
>>
>> Thanks,
>> Qu
>>
>>>
> 
> Hey
> 
> Have you written more after the mail below?  Cause my normal email account ran full and I cannot recover that right now with my computer.
> 
> Anyway... I tried now the restore and it seems to give back some data (haven't looked at it yet)... I also made a dd copy of the whole fs image to another freshly crafted btrfs fs (as an image file).
> 
> That seemed to work well, but when I differs that image with the original, new csum errors of that file were found. (see attached image)
> 
> Could that be a pointer to some hardware defect? Perhaps the memory? Though I did do an extensive memtest86+ a while ago.
> 
> And that could be the reason for the corruption in the first place...
> 
> Thanks,
> Chris.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: spurious full btrfs corruption
       [not found]                     ` <1519833022.3714.122.camel@scientia.net>
@ 2018-03-01  1:25                       ` Qu Wenruo
  2018-03-06  0:57                         ` Christoph Anton Mitterer
  0 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2018-03-01  1:25 UTC (permalink / raw)
  To: Christoph Anton Mitterer, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 18692 bytes --]



On 2018年02月28日 23:50, Christoph Anton Mitterer wrote:
> Hey Qu
> 
> Thanks for still looking into this.
> I'm still in the recovery process (and there are other troubles at the
> university where I work, so everything will take me some time), but I
> have made a dd image of the broken fs, before I put a backup on the
> SSD, so that still exist in the case we need to do further debugging.
> 
> To thoroughly describe what has happened, let me go a bit back.
> 
> - Until last ~ September, I was using some Fujitsu E782, for at least
>   4 years, with no signs of data corruptions.

That's pretty good.

> - For my personal data, I have one[0] Seagate 8 TB SMR HDD, which I
>   backup (send/receive) on two further such HDDs (all these are
>   btrfs), and (rsync) on one further with ext4.
>   These files have all their SHA512 sums attached as XATTRs, which I
>   regularly test. So I think I can be pretty sure, that there was never
>   a case of silent data corruption and the RAM on the E782 is fine.

Good backup practice can't be even better.

> - In October I got a new notebook from the university... brand new
>   Fujitsu U757 in basically the best possible configuration.
>   I ran memtest86+ in it's normal (non-SMP) mode for roughly a day,
>   with no errors.
>   In SMP mode (which is considered experimental, I think) it crashes
>   reproducible on the same position. Many people seem to have this
>   (with exactly the same test, address range where it freezes) so I
>   considered it a bug in memtest86+ SMP mode, which it likely is.
>   A patch[1], didn't help me.

Normally I won't blame memory unless strange behavior happens, from
unexpected freeze to strange kernel panic.

But when this happens, a lot of things can go wrong.

> - Unfortunately from the beginning on that notebook showed many further
>   issues.
>   - CPU overheating[2]
>   - boot freezes, when the initramfs tool of Debian isn't configured
> to 
>     blindly add all modules to the initramfs[3].
>   - spurious freezes, which I couldn't really debug any further since
>     there is no serial port...

Netconsole would help here, especially when U757 has an RJ45.
As long as you have another system which is able to run nc, it should
catch any kernel message, and help us to analyse if it's really a memory
corruption.

> in that cases neither Magic-SysRq nor
>     even NumLock LEDs and so worked anymore.
>     These freezes caused me some troubles with dpkg[4].
>     The issue I describe there, could also shed some light on the whole
>     situation, since it resulted out of the freezes.
> - The dealer replaced the thermal paste on the CPU and when the CPU
>   overheating and the freezes didn't go away, they sent the notebook
>   for one week to Fujitsu in Germany, who allegedly thoroughly tested
>   it with Windows, and found no errors.

That's unfortunately very common for consumer electronics, as few people
and cooperation really care about Linux user on consumer laptops.

And since there are problems with the system (either hardware or
software), I already see a much higher possibility to hard reset.

> 
> - The notebooks SSD is a Samsung SSD 850 PRO 1TB, the same which I
>   already used with the old notebook.
>   A long SMART check after the corruption, brought no errors.

Also using that SSD with smaller capacity, it's less possible for the SSD.

> 
> 
> - Just before the corruption on the btrfs happened, I decided it's
> time 
>   for a backup of the notebooks SSD (what an irony, I know), so I made
>   a snapshot of my one and only subvol, removed and non-precious data
>   from that snapshot, made anotjer ro-snapshot of that and removed the
>   rw snapshot.
> - The kernel was some 4.14.
> 
> - More or less after that, I saw the "BUG: unable to handle kernel
>   paging request at ffff9fb75f827100" which I reported here.
>   I'm not sure whether this had to do with btrfs at all, and even if
>   whether it was the fs on the SSD, or another one on an external HDD

It could be Btrfs, and it would block btrfs module to continue, which is
almost a hard reset.

>   I've had mounted at that time.
>   sync/umount/remount,rw/shutdown all didn't work, and I had to power
>   off the node.
> - After that things went on basically as I described in my previous
>   mails to the list already.
>   - There were some csum erros.>   - Checking these files with debsums (Debian stores MD5s of the
>     package's files) found no errors.
>   - A scrub brought no errors.
>   - Shortly after the scrub, further csum errors as well as:
>     BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096
>   - Then I booted from a rescue USB stick with kernel/btrfs-progs 4.12.
>   - fsck in normal/lowmem mode were okay except:
>     Couldn't find free space inode 1
>   - I cleared the v1 free space cache
>   - a scrub failed with "ret=-1, errno=5 (Input/output error)"
>   - Things like these in the kernel log:
>     Feb 21 17:43:09 heisenberg kernel: BTRFS warning (device dm-0): checksum error at logical 16401395712 on dev /dev/mapper/system, sector 32033976, root 257, inode 42609350, offset 6754201600, length 4096, links 1 (path: var/lib/libvirt/images/subsurface.qcow2)
>     Feb 21 17:43:09 heisenberg kernel: BTRFS error (device dm-0): bdev /dev/mapper/system errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
>     Feb 21 17:46:57 heisenberg kernel: BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 16384
>   - ... (see the mails in the list archive, respectively what I sent
>     you off list... since I could only make screenshots from then)
> 
> - Off list you told me what to try (btrfs check with different roots,
>   and how to use btrfs restore, and how to find the right block for
>   that).
>   I didn't try --repair or so.
> - All the rescue works I made from that USB stick with
>   kernel/btrfs-progs 4.12.
>   Using the very block and FS_TREE ROOT_ITEM (or whatever it is called
>   ^^) that I wrote in our off list mails.
> - Since I didn't think of a possible memory defect on the new notebook
>   yet, I stil used the new one for these works.
> - To an external disk I did
>   - btrfs restore with options -x -S -i (plus the -f <num>, answering
>     all questions whether to go on because it's looping too long over a
>     file with no)
>   - the same to another dir, with option -m in addition and answering
>     these questions with yes.
> - Then I made a dd image of the broken fs,... and after that I diff'ed
>   the image with the original.
>   There I got another btrfs csum error, i.e. on the (freshly created)
>   btrfs on some external HDD, to which I dumped all the files during my
>   rescue efforts). So the file with the csum error was the image I just
>   created with dd.
> - Just for trying a made another dd/diff round (on the new notebook),
>   and this time it worked.
> - Still, alerted, I put the SSD into my OLD notebook, and continued
>   with everything that follows from there, I also upgraded
>   kernel/btrfs-progs to 4.15.
> 
> - I tried to repeat the two btrfs-restores... but interestingly, with
>   4.15, that didn't work and I got these block mapping errors.
>   This is IMO really strange...
>   I could later try to do it again with the OLD notebook but with 4.12.
> 
> - THen, I diffed what came out by the two different btrfs-restore
>   Except for the files where I answered the question n, they were
>   equal, that is at least in terms of actual data (didn't check
>   metadata)
>   (here however, I used the data from the restores that I made still on
>   the NEW notebook with 4.12 kernel/progs)
> - Parts of the first two restores (still made on the new notebook) gave
>   csum errors as well (and the diff aborted with an IO error).
>   I made a scrub on the extern HDD and removed all the broken files
>   (which were anyway uninteresting).
> 
> 
> I'd guess that the csum errors on the fresh btrfs on the external HDD,
> are some hint, that there could be simply an issue with the memory of
> the new notebook,... that just happens so rarely (only on a few blocks
> in 1TB copied), that it didn't strike too often.
> Maybe (pure speculation) this can be a reason for the freezes as well?

Normally I won't blame memory, but even newly created btrfs, without any
powerloss, it still reports csum error, then it maybe the problem.

> 
> 
> 
> - Then (OLD notebook, 4.15 kernel/progs) I created new btrfs on the
>   SSD, extracted a backup from last September (the backup happens to
> be 
>   on these Seagate 8TB HDDs I mentioned before... they were tar.gz'ed
>   and that file had SHA512 XATTRs which still verified).
>   Afterwards upgrading everything to the current state.
> - Now (and still ongoing) merging the data from the btrfs restore into
>   the "new" system,... which includes diffing or manually inspecting
>   files which have changed or are new since the backup.
>   This is obviously impossible for the multi-GB qcow2 VM images, which
>   appeared above in some checksum error at logical 16401395712...
> 
> - During that merging/checking... I didn't check anything off the files
>   under package management, i.e. /usr /bin /sbin /lib*
>   I checked everything from /root /etc/ /home and I'm still in the
>   process of checking the precious parts from /var
>   And here's something interesting again for you developers:
> 
> - So far, in the data I checked (which as I've said, excludes a lot,..
>   especially the QEMU images)
>   I found only few cases, where the data I got from btrfs restore was
>   really bad.
>   Namely, two MP3 files. Which were equal to their backup counterparts,
>   but just up to some offset... and the rest of the files were just
>   missing.

Offset? Is that offset aligned to 4K?
Or some strange offset?

> 
> - I cannot tell whether files from after the backup was made, may be
>   completely missing from the btrfs-restore... and I just don't
>   remember them...
> 
> - Especially recovering the VM images will take up some longer time...
>   (I think I cannot really trust what came out from the btrfs restore
>   here, since these already brought csum errs before)
> 
> 
> Two further possible issues / interesting things happened during the
> works:
> 1) btrfs-rescue-boot-usb-err.log
>    That was during the rescue operations from the OLD notebook and 4.15
>    kernel/progs already(!).
>    dm-0 is the SSD with the broken btrfs
>    dm-1 is the external HDD to which I wrote the images/btrfs-restore
>    data earlier
>    The csum errors on dm-1 are, as said, possibly from bad memory on
>    the new notebook, which I used to write the image/restore-data
>    in the first stage... and this was IIRC simply the time when I had
>    noticed that already and ran a scrub.
>    But what about that:
>    Feb 23 15:48:11 gss-rescue kernel: BTRFS warning (device dm-1): Skipping commit of aborted transaction.
>    Feb 23 15:48:11 gss-rescue kernel: ------------[ cut here ]------------
>    Feb 23 15:48:11 gss-rescue kernel: BTRFS: Transaction aborted (error -28)
>    ...
>    ?

No space left?
Pretty strange.

Would you please try to restore the fs on another system with good memory?

This -28 (ENOSPC) seems to show that the extent tree of the new btrfs is
corrupted.

> 2) btrfs-check.weird
>    This is on the freshly created FS on the SSD, after populating it
>    with loads of data from the backup.
>    fscks from 4.15 USB stick with normal and lowmem modes...
>    The show no error, but when you compare the byte numbers,... some
> of 
>    them differ!!! What the f***?
>    I.e. all but:
>    found 213620989952 bytes used, no error found
>    total csum bytes: 207507896
>    total extent tree bytes: 41713664
>    differ.
>    Same fs, no mounts/etc. in between, fscks directly ran after each
>    other.
>    How can this be?

Lowmem mode and original mode do different ways to iterate all extents.
For now please ignore it, but I'll dig into this to try to keep them same.

> 
> 
> 
> Now on to your questions:
> 
> 
> On Wed, 2018-02-28 at 16:36 +0800, Qu Wenruo wrote:
>> So my current assumption is, there are at least 2 power loss happens
>> during the problem.
>>
>> The 1st power loss caused free space cache corrupted but not detected
>> by
>> its checksum, and btrfs used the corrupted free space cache to
>> allocate
>> tree blocks.
>>
>> And then 2nd power loss happened. Since new allocated tree blocks can
>> overwrite existing tree blocks, it breaks metadata CoW of btrfs, and
>> leads the final corruption.
>>
>> Would you please provide some detailed info about the corruption?
> 
> 
> As for your questions about power loss... well there was no "classic"
> blackout power loss, but just the (many) occasions of freezes much
> earlier (described in the very beginning, at the problems with the new
> notebook). These happened over the last months... I usually made a
> fsck/scrub/full-debsums check... but never found an error.

The point here is, we need to pay extra attention about any fsck report
about free space cache corruption.
Since free space cache corruption (only happens for v1) is not a big
problem, fsck will only report but doesn't account it as error.

I would recommend to use either v2 space cache or *NEVER* use v1 space
cache.
It won't cause any functional chance, just a little slower.
But it rules out the only weak point against power loss.

> So my conclusion was the btrfs must be simply rock-solid ;-) (perhaps I
> should say non-raid56-btrfs?! :-P)...
> 
> Apart from these,... the cases of power loss by me having to put the
> system off (as shutdown/sync/etc. didn't work)... are described
> above... and all following events as thoroughly as I remembered them.

That's detailed enough, and that could considered as power loss (for btrfs).

> 
> 
> 
> I do remember that in the past I've seen few times errors with respect
> to the free space cache during the system ran... e.g.
> kern.log.4.xz:Jan 24 05:49:32 heisenberg kernel: [  120.203741] BTRFS warning (device dm-0): block group 22569549824 has wrong amount of free space
> kern.log.4.xz:Jan 24 05:49:32 heisenberg kernel: [  120.204484] BTRFS warning (device dm-0): failed to load free space cache for block group 22569549824, rebuilding it now
> but AFAIU these are considered to be "harmless"?

Yep, when kernel outputs such error, it's harmless.

But if kernel doesn't output such error after powerloss, it could be a
problem.
If kernel just follows the corrupted space cache, it would break
meta/data CoW, and btrfs is no longer bulletproof.

And to make things even more scary, nobody knows if such thing happens.
If no error message after power loss, it could be that block group is
untouched in previous transaction, or it could be damaged.

So I'm working to try to reproduce a case where v1 space cache is
corrupted and could lead to kernel to use them.

On the other hand, btrfs check does pretty good check on v1 space cache,
so after power loss, I would recommend to do a btrfs check before
mounting the fs.

And v2 space cache follows metadata CoW so we don't even need to bother
any corruption, it's just impossible (unless code bug)

Thanks,
Qu

> 
> Another case was:
> https://www.spinics.net/lists/linux-btrfs/msg61706.html
> But that was on one of the copies of the big 8TB HDDs with my private
> data... and not on the fs that broke now.
> 
> 
> 
> 
> From my PoV the following question remain:
> a) Obviously whether my new notebook is broken and how I verify this
>    ;-)
>    (and whether Qu still works for Fujitsu and has some contact for me
>    who really deal with these problems without saying "it's Linux'
>    fault" or who can make Fujitsu send me a new one :D)
> 
> 
> b) Whether my six 8TB-HDDs with btrfs, which I used (and wrote to) with
>    the new notebook may have also already some silent corruptions in
>    it.. and whether I can check them for that?
>    I'd have done the following:
>    - fsck (normal & lowmem) them all
>        and clear the v1 free space cache just to be sure
>    - verify my own SHA512 sums in the XATTRs
>    - do I full scrub
>    - perhaps do a stat on each file, to make sure that all metadata is
>      really read?
> 
>    Is there anything else I can do to verify everything is fine?
> 
> c) From what I can tell,.. btrfs restore seemed to have recovered a
>    lot... (though as I've said for much that it recovered I haven't
>    checked whether it's really fine or just garbage in terms of data)
>    This and also the fact that when the corruption finally appeared
>    (though I have no idea whether it built up already far longer), not
>    much was written to disk, makes me guess that most data was actually
>    still there... and just stuff in the meta-data (e.g. that generation
>    discrepancies and that block mapping errors) caused the troubles.
>    Is there anything one could do to make btrfs more robust against
>    these things?
>    E.g. on SSD metadata defaults to single, right? Would it have helped
>    if this was not the case?
> 
>    As far as I understand, btrfs by design with the CoW (if it has no
>    coding errors) shouldn't be able to lead to such corruptions.. but
>    rather just to either see old or new data.
>    Of course it can't solve possible memory errors... but it should
>    perhaps be able to notice them (isn't everything checksummed?) and
>    only write if these match?
>    But maybe my thinking is too simple minded here ;-)
> 
> 
> btw: I just remember, that during the btrfs-restore... the tool already
> complained that it cannot recover some few files (but those were not
> really important to me).... still it's a surprise that it worked for so
> many files... but couldn't recover some.
> 
> 
> 
> Thanks for your help, and do not hesitate to ask if you need more
> information.
> Chris.
> 
> 
> 
> [0] In the meantime the data grow furthere and thus I had to split it
>     on two HDDs, each having 2 btrfs copies and one on ext4.
> [1] http://forum.canardpc.com/threads/115443-PATCH-false-errors-in-test-7-with-SMP
> [2] http://lkml.iu.edu/hypermail/linux/kernel/1710.3/03429.html
> [3] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=878829
> [4] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=888234
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: spurious full btrfs corruption
  2018-03-01  1:25                       ` spurious full btrfs corruption Qu Wenruo
@ 2018-03-06  0:57                         ` Christoph Anton Mitterer
  2018-03-06  1:50                           ` Qu Wenruo
  2018-03-07  3:09                           ` Duncan
  0 siblings, 2 replies; 10+ messages in thread
From: Christoph Anton Mitterer @ 2018-03-06  0:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo

Hey Qu.

On Thu, 2018-03-01 at 09:25 +0800, Qu Wenruo wrote:
> > - For my personal data, I have one[0] Seagate 8 TB SMR HDD, which I
> >   backup (send/receive) on two further such HDDs (all these are
> >   btrfs), and (rsync) on one further with ext4.
> >   These files have all their SHA512 sums attached as XATTRs, which
> > I
> >   regularly test. So I think I can be pretty sure, that there was
> > never
> >   a case of silent data corruption and the RAM on the E782 is fine.
> 
> Good backup practice can't be even better.

Well I still would want to add something tape and/or optical based
solution...
But having this depends a bit on having a good way to do incremental
backups, i.e. I wouldn't want to write full copies of everything to
tape/BluRay over and over again, but just the actually added data and
records of metadata changes.
The former (adding just added files is rather easy), but still
recording any changes in metadata (moved/renamed/deleted files, changes
in file dates, permissions, XATTRS etc.).
Also I would always want to backup complete files, so not just changes
to a file, even if just one byte changed of a 4 GiB file... and not
want to have files split over mediums.

send/receive sounds like a candidate for this (except it works only on
changes, not full files), but I would prefer to have everything in a
standard format like tar which one can rather easily recover manually
if there are failures in the backups.


Another missing piece is a tool which (at my manual order) adds hash
sums to the files, and which can verify them
Actually I wrote such a tool already, but as shell script and it simply
forks so often, that it became extremely slow at millions of small
files.
I often found it so useful to have that kind of checksumming in
addition to the kind of checksumming e.g. btrfs does which is not at
the level of whole files.
So if something goes wrong like now, I cannot only verify whether
single extents are valid, but also the chain of them that comprises a
file.. and that just for the point where I defined "now, as it is, the
file is valid",.. and automatically on any writes, as it would be done
at file system level checksumming.
In the current case,... for many files where I had such whole-file-
csums, verifying whether what btrfs-restore gave me was valid or not,
was very easy because of them.


> Normally I won't blame memory unless strange behavior happens, from
> unexpected freeze to strange kernel panic.
Me neither... I think bad RAM happens rather rarely these days.... but
my case may actually be one.


> Netconsole would help here, especially when U757 has an RJ45.
> As long as you have another system which is able to run nc, it should
> catch any kernel message, and help us to analyse if it's really a
> memory
> corruption.
Ah thanks... I wasn't even aware of that ^^
I'll have a look at it when I start inspecting the U757 again in the
next weeks.


> > - The notebooks SSD is a Samsung SSD 850 PRO 1TB, the same which I
> >   already used with the old notebook.
> >   A long SMART check after the corruption, brought no errors.
> 
> Also using that SSD with smaller capacity, it's less possible for the
> SSD.
Sorry, what do you mean? :)


> Normally I won't blame memory, but even newly created btrfs, without
> any
> powerloss, it still reports csum error, then it maybe the problem.
That was also my idea...
I may mix up things, but I think I even found a csum error later on the
rescue USB stick (which is also btrfs)... would need to double check
that, though.

> > - So far, in the data I checked (which as I've said, excludes a
> > lot,..
> >   especially the QEMU images)
> >   I found only few cases, where the data I got from btrfs restore
> > was
> >   really bad.
> >   Namely, two MP3 files. Which were equal to their backup
> > counterparts,
> >   but just up to some offset... and the rest of the files were just
> >   missing.
> 
> Offset? Is that offset aligned to 4K?
> Or some strange offset?

These were the two files:
-rw-r--r-- 1 calestyo calestyo   90112 Feb 22 16:46 'Lady In The Water/05.mp3'
-rw-r--r-- 1 calestyo calestyo 4892407 Feb 27 23:28 '/home/calestyo/share/music/Lady In The Water/05.mp3'


-rw-r--r-- 1 calestyo calestyo 1904640 Feb 22 16:47 'The Hunt For Red October [Intrada]/21.mp3'
-rw-r--r-- 1 calestyo calestyo 2968128 Feb 27 23:28 '/home/calestyo/share/music/The Hunt For Red October [Intrada]/21.mp3'

with the former (smaller one) being the corrupted one (i.e. the one
returned by btrfs-restore).

Both are (in terms of filesize) multiples of 4096... what does that
mean now?


> > - Especially recovering the VM images will take up some longer
> > time...
> >   (I think I cannot really trust what came out from the btrfs restore
> >   here, since these already brought csum errs before)

In the meantime I had a look of the remaining files that I got from the
btrfs-restore (haven't run it again so far, from the OLD notebook, so
only the results from the NEW notebook here:):

The remaining ones were multi-GB qcow2 images for some qemu VMs.
I think I had non of these files open (i.e. VMs running) while in the
final corruption phase... but at least I'm sure that not *all* of them
were running.

However, all the qcow2 files from the restore are more or less garbage.
During the btrfs-restore it already complained on them, that it would
loop too often on them and whether I want to continue or not (I choose
n and on another full run I choose y).

Some still contain a partition table, some partitions even filesystems
(btrfs again)... but I cannot mount them.

The following is some output of several commands... but these
filesystems are not that important for me... it would be nice if one
could recover parts of it,... but nothing you'd need to waste time for.
But perhaps it helps to improve btrfs-restore(?),... so here we go:



root@heisenberg:/mnt/restore/x# l
total 368M
drwxr-xr-x 1 root root 212 Mar  6 01:52 .
drwxr-xr-x 1 root root  76 Mar  6 01:52 ..
-rw------- 1 root root 41G Feb 15 17:48 SilverFast.qcow2
-rw------- 1 root root 27G Feb 15 16:18 Windows.qcow2
-rw------- 1 root root 11G Feb 21 02:05 klenze.scientia.net_Debian-amd64-unstable.qcow2
-rw------- 1 root root 13G Feb 17 22:27 mldonkey.qcow2
-rw------- 1 root root 11G Nov 16 01:40 subsurface.qcow2
root@heisenberg:/mnt/restore/x# qemu-nbd -f qcow2 --connect=/dev/nbd0 klenze.scientia.net_Debian-amd64-unstable.qcow2 
root@heisenberg:/mnt/restore/x# blkid /dev/nbd0*
/dev/nbd0: PTUUID="a4944b03-ae24-49ce-81ef-9ef2cf4a0111" PTTYPE="gpt"
/dev/nbd0p1: PARTLABEL="BIOS boot partition" PARTUUID="c493388e-6f04-4499-838e-1b80669f6d63"
/dev/nbd0p2: LABEL="system" UUID="e4c30bb5-61cf-40aa-ba50-d296fe45d72a" UUID_SUB="0e258f8d-5472-408c-8d8e-193bbee53d9a" TYPE="btrfs" PARTLABEL="Linux filesystem" PARTUUID="cd6a8d28-2259-4b0c-869f-267e7f6fa5fa"
root@heisenberg:/mnt/restore/x# mount -r /dev/nbd0p2 /opt/
mount: /opt: wrong fs type, bad option, bad superblock on /dev/nbd0p2, missing codepage or helper program, or other error.
root@heisenberg:/mnt/restore/x# 

kernel says:
Mar 06 01:53:14 heisenberg kernel: Alternate GPT is invalid, using primary GPT.
Mar 06 01:53:14 heisenberg kernel:  nbd0: p1 p2
Mar 06 01:53:44 heisenberg kernel: BTRFS info (device nbd0p2): disk space caching is enabled
Mar 06 01:53:44 heisenberg kernel: BTRFS info (device nbd0p2): has skinny extents
Mar 06 01:53:44 heisenberg kernel: BTRFS error (device nbd0p2): bad tree block start 0 12142526464
Mar 06 01:53:44 heisenberg kernel: BTRFS error (device nbd0p2): bad tree block start 0 12142526464
Mar 06 01:53:44 heisenberg kernel: BTRFS error (device nbd0p2): failed to read chunk root
Mar 06 01:53:44 heisenberg kernel: BTRFS error (device nbd0p2): open_ctree failed



root@heisenberg:/mnt/restore/x# btrfs check /dev/nbd0p2
checksum verify failed on 12142526464 found E4E3BDB6 wanted 00000000
checksum verify failed on 12142526464 found E4E3BDB6 wanted 00000000
checksum verify failed on 12142526464 found E4E3BDB6 wanted 00000000
checksum verify failed on 12142526464 found E4E3BDB6 wanted 00000000
bytenr mismatch, want=12142526464, have=0
ERROR: cannot read chunk root
ERROR: cannot open file system

(kernel says nothing)


root@heisenberg:/mnt/restore/x# btrfs-find-root /dev/nbd0p2
WARNING: cannot read chunk root, continue anyway
Superblock thinks the generation is 572957
Superblock thinks the level is 0

(again, nothing from the kernel log)



root@heisenberg:/mnt/restore/x# btrfs inspect-internal dump-tree /dev/nbd0p2
btrfs-progs v4.15.1
checksum verify failed on 12142526464 found E4E3BDB6 wanted 00000000
checksum verify failed on 12142526464 found E4E3BDB6 wanted 00000000
checksum verify failed on 12142526464 found E4E3BDB6 wanted 00000000
checksum verify failed on 12142526464 found E4E3BDB6 wanted 00000000
bytenr mismatch, want=12142526464, have=0
ERROR: cannot read chunk root
ERROR: unable to open /dev/nbd0p2



root@heisenberg:/mnt/restore/x# btrfs inspect-internal dump-super /dev/nbd0p2
superblock: bytenr=65536, device=/dev/nbd0p2
---------------------------------------------------------
csum_type		0 (crc32c)
csum_size		4
csum			0x36145f6d [match]
bytenr			65536
flags			0x1
			( WRITTEN )
magic			_BHRfS_M [match]
fsid			e4c30bb5-61cf-40aa-ba50-d296fe45d72a
label			system
generation		572957
root			316702720
sys_array_size		129
chunk_root_generation	524318
root_level		0
chunk_root		12142526464
chunk_root_level	0
log_root		0
log_root_transid	0
log_root_level		0
total_bytes		20401074176
bytes_used		6371258368
sectorsize		4096
nodesize		16384
leafsize (deprecated)		16384
stripesize		4096
root_dir		6
num_devices		1
compat_flags		0x0
compat_ro_flags		0x0
incompat_flags		0x161
			( MIXED_BACKREF |
			  BIG_METADATA |
			  EXTENDED_IREF |
			  SKINNY_METADATA )
cache_generation	572957
uuid_tree_generation	33
dev_item.uuid		0e258f8d-5472-408c-8d8e-193bbee53d9a
dev_item.fsid		e4c30bb5-61cf-40aa-ba50-d296fe45d72a [match]
dev_item.type		0
dev_item.total_bytes	20401074176
dev_item.bytes_used	11081351168
dev_item.io_align	4096
dev_item.io_width	4096
dev_item.sector_size	4096
dev_item.devid		1
dev_item.dev_group	0
dev_item.seek_speed	0
dev_item.bandwidth	0
dev_item.generation	0


> > Two further possible issues / interesting things happened during
> > the
> > works:
> > 1) btrfs-rescue-boot-usb-err.log
> >    That was during the rescue operations from the OLD notebook and
> > 4.15
> >    kernel/progs already(!).
> >    dm-0 is the SSD with the broken btrfs
> >    dm-1 is the external HDD to which I wrote the images/btrfs-
> > restore
> >    data earlier
> >    The csum errors on dm-1 are, as said, possibly from bad memory
> > on
> >    the new notebook, which I used to write the image/restore-data
> >    in the first stage... and this was IIRC simply the time when I
> > had
> >    noticed that already and ran a scrub.
> >    But what about that:
> >    Feb 23 15:48:11 gss-rescue kernel: BTRFS warning (device dm-1):
> > Skipping commit of aborted transaction.
> >    Feb 23 15:48:11 gss-rescue kernel: ------------[ cut here ]-----
> > -------
> >    Feb 23 15:48:11 gss-rescue kernel: BTRFS: Transaction aborted
> > (error -28)
> >    ...
> >    ?
> 
> No space left?
> Pretty strange.

If dm-1 should be the one with no space left,... then probably not, as
it's another 8TB device that should have many TBs left.


> Would you please try to restore the fs on another system with good
> memory?

Which one? The originally broken fs from the SSD?
And what should I try to find out here?


> This -28 (ENOSPC) seems to show that the extent tree of the new btrfs
> is
> corrupted.

"new" here is dm-1, right? Which is the fresh btrfs I've created on
some 8TB HDD for my recovery works.
While that FS shows me:
[26017.690417] BTRFS info (device dm-2): disk space caching is enabled
[26017.690421] BTRFS info (device dm-2): has skinny extents
[26017.798959] BTRFS info (device dm-2): bdev /dev/mapper/data-a4 errs:
wr 0, rd 0, flush 0, corrupt 130, gen 0
on mounting (I think the 130 corruptions are simply from the time when
I still used it for btrfs-restore with the NEW notebook with possibly
bad RAM)... I continued to use it in the meantime (for more recovery
works) and wrote actually many TB to it... so far, there seem to be no
further corruption on it.
If there was some extent tree corruption... than nothing I would notice
now.

An fsck of it seems fine:
# btrfs check /dev/mapper/restore 
Checking filesystem on /dev/mapper/restore
UUID: 62eb62e0-775b-4523-b218-1410b90c03c9
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
found 2502273781760 bytes used, no error found
total csum bytes: 2438116164
total tree bytes: 5030854656
total fs tree bytes: 2168242176
total extent tree bytes: 286375936
btree space waste bytes: 453818165
file data blocks allocated: 2877953581056
 referenced 2877907415040


> > 2) btrfs-check.weird
> >    This is on the freshly created FS on the SSD, after populating
> > it
> >    with loads of data from the backup.
> >    fscks from 4.15 USB stick with normal and lowmem modes...
> >    The show no error, but when you compare the byte numbers,...
> > some
> > of 
> >    them differ!!! What the f***?
> >    I.e. all but:
> >    found 213620989952 bytes used, no error found
> >    total csum bytes: 207507896
> >    total extent tree bytes: 41713664
> >    differ.
> >    Same fs, no mounts/etc. in between, fscks directly ran after
> > each
> >    other.
> >    How can this be?
> 
> Lowmem mode and original mode do different ways to iterate all
> extents.
> For now please ignore it, but I'll dig into this to try to keep them
> same.

Okay... just tell me if you need me to try something new out in that
area.


> The point here is, we need to pay extra attention about any fsck
> report
> about free space cache corruption.
> Since free space cache corruption (only happens for v1) is not a big
> problem, fsck will only report but doesn't account it as error.
Why is it not a big problem?



> I would recommend to use either v2 space cache or *NEVER* use v1
> space
> cache.
> It won't cause any functional chance, just a little slower.
> But it rules out the only weak point against power loss.
This comes as a surprise... wasn't it always said that v2 space cache
is still unstable?

And shouldn't that then become either default (using v2)... or a
default of not using v1 at least?



> > I do remember that in the past I've seen few times errors with
> > respect
> > to the free space cache during the system ran... e.g.
> > kern.log.4.xz:Jan 24 05:49:32 heisenberg kernel: [  120.203741]
> > BTRFS warning (device dm-0): block group 22569549824 has wrong
> > amount of free space
> > kern.log.4.xz:Jan 24 05:49:32 heisenberg kernel: [  120.204484]
> > BTRFS warning (device dm-0): failed to load free space cache for
> > block group 22569549824, rebuilding it now
> > but AFAIU these are considered to be "harmless"?
> 
> Yep, when kernel outputs such error, it's harmless.

Well I have seen such also in case there was no power loss/crash/etc.
(see the mails I wrote you off list in the last days).


> But if kernel doesn't output such error after powerloss, it could be
> a
> problem.
> If kernel just follows the corrupted space cache, it would break
> meta/data CoW, and btrfs is no longer bulletproof.
Okay... sounds scary... as I probably had "many" cases of crashes,
where I at least didn't notice these messages (OTOH, I didn't really
look for them).


> And to make things even more scary, nobody knows if such thing
> happens.
> If no error message after power loss, it could be that block group is
> untouched in previous transaction, or it could be damaged.
Wouldn't it be reasonable, that when a fs is mounted that was not
properly unmounted (I assume there is some flag that shows this?),...
any such possible corrupted caches/trees/etc. are simply invalidated as
a safety measure?


> So I'm working to try to reproduce a case where v1 space cache is
> corrupted and could lead to kernel to use them.
Well even if you manage to do and rule out a few cases of such
corruptions by fixing bugs, it still sounds all pretty fragile.


Had you seen that from my mail "Re: BUG: unable to handle kernel paging
request at ffff9fb75f827100" from "Wed, 21 Feb 2018 17:42:01 +0100":
checking extents
checking free space cache
Couldn't find free space inode 1
checking fs roots
checking csums
checking root refs
Checking filesystem on /dev/mapper/system
UUID: b6050e38-716a-40c3-a8df-fcf1dd7e655d
found 676124835840 bytes used, no error found
total csum bytes: 657522064
total tree bytes: 2546106368
total fs tree bytes: 1496350720
total extent tree bytes: 182255616
btree space waste bytes: 594036536
file data blocks allocated: 5032601706496
 referenced 670040977408

That was a fsck of the corrupted fs on the SSD (from the USB stick with
I think with 4.12 kernel/progs)
Especially that it was inode 1 seems like a win in the lottery...
"Couldn't find free space inode 1" 
so couldn't that also point to something?


[obsolete because of below] The v1 space caches aren't
checksumed/CoWed, right? Wouldn't that make sense to rule out using any
broken cache?


> On the other hand, btrfs check does pretty good check on v1 space
> cache,
> so after power loss, I would recommend to do a btrfs check before
> mounting the fs.
And I assume using --clear-space-cache v1 to simply reset the cache...?


> And v2 space cache follows metadata CoW so we don't even need to
> bother
> any corruption, it's just impossible (unless code bug)
Ah... okay ^^ Then why isn't it default, or at least v1 space cache
disabled per default for anyone?
Even if my case of corruptions here on the SSD may/would have been
caused by bad memory and nothing to do with space cache,... this sounds
still like an area where many bad things could happen.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: spurious full btrfs corruption
  2018-03-06  0:57                         ` Christoph Anton Mitterer
@ 2018-03-06  1:50                           ` Qu Wenruo
  2018-03-08 14:38                             ` Christoph Anton Mitterer
  2018-03-07  3:09                           ` Duncan
  1 sibling, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2018-03-06  1:50 UTC (permalink / raw)
  To: Christoph Anton Mitterer, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 22205 bytes --]



On 2018年03月06日 08:57, Christoph Anton Mitterer wrote:
> Hey Qu.
> 
> On Thu, 2018-03-01 at 09:25 +0800, Qu Wenruo wrote:
>>> - For my personal data, I have one[0] Seagate 8 TB SMR HDD, which I
>>>   backup (send/receive) on two further such HDDs (all these are
>>>   btrfs), and (rsync) on one further with ext4.
>>>   These files have all their SHA512 sums attached as XATTRs, which
>>> I
>>>   regularly test. So I think I can be pretty sure, that there was
>>> never
>>>   a case of silent data corruption and the RAM on the E782 is fine.
>>
>> Good backup practice can't be even better.
> 
> Well I still would want to add something tape and/or optical based
> solution...
> But having this depends a bit on having a good way to do incremental
> backups, i.e. I wouldn't want to write full copies of everything to
> tape/BluRay over and over again, but just the actually added data and
> records of metadata changes.
> The former (adding just added files is rather easy), but still
> recording any changes in metadata (moved/renamed/deleted files, changes
> in file dates, permissions, XATTRS etc.).
> Also I would always want to backup complete files, so not just changes
> to a file, even if just one byte changed of a 4 GiB file... and not
> want to have files split over mediums.
> 
> send/receive sounds like a candidate for this (except it works only on
> changes, not full files), but I would prefer to have everything in a
> standard format like tar which one can rather easily recover manually
> if there are failures in the backups.
> 
> 
> Another missing piece is a tool which (at my manual order) adds hash
> sums to the files, and which can verify them
> Actually I wrote such a tool already, but as shell script and it simply
> forks so often, that it became extremely slow at millions of small
> files.
> I often found it so useful to have that kind of checksumming in
> addition to the kind of checksumming e.g. btrfs does which is not at
> the level of whole files.
> So if something goes wrong like now, I cannot only verify whether
> single extents are valid, but also the chain of them that comprises a
> file.. and that just for the point where I defined "now, as it is, the
> file is valid",.. and automatically on any writes, as it would be done
> at file system level checksumming.
> In the current case,... for many files where I had such whole-file-
> csums, verifying whether what btrfs-restore gave me was valid or not,
> was very easy because of them.
> 
> 
>> Normally I won't blame memory unless strange behavior happens, from
>> unexpected freeze to strange kernel panic.
> Me neither... I think bad RAM happens rather rarely these days.... but
> my case may actually be one.
> 
> 
>> Netconsole would help here, especially when U757 has an RJ45.
>> As long as you have another system which is able to run nc, it should
>> catch any kernel message, and help us to analyse if it's really a
>> memory
>> corruption.
> Ah thanks... I wasn't even aware of that ^^
> I'll have a look at it when I start inspecting the U757 again in the
> next weeks.
> 
> 
>>> - The notebooks SSD is a Samsung SSD 850 PRO 1TB, the same which I
>>>   already used with the old notebook.
>>>   A long SMART check after the corruption, brought no errors.
>>
>> Also using that SSD with smaller capacity, it's less possible for the
>> SSD.
> Sorry, what do you mean? :)

I'm using the same SSD (with smaller size).
So unless some strange thing happened, I won't blame the SSD.

> 
> 
>> Normally I won't blame memory, but even newly created btrfs, without
>> any
>> powerloss, it still reports csum error, then it maybe the problem.
> That was also my idea...
> I may mix up things, but I think I even found a csum error later on the
> rescue USB stick (which is also btrfs)... would need to double check
> that, though.
> 
>>> - So far, in the data I checked (which as I've said, excludes a
>>> lot,..
>>>   especially the QEMU images)
>>>   I found only few cases, where the data I got from btrfs restore
>>> was
>>>   really bad.
>>>   Namely, two MP3 files. Which were equal to their backup
>>> counterparts,
>>>   but just up to some offset... and the rest of the files were just
>>>   missing.
>>
>> Offset? Is that offset aligned to 4K?
>> Or some strange offset?
> 
> These were the two files:
> -rw-r--r-- 1 calestyo calestyo   90112 Feb 22 16:46 'Lady In The Water/05.mp3'
> -rw-r--r-- 1 calestyo calestyo 4892407 Feb 27 23:28 '/home/calestyo/share/music/Lady In The Water/05.mp3'
> 
> 
> -rw-r--r-- 1 calestyo calestyo 1904640 Feb 22 16:47 'The Hunt For Red October [Intrada]/21.mp3'
> -rw-r--r-- 1 calestyo calestyo 2968128 Feb 27 23:28 '/home/calestyo/share/music/The Hunt For Red October [Intrada]/21.mp3'
> 
> with the former (smaller one) being the corrupted one (i.e. the one
> returned by btrfs-restore).
> 
> Both are (in terms of filesize) multiples of 4096... what does that
> mean now?

That means either we lost some file extents or inode items.

Btrfs-restore only found EXTENT_DATA, which contains the pointer to the
real data, and inode number.
But no INODE_ITEM is found, which records the real inode size, so it can
only use EXTENT_DATA to rebuild as much data as possible.
That why all recovered one is aligned to 4K.

So some metadata is also corrupted.

> 
> 
>>> - Especially recovering the VM images will take up some longer
>>> time...
>>>   (I think I cannot really trust what came out from the btrfs restore
>>>   here, since these already brought csum errs before)
> 
> In the meantime I had a look of the remaining files that I got from the
> btrfs-restore (haven't run it again so far, from the OLD notebook, so
> only the results from the NEW notebook here:):
> 
> The remaining ones were multi-GB qcow2 images for some qemu VMs.
> I think I had non of these files open (i.e. VMs running) while in the
> final corruption phase... but at least I'm sure that not *all* of them
> were running.
> 
> However, all the qcow2 files from the restore are more or less garbage.
> During the btrfs-restore it already complained on them, that it would
> loop too often on them and whether I want to continue or not (I choose
> n and on another full run I choose y).
> 
> Some still contain a partition table, some partitions even filesystems
> (btrfs again)... but I cannot mount them.

I think the same problem happens on them too.

Some data is lost while some are good.
Anyway, they would be garbage.

> 
> The following is some output of several commands... but these
> filesystems are not that important for me... it would be nice if one
> could recover parts of it,... but nothing you'd need to waste time for.
> But perhaps it helps to improve btrfs-restore(?),... so here we go:
> 
> 
> 
> root@heisenberg:/mnt/restore/x# l
> total 368M
> drwxr-xr-x 1 root root 212 Mar  6 01:52 .
> drwxr-xr-x 1 root root  76 Mar  6 01:52 ..
> -rw------- 1 root root 41G Feb 15 17:48 SilverFast.qcow2
> -rw------- 1 root root 27G Feb 15 16:18 Windows.qcow2
> -rw------- 1 root root 11G Feb 21 02:05 klenze.scientia.net_Debian-amd64-unstable.qcow2
> -rw------- 1 root root 13G Feb 17 22:27 mldonkey.qcow2
> -rw------- 1 root root 11G Nov 16 01:40 subsurface.qcow2
> root@heisenberg:/mnt/restore/x# qemu-nbd -f qcow2 --connect=/dev/nbd0 klenze.scientia.net_Debian-amd64-unstable.qcow2 
> root@heisenberg:/mnt/restore/x# blkid /dev/nbd0*
> /dev/nbd0: PTUUID="a4944b03-ae24-49ce-81ef-9ef2cf4a0111" PTTYPE="gpt"
> /dev/nbd0p1: PARTLABEL="BIOS boot partition" PARTUUID="c493388e-6f04-4499-838e-1b80669f6d63"
> /dev/nbd0p2: LABEL="system" UUID="e4c30bb5-61cf-40aa-ba50-d296fe45d72a" UUID_SUB="0e258f8d-5472-408c-8d8e-193bbee53d9a" TYPE="btrfs" PARTLABEL="Linux filesystem" PARTUUID="cd6a8d28-2259-4b0c-869f-267e7f6fa5fa"
> root@heisenberg:/mnt/restore/x# mount -r /dev/nbd0p2 /opt/
> mount: /opt: wrong fs type, bad option, bad superblock on /dev/nbd0p2, missing codepage or helper program, or other error.
> root@heisenberg:/mnt/restore/x# 
> 
> kernel says:
> Mar 06 01:53:14 heisenberg kernel: Alternate GPT is invalid, using primary GPT.
> Mar 06 01:53:14 heisenberg kernel:  nbd0: p1 p2
> Mar 06 01:53:44 heisenberg kernel: BTRFS info (device nbd0p2): disk space caching is enabled
> Mar 06 01:53:44 heisenberg kernel: BTRFS info (device nbd0p2): has skinny extents
> Mar 06 01:53:44 heisenberg kernel: BTRFS error (device nbd0p2): bad tree block start 0 12142526464
> Mar 06 01:53:44 heisenberg kernel: BTRFS error (device nbd0p2): bad tree block start 0 12142526464
> Mar 06 01:53:44 heisenberg kernel: BTRFS error (device nbd0p2): failed to read chunk root
> Mar 06 01:53:44 heisenberg kernel: BTRFS error (device nbd0p2): open_ctree failed
> 
> 
> 
> root@heisenberg:/mnt/restore/x# btrfs check /dev/nbd0p2
> checksum verify failed on 12142526464 found E4E3BDB6 wanted 00000000
> checksum verify failed on 12142526464 found E4E3BDB6 wanted 00000000
> checksum verify failed on 12142526464 found E4E3BDB6 wanted 00000000
> checksum verify failed on 12142526464 found E4E3BDB6 wanted 00000000
> bytenr mismatch, want=12142526464, have=0
> ERROR: cannot read chunk root
> ERROR: cannot open file system
> 
> (kernel says nothing)
> 
> 
> root@heisenberg:/mnt/restore/x# btrfs-find-root /dev/nbd0p2
> WARNING: cannot read chunk root, continue anyway
> Superblock thinks the generation is 572957
> Superblock thinks the level is 0
> 
> (again, nothing from the kernel log)
> 
> 
> 
> root@heisenberg:/mnt/restore/x# btrfs inspect-internal dump-tree /dev/nbd0p2
> btrfs-progs v4.15.1
> checksum verify failed on 12142526464 found E4E3BDB6 wanted 00000000
> checksum verify failed on 12142526464 found E4E3BDB6 wanted 00000000
> checksum verify failed on 12142526464 found E4E3BDB6 wanted 00000000
> checksum verify failed on 12142526464 found E4E3BDB6 wanted 00000000
> bytenr mismatch, want=12142526464, have=0
> ERROR: cannot read chunk root
> ERROR: unable to open /dev/nbd0p2
> 
> 
> 
> root@heisenberg:/mnt/restore/x# btrfs inspect-internal dump-super /dev/nbd0p2
> superblock: bytenr=65536, device=/dev/nbd0p2
> ---------------------------------------------------------
> csum_type		0 (crc32c)
> csum_size		4
> csum			0x36145f6d [match]
> bytenr			65536
> flags			0x1
> 			( WRITTEN )
> magic			_BHRfS_M [match]
> fsid			e4c30bb5-61cf-40aa-ba50-d296fe45d72a
> label			system
> generation		572957
> root			316702720
> sys_array_size		129
> chunk_root_generation	524318
> root_level		0
> chunk_root		12142526464
> chunk_root_level	0
> log_root		0
> log_root_transid	0
> log_root_level		0
> total_bytes		20401074176
> bytes_used		6371258368
> sectorsize		4096
> nodesize		16384
> leafsize (deprecated)		16384
> stripesize		4096
> root_dir		6
> num_devices		1
> compat_flags		0x0
> compat_ro_flags		0x0
> incompat_flags		0x161
> 			( MIXED_BACKREF |
> 			  BIG_METADATA |
> 			  EXTENDED_IREF |
> 			  SKINNY_METADATA )
> cache_generation	572957
> uuid_tree_generation	33
> dev_item.uuid		0e258f8d-5472-408c-8d8e-193bbee53d9a
> dev_item.fsid		e4c30bb5-61cf-40aa-ba50-d296fe45d72a [match]
> dev_item.type		0
> dev_item.total_bytes	20401074176
> dev_item.bytes_used	11081351168
> dev_item.io_align	4096
> dev_item.io_width	4096
> dev_item.sector_size	4096
> dev_item.devid		1
> dev_item.dev_group	0
> dev_item.seek_speed	0
> dev_item.bandwidth	0
> dev_item.generation	0
> 
> 
>>> Two further possible issues / interesting things happened during
>>> the
>>> works:
>>> 1) btrfs-rescue-boot-usb-err.log
>>>    That was during the rescue operations from the OLD notebook and
>>> 4.15
>>>    kernel/progs already(!).
>>>    dm-0 is the SSD with the broken btrfs
>>>    dm-1 is the external HDD to which I wrote the images/btrfs-
>>> restore
>>>    data earlier
>>>    The csum errors on dm-1 are, as said, possibly from bad memory
>>> on
>>>    the new notebook, which I used to write the image/restore-data
>>>    in the first stage... and this was IIRC simply the time when I
>>> had
>>>    noticed that already and ran a scrub.
>>>    But what about that:
>>>    Feb 23 15:48:11 gss-rescue kernel: BTRFS warning (device dm-1):
>>> Skipping commit of aborted transaction.
>>>    Feb 23 15:48:11 gss-rescue kernel: ------------[ cut here ]-----
>>> -------
>>>    Feb 23 15:48:11 gss-rescue kernel: BTRFS: Transaction aborted
>>> (error -28)
>>>    ...
>>>    ?
>>
>> No space left?
>> Pretty strange.
> 
> If dm-1 should be the one with no space left,... then probably not, as
> it's another 8TB device that should have many TBs left.
> 
> 
>> Would you please try to restore the fs on another system with good
>> memory?
> 
> Which one? The originally broken fs from the SSD?

Yep.

> And what should I try to find out here?

During restore, if the csum error happens again on the newly created
destination btrfs.
(And I recommend use mount option nospace_cache,notreelog on the
destination fs)

> 
> 
>> This -28 (ENOSPC) seems to show that the extent tree of the new btrfs
>> is
>> corrupted.
> 
> "new" here is dm-1, right? Which is the fresh btrfs I've created on
> some 8TB HDD for my recovery works.
> While that FS shows me:
> [26017.690417] BTRFS info (device dm-2): disk space caching is enabled
> [26017.690421] BTRFS info (device dm-2): has skinny extents
> [26017.798959] BTRFS info (device dm-2): bdev /dev/mapper/data-a4 errs:
> wr 0, rd 0, flush 0, corrupt 130, gen 0
> on mounting (I think the 130 corruptions are simply from the time when
> I still used it for btrfs-restore with the NEW notebook with possibly
> bad RAM)... I continued to use it in the meantime (for more recovery
> works) and wrote actually many TB to it... so far, there seem to be no
> further corruption on it.
> If there was some extent tree corruption... than nothing I would notice
> now.
> 
> An fsck of it seems fine:
> # btrfs check /dev/mapper/restore 
> Checking filesystem on /dev/mapper/restore
> UUID: 62eb62e0-775b-4523-b218-1410b90c03c9
> checking extents
> checking free space cache
> checking fs roots
> checking csums
> checking root refs
> found 2502273781760 bytes used, no error found
> total csum bytes: 2438116164
> total tree bytes: 5030854656
> total fs tree bytes: 2168242176
> total extent tree bytes: 286375936
> btree space waste bytes: 453818165
> file data blocks allocated: 2877953581056
>  referenced 2877907415040

At least metadata is in good shape.

If scrub reports no error, it would be perfect.

> 
>>> 2) btrfs-check.weird
>>>    This is on the freshly created FS on the SSD, after populating
>>> it
>>>    with loads of data from the backup.
>>>    fscks from 4.15 USB stick with normal and lowmem modes...
>>>    The show no error, but when you compare the byte numbers,...
>>> some
>>> of 
>>>    them differ!!! What the f***?
>>>    I.e. all but:
>>>    found 213620989952 bytes used, no error found
>>>    total csum bytes: 207507896
>>>    total extent tree bytes: 41713664
>>>    differ.
>>>    Same fs, no mounts/etc. in between, fscks directly ran after
>>> each
>>>    other.
>>>    How can this be?
>>
>> Lowmem mode and original mode do different ways to iterate all
>> extents.
>> For now please ignore it, but I'll dig into this to try to keep them
>> same.
> 
> Okay... just tell me if you need me to try something new out in that
> area.
> 
> 
>> The point here is, we need to pay extra attention about any fsck
>> report
>> about free space cache corruption.
>> Since free space cache corruption (only happens for v1) is not a big
>> problem, fsck will only report but doesn't account it as error.
> Why is it not a big problem?

Because we have "nospace_cache" mount option as our best friend.

> 
> 
> 
>> I would recommend to use either v2 space cache or *NEVER* use v1
>> space
>> cache.
>> It won't cause any functional chance, just a little slower.
>> But it rules out the only weak point against power loss.
> This comes as a surprise... wasn't it always said that v2 space cache
> is still unstable?

But v1 also has its problem.
In fact I have already found a situation btrfs could corrupt its v1
space cache, just using fsstress -n 200.

Although kernel and btrfs-progs can both detect the corruption, that's
already the last defending line. The corrupted cache passes both
generation and checksum check, the only check which catches it is free
space size, but that can be bypassed if we craft the operation carefully
enough.

> 
> And shouldn't that then become either default (using v2)... or a
> default of not using v1 at least?

Because we still don't have a strong enough evidence or test case to
prove it.
But I think it would change soon.

Anyway, it's just an optimization, and most of us can live without it.

> 
> 
> 
>>> I do remember that in the past I've seen few times errors with
>>> respect
>>> to the free space cache during the system ran... e.g.
>>> kern.log.4.xz:Jan 24 05:49:32 heisenberg kernel: [  120.203741]
>>> BTRFS warning (device dm-0): block group 22569549824 has wrong
>>> amount of free space
>>> kern.log.4.xz:Jan 24 05:49:32 heisenberg kernel: [  120.204484]
>>> BTRFS warning (device dm-0): failed to load free space cache for
>>> block group 22569549824, rebuilding it now
>>> but AFAIU these are considered to be "harmless"?
>>
>> Yep, when kernel outputs such error, it's harmless.
> 
> Well I have seen such also in case there was no power loss/crash/etc.
> (see the mails I wrote you off list in the last days).
> 
> 
>> But if kernel doesn't output such error after powerloss, it could be
>> a
>> problem.
>> If kernel just follows the corrupted space cache, it would break
>> meta/data CoW, and btrfs is no longer bulletproof.
> Okay... sounds scary... as I probably had "many" cases of crashes,
> where I at least didn't notice these messages (OTOH, I didn't really
> look for them).
> 
> 
>> And to make things even more scary, nobody knows if such thing
>> happens.
>> If no error message after power loss, it could be that block group is
>> untouched in previous transaction, or it could be damaged.
> Wouldn't it be reasonable, that when a fs is mounted that was not
> properly unmounted (I assume there is some flag that shows this?),...
> any such possible corrupted caches/trees/etc. are simply invalidated as
> a safety measure?

Unfortunately, btrfs doesn't have such flag to indicate dirty umount.
We could use log_root to find it, but that's not always the case, and
one can even use notreelog to disable log tree completely.

> 
> 
>> So I'm working to try to reproduce a case where v1 space cache is
>> corrupted and could lead to kernel to use them.
> Well even if you manage to do and rule out a few cases of such
> corruptions by fixing bugs, it still sounds all pretty fragile.
> 
> 
> Had you seen that from my mail "Re: BUG: unable to handle kernel paging
> request at ffff9fb75f827100" from "Wed, 21 Feb 2018 17:42:01 +0100":
> checking extents
> checking free space cache
> Couldn't find free space inode 1
> checking fs roots
> checking csums
> checking root refs
> Checking filesystem on /dev/mapper/system
> UUID: b6050e38-716a-40c3-a8df-fcf1dd7e655d
> found 676124835840 bytes used, no error found
> total csum bytes: 657522064
> total tree bytes: 2546106368
> total fs tree bytes: 1496350720
> total extent tree bytes: 182255616
> btree space waste bytes: 594036536
> file data blocks allocated: 5032601706496
>  referenced 670040977408
> 
> That was a fsck of the corrupted fs on the SSD (from the USB stick with
> I think with 4.12 kernel/progs)
> Especially that it was inode 1 seems like a win in the lottery...
> "Couldn't find free space inode 1" 
> so couldn't that also point to something?

Hard to say. It could be a new block group chunk with no cache.
It's possible, so I won't worry about that too much.

> 
> 
> [obsolete because of below] The v1 space caches aren't
> checksumed/CoWed, right? Wouldn't that make sense to rule out using any
> broken cache?

V1 cache is not CoWed nor checksumed by the normal routine.

But it has its own checksum, a little like the XATTR method.
It stores its own checksum in its first page.
And each page will have its own checksum in the 1st page. (including the
1st page itself)

And it also has its own generation.
So normally it should be enough to detect such corruption.

However there are still cases where V1 cache get CoWed, still digging
the reason, but under certain chunk layout, it could leads to corruption.

> 
> 
>> On the other hand, btrfs check does pretty good check on v1 space
>> cache,
>> so after power loss, I would recommend to do a btrfs check before
>> mounting the fs.
> And I assume using --clear-space-cache v1 to simply reset the cache...?

That's also valid.

> 
> 
>> And v2 space cache follows metadata CoW so we don't even need to
>> bother
>> any corruption, it's just impossible (unless code bug)
> Ah... okay ^^ Then why isn't it default, or at least v1 space cache
> disabled per default for anyone?

Because V1 works well so far, and the corruption I'm chasing is really
hard to reproduce, and v1 is just the most possible reason right now.

I'm not saying v2 is perfect, as when digging v1 code, I'm also
wondering how v2 handles case where v2 cache lives in a metadata chunk
which also has its own cache.

So I recommend to disable space cache at all for now.

Thanks,
Qu

> Even if my case of corruptions here on the SSD may/would have been
> caused by bad memory and nothing to do with space cache,... this sounds
> still like an area where many bad things could happen> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: spurious full btrfs corruption
  2018-03-06  0:57                         ` Christoph Anton Mitterer
  2018-03-06  1:50                           ` Qu Wenruo
@ 2018-03-07  3:09                           ` Duncan
  1 sibling, 0 replies; 10+ messages in thread
From: Duncan @ 2018-03-07  3:09 UTC (permalink / raw)
  To: linux-btrfs

Christoph Anton Mitterer posted on Tue, 06 Mar 2018 01:57:58 +0100 as
excerpted:

> In the meantime I had a look of the remaining files that I got from the
> btrfs-restore (haven't run it again so far, from the OLD notebook, so
> only the results from the NEW notebook here:):
> 
> The remaining ones were multi-GB qcow2 images for some qemu VMs.
> I think I had non of these files open (i.e. VMs running) while in the
> final corruption phase... but at least I'm sure that not *all* of them
> were running.
> 
> However, all the qcow2 files from the restore are more or less garbage.
> During the btrfs-restore it already complained on them, that it would
> loop too often on them and whether I want to continue or not (I choose n
> and on another full run I choose y).
> 
> Some still contain a partition table, some partitions even filesystems
> (btrfs again)... but I cannot mount them.

Just a note on format choices FWIW, nothing at all to do with your 
current problem...

As my own use-case doesn't involve VMs I'm /far/ from an expert here, but 
if I'm screwing things up I'm sure someone will correct me and I'll learn 
something too, but it does /sound/ reasonable, so assuming I'm 
remembering correctly from a discussion here...

Tip: Btrfs and qcow2 are both copy-on-write/COW (it's in the qcow2 name, 
after all), and doing multiple layers of COW is both inefficient and a 
good candidate to test for corner-case bugs that wouldn't show up in 
more normal use-cases.  Assuming bug-free it /should/ work properly, of 
course, but equally of course, bug-free isn't an entirely realistic 
assumption. =8^0

... And you're putting btrfs on qcow2 on btrfs... THREE layers of COW!

The recommendation was thus to pick what layer you wish to COW at, and 
use something that's not COW-based at the other layers.  Apparently, qemu 
has raw-format as a choice as well as qcow2, and that was recommended as 
preferred for use with btrfs (and IIRC what the recommender was using 
himself).

But of course that still leaves cow-based btrfs on both the top and the 
bottom layers.  I suppose which of those is best to remain btrfs, while 
making the other say ext4 as widest used and hopefully safest general 
purpose non-COW alternative, depends on the use-case.

Of course keeping btrfs at both levels but nocowing the image files on 
the host btrfs is a possibility as well, but nocow on btrfs has enough 
limits and caveats that I consider it a second-class "really should have 
used a different filesystem for this but didn't want to bother setting up 
a dedicated one" choice, and as such, don't consider it a viable option 
here.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: spurious full btrfs corruption
  2018-03-06  1:50                           ` Qu Wenruo
@ 2018-03-08 14:38                             ` Christoph Anton Mitterer
  2018-03-08 23:48                               ` Qu Wenruo
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Anton Mitterer @ 2018-03-08 14:38 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Hey.


On Tue, 2018-03-06 at 09:50 +0800, Qu Wenruo wrote:
> > These were the two files:
> > -rw-r--r-- 1 calestyo calestyo   90112 Feb 22 16:46 'Lady In The
> > Water/05.mp3'
> > -rw-r--r-- 1 calestyo calestyo 4892407 Feb 27 23:28
> > '/home/calestyo/share/music/Lady In The Water/05.mp3'
> > 
> > 
> > -rw-r--r-- 1 calestyo calestyo 1904640 Feb 22 16:47 'The Hunt For
> > Red October [Intrada]/21.mp3'
> > -rw-r--r-- 1 calestyo calestyo 2968128 Feb 27 23:28
> > '/home/calestyo/share/music/The Hunt For Red October
> > [Intrada]/21.mp3'
> > 
> > with the former (smaller one) being the corrupted one (i.e. the one
> > returned by btrfs-restore).
> > 
> > Both are (in terms of filesize) multiples of 4096... what does that
> > mean now?
> 
> That means either we lost some file extents or inode items.
> 
> Btrfs-restore only found EXTENT_DATA, which contains the pointer to
> the
> real data, and inode number.
> But no INODE_ITEM is found, which records the real inode size, so it
> can
> only use EXTENT_DATA to rebuild as much data as possible.
> That why all recovered one is aligned to 4K.
> 
> So some metadata is also corrupted.

But that can also happen to just some files?
Anyway... still strange that it hit just those two (which weren't
touched for long).


> > However, all the qcow2 files from the restore are more or less
> > garbage.
> > During the btrfs-restore it already complained on them, that it
> > would
> > loop too often on them and whether I want to continue or not (I
> > choose
> > n and on another full run I choose y).
> > 
> > Some still contain a partition table, some partitions even
> > filesystems
> > (btrfs again)... but I cannot mount them.
> 
> I think the same problem happens on them too.
> 
> Some data is lost while some are good.
> Anyway, they would be garbage.

Again, still strange... that so many files (of those that I really
checked) were fully okay,... while those 4 were all broken.

When it only uses EXTENT_DATA, would that mean that it basically breaks
on every border where the file is split up into multiple extents (which
is of course likely for the (CoWed) images that I had.



> > 
> > > Would you please try to restore the fs on another system with
> > > good
> > > memory?
> > 
> > Which one? The originally broken fs from the SSD?
> 
> Yep.
> 
> > And what should I try to find out here?
> 
> During restore, if the csum error happens again on the newly created
> destination btrfs.
> (And I recommend use mount option nospace_cache,notreelog on the
> destination fs)

So an update on this (everything on the OLD notebook with likely good
memory):

I booted again from USBstick (with 4.15 kernel/progs),
luksOpened+losetup+luksOpened (yes two dm-crypt, the one from the
external restore HDD, then the image file of the SSD which again
contained dmcrypt+LUKS, of which one was the broken btrfs).

As I've mentioned before... btrfs-restore (and the other tools for
trying to find the bytenr) immediately fail here.
They bring some "block mapping error" and produce no output.

This worked on my first rescue attempt (where I had 4.12 kernel/progs).

Since I had no 4.12 kernel/progs at hand anymore, I went to an even
older rescue stick, wich has 4.7 kernel/progs (if I'm not wrong).
There it worked again (on the same image file).

So something changed after 4.14, which makes the tools no longer being
able to restore at least that what they could restore at 4.14.


=> Some bug recently introduced in btrfs-progs?




I finished the dump then (from OLD notebook/good RAM) with 4.7
kernel/progs,... to the very same external HDD I've used before.

And afterwards I:
diff -qr --no-dereference restoreFromNEWnotebook/ restoreFromOLDnotebook/

=> No differences were found, except one further file that was in the
new restoreFromOLDnotebook. Could be that this was a file wich I
deleted on the old restore because of csum errors, but I don't really
remember (actually I thought to remember that there were a few which I
deleted).

Since all other files were equal (that is at least in terms of file
contents and symlink targets - I didn't compare the metadata like
permissions, dates and owners)... the qcow2 images are garbage as well.

=> No csum errors were recorded in the kernel log during the diff, and
since both, the (remaining) restore results from the NEW notebook and
the ones just made on the OLD one were read because of the diff,... I'd
guess that no further corruption happened in the recent btrfs-restore.





On to the next working site:

> > > This -28 (ENOSPC) seems to show that the extent tree of the new
> > > btrfs
> > > is
> > > corrupted.
> > 
> > "new" here is dm-1, right? Which is the fresh btrfs I've created on
> > some 8TB HDD for my recovery works.
> > While that FS shows me:
> > [26017.690417] BTRFS info (device dm-2): disk space caching is
> > enabled
> > [26017.690421] BTRFS info (device dm-2): has skinny extents
> > [26017.798959] BTRFS info (device dm-2): bdev /dev/mapper/data-a4
> > errs:
> > wr 0, rd 0, flush 0, corrupt 130, gen 0
> > on mounting (I think the 130 corruptions are simply from the time
> > when
> > I still used it for btrfs-restore with the NEW notebook with
> > possibly
> > bad RAM)... I continued to use it in the meantime (for more
> > recovery
> > works) and wrote actually many TB to it... so far, there seem to be
> > no
> > further corruption on it.
> > If there was some extent tree corruption... than nothing I would
> > notice
> > now.
> > 
> > An fsck of it seems fine:
> > # btrfs check /dev/mapper/restore 
> > Checking filesystem on /dev/mapper/restore
> > UUID: 62eb62e0-775b-4523-b218-1410b90c03c9
> > checking extents
> > checking free space cache
> > checking fs roots
> > checking csums
> > checking root refs
> > found 2502273781760 bytes used, no error found
> > total csum bytes: 2438116164
> > total tree bytes: 5030854656
> > total fs tree bytes: 2168242176
> > total extent tree bytes: 286375936
> > btree space waste bytes: 453818165
> > file data blocks allocated: 2877953581056
> >  referenced 2877907415040
> 
> At least metadata is in good shape.
> 
> If scrub reports no error, it would be perfect.

In the meantime I had written the btrfs-restore under kernel 4.7 as
mentioned above to that disk.
At the same time I was trying to continue where I stopped last time
when the SSD fs broke - doing backups of that.

So I had the fs mounted as /-fs and mounted it again in /mnt (where a
snapshot made from a rescue USB stick) was already waiting, and started
to tar.xz it.

It happened then, that I wanted to do the diff of the two btrfs-
restores (as mentioned above) and I accidentally mounted the external
HDD on /mnt again.

Shouldn't be a problem normally, but automatically I did Ctrl-C during
the mount (which is of course useless).
Afterwards I umount /mnt... where it said it couldn't do so,
nevertheless only the first mount at /mnt was shown,... so maybe I was
fast enough.

I'm telling this boring story because of two reasons:
- First I remembered that something very similar happened when the
first SSD fs was corrupted,.. only that I then got this paging bug as
described in my very first mail and couldn't unmount / cleanly shut
down anymore.
So maybe that had to do something with the whole story? Could there be
some bug when mounts are stacked (I know it's unlikely... but who
knows).

- This time (don't think this was the case back then when the SSD fs
got corrupted), I got a:
Mar 07 19:58:10 heisenberg kernel: BTRFS info (device dm-1): disk space caching is enabled
Mar 07 19:58:10 heisenberg kernel: BTRFS info (device dm-1): has skinny extents
Mar 07 19:58:10 heisenberg kernel: BTRFS info (device dm-1): bdev /dev/mapper/data-a4 errs: wr 0, rd 0, flush 0, corrupt 130, gen 0
=> so I'd say it was in fact mounted (even though the umount claimed
   differently)

Mar 07 19:58:20 heisenberg kernel: BTRFS error (device dm-1): open_ctree failed
=> wtf? What does this mean? Anything to do with free space caches (I
   haven't disabled that yet)

Mar 07 19:59:07 heisenberg kernel: BTRFS info (device dm-1): disk space caching is enabled
Mar 07 19:59:07 heisenberg kernel: BTRFS info (device dm-1): has skinny extents
Mar 07 19:59:07 heisenberg kernel: BTRFS info (device dm-1): bdev /dev/mapper/data-a4 errs: wr 0, rd 0, flush 0, corrupt 130, gen 0
Mar 07 19:59:31 heisenberg kernel: BTRFS info (device dm-1): disk space caching is enabled
=> here I mounted it again at another dir...

dm-1 here is the external HDD (and the 130 corrupt are likely from the
first btrfs-restore that I made while still on the NEW notebook with
the possible bad RAM).


After that I did a fsck of the 8TB HDD / dm-1 ... and as you've anyway
asked me above, a scrub of it.
Neither of both showed any errors.... (so still strange why it got that
open_ctree error)


> > > Since free space cache corruption (only happens for v1) is not a
> > > big
> > > problem, fsck will only report but doesn't account it as error.
> > 
> > Why is it not a big problem?
> 
> Because we have "nospace_cache" mount option as our best friend.

Which leaves the question when one could enable the cache again... if
no clear error is found right now... :(


> > This comes as a surprise... wasn't it always said that v2 space
> > cache
> > is still unstable?
> 
> But v1 also has its problem.
> In fact I have already found a situation btrfs could corrupt its v1
> space cache, just using fsstress -n 200.
> 
> Although kernel and btrfs-progs can both detect the corruption,
> that's
> already the last defending line. The corrupted cache passes both
> generation and checksum check, the only check which catches it is
> free
> space size, but that can be bypassed if we craft the operation
> carefully
> enough.

Well then back to:
it should be disabled per default in an update to stable kernels until
the issues are found and fixed...


> > And shouldn't that then become either default (using v2)... or a
> > default of not using v1 at least?
> 
> Because we still don't have a strong enough evidence or test case to
> prove it.
> But I think it would change soon.
> 
> Anyway, it's just an optimization, and most of us can live without
> it.

Perhaps still better to proactively disable it, if it's already
suspected to be buggy and you have that situation with fsstress -n
200,... instead of waiting for people getting corruptions.

(And/or possibly a good time to push for v2... if that's anyway the
future to go) :)



> > Wouldn't it be reasonable, that when a fs is mounted that was not
> > properly unmounted (I assume there is some flag that shows
> > this?),...
> > any such possible corrupted caches/trees/etc. are simply
> > invalidated as
> > a safety measure?
> 
> Unfortunately, btrfs doesn't have such flag to indicate dirty umount.
> We could use log_root to find it, but that's not always the case, and
> one can even use notreelog to disable log tree completely.

Likely I'm just thinking to naively... but wouldn't that be easy to
add? If the kernel or any tool that writes to the fs (things like --
repair) opens the fs in a mode where any changes (including internal
ones) can be made... flag the fs to be "dirty"... if that operation
succeesds/ends (e.g. via umount)... flag it to be clean.

I'm not saying, create a journal ;-) ... but such a plain flag could
then be used to decide whether caches like the freespace cache should
be rather discarded.


> However there are still cases where V1 cache get CoWed, still digging
> the reason, but under certain chunk layout, it could leads to
> corruption.

Okay... would be nice if you CC me in case you find anything in the
future... especially when it's safe again to enable the caches.


Thanks,
Chris. :-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: spurious full btrfs corruption
  2018-03-08 14:38                             ` Christoph Anton Mitterer
@ 2018-03-08 23:48                               ` Qu Wenruo
  2018-03-16  0:03                                 ` Christoph Anton Mitterer
  0 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2018-03-08 23:48 UTC (permalink / raw)
  To: Christoph Anton Mitterer, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 13950 bytes --]



On 2018年03月08日 22:38, Christoph Anton Mitterer wrote:
> Hey.
> 
> 
> On Tue, 2018-03-06 at 09:50 +0800, Qu Wenruo wrote:
>>> These were the two files:
>>> -rw-r--r-- 1 calestyo calestyo   90112 Feb 22 16:46 'Lady In The
>>> Water/05.mp3'
>>> -rw-r--r-- 1 calestyo calestyo 4892407 Feb 27 23:28
>>> '/home/calestyo/share/music/Lady In The Water/05.mp3'
>>>
>>>
>>> -rw-r--r-- 1 calestyo calestyo 1904640 Feb 22 16:47 'The Hunt For
>>> Red October [Intrada]/21.mp3'
>>> -rw-r--r-- 1 calestyo calestyo 2968128 Feb 27 23:28
>>> '/home/calestyo/share/music/The Hunt For Red October
>>> [Intrada]/21.mp3'
>>>
>>> with the former (smaller one) being the corrupted one (i.e. the one
>>> returned by btrfs-restore).
>>>
>>> Both are (in terms of filesize) multiples of 4096... what does that
>>> mean now?
>>
>> That means either we lost some file extents or inode items.
>>
>> Btrfs-restore only found EXTENT_DATA, which contains the pointer to
>> the
>> real data, and inode number.
>> But no INODE_ITEM is found, which records the real inode size, so it
>> can
>> only use EXTENT_DATA to rebuild as much data as possible.
>> That why all recovered one is aligned to 4K.
>>
>> So some metadata is also corrupted.
> 
> But that can also happen to just some files?

Yep, one tree leaf corruption would leads to corruption to several files.

> Anyway... still strange that it hit just those two (which weren't
> touched for long).
> 
> 
>>> However, all the qcow2 files from the restore are more or less
>>> garbage.
>>> During the btrfs-restore it already complained on them, that it
>>> would
>>> loop too often on them and whether I want to continue or not (I
>>> choose
>>> n and on another full run I choose y).
>>>
>>> Some still contain a partition table, some partitions even
>>> filesystems
>>> (btrfs again)... but I cannot mount them.
>>
>> I think the same problem happens on them too.
>>
>> Some data is lost while some are good.
>> Anyway, they would be garbage.
> 
> Again, still strange... that so many files (of those that I really
> checked) were fully okay,... while those 4 were all broken.

One leaf contains some extent data is corrupted here.

> 
> When it only uses EXTENT_DATA, would that mean that it basically breaks
> on every border where the file is split up into multiple extents (which
> is of course likely for the (CoWed) images that I had.

This depends on the leaf boundary.

But normally one corrupted leaf can only lead to one or two corruption.
For 4 files, we have at least 2 leaf corruption.


> 
> 
> 
>>>
>>>> Would you please try to restore the fs on another system with
>>>> good
>>>> memory?
>>>
>>> Which one? The originally broken fs from the SSD?
>>
>> Yep.
>>
>>> And what should I try to find out here?
>>
>> During restore, if the csum error happens again on the newly created
>> destination btrfs.
>> (And I recommend use mount option nospace_cache,notreelog on the
>> destination fs)
> 
> So an update on this (everything on the OLD notebook with likely good
> memory):
> 
> I booted again from USBstick (with 4.15 kernel/progs),
> luksOpened+losetup+luksOpened (yes two dm-crypt, the one from the
> external restore HDD, then the image file of the SSD which again
> contained dmcrypt+LUKS, of which one was the broken btrfs).
> 
> As I've mentioned before... btrfs-restore (and the other tools for
> trying to find the bytenr) immediately fail here.
> They bring some "block mapping error" and produce no output.
> 
> This worked on my first rescue attempt (where I had 4.12 kernel/progs).
> 
> Since I had no 4.12 kernel/progs at hand anymore, I went to an even
> older rescue stick, wich has 4.7 kernel/progs (if I'm not wrong).
> There it worked again (on the same image file).
> 
> So something changed after 4.14, which makes the tools no longer being
> able to restore at least that what they could restore at 4.14.

This seems to be a regression.
But I'm not sure if it's the kernel to blame or the btrfs-progs.

> 
> 
> => Some bug recently introduced in btrfs-progs?

Is the "block mapping error" message from kernel or btrfs-progs?

> 
> 
> 
> 
> I finished the dump then (from OLD notebook/good RAM) with 4.7
> kernel/progs,... to the very same external HDD I've used before.
> 
> And afterwards I:
> diff -qr --no-dereference restoreFromNEWnotebook/ restoreFromOLDnotebook/
> 
> => No differences were found, except one further file that was in the
> new restoreFromOLDnotebook. Could be that this was a file wich I
> deleted on the old restore because of csum errors, but I don't really
> remember (actually I thought to remember that there were a few which I
> deleted).
> 
> Since all other files were equal (that is at least in terms of file
> contents and symlink targets - I didn't compare the metadata like
> permissions, dates and owners)... the qcow2 images are garbage as well.
> 
> => No csum errors were recorded in the kernel log during the diff, and
> since both, the (remaining) restore results from the NEW notebook and
> the ones just made on the OLD one were read because of the diff,... I'd
> guess that no further corruption happened in the recent btrfs-restore.
> 
> 
> 
> 
> 
> On to the next working site:
> 
>>>> This -28 (ENOSPC) seems to show that the extent tree of the new
>>>> btrfs
>>>> is
>>>> corrupted.
>>>
>>> "new" here is dm-1, right? Which is the fresh btrfs I've created on
>>> some 8TB HDD for my recovery works.
>>> While that FS shows me:
>>> [26017.690417] BTRFS info (device dm-2): disk space caching is
>>> enabled
>>> [26017.690421] BTRFS info (device dm-2): has skinny extents
>>> [26017.798959] BTRFS info (device dm-2): bdev /dev/mapper/data-a4
>>> errs:
>>> wr 0, rd 0, flush 0, corrupt 130, gen 0
>>> on mounting (I think the 130 corruptions are simply from the time
>>> when
>>> I still used it for btrfs-restore with the NEW notebook with
>>> possibly
>>> bad RAM)... I continued to use it in the meantime (for more
>>> recovery
>>> works) and wrote actually many TB to it... so far, there seem to be
>>> no
>>> further corruption on it.
>>> If there was some extent tree corruption... than nothing I would
>>> notice
>>> now.
>>>
>>> An fsck of it seems fine:
>>> # btrfs check /dev/mapper/restore 
>>> Checking filesystem on /dev/mapper/restore
>>> UUID: 62eb62e0-775b-4523-b218-1410b90c03c9
>>> checking extents
>>> checking free space cache
>>> checking fs roots
>>> checking csums
>>> checking root refs
>>> found 2502273781760 bytes used, no error found
>>> total csum bytes: 2438116164
>>> total tree bytes: 5030854656
>>> total fs tree bytes: 2168242176
>>> total extent tree bytes: 286375936
>>> btree space waste bytes: 453818165
>>> file data blocks allocated: 2877953581056
>>>  referenced 2877907415040
>>
>> At least metadata is in good shape.
>>
>> If scrub reports no error, it would be perfect.
> 
> In the meantime I had written the btrfs-restore under kernel 4.7 as
> mentioned above to that disk.
> At the same time I was trying to continue where I stopped last time
> when the SSD fs broke - doing backups of that.
> 
> So I had the fs mounted as /-fs and mounted it again in /mnt (where a
> snapshot made from a rescue USB stick) was already waiting, and started
> to tar.xz it.
> 
> It happened then, that I wanted to do the diff of the two btrfs-
> restores (as mentioned above) and I accidentally mounted the external
> HDD on /mnt again.
> 
> Shouldn't be a problem normally, but automatically I did Ctrl-C during
> the mount (which is of course useless).
> Afterwards I umount /mnt... where it said it couldn't do so,
> nevertheless only the first mount at /mnt was shown,... so maybe I was
> fast enough.
> 
> I'm telling this boring story because of two reasons:
> - First I remembered that something very similar happened when the
> first SSD fs was corrupted,.. only that I then got this paging bug as
> described in my very first mail and couldn't unmount / cleanly shut
> down anymore.
> So maybe that had to do something with the whole story? Could there be
> some bug when mounts are stacked (I know it's unlikely... but who
> knows).

When kernel module (btrfs in this case) caused kernel BUG, it stalls the
whole module (if not the whole kernel).

So later operation, including umount/mount won't work properly.

> 
> - This time (don't think this was the case back then when the SSD fs
> got corrupted), I got a:
> Mar 07 19:58:10 heisenberg kernel: BTRFS info (device dm-1): disk space caching is enabled
> Mar 07 19:58:10 heisenberg kernel: BTRFS info (device dm-1): has skinny extents
> Mar 07 19:58:10 heisenberg kernel: BTRFS info (device dm-1): bdev /dev/mapper/data-a4 errs: wr 0, rd 0, flush 0, corrupt 130, gen 0
> => so I'd say it was in fact mounted (even though the umount claimed
>    differently)

Something went wrong during mount.
Normally log-replay, but it could be other check.

> 
> Mar 07 19:58:20 heisenberg kernel: BTRFS error (device dm-1): open_ctree failed
> => wtf? What does this mean? Anything to do with free space caches (I
>    haven't disabled that yet)

Maybe during log-replay or other work must be reused at mount time, it
fails, so kernel just refuse to mount the fs.

> 
> Mar 07 19:59:07 heisenberg kernel: BTRFS info (device dm-1): disk space caching is enabled
> Mar 07 19:59:07 heisenberg kernel: BTRFS info (device dm-1): has skinny extents
> Mar 07 19:59:07 heisenberg kernel: BTRFS info (device dm-1): bdev /dev/mapper/data-a4 errs: wr 0, rd 0, flush 0, corrupt 130, gen 0
> Mar 07 19:59:31 heisenberg kernel: BTRFS info (device dm-1): disk space caching is enabled
> => here I mounted it again at another dir...

And strangely this time it works...

> 
> dm-1 here is the external HDD (and the 130 corrupt are likely from the
> first btrfs-restore that I made while still on the NEW notebook with
> the possible bad RAM).
> 
> 
> After that I did a fsck of the 8TB HDD / dm-1 ... and as you've anyway
> asked me above, a scrub of it.
> Neither of both showed any errors.... (so still strange why it got that
> open_ctree error)

I'm surprise the corruption just disappeared...

> 
> 
>>>> Since free space cache corruption (only happens for v1) is not a
>>>> big
>>>> problem, fsck will only report but doesn't account it as error.
>>>
>>> Why is it not a big problem?
>>
>> Because we have "nospace_cache" mount option as our best friend.
> 
> Which leaves the question when one could enable the cache again... if
> no clear error is found right now... :(

Fortunately (or unfortunately), no obvious problem with v1 space cache
found yet.

The difference in free space is caused by race and it's ensured that
free space cache can only have less space or equal space with block
group item.

And in that case, kernel will always expose such problem and discard the
cache.

> 
> 
>>> This comes as a surprise... wasn't it always said that v2 space
>>> cache
>>> is still unstable?
>>
>> But v1 also has its problem.
>> In fact I have already found a situation btrfs could corrupt its v1
>> space cache, just using fsstress -n 200.
>>
>> Although kernel and btrfs-progs can both detect the corruption,
>> that's
>> already the last defending line. The corrupted cache passes both
>> generation and checksum check, the only check which catches it is
>> free
>> space size, but that can be bypassed if we craft the operation
>> carefully
>> enough.
> 
> Well then back to:
> it should be disabled per default in an update to stable kernels until
> the issues are found and fixed...

At least it's still not proven v1 cache is the cause.

> 
> 
>>> And shouldn't that then become either default (using v2)... or a
>>> default of not using v1 at least?
>>
>> Because we still don't have a strong enough evidence or test case to
>> prove it.
>> But I think it would change soon.
>>
>> Anyway, it's just an optimization, and most of us can live without
>> it.
> 
> Perhaps still better to proactively disable it, if it's already
> suspected to be buggy and you have that situation with fsstress -n
> 200,... instead of waiting for people getting corruptions.
> 
> (And/or possibly a good time to push for v2... if that's anyway the
> future to go) :)
> 
> 
> 
>>> Wouldn't it be reasonable, that when a fs is mounted that was not
>>> properly unmounted (I assume there is some flag that shows
>>> this?),...
>>> any such possible corrupted caches/trees/etc. are simply
>>> invalidated as
>>> a safety measure?
>>
>> Unfortunately, btrfs doesn't have such flag to indicate dirty umount.
>> We could use log_root to find it, but that's not always the case, and
>> one can even use notreelog to disable log tree completely.
> 
> Likely I'm just thinking to naively... but wouldn't that be easy to
> add?

Normally it's because we don't need. The metadata CoW is still pretty
safe so far.

Thanks,
Qu

> If the kernel or any tool that writes to the fs (things like --
> repair) opens the fs in a mode where any changes (including internal
> ones) can be made... flag the fs to be "dirty"... if that operation
> succeesds/ends (e.g. via umount)... flag it to be clean.
> 
> I'm not saying, create a journal ;-) ... but such a plain flag could
> then be used to decide whether caches like the freespace cache should
> be rather discarded.
> 
> 
>> However there are still cases where V1 cache get CoWed, still digging
>> the reason, but under certain chunk layout, it could leads to
>> corruption.
> 
> Okay... would be nice if you CC me in case you find anything in the
> future... especially when it's safe again to enable the caches.
> 
> 
> Thanks,
> Chris. :-)
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: spurious full btrfs corruption
  2018-03-08 23:48                               ` Qu Wenruo
@ 2018-03-16  0:03                                 ` Christoph Anton Mitterer
  2018-03-21 22:03                                   ` Christoph Anton Mitterer
  2018-03-26 14:32                                   ` Christoph Anton Mitterer
  0 siblings, 2 replies; 10+ messages in thread
From: Christoph Anton Mitterer @ 2018-03-16  0:03 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Hey.

Found some time to move on with this:


Frist, I think from my side (i.e. restoring as much as possible) I'm
basically done now, so everything left over here is looking for
possible bugs/etc.

I have from my side no indication that my corruptions were actually a
bug in btrfs... the new notebook used to be unstable for some time and
it might be just that.
Also that second occurrence of csum errors (when I made a image from
the broken fs to external HDD) kinda hints that it may be a memory
issue (though I haven't found time to run memtest86+ yet).

So let's just suppose that btrfs code is as rocksolid as its raid56 is
;-P and assume the issues were cause by some unlucky memory corruption
just happening at the wrong (important) meta-data. 




The issue that newer btrfs-progs/kernel don't restore anything at all
from my corrupted fs:

On Fri, 2018-03-09 at 07:48 +0800, Qu Wenruo wrote:
> > So something changed after 4.14, which makes the tools no longer
> > being
> > able to restore at least that what they could restore at 4.14.
> 
> This seems to be a regression.
> But I'm not sure if it's the kernel to blame or the btrfs-progs.
> 
> > 
> > 
> > => Some bug recently introduced in btrfs-progs?
> 
> Is the "block mapping error" message from kernel or btrfs-progs?

All progs messages unless otherwise noticed.
/dev/mapper/restore being the image from the broken SSD fs.
Everything below was on the OLD laptop (which has probably no memory or
whichever issues) under kernel 4.15.4 and progs 4.15.1.

# btrfs-find-root /dev/mapper/restore 
Couldn't map the block 4503658729209856
No mapping for 4503658729209856-4503658729226240
Couldn't map the block 4503658729209856
Superblock thinks the generation is 2083143
Superblock thinks the level is 1
Found tree root at 58572800 gen 2083143 level 1
Well block 27820032(gen: 2083133 level: 1) seems good, but generation/level doesn't match, want gen: 2083143 level: 1
Well block 25526272(gen: 2083132 level: 1) seems good, but generation/level doesn't match, want gen: 2083143 level: 1
Well block 21807104(gen: 2083131 level: 1) seems good, but generation/level doesn't match, want gen: 2083143 level: 1
Well block 11829248(gen: 2083130 level: 1) seems good, but generation/level doesn't match, want gen: 2083143 level: 1
Well block 8716288(gen: 2083129 level: 1) seems good, but generation/level doesn't match, want gen: 2083143 level: 1
Well block 6209536(gen: 2083128 level: 1) seems good, but generation/level doesn't match, want gen: 2083143 level: 1




# btrfs-debug-tree -b 27820032 /dev/mapper/restore 
btrfs-progs v4.15.1
Couldn't map the block 4503658729209856
No mapping for 4503658729209856-4503658729226240
Couldn't map the block 4503658729209856
bytenr mismatch, want=4503658729209856, have=0
node 27820032 level 1 items 2 free 491 generation 2083133 owner 1
fs uuid b6050e38-716a-40c3-a8df-fcf1dd7e655d
chunk uuid ae6b0cc6-bbc5-4131-b3f3-41b748f5a775
	key (EXTENT_TREE ROOT_ITEM 0) block 27836416 (1699) gen 2083133
	key (1853 INODE_ITEM 0) block 28000256 (1709) gen 2083133

=> I *think* (but not 100% sure - would need to double check if it's
important for you to know), that the older progs/kernel showed me much
more here




# btrfs-debug-tree /dev/mapper/restore 
btrfs-progs v4.15.1
Couldn't map the block 4503658729209856
No mapping for 4503658729209856-4503658729226240
Couldn't map the block 4503658729209856
bytenr mismatch, want=4503658729209856, have=0
ERROR: unable to open /dev/mapper/restore

=> same here: I *think* (but not 100% sure - would need to double check
if it's important for you to know), that the older progs/kernel showed
me much more here




# btrfs-debug-tree -b 27836416 /dev/mapper/restore 
btrfs-progs v4.15.1
Couldn't map the block 4503658729209856
No mapping for 4503658729209856-4503658729226240
Couldn't map the block 4503658729209856
bytenr mismatch, want=4503658729209856, have=0
leaf 27836416 items 63 free space 6131 generation 2083133 owner 1
leaf 27836416 flags 0x1(WRITTEN) backref revision 1
fs uuid b6050e38-716a-40c3-a8df-fcf1dd7e655d
chunk uuid ae6b0cc6-bbc5-4131-b3f3-41b748f5a775
	item 0 key (EXTENT_TREE ROOT_ITEM 0) itemoff 15844 itemsize 439
		generation 2083133 root_dirid 0 bytenr 27328512 level 2 refs 1
		lastsnap 0 byte_limit 0 bytes_used 182190080 flags 0x0(none)
		uuid 00000000-0000-0000-0000-000000000000
		drop key (0 UNKNOWN.0 0) level 0
	item 1 key (DEV_TREE ROOT_ITEM 0) itemoff 15405 itemsize 439
		generation 2083129 root_dirid 0 bytenr 9502720 level 1 refs 1
		lastsnap 0 byte_limit 0 bytes_used 114688 flags 0x0(none)
		uuid 00000000-0000-0000-0000-000000000000
		drop key (0 UNKNOWN.0 0) level 0
	item 2 key (FS_TREE INODE_REF 6) itemoff 15388 itemsize 17
		index 0 namelen 7 name: default
	item 3 key (FS_TREE ROOT_ITEM 0) itemoff 14949 itemsize 439
		generation 2081091 root_dirid 256 bytenr 474185728 level 0 refs 1
		lastsnap 0 byte_limit 0 bytes_used 16384 flags 0x0(none)
		uuid 00000000-0000-0000-0000-000000000000
		ctransid 2081091 otransid 0 stransid 0 rtransid 0
		ctime 1519222863.366476716 (2018-02-21 15:21:03)
		drop key (0 UNKNOWN.0 0) level 0
	item 4 key (FS_TREE ROOT_REF 257) itemoff 14927 itemsize 22
		root ref key dirid 256 sequence 2 name root
	item 5 key (FS_TREE ROOT_REF 1830) itemoff 14866 itemsize 61
		root ref key dirid 256 sequence 5 name heisenberg.scientia.net_system_2018-02-21_1
	item 6 key (ROOT_TREE_DIR INODE_ITEM 0) itemoff 14706 itemsize 160
		generation 3 transid 0 size 0 nbytes 16384
		block group 0 mode 40755 links 1 uid 0 gid 0 rdev 0
		sequence 0 flags 0x0(none)
		atime 1446913329.0 (2015-11-07 17:22:09)
		ctime 1446913329.0 (2015-11-07 17:22:09)
		mtime 1446913329.0 (2015-11-07 17:22:09)
		otime 0.0 (1970-01-01 01:00:00)
	item 7 key (ROOT_TREE_DIR INODE_REF 6) itemoff 14694 itemsize 12
		index 0 namelen 2 name: ..
	item 8 key (ROOT_TREE_DIR DIR_ITEM 2378154706) itemoff 14657 itemsize 37
		location key (FS_TREE ROOT_ITEM -1) type DIR
		transid 0 data_len 0 name_len 7
		name: default
	item 9 key (CSUM_TREE ROOT_ITEM 0) itemoff 14218 itemsize 439
		generation 2083133 root_dirid 0 bytenr 27197440 level 2 refs 1
		lastsnap 0 byte_limit 0 bytes_used 866926592 flags 0x0(none)
		uuid 00000000-0000-0000-0000-000000000000
		drop key (0 UNKNOWN.0 0) level 0
	item 10 key (UUID_TREE ROOT_ITEM 0) itemoff 13779 itemsize 439
		generation 2080566 root_dirid 0 bytenr 505818398720 level 0 refs 1
		lastsnap 0 byte_limit 0 bytes_used 16384 flags 0x0(none)
		uuid 85e68ea6-09cd-3b45-b01a-e57bcc5684ba
		drop key (0 UNKNOWN.0 0) level 0
	item 11 key (257 ROOT_ITEM 0) itemoff 13340 itemsize 439
		generation 2083133 root_dirid 256 bytenr 27000832 level 2 refs 1
		lastsnap 2080523 byte_limit 0 bytes_used 1288060928 flags 0x0(none)
		uuid c37238d5-ac17-ee45-a790-b4d1538f46fc
		ctransid 2083133 otransid 8 stransid 0 rtransid 0
		ctime 1519231525.124530719 (2018-02-21 17:45:25)
		otime 1446913423.498012409 (2015-11-07 17:23:43)
		drop key (0 UNKNOWN.0 0) level 0
	item 12 key (257 ROOT_BACKREF 5) itemoff 13318 itemsize 22
		root backref key dirid 256 sequence 2 name root
	item 13 key (979 INODE_ITEM 0) itemoff 13158 itemsize 160
		generation 2081024 transid 2081024 size 262144 nbytes 3230924800
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 12325 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
		atime 0.0 (1970-01-01 01:00:00)
		ctime 1519191053.828440563 (2018-02-21 06:30:53)
		mtime 0.0 (1970-01-01 01:00:00)
		otime 0.0 (1970-01-01 01:00:00)
	item 14 key (979 EXTENT_DATA 0) itemoff 13105 itemsize 53
		generation 2081024 type 1 (regular)
		extent data disk byte 73967251456 nr 262144
		extent data offset 0 nr 262144 ram 262144
		extent compression 0 (none)
	item 15 key (980 INODE_ITEM 0) itemoff 12945 itemsize 160
		generation 2081024 transid 2081024 size 262144 nbytes 3137601536
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 11969 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
		atime 0.0 (1970-01-01 01:00:00)
		ctime 1519191053.828440563 (2018-02-21 06:30:53)
		mtime 0.0 (1970-01-01 01:00:00)
		otime 0.0 (1970-01-01 01:00:00)
	item 16 key (980 EXTENT_DATA 0) itemoff 12892 itemsize 53
		generation 2081024 type 1 (regular)
		extent data disk byte 73977970688 nr 262144
		extent data offset 0 nr 262144 ram 262144
		extent compression 0 (none)
	item 17 key (1830 ROOT_ITEM 2080565) itemoff 12453 itemsize 439
		generation 2080565 root_dirid 256 bytenr 505812746240 level 2 refs 1
		lastsnap 2080565 byte_limit 0 bytes_used 1021968384 flags 0x1(RDONLY)
		uuid 5247e0c0-7a79-434e-880b-d2c7941e6767
		parent_uuid 04b7ff5b-31c9-4a41-ba14-7c91076f6da6
		ctransid 2080561 otransid 2080565 stransid 0 rtransid 0
		ctime 1519174487.202858099 (2018-02-21 01:54:47)
		otime 1519174628.538451830 (2018-02-21 01:57:08)
		drop key (0 UNKNOWN.0 0) level 0
	item 18 key (1830 ROOT_BACKREF 5) itemoff 12392 itemsize 61
		root backref key dirid 256 sequence 5 name heisenberg.scientia.net_system_2018-02-21_1
	item 19 key (1831 INODE_ITEM 0) itemoff 12232 itemsize 160
		generation 2083133 transid 2083133 size 262144 nbytes 15990784
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 61 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
		atime 0.0 (1970-01-01 01:00:00)
		ctime 1519231527.115322244 (2018-02-21 17:45:27)
		mtime 0.0 (1970-01-01 01:00:00)
		otime 0.0 (1970-01-01 01:00:00)
	item 20 key (1831 EXTENT_DATA 0) itemoff 12179 itemsize 53
		generation 2083133 type 1 (regular)
		extent data disk byte 33299181568 nr 262144
		extent data offset 0 nr 262144 ram 262144
		extent compression 0 (none)
	item 21 key (1832 INODE_ITEM 0) itemoff 12019 itemsize 160
		generation 2083127 transid 2083127 size 262144 nbytes 8650752
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 33 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
		atime 0.0 (1970-01-01 01:00:00)
		ctime 1519231352.51151791 (2018-02-21 17:42:32)
		mtime 0.0 (1970-01-01 01:00:00)
		otime 0.0 (1970-01-01 01:00:00)
	item 22 key (1832 EXTENT_DATA 0) itemoff 11966 itemsize 53
		generation 2083127 type 1 (regular)
		extent data disk byte 48053231616 nr 262144
		extent data offset 0 nr 262144 ram 262144
		extent compression 0 (none)
	item 23 key (1833 INODE_ITEM 0) itemoff 11806 itemsize 160
		generation 2083117 transid 2083117 size 262144 nbytes 4456448
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 17 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
		atime 0.0 (1970-01-01 01:00:00)
		ctime 1519231019.230780046 (2018-02-21 17:36:59)
		mtime 0.0 (1970-01-01 01:00:00)
		otime 0.0 (1970-01-01 01:00:00)
	item 24 key (1833 EXTENT_DATA 0) itemoff 11753 itemsize 53
		generation 2083117 type 1 (regular)
		extent data disk byte 46261501952 nr 262144
		extent data offset 0 nr 262144 ram 262144
		extent compression 0 (none)
	item 25 key (1834 INODE_ITEM 0) itemoff 11593 itemsize 160
		generation 2083120 transid 2083120 size 262144 nbytes 2621440
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 10 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
		atime 0.0 (1970-01-01 01:00:00)
		ctime 1519231097.45708714 (2018-02-21 17:38:17)
		mtime 0.0 (1970-01-01 01:00:00)
		otime 0.0 (1970-01-01 01:00:00)
	item 26 key (1834 EXTENT_DATA 0) itemoff 11540 itemsize 53
		generation 2083120 type 1 (regular)
		extent data disk byte 33299738624 nr 262144
		extent data offset 0 nr 262144 ram 262144
		extent compression 0 (none)
	item 27 key (1835 INODE_ITEM 0) itemoff 11380 itemsize 160
		generation 2083121 transid 2083121 size 262144 nbytes 3407872
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 13 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
		atime 0.0 (1970-01-01 01:00:00)
		ctime 1519231133.956124450 (2018-02-21 17:38:53)
		mtime 0.0 (1970-01-01 01:00:00)
		otime 0.0 (1970-01-01 01:00:00)
	item 28 key (1835 EXTENT_DATA 0) itemoff 11327 itemsize 53
		generation 2083121 type 1 (regular)
		extent data disk byte 48017752064 nr 262144
		extent data offset 0 nr 262144 ram 262144
		extent compression 0 (none)
	item 29 key (1836 INODE_ITEM 0) itemoff 11167 itemsize 160
		generation 2083128 transid 2083128 size 262144 nbytes 6553600
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 25 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
		atime 0.0 (1970-01-01 01:00:00)
		ctime 1519231382.739882872 (2018-02-21 17:43:02)
		mtime 0.0 (1970-01-01 01:00:00)
		otime 0.0 (1970-01-01 01:00:00)
	item 30 key (1836 EXTENT_DATA 0) itemoff 11114 itemsize 53
		generation 2083128 type 1 (regular)
		extent data disk byte 43620921344 nr 262144
		extent data offset 0 nr 262144 ram 262144
		extent compression 0 (none)
	item 31 key (1837 INODE_ITEM 0) itemoff 10954 itemsize 160
		generation 2083129 transid 2083129 size 262144 nbytes 9437184
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 36 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
		atime 0.0 (1970-01-01 01:00:00)
		ctime 1519231390.216060972 (2018-02-21 17:43:10)
		mtime 0.0 (1970-01-01 01:00:00)
		otime 0.0 (1970-01-01 01:00:00)
	item 32 key (1837 EXTENT_DATA 0) itemoff 10901 itemsize 53
		generation 2083129 type 1 (regular)
		extent data disk byte 43256619008 nr 262144
		extent data offset 0 nr 262144 ram 262144
		extent compression 0 (none)
	item 33 key (1838 INODE_ITEM 0) itemoff 10741 itemsize 160
		generation 2083133 transid 2083133 size 262144 nbytes 8126464
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 31 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
		atime 0.0 (1970-01-01 01:00:00)
		ctime 1519231527.115322244 (2018-02-21 17:45:27)
		mtime 0.0 (1970-01-01 01:00:00)
		otime 0.0 (1970-01-01 01:00:00)
	item 34 key (1838 EXTENT_DATA 0) itemoff 10688 itemsize 53
		generation 2083133 type 1 (regular)
		extent data disk byte 44043571200 nr 262144
		extent data offset 0 nr 262144 ram 262144
		extent compression 0 (none)
	item 35 key (1839 INODE_ITEM 0) itemoff 10528 itemsize 160
		generation 2083132 transid 2083132 size 262144 nbytes 5242880
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 20 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
		atime 0.0 (1970-01-01 01:00:00)
		ctime 1519231491.286468713 (2018-02-21 17:44:51)
		mtime 0.0 (1970-01-01 01:00:00)
		otime 0.0 (1970-01-01 01:00:00)
	item 36 key (1839 EXTENT_DATA 0) itemoff 10475 itemsize 53
		generation 2083132 type 1 (regular)
		extent data disk byte 44647149568 nr 262144
		extent data offset 0 nr 262144 ram 262144
		extent compression 0 (none)
	item 37 key (1840 INODE_ITEM 0) itemoff 10315 itemsize 160
		generation 2083133 transid 2083133 size 262144 nbytes 7602176
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 29 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
		atime 0.0 (1970-01-01 01:00:00)
		ctime 1519231527.115322244 (2018-02-21 17:45:27)
		mtime 0.0 (1970-01-01 01:00:00)
		otime 0.0 (1970-01-01 01:00:00)
	item 38 key (1840 EXTENT_DATA 0) itemoff 10262 itemsize 53
		generation 2083133 type 1 (regular)
		extent data disk byte 44176515072 nr 262144
		extent data offset 0 nr 262144 ram 262144
		extent compression 0 (none)
	item 39 key (1841 INODE_ITEM 0) itemoff 10102 itemsize 160
		generation 2083132 transid 2083132 size 262144 nbytes 4456448
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 17 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
		atime 0.0 (1970-01-01 01:00:00)
		ctime 1519231491.290468809 (2018-02-21 17:44:51)
		mtime 0.0 (1970-01-01 01:00:00)
		otime 0.0 (1970-01-01 01:00:00)
	item 40 key (1841 EXTENT_DATA 0) itemoff 10049 itemsize 53
		generation 2083132 type 1 (regular)
		extent data disk byte 47911272448 nr 262144
		extent data offset 0 nr 262144 ram 262144
		extent compression 0 (none)
	item 41 key (1842 INODE_ITEM 0) itemoff 9889 itemsize 160
		generation 2083130 transid 2083130 size 262144 nbytes 2883584
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 11 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
		atime 0.0 (1970-01-01 01:00:00)
		ctime 1519231423.700858661 (2018-02-21 17:43:43)
		mtime 0.0 (1970-01-01 01:00:00)
		otime 0.0 (1970-01-01 01:00:00)
	item 42 key (1842 EXTENT_DATA 0) itemoff 9836 itemsize 53
		generation 2083130 type 1 (regular)
		extent data disk byte 46249439232 nr 262144
		extent data offset 0 nr 262144 ram 262144
		extent compression 0 (none)
	item 43 key (1843 INODE_ITEM 0) itemoff 9676 itemsize 160
		generation 2083127 transid 2083127 size 262144 nbytes 1310720
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 5 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
		atime 0.0 (1970-01-01 01:00:00)
		ctime 1519231352.35151410 (2018-02-21 17:42:32)
		mtime 0.0 (1970-01-01 01:00:00)
		otime 0.0 (1970-01-01 01:00:00)
	item 44 key (1843 EXTENT_DATA 0) itemoff 9623 itemsize 53
		generation 2083127 type 1 (regular)
		extent data disk byte 47916822528 nr 262144
		extent data offset 0 nr 262144 ram 262144
		extent compression 0 (none)
	item 45 key (1844 INODE_ITEM 0) itemoff 9463 itemsize 160
		generation 2083130 transid 2083130 size 262144 nbytes 2883584
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 11 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
		atime 0.0 (1970-01-01 01:00:00)
		ctime 1519231423.700858661 (2018-02-21 17:43:43)
		mtime 0.0 (1970-01-01 01:00:00)
		otime 0.0 (1970-01-01 01:00:00)
	item 46 key (1844 EXTENT_DATA 0) itemoff 9410 itemsize 53
		generation 2083130 type 1 (regular)
		extent data disk byte 47890669568 nr 262144
		extent data offset 0 nr 262144 ram 262144
		extent compression 0 (none)
	item 47 key (1845 INODE_ITEM 0) itemoff 9250 itemsize 160
		generation 2083133 transid 2083133 size 262144 nbytes 15990784
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 61 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
		atime 0.0 (1970-01-01 01:00:00)
		ctime 1519231527.119322339 (2018-02-21 17:45:27)
		mtime 0.0 (1970-01-01 01:00:00)
		otime 0.0 (1970-01-01 01:00:00)
	item 48 key (1845 EXTENT_DATA 0) itemoff 9197 itemsize 53
		generation 2083133 type 1 (regular)
		extent data disk byte 44771708928 nr 262144
		extent data offset 0 nr 262144 ram 262144
		extent compression 0 (none)
	item 49 key (1846 INODE_ITEM 0) itemoff 9037 itemsize 160
		generation 2083133 transid 2083133 size 262144 nbytes 15990784
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 61 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
		atime 0.0 (1970-01-01 01:00:00)
		ctime 1519231527.119322339 (2018-02-21 17:45:27)
		mtime 0.0 (1970-01-01 01:00:00)
		otime 0.0 (1970-01-01 01:00:00)
	item 50 key (1846 EXTENT_DATA 0) itemoff 8984 itemsize 53
		generation 2083133 type 1 (regular)
		extent data disk byte 46255108096 nr 262144
		extent data offset 0 nr 262144 ram 262144
		extent compression 0 (none)
	item 51 key (1847 INODE_ITEM 0) itemoff 8824 itemsize 160
		generation 2083132 transid 2083132 size 262144 nbytes 5242880
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 20 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
		atime 0.0 (1970-01-01 01:00:00)
		ctime 1519231491.290468809 (2018-02-21 17:44:51)
		mtime 0.0 (1970-01-01 01:00:00)
		otime 0.0 (1970-01-01 01:00:00)
	item 52 key (1847 EXTENT_DATA 0) itemoff 8771 itemsize 53
		generation 2083132 type 1 (regular)
		extent data disk byte 48023654400 nr 262144
		extent data offset 0 nr 262144 ram 262144
		extent compression 0 (none)
	item 53 key (1848 INODE_ITEM 0) itemoff 8611 itemsize 160
		generation 2083127 transid 2083127 size 262144 nbytes 3145728
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 12 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
		atime 0.0 (1970-01-01 01:00:00)
		ctime 1519231352.51151791 (2018-02-21 17:42:32)
		mtime 0.0 (1970-01-01 01:00:00)
		otime 0.0 (1970-01-01 01:00:00)
	item 54 key (1848 EXTENT_DATA 0) itemoff 8558 itemsize 53
		generation 2083127 type 1 (regular)
		extent data disk byte 48054853632 nr 262144
		extent data offset 0 nr 262144 ram 262144
		extent compression 0 (none)
	item 55 key (1849 INODE_ITEM 0) itemoff 8398 itemsize 160
		generation 2083132 transid 2083132 size 262144 nbytes 5505024
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 21 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
		atime 0.0 (1970-01-01 01:00:00)
		ctime 1519231491.286468713 (2018-02-21 17:44:51)
		mtime 0.0 (1970-01-01 01:00:00)
		otime 0.0 (1970-01-01 01:00:00)
	item 56 key (1849 EXTENT_DATA 0) itemoff 8345 itemsize 53
		generation 2083132 type 1 (regular)
		extent data disk byte 44981190656 nr 262144
		extent data offset 0 nr 262144 ram 262144
		extent compression 0 (none)
	item 57 key (1850 INODE_ITEM 0) itemoff 8185 itemsize 160
		generation 2083129 transid 2083129 size 262144 nbytes 4718592
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 18 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
		atime 0.0 (1970-01-01 01:00:00)
		ctime 1519231390.216060972 (2018-02-21 17:43:10)
		mtime 0.0 (1970-01-01 01:00:00)
		otime 0.0 (1970-01-01 01:00:00)
	item 58 key (1850 EXTENT_DATA 0) itemoff 8132 itemsize 53
		generation 2083129 type 1 (regular)
		extent data disk byte 43665002496 nr 262144
		extent data offset 0 nr 262144 ram 262144
		extent compression 0 (none)
	item 59 key (1851 INODE_ITEM 0) itemoff 7972 itemsize 160
		generation 2083128 transid 2083128 size 262144 nbytes 12058624
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 46 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
		atime 0.0 (1970-01-01 01:00:00)
		ctime 1519231382.743882966 (2018-02-21 17:43:02)
		mtime 0.0 (1970-01-01 01:00:00)
		otime 0.0 (1970-01-01 01:00:00)
	item 60 key (1851 EXTENT_DATA 0) itemoff 7919 itemsize 53
		generation 2083128 type 1 (regular)
		extent data disk byte 48013914112 nr 262144
		extent data offset 0 nr 262144 ram 262144
		extent compression 0 (none)
	item 61 key (1852 INODE_ITEM 0) itemoff 7759 itemsize 160
		generation 2083130 transid 2083130 size 262144 nbytes 4456448
		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
		sequence 17 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
		atime 0.0 (1970-01-01 01:00:00)
		ctime 1519231423.704858756 (2018-02-21 17:43:43)
		mtime 0.0 (1970-01-01 01:00:00)
		otime 0.0 (1970-01-01 01:00:00)
	item 62 key (1852 EXTENT_DATA 0) itemoff 7706 itemsize 53
		generation 2083130 type 1 (regular)
		extent data disk byte 48013127680 nr 262144
		extent data offset 0 nr 262144 ram 262144
		extent compression 0 (none)

=> I think at least the "magic" bytenr of 474185728 is the same as what
I got by the older progs/kernel



# btrfs restore -f 474185728 /dev/mapper/restore tmp/
Couldn't map the block 4503658729209856
No mapping for 4503658729209856-4503658729226240
Couldn't map the block 4503658729209856
bytenr mismatch, want=4503658729209856, have=0
Could not open root, trying backup super
Couldn't map the block 4503658729209856
No mapping for 4503658729209856-4503658729226240
Couldn't map the block 4503658729209856
bytenr mismatch, want=4503658729209856, have=0
Could not open root, trying backup super
Couldn't map the block 4503658729209856
No mapping for 4503658729209856-4503658729226240
Couldn't map the block 4503658729209856
bytenr mismatch, want=4503658729209856, have=0
Could not open root, trying backup super

=> tmp/ remains empty... so here is the major difference between 4.15
and the older ones


No kernel messages at all during the whole procedure.




> And strangely this time it works...
> 
> > 
> > dm-1 here is the external HDD (and the 130 corrupt are likely from
> > the
> > first btrfs-restore that I made while still on the NEW notebook
> > with
> > the possible bad RAM).
> > 
> > 
> > After that I did a fsck of the 8TB HDD / dm-1 ... and as you've
> > anyway
> > asked me above, a scrub of it.
> > Neither of both showed any errors.... (so still strange why it got
> > that
> > open_ctree error)
> 
> I'm surprise the corruption just disappeared...

Anything more on this from your side (I mean this spurious "open_ctree
failed" and it's even more unexplained vanishing)?
I haven't seen it again since then and wouldn't know anything further I
could do here... so I'd just forget about it.



> Fortunately (or unfortunately), no obvious problem with v1 space
> cache
> found yet.

I assume still nothing new here? Can one thus basically use the v1
space cache again or would you still rather keep it disabled until
further investigations have been made?



Cheers&thx,
Chris.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: spurious full btrfs corruption
  2018-03-16  0:03                                 ` Christoph Anton Mitterer
@ 2018-03-21 22:03                                   ` Christoph Anton Mitterer
  2018-03-26 14:32                                   ` Christoph Anton Mitterer
  1 sibling, 0 replies; 10+ messages in thread
From: Christoph Anton Mitterer @ 2018-03-21 22:03 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Just some addition on this:

On Fri, 2018-03-16 at 01:03 +0100, Christoph Anton Mitterer wrote:
> The issue that newer btrfs-progs/kernel don't restore anything at all
> from my corrupted fs:

4.13.3 seems to be already buggy...

4.7.3 works, but interestingly btrfs-find-super seems to hang on it
forever with 100% CPU but apparently no disc IO (works in later
versions, where it finishes in a few seconds).


Cheers,
Chris.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: spurious full btrfs corruption
  2018-03-16  0:03                                 ` Christoph Anton Mitterer
  2018-03-21 22:03                                   ` Christoph Anton Mitterer
@ 2018-03-26 14:32                                   ` Christoph Anton Mitterer
  1 sibling, 0 replies; 10+ messages in thread
From: Christoph Anton Mitterer @ 2018-03-26 14:32 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

Hey Qu.

Some update on the corruption issue on my Fujitsu notebook:


Finally got around running some memtest on it... and few seconds after
it started I already got this:
https://paste.pics/1ff8b13b94f31082bc7410acfb1c6693

So plenty of bad memory...

I'd say it's probably not so unlikely that *this* was the actual reason
for btrfs-metadata corruption.

It would perfectly fit to the symptom that I saw shortly before the fs
was completely destroyed:
The spurious csum errors on reads that went away when I read the file
again.



I'd guess you also found no further issue with the v1 space cache
and/or the tree log in the meantime?
So it's probably safe to turn them on again?



We(aka you + me testing fixes) can still look in the issue that newer
btrfsprogs no longer recover anything from the broken fs, while older
to.
I can keep the image around, so no reason to hurry from your side.



Cheers,
Chris.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-03-26 14:32 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <26306A4D-2D8E-4661-B89E-9F050FD184D5@scientia.net>
     [not found] ` <BF03A3DF-684D-47FE-A9AD-320256F64763@scientia.net>
     [not found]   ` <b6b4ed0e-9569-158d-19ad-82c885bab4ec@gmx.com>
     [not found]     ` <BB977A40-847C-4611-A468-9BF1137CE711@scientia.net>
     [not found]       ` <d2a14d69-dfa9-8794-6375-03d1b209632f@gmx.com>
     [not found]         ` <68697875-6E77-49C4-B54E-0FADB94700DA@scientia.net>
     [not found]           ` <99ee9b31-a38a-d479-5b1d-30e9c942d577@gmx.com>
     [not found]             ` <A19E863A-F83C-4630-9675-4309CECB318E@scientia.net>
     [not found]               ` <1b24e69e-2c1e-71af-fb1d-9d32f72cc78c@gmx.com>
     [not found]                 ` <8DB99A3B-6238-497D-A70F-8834CC014DCF@gmail.com>
2018-02-28  8:36                   ` Fwd: Re: BUG: unable to handle kernel paging request at ffff9fb75f827100 Qu Wenruo
     [not found]                     ` <1519833022.3714.122.camel@scientia.net>
2018-03-01  1:25                       ` spurious full btrfs corruption Qu Wenruo
2018-03-06  0:57                         ` Christoph Anton Mitterer
2018-03-06  1:50                           ` Qu Wenruo
2018-03-08 14:38                             ` Christoph Anton Mitterer
2018-03-08 23:48                               ` Qu Wenruo
2018-03-16  0:03                                 ` Christoph Anton Mitterer
2018-03-21 22:03                                   ` Christoph Anton Mitterer
2018-03-26 14:32                                   ` Christoph Anton Mitterer
2018-03-07  3:09                           ` Duncan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.