On 2018/12/30 上午8:48, Tomáš Metelka wrote: > Ok, I've got it:-( > > But just a few questions: I've tried (with btrfs-progs v4.19.1) to > recover files through btrfs restore -s -m -S -v -i ... and following > events occurred: > > 1) Just 1 "hard" error: > ERROR: cannot map block logical 117058830336 length 1073741824: -2 > Error copying data for /mnt/... > (file which absence really doesn't pain me:-)) This means one data extent can't be recovered due to missing chunk mapping. Not impossible for heavily damaged fs, but nothing serious. > > 2) For 24 files a I got "too much loops" warning (U mean this: "if > (loops >= 0 && loops++ >= 1024) { ..."). I've always answered yes but > I'm afraid these files are corrupted (at least 2 of them seems corrupted). > > How much bad is this? Not sure, but I don't think store is robust enough for such case. Maybe false alert. > Does the error mentioned in #1 mean that it's the > only file which is totally lost? Not even total lost, as it's just one file extent, maybe other part is OK. Thanks, Qu > I can live without those 24 + 1 files > so if #1 and #2 would be the only errors then I could say the recovery > was successful ... but I'm afraid things aren't such easy:-) > > Thanks > M. > > >   Tomáš Metelka >   Business & IT Analyst > >   Tel: +420 728 627 252 >   Email: tomas.metelka@metaliza.cz > > > > On 24. 12. 18 15:19, Qu Wenruo wrote: >> >> >> On 2018/12/24 下午9:52, Tomáš Metelka wrote: >>> On 24. 12. 18 14:02, Qu Wenruo wrote: >>>> btrfs check --readonly output please. >>>> >>>> btrfs check --readonly is always the most reliable and detailed output >>>> for any possible recovery. >>> >>> This is very weird because it prints only: >>> ERROR: cannot open file system >> >> A new place to enhance ;) >> >>> >>> I've tried also "btrfs check -r 75152310272" but it only says: >>> parent transid verify failed on 75152310272 wanted 2488742 found 2488741 >>> parent transid verify failed on 75152310272 wanted 2488742 found 2488741 >>> Ignoring transid failure >>> ERROR: cannot open file system >>> >>> I've tried that because: >>>      backup 3: >>>   backup_tree_root:    75152310272    gen: 2488741 level: 1 >>> >>>> Also kernel message for the mount failure could help. >>> >>> Sorry, my fault, I should start from this point: >>> >>> Dec 23 21:59:07 tisc5 kernel: [10319.442615] BTRFS: device fsid >>> be557007-42c9-4079-be16-568997e94cd9 devid 1 transid 2488742 /dev/loop0 >>> Dec 23 22:00:49 tisc5 kernel: [10421.167028] BTRFS info (device loop0): >>> disk space caching is enabled >>> Dec 23 22:00:49 tisc5 kernel: [10421.167034] BTRFS info (device loop0): >>> has skinny extents >>> Dec 23 22:00:50 tisc5 kernel: [10421.807564] BTRFS critical (device >>> loop0): corrupt node: root=1 block=75150311424 slot=245, invalid NULL >>> node pointer >> This explains the problem. >> >> Your root tree has one node pointer which is not correct. >> For pointer it should never points to 0. >> >> This is pretty weird, at least some corruption pattern I have never seen. >> >> Since your tree root get corrupted, there isn't much thing we can do, >> but try to use older tree roots. >> >> You could go try all backup roots, starting from the newest backup (with >> highest generation), and check the backup root bytenr using: >> # btrfs check -r >> >> To see which one get least error, but normally the chance is near 0. >> >>> Dec 23 22:00:50 tisc5 kernel: [10421.807653] BTRFS error (device loop0): >>> failed to read block groups: -5 >>> Dec 23 22:00:50 tisc5 kernel: [10421.877001] BTRFS error (device loop0): >>> open_ctree failed >>> >>> >>> So i tried to do: >>> 1) btrfs inspect-internal dump-super (with the snippet posted above) >>> 2) btrfs inspect-internal dump-tree -b 75150311424 >>> >>> And it showed (header + snippet for items 243-248): >>> node 75150311424 level 1 items 249 free 244 generation 2488741 owner 2 >>> fs uuid be557007-42c9-4079-be16-568997e94cd9 >>> chunk uuid dbe69c7e-2d50-4001-af31-148c5475b48b >>> ... >>>    key (14799519744 EXTENT_ITEM 4096) block 233423224832 (14247023) gen >>> 2484894 >>>    key (14811271168 EXTENT_ITEM 135168) block 656310272 (40058) gen >>> 2488049 >> >> >>>    key (1505328190277054464 UNKNOWN.4 366981796979539968) block 0 (0) >>> gen 0 >>>    key (0 UNKNOWN.0 1419267647995904) block 6468220747776 (394788864) >>> gen >>> 7786775707648 >> >> Pretty obviously, these two nodes are garbage. >> Something corrupted the memory at runtime, and we don't have runtime >> check against corruption yet. >> >> So IMHO, I think the problem is, some kernel code, either btrfs or other >> parts, corrupted the memory. >> And then btrfs fails to detect it, write it back to disk, and finally >> kernel get its chance to read the tree block from disk and finally >> caught the problem. >> >> I could add such check for node, but normally it needs >> CONFIG_BTRFS_FS_CHECK_INTEGRITY, so makes no sense for normal user. >> >>>    key (12884901888 EXTENT_ITEM 24576) block 816693248 (49847) gen >>> 2484931 >>>    key (14902849536 EXTENT_ITEM 131072) block 75135844352 (4585928) gen >>> 2488739 >>> >>> >>> I looked at that numbers quite a while (also in hex) trying to figure >>> out what has happened (bit flips (it was on SSD), byte shifts (I >>> suspected bad CPU also ... because it has died after 2 months from >>> that)) and tried to guess "correct" values for that items ... but no >>> idea:-( >> >> I'm not that sure, unless you're super lucky (or unlucky in this case), >> or it will normally get caught by csum first. >> >>> >>> So this why I have asked about that log_root and whether there is a >>> chance to "log-replay things":-) >> >> For your case, definitely not related to log replay. >> >> Thanks, >> Qu >> >>> >>> >>> Thanks >>> M. >>