Re: Please help. Repair probably bitflip damage and suspected bug

* Re: Please help. Repair probably bitflip damage and suspected bug
@ 2017-06-19 10:43 Jesse
  2017-06-21  3:55 ` Chris Murphy
  0 siblings, 1 reply; 4+ messages in thread
From: Jesse @ 2017-06-19 10:43 UTC (permalink / raw)
  To: Jesse; +Cc: linux-btrfs

I just noticed a series of seemingly btrfs related call traces that
for the first time, did not lock up the system.

I have uploaded dmesg to https://paste.ee/p/An8Qy

Anyone able to help advise on these?

Thanks

Jesse

On 19 June 2017 at 17:19, Jesse <btrfs_mail_list@mymail.isbest.biz> wrote:
> Further to the above message reporting problems, I have been able to
> capture a call trace under the main system rather than live media.
>
> Note this occurred in rsync from btrfs to a separate drive running xfs
> on a local filesystem (both sata drives). So I presume that btrfs is
> only reading the drive at the time of crash, unless rsync is also
> doing some sort of disc caching of the files to btrfs as it is the OS
> filesystem.
>
> The destination drive directories being copied to in this case were
> empty, so I was making a copy of the data off of the btrfs drive (due
> to the btrfs tree errors and problems reported in the post I am here
> replying to).
>
> I am suspecting that there is a direct correlation to using rsync
> while (or subsequent to) touching areas of the btrfs tree that have
> corruption which results in a complete system lockup/crash.
>
> I have also noted that when these crashes while running rsync occur,
> the prior x files (eg: 10 files) show in the rsync log as being
> synced, however, show on the destination drive with filesize of zero.
>
> The trace (/var/log/messages | grep btrfs) I have uploaded to
> https://paste.ee/p/nRcj0
>
> The important part of which is:
>
> Jun 18 23:43:24 Orion vmunix: [38084.183174] BTRFS info (device sda2):
> no csum found for inode 12497 start 0
> Jun 18 23:43:24 Orion vmunix: [38084.183195] BTRFS info (device sda2):
> no csum found for inode 12497 start 0
> Jun 18 23:43:24 Orion vmunix: [38084.183209] BTRFS info (device sda2):
> no csum found for inode 12497 start 0
> Jun 18 23:43:24 Orion vmunix: [38084.183222] BTRFS info (device sda2):
> no csum found for inode 12497 start 0
> Jun 18 23:43:24 Orion vmunix: [38084.217552] BTRFS info (device sda2):
> csum failed ino 12497 extent 1700305813504 csum 1405070872 wanted 0
> mirror 0
> Jun 18 23:43:24 Orion vmunix: [38084.217626] BTRFS info (device sda2):
> no csum found for inode 12497 start 0
> Jun 18 23:43:24 Orion vmunix: [38084.217643] BTRFS info (device sda2):
> no csum found for inode 12497 start 0
> Jun 18 23:43:24 Orion vmunix: [38084.217657] BTRFS info (device sda2):
> no csum found for inode 12497 start 0
> Jun 18 23:43:24 Orion vmunix: [38084.217669] BTRFS info (device sda2):
> no csum found for inode 12497 start 0
> Jun 18 23:43:24 Orion vmunix:  auth_rpcgss nfs_acl nfs lockd grace
> sunrpc fscache zfs(POE) zunicode(POE) zcommon(POE) znvpair(POE)
> spl(OE) zavl(POE) btrfs xor raid6_pq dm_mirror dm_region_hash dm_log
> hid_generic usbhid hid uas usb_storage radeon i2c_algo_bit ttm
> drm_kms_helper drm r8169 ahci mii libahci wmi
> Jun 18 23:43:24 Orion vmunix: [38084.220604] Workqueue: btrfs-endio
> btrfs_endio_helper [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.220812] RIP:
> 0010:[<ffffffffc048804a>]  [<ffffffffc048804a>]
> __btrfs_map_block+0x32a/0x1180 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.222459]  [<ffffffffc0452c00>] ?
> __btrfs_lookup_bio_sums.isra.8+0x3e0/0x540 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.222632]  [<ffffffffc048cf3d>]
> btrfs_map_bio+0x7d/0x2b0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.222781]  [<ffffffffc04aaf64>]
> btrfs_submit_compressed_read+0x484/0x4e0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.222948]  [<ffffffffc04651e1>]
> btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.223198]  [<ffffffffc047fd90>] ?
> btrfs_create_repair_bio+0xf0/0x110 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.223360]  [<ffffffffc047fec7>]
> bio_readpage_error+0x117/0x180 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.223514]  [<ffffffffc04802f0>] ?
> clean_io_failure+0x1b0/0x1b0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.223667]  [<ffffffffc04806ae>]
> end_bio_extent_readpage+0x3be/0x3f0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.223996]  [<ffffffffc0457838>]
> end_workqueue_fn+0x48/0x60 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.224145]  [<ffffffffc0490de2>]
> normal_work_helper+0x82/0x210 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.224297]  [<ffffffffc0491042>]
> btrfs_endio_helper+0x12/0x20 [btrfs]
> Jun 18 23:43:24 Orion vmunix:  auth_rpcgss nfs_acl nfs lockd grace
> sunrpc fscache zfs(POE) zunicode(POE) zcommon(POE) znvpair(POE)
> spl(OE) zavl(POE) btrfs xor raid6_pq dm_mirror dm_region_hash dm_log
> hid_generic usbhid hid uas usb_storage radeon i2c_algo_bit ttm
> drm_kms_helper drm r8169 ahci mii libahci wmi
> Jun 18 23:43:24 Orion vmunix: [38084.330053]  [<ffffffffc048804a>] ?
> __btrfs_map_block+0x32a/0x1180 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330106]  [<ffffffffc0487fec>] ?
> __btrfs_map_block+0x2cc/0x1180 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330154]  [<ffffffffc0452c00>] ?
> __btrfs_lookup_bio_sums.isra.8+0x3e0/0x540 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330205]  [<ffffffffc048cf3d>]
> btrfs_map_bio+0x7d/0x2b0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330257]  [<ffffffffc04aaf64>]
> btrfs_submit_compressed_read+0x484/0x4e0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330304]  [<ffffffffc04651e1>]
> btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330361]  [<ffffffffc047fd90>] ?
> btrfs_create_repair_bio+0xf0/0x110 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330412]  [<ffffffffc047fec7>]
> bio_readpage_error+0x117/0x180 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330462]  [<ffffffffc04802f0>] ?
> clean_io_failure+0x1b0/0x1b0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330513]  [<ffffffffc04806ae>]
> end_bio_extent_readpage+0x3be/0x3f0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330568]  [<ffffffffc0457838>]
> end_workqueue_fn+0x48/0x60 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330618]  [<ffffffffc0490de2>]
> normal_work_helper+0x82/0x210 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330668]  [<ffffffffc0491042>]
> btrfs_endio_helper+0x12/0x20 [btrfs]
> Jun 18 23:43:24 Orion vmunix:  auth_rpcgss nfs_acl nfs lockd grace
> sunrpc fscache zfs(POE) zunicode(POE) zcommon(POE) znvpair(POE)
> spl(OE) zavl(POE) btrfs xor raid6_pq dm_mirror dm_region_hash dm_log
> hid_generic usbhid hid uas usb_storage radeon i2c_algo_bit ttm
> drm_kms_helper drm r8169 ahci mii libahci wmi
> Jun 18 23:43:24 Orion vmunix: [38084.331102]  [<ffffffffc048804a>] ?
> __btrfs_map_block+0x32a/0x1180 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.331152]  [<ffffffffc0487fec>] ?
> __btrfs_map_block+0x2cc/0x1180 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.331202]  [<ffffffffc0452c00>] ?
> __btrfs_lookup_bio_sums.isra.8+0x3e0/0x540 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.331255]  [<ffffffffc048cf3d>]
> btrfs_map_bio+0x7d/0x2b0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.331310]  [<ffffffffc04aaf64>]
> btrfs_submit_compressed_read+0x484/0x4e0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.331360]  [<ffffffffc04651e1>]
> btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.331423]  [<ffffffffc047fd90>] ?
> btrfs_create_repair_bio+0xf0/0x110 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.331477]  [<ffffffffc047fec7>]
> bio_readpage_error+0x117/0x180 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.331530]  [<ffffffffc04802f0>] ?
> clean_io_failure+0x1b0/0x1b0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.331585]  [<ffffffffc04806ae>]
> end_bio_extent_readpage+0x3be/0x3f0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.331649]  [<ffffffffc0457838>]
> end_workqueue_fn+0x48/0x60 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.331703]  [<ffffffffc0490de2>]
> normal_work_helper+0x82/0x210 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.331757]  [<ffffffffc0491042>]
> btrfs_endio_helper+0x12/0x20 [btrfs]
> Jun 19 07:29:22 Orion vmunix: [    3.107113] Btrfs loaded
> Jun 19 07:29:22 Orion vmunix: [    3.665536] BTRFS: device label
> btrfs1 devid 2 transid 1086759 /dev/sdb2
> Jun 19 07:29:22 Orion vmunix: [    3.665811] BTRFS: device label
> btrfs1 devid 1 transid 1086759 /dev/sda2
> Jun 19 07:29:22 Orion vmunix: [    8.673689] BTRFS info (device sda2):
> disk space caching is enabled
> Jun 19 07:29:22 Orion vmunix: [   28.190962] BTRFS info (device sda2):
> enabling auto defrag
> Jun 19 07:29:22 Orion vmunix: [   28.191039] BTRFS info (device sda2):
> disk space caching is enabled
>
> I notice that this page
> https://btrfs.wiki.kernel.org/index.php/Gotchas mentions "Files with a
> lot of random writes can become heavily fragmented (10000+ extents)
> causing thrashing on HDDs and excessive multi-second spikes" and as
> such I am wondering if this is related to the crashing. AFAIK rsync
> should be creating the temp file in the destination drive (xfs),
> unless there is some part of rsync that I am not understanding that
> would be writing to the file system drive (btrfs)  which is also in
> the case the source hdd (btrfs).
>
> Can someone please help with these btrfs problems.
>
> Thankyou
>
>
>
>> My Linux Mint system is starting up and usable, however, I am unable
>> to complete any scrub as they abort before finished. There are various
>> inode errors in dmesg. Badblocks (readonly) finds no errors. checking
>> extents gives bad block 5123372711936 on both /dev/sda2 and /dev/sda2.
>> A btrfscheck (readonly) results in a 306MB text file of output of root
>> xxx inode errors.
>> There are two drives 3TB each in RAID 1 for sda2/sdb2 for which
>> partition 2 is nearly the entire drive.
>>
>> Currently I am now using a Manjaro Live Boot with btrfs tools
>> btrfs-progs v4.10.1 in an attempt to recover/repair what seems to be
>> bitflip
>> (The original Linux Mint System has btrfs-progs v4.5.3)
>>
>> When doing a scrub on '/', the status of /dev/sdb2 aborts always at ~
>> 383GiB with 0 errors. Whereas the /dev/sda2 and thus the '/' scrub
>> aborts at more diverse values starting at 537.90GiB with 0 errors.
>>
>> btrfs inspect-internal dump-tree -b 5123372711936 has one item
>> evidently out of order:
>> 2551224532992 -> 2551253647360 -> 2551251468288
>>
>> I am currently attempting to copy files off the system while in
>> Manjaro using rsync prior to attempting whatever the knowlegable
>> people here recommend. It has resulting in two files not being able to
>> be read so far, however, a lot of messages in dmesg for btrfs errors
>> https://ptpb.pw/L9Z9
>>
>> Pastebins from original machine:
>> System specs as on original Linux Mint system: https://ptpb.pw/dFz3
>> dmesg btrfs grep from prior to errors starting until scrub attempts:
>> https://ptpb.pw/rTzs
>>
>> Pastebins from subsequent live boot with newer btrfs tools 4.10:
>> LiveBoot Repair (Manjaro Arch) specs: https://ptpb.pw/ikMM
>> Scrub failing/aborting at same place on /dev/sdb: https://ptpb.pw/-vcP
>> badblock_extent_btrfscheck_5123372711936: https://ptpb.pw/T1rD
>> 'btrfs inspect-internal dump-tree -b 5123372711936 /dev/sda2':
>> https://ptpb.pw/zcyI
>> 'btrfs inspect-internal dump-tree -b 5123372711936 /dev/sdb2':
>> https://ptpb.pw/zcyI
>> dmesg on Manjaro attempting to rsync recover files: https://ptpb.pw/L9Z9
>>
>> I have also just noticed that btrfs causes a complete system crash
>> when copying a certain file(s).
>>
>> RTP: [<ffffffffa055b122>] btrfs_check_repairable+0xf2/0x100 [btrfs]
>> BUG: unable to handle kernel paging request at 000000000dealc93
>> IP: [<ffffffff810c3e0b>] __wake_up _common+0x2b/0x80
>> I have taken a photo of the last part of this. As it is in the live
>> DVD boot distro during repair, it did not save the trace to a file
>> that would be accessible later.
>> The photo is at https://pasteboard.co/MTP2OGMK.jpg
>>
>> Could someone please advise the steps to repair this.
>>
>> Thankyou
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread