All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Please help. Repair probably bitflip damage and suspected bug
@ 2017-06-19 10:43 Jesse
  2017-06-21  3:55 ` Chris Murphy
  0 siblings, 1 reply; 4+ messages in thread
From: Jesse @ 2017-06-19 10:43 UTC (permalink / raw)
  To: Jesse; +Cc: linux-btrfs

I just noticed a series of seemingly btrfs related call traces that
for the first time, did not lock up the system.

I have uploaded dmesg to https://paste.ee/p/An8Qy

Anyone able to help advise on these?

Thanks

Jesse


On 19 June 2017 at 17:19, Jesse <btrfs_mail_list@mymail.isbest.biz> wrote:
> Further to the above message reporting problems, I have been able to
> capture a call trace under the main system rather than live media.
>
> Note this occurred in rsync from btrfs to a separate drive running xfs
> on a local filesystem (both sata drives). So I presume that btrfs is
> only reading the drive at the time of crash, unless rsync is also
> doing some sort of disc caching of the files to btrfs as it is the OS
> filesystem.
>
> The destination drive directories being copied to in this case were
> empty, so I was making a copy of the data off of the btrfs drive (due
> to the btrfs tree errors and problems reported in the post I am here
> replying to).
>
> I am suspecting that there is a direct correlation to using rsync
> while (or subsequent to) touching areas of the btrfs tree that have
> corruption which results in a complete system lockup/crash.
>
> I have also noted that when these crashes while running rsync occur,
> the prior x files (eg: 10 files) show in the rsync log as being
> synced, however, show on the destination drive with filesize of zero.
>
> The trace (/var/log/messages | grep btrfs) I have uploaded to
> https://paste.ee/p/nRcj0
>
> The important part of which is:
>
> Jun 18 23:43:24 Orion vmunix: [38084.183174] BTRFS info (device sda2):
> no csum found for inode 12497 start 0
> Jun 18 23:43:24 Orion vmunix: [38084.183195] BTRFS info (device sda2):
> no csum found for inode 12497 start 0
> Jun 18 23:43:24 Orion vmunix: [38084.183209] BTRFS info (device sda2):
> no csum found for inode 12497 start 0
> Jun 18 23:43:24 Orion vmunix: [38084.183222] BTRFS info (device sda2):
> no csum found for inode 12497 start 0
> Jun 18 23:43:24 Orion vmunix: [38084.217552] BTRFS info (device sda2):
> csum failed ino 12497 extent 1700305813504 csum 1405070872 wanted 0
> mirror 0
> Jun 18 23:43:24 Orion vmunix: [38084.217626] BTRFS info (device sda2):
> no csum found for inode 12497 start 0
> Jun 18 23:43:24 Orion vmunix: [38084.217643] BTRFS info (device sda2):
> no csum found for inode 12497 start 0
> Jun 18 23:43:24 Orion vmunix: [38084.217657] BTRFS info (device sda2):
> no csum found for inode 12497 start 0
> Jun 18 23:43:24 Orion vmunix: [38084.217669] BTRFS info (device sda2):
> no csum found for inode 12497 start 0
> Jun 18 23:43:24 Orion vmunix:  auth_rpcgss nfs_acl nfs lockd grace
> sunrpc fscache zfs(POE) zunicode(POE) zcommon(POE) znvpair(POE)
> spl(OE) zavl(POE) btrfs xor raid6_pq dm_mirror dm_region_hash dm_log
> hid_generic usbhid hid uas usb_storage radeon i2c_algo_bit ttm
> drm_kms_helper drm r8169 ahci mii libahci wmi
> Jun 18 23:43:24 Orion vmunix: [38084.220604] Workqueue: btrfs-endio
> btrfs_endio_helper [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.220812] RIP:
> 0010:[<ffffffffc048804a>]  [<ffffffffc048804a>]
> __btrfs_map_block+0x32a/0x1180 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.222459]  [<ffffffffc0452c00>] ?
> __btrfs_lookup_bio_sums.isra.8+0x3e0/0x540 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.222632]  [<ffffffffc048cf3d>]
> btrfs_map_bio+0x7d/0x2b0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.222781]  [<ffffffffc04aaf64>]
> btrfs_submit_compressed_read+0x484/0x4e0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.222948]  [<ffffffffc04651e1>]
> btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.223198]  [<ffffffffc047fd90>] ?
> btrfs_create_repair_bio+0xf0/0x110 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.223360]  [<ffffffffc047fec7>]
> bio_readpage_error+0x117/0x180 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.223514]  [<ffffffffc04802f0>] ?
> clean_io_failure+0x1b0/0x1b0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.223667]  [<ffffffffc04806ae>]
> end_bio_extent_readpage+0x3be/0x3f0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.223996]  [<ffffffffc0457838>]
> end_workqueue_fn+0x48/0x60 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.224145]  [<ffffffffc0490de2>]
> normal_work_helper+0x82/0x210 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.224297]  [<ffffffffc0491042>]
> btrfs_endio_helper+0x12/0x20 [btrfs]
> Jun 18 23:43:24 Orion vmunix:  auth_rpcgss nfs_acl nfs lockd grace
> sunrpc fscache zfs(POE) zunicode(POE) zcommon(POE) znvpair(POE)
> spl(OE) zavl(POE) btrfs xor raid6_pq dm_mirror dm_region_hash dm_log
> hid_generic usbhid hid uas usb_storage radeon i2c_algo_bit ttm
> drm_kms_helper drm r8169 ahci mii libahci wmi
> Jun 18 23:43:24 Orion vmunix: [38084.330053]  [<ffffffffc048804a>] ?
> __btrfs_map_block+0x32a/0x1180 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330106]  [<ffffffffc0487fec>] ?
> __btrfs_map_block+0x2cc/0x1180 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330154]  [<ffffffffc0452c00>] ?
> __btrfs_lookup_bio_sums.isra.8+0x3e0/0x540 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330205]  [<ffffffffc048cf3d>]
> btrfs_map_bio+0x7d/0x2b0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330257]  [<ffffffffc04aaf64>]
> btrfs_submit_compressed_read+0x484/0x4e0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330304]  [<ffffffffc04651e1>]
> btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330361]  [<ffffffffc047fd90>] ?
> btrfs_create_repair_bio+0xf0/0x110 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330412]  [<ffffffffc047fec7>]
> bio_readpage_error+0x117/0x180 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330462]  [<ffffffffc04802f0>] ?
> clean_io_failure+0x1b0/0x1b0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330513]  [<ffffffffc04806ae>]
> end_bio_extent_readpage+0x3be/0x3f0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330568]  [<ffffffffc0457838>]
> end_workqueue_fn+0x48/0x60 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330618]  [<ffffffffc0490de2>]
> normal_work_helper+0x82/0x210 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.330668]  [<ffffffffc0491042>]
> btrfs_endio_helper+0x12/0x20 [btrfs]
> Jun 18 23:43:24 Orion vmunix:  auth_rpcgss nfs_acl nfs lockd grace
> sunrpc fscache zfs(POE) zunicode(POE) zcommon(POE) znvpair(POE)
> spl(OE) zavl(POE) btrfs xor raid6_pq dm_mirror dm_region_hash dm_log
> hid_generic usbhid hid uas usb_storage radeon i2c_algo_bit ttm
> drm_kms_helper drm r8169 ahci mii libahci wmi
> Jun 18 23:43:24 Orion vmunix: [38084.331102]  [<ffffffffc048804a>] ?
> __btrfs_map_block+0x32a/0x1180 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.331152]  [<ffffffffc0487fec>] ?
> __btrfs_map_block+0x2cc/0x1180 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.331202]  [<ffffffffc0452c00>] ?
> __btrfs_lookup_bio_sums.isra.8+0x3e0/0x540 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.331255]  [<ffffffffc048cf3d>]
> btrfs_map_bio+0x7d/0x2b0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.331310]  [<ffffffffc04aaf64>]
> btrfs_submit_compressed_read+0x484/0x4e0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.331360]  [<ffffffffc04651e1>]
> btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.331423]  [<ffffffffc047fd90>] ?
> btrfs_create_repair_bio+0xf0/0x110 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.331477]  [<ffffffffc047fec7>]
> bio_readpage_error+0x117/0x180 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.331530]  [<ffffffffc04802f0>] ?
> clean_io_failure+0x1b0/0x1b0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.331585]  [<ffffffffc04806ae>]
> end_bio_extent_readpage+0x3be/0x3f0 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.331649]  [<ffffffffc0457838>]
> end_workqueue_fn+0x48/0x60 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.331703]  [<ffffffffc0490de2>]
> normal_work_helper+0x82/0x210 [btrfs]
> Jun 18 23:43:24 Orion vmunix: [38084.331757]  [<ffffffffc0491042>]
> btrfs_endio_helper+0x12/0x20 [btrfs]
> Jun 19 07:29:22 Orion vmunix: [    3.107113] Btrfs loaded
> Jun 19 07:29:22 Orion vmunix: [    3.665536] BTRFS: device label
> btrfs1 devid 2 transid 1086759 /dev/sdb2
> Jun 19 07:29:22 Orion vmunix: [    3.665811] BTRFS: device label
> btrfs1 devid 1 transid 1086759 /dev/sda2
> Jun 19 07:29:22 Orion vmunix: [    8.673689] BTRFS info (device sda2):
> disk space caching is enabled
> Jun 19 07:29:22 Orion vmunix: [   28.190962] BTRFS info (device sda2):
> enabling auto defrag
> Jun 19 07:29:22 Orion vmunix: [   28.191039] BTRFS info (device sda2):
> disk space caching is enabled
>
> I notice that this page
> https://btrfs.wiki.kernel.org/index.php/Gotchas mentions "Files with a
> lot of random writes can become heavily fragmented (10000+ extents)
> causing thrashing on HDDs and excessive multi-second spikes" and as
> such I am wondering if this is related to the crashing. AFAIK rsync
> should be creating the temp file in the destination drive (xfs),
> unless there is some part of rsync that I am not understanding that
> would be writing to the file system drive (btrfs)  which is also in
> the case the source hdd (btrfs).
>
> Can someone please help with these btrfs problems.
>
> Thankyou
>
>
>
>> My Linux Mint system is starting up and usable, however, I am unable
>> to complete any scrub as they abort before finished. There are various
>> inode errors in dmesg. Badblocks (readonly) finds no errors. checking
>> extents gives bad block 5123372711936 on both /dev/sda2 and /dev/sda2.
>> A btrfscheck (readonly) results in a 306MB text file of output of root
>> xxx inode errors.
>> There are two drives 3TB each in RAID 1 for sda2/sdb2 for which
>> partition 2 is nearly the entire drive.
>>
>> Currently I am now using a Manjaro Live Boot with btrfs tools
>> btrfs-progs v4.10.1 in an attempt to recover/repair what seems to be
>> bitflip
>> (The original Linux Mint System has btrfs-progs v4.5.3)
>>
>> When doing a scrub on '/', the status of /dev/sdb2 aborts always at ~
>> 383GiB with 0 errors. Whereas the /dev/sda2 and thus the '/' scrub
>> aborts at more diverse values starting at 537.90GiB with 0 errors.
>>
>> btrfs inspect-internal dump-tree -b 5123372711936 has one item
>> evidently out of order:
>> 2551224532992 -> 2551253647360 -> 2551251468288
>>
>> I am currently attempting to copy files off the system while in
>> Manjaro using rsync prior to attempting whatever the knowlegable
>> people here recommend. It has resulting in two files not being able to
>> be read so far, however, a lot of messages in dmesg for btrfs errors
>> https://ptpb.pw/L9Z9
>>
>> Pastebins from original machine:
>> System specs as on original Linux Mint system: https://ptpb.pw/dFz3
>> dmesg btrfs grep from prior to errors starting until scrub attempts:
>> https://ptpb.pw/rTzs
>>
>> Pastebins from subsequent live boot with newer btrfs tools 4.10:
>> LiveBoot Repair (Manjaro Arch) specs: https://ptpb.pw/ikMM
>> Scrub failing/aborting at same place on /dev/sdb: https://ptpb.pw/-vcP
>> badblock_extent_btrfscheck_5123372711936: https://ptpb.pw/T1rD
>> 'btrfs inspect-internal dump-tree -b 5123372711936 /dev/sda2':
>> https://ptpb.pw/zcyI
>> 'btrfs inspect-internal dump-tree -b 5123372711936 /dev/sdb2':
>> https://ptpb.pw/zcyI
>> dmesg on Manjaro attempting to rsync recover files: https://ptpb.pw/L9Z9
>>
>> I have also just noticed that btrfs causes a complete system crash
>> when copying a certain file(s).
>>
>> RTP: [<ffffffffa055b122>] btrfs_check_repairable+0xf2/0x100 [btrfs]
>> BUG: unable to handle kernel paging request at 000000000dealc93
>> IP: [<ffffffff810c3e0b>] __wake_up _common+0x2b/0x80
>> I have taken a photo of the last part of this. As it is in the live
>> DVD boot distro during repair, it did not save the trace to a file
>> that would be accessible later.
>> The photo is at https://pasteboard.co/MTP2OGMK.jpg
>>
>> Could someone please advise the steps to repair this.
>>
>> Thankyou
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Please help. Repair probably bitflip damage and suspected bug
  2017-06-19 10:43 Please help. Repair probably bitflip damage and suspected bug Jesse
@ 2017-06-21  3:55 ` Chris Murphy
  0 siblings, 0 replies; 4+ messages in thread
From: Chris Murphy @ 2017-06-21  3:55 UTC (permalink / raw)
  To: Jesse; +Cc: linux-btrfs

>
[Sun Jun 18 04:02:43 2017] BTRFS critical (device sdb2): corrupt node,
bad key order: block=5123372711936, root=1, slot=82


>From the archives, most likely it's bad RAM. I see this system also
uses XFS v4 file system, if it were made as XFS v5 using metadata
csums you'd probably eventually run into a similar problem that would
be caught by metadata checksum errors. It'll fail faster with Btrfs
because it's checksumming everything.


Chris Murphy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Please help. Repair probably bitflip damage and suspected bug
@ 2017-06-19  9:19 Jesse
  0 siblings, 0 replies; 4+ messages in thread
From: Jesse @ 2017-06-19  9:19 UTC (permalink / raw)
  To: Jesse; +Cc: linux-btrfs

Further to the above message reporting problems, I have been able to
capture a call trace under the main system rather than live media.

Note this occurred in rsync from btrfs to a separate drive running xfs
on a local filesystem (both sata drives). So I presume that btrfs is
only reading the drive at the time of crash, unless rsync is also
doing some sort of disc caching of the files to btrfs as it is the OS
filesystem.

The destination drive directories being copied to in this case were
empty, so I was making a copy of the data off of the btrfs drive (due
to the btrfs tree errors and problems reported in the post I am here
replying to).

I am suspecting that there is a direct correlation to using rsync
while (or subsequent to) touching areas of the btrfs tree that have
corruption which results in a complete system lockup/crash.

I have also noted that when these crashes while running rsync occur,
the prior x files (eg: 10 files) show in the rsync log as being
synced, however, show on the destination drive with filesize of zero.

The trace (/var/log/messages | grep btrfs) I have uploaded to
https://paste.ee/p/nRcj0

The important part of which is:

Jun 18 23:43:24 Orion vmunix: [38084.183174] BTRFS info (device sda2):
no csum found for inode 12497 start 0
Jun 18 23:43:24 Orion vmunix: [38084.183195] BTRFS info (device sda2):
no csum found for inode 12497 start 0
Jun 18 23:43:24 Orion vmunix: [38084.183209] BTRFS info (device sda2):
no csum found for inode 12497 start 0
Jun 18 23:43:24 Orion vmunix: [38084.183222] BTRFS info (device sda2):
no csum found for inode 12497 start 0
Jun 18 23:43:24 Orion vmunix: [38084.217552] BTRFS info (device sda2):
csum failed ino 12497 extent 1700305813504 csum 1405070872 wanted 0
mirror 0
Jun 18 23:43:24 Orion vmunix: [38084.217626] BTRFS info (device sda2):
no csum found for inode 12497 start 0
Jun 18 23:43:24 Orion vmunix: [38084.217643] BTRFS info (device sda2):
no csum found for inode 12497 start 0
Jun 18 23:43:24 Orion vmunix: [38084.217657] BTRFS info (device sda2):
no csum found for inode 12497 start 0
Jun 18 23:43:24 Orion vmunix: [38084.217669] BTRFS info (device sda2):
no csum found for inode 12497 start 0
Jun 18 23:43:24 Orion vmunix:  auth_rpcgss nfs_acl nfs lockd grace
sunrpc fscache zfs(POE) zunicode(POE) zcommon(POE) znvpair(POE)
spl(OE) zavl(POE) btrfs xor raid6_pq dm_mirror dm_region_hash dm_log
hid_generic usbhid hid uas usb_storage radeon i2c_algo_bit ttm
drm_kms_helper drm r8169 ahci mii libahci wmi
Jun 18 23:43:24 Orion vmunix: [38084.220604] Workqueue: btrfs-endio
btrfs_endio_helper [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.220812] RIP:
0010:[<ffffffffc048804a>]  [<ffffffffc048804a>]
__btrfs_map_block+0x32a/0x1180 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.222459]  [<ffffffffc0452c00>] ?
__btrfs_lookup_bio_sums.isra.8+0x3e0/0x540 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.222632]  [<ffffffffc048cf3d>]
btrfs_map_bio+0x7d/0x2b0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.222781]  [<ffffffffc04aaf64>]
btrfs_submit_compressed_read+0x484/0x4e0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.222948]  [<ffffffffc04651e1>]
btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.223198]  [<ffffffffc047fd90>] ?
btrfs_create_repair_bio+0xf0/0x110 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.223360]  [<ffffffffc047fec7>]
bio_readpage_error+0x117/0x180 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.223514]  [<ffffffffc04802f0>] ?
clean_io_failure+0x1b0/0x1b0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.223667]  [<ffffffffc04806ae>]
end_bio_extent_readpage+0x3be/0x3f0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.223996]  [<ffffffffc0457838>]
end_workqueue_fn+0x48/0x60 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.224145]  [<ffffffffc0490de2>]
normal_work_helper+0x82/0x210 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.224297]  [<ffffffffc0491042>]
btrfs_endio_helper+0x12/0x20 [btrfs]
Jun 18 23:43:24 Orion vmunix:  auth_rpcgss nfs_acl nfs lockd grace
sunrpc fscache zfs(POE) zunicode(POE) zcommon(POE) znvpair(POE)
spl(OE) zavl(POE) btrfs xor raid6_pq dm_mirror dm_region_hash dm_log
hid_generic usbhid hid uas usb_storage radeon i2c_algo_bit ttm
drm_kms_helper drm r8169 ahci mii libahci wmi
Jun 18 23:43:24 Orion vmunix: [38084.330053]  [<ffffffffc048804a>] ?
__btrfs_map_block+0x32a/0x1180 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.330106]  [<ffffffffc0487fec>] ?
__btrfs_map_block+0x2cc/0x1180 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.330154]  [<ffffffffc0452c00>] ?
__btrfs_lookup_bio_sums.isra.8+0x3e0/0x540 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.330205]  [<ffffffffc048cf3d>]
btrfs_map_bio+0x7d/0x2b0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.330257]  [<ffffffffc04aaf64>]
btrfs_submit_compressed_read+0x484/0x4e0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.330304]  [<ffffffffc04651e1>]
btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.330361]  [<ffffffffc047fd90>] ?
btrfs_create_repair_bio+0xf0/0x110 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.330412]  [<ffffffffc047fec7>]
bio_readpage_error+0x117/0x180 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.330462]  [<ffffffffc04802f0>] ?
clean_io_failure+0x1b0/0x1b0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.330513]  [<ffffffffc04806ae>]
end_bio_extent_readpage+0x3be/0x3f0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.330568]  [<ffffffffc0457838>]
end_workqueue_fn+0x48/0x60 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.330618]  [<ffffffffc0490de2>]
normal_work_helper+0x82/0x210 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.330668]  [<ffffffffc0491042>]
btrfs_endio_helper+0x12/0x20 [btrfs]
Jun 18 23:43:24 Orion vmunix:  auth_rpcgss nfs_acl nfs lockd grace
sunrpc fscache zfs(POE) zunicode(POE) zcommon(POE) znvpair(POE)
spl(OE) zavl(POE) btrfs xor raid6_pq dm_mirror dm_region_hash dm_log
hid_generic usbhid hid uas usb_storage radeon i2c_algo_bit ttm
drm_kms_helper drm r8169 ahci mii libahci wmi
Jun 18 23:43:24 Orion vmunix: [38084.331102]  [<ffffffffc048804a>] ?
__btrfs_map_block+0x32a/0x1180 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.331152]  [<ffffffffc0487fec>] ?
__btrfs_map_block+0x2cc/0x1180 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.331202]  [<ffffffffc0452c00>] ?
__btrfs_lookup_bio_sums.isra.8+0x3e0/0x540 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.331255]  [<ffffffffc048cf3d>]
btrfs_map_bio+0x7d/0x2b0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.331310]  [<ffffffffc04aaf64>]
btrfs_submit_compressed_read+0x484/0x4e0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.331360]  [<ffffffffc04651e1>]
btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.331423]  [<ffffffffc047fd90>] ?
btrfs_create_repair_bio+0xf0/0x110 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.331477]  [<ffffffffc047fec7>]
bio_readpage_error+0x117/0x180 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.331530]  [<ffffffffc04802f0>] ?
clean_io_failure+0x1b0/0x1b0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.331585]  [<ffffffffc04806ae>]
end_bio_extent_readpage+0x3be/0x3f0 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.331649]  [<ffffffffc0457838>]
end_workqueue_fn+0x48/0x60 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.331703]  [<ffffffffc0490de2>]
normal_work_helper+0x82/0x210 [btrfs]
Jun 18 23:43:24 Orion vmunix: [38084.331757]  [<ffffffffc0491042>]
btrfs_endio_helper+0x12/0x20 [btrfs]
Jun 19 07:29:22 Orion vmunix: [    3.107113] Btrfs loaded
Jun 19 07:29:22 Orion vmunix: [    3.665536] BTRFS: device label
btrfs1 devid 2 transid 1086759 /dev/sdb2
Jun 19 07:29:22 Orion vmunix: [    3.665811] BTRFS: device label
btrfs1 devid 1 transid 1086759 /dev/sda2
Jun 19 07:29:22 Orion vmunix: [    8.673689] BTRFS info (device sda2):
disk space caching is enabled
Jun 19 07:29:22 Orion vmunix: [   28.190962] BTRFS info (device sda2):
enabling auto defrag
Jun 19 07:29:22 Orion vmunix: [   28.191039] BTRFS info (device sda2):
disk space caching is enabled

I notice that this page
https://btrfs.wiki.kernel.org/index.php/Gotchas mentions "Files with a
lot of random writes can become heavily fragmented (10000+ extents)
causing thrashing on HDDs and excessive multi-second spikes" and as
such I am wondering if this is related to the crashing. AFAIK rsync
should be creating the temp file in the destination drive (xfs),
unless there is some part of rsync that I am not understanding that
would be writing to the file system drive (btrfs)  which is also in
the case the source hdd (btrfs).

Can someone please help with these btrfs problems.

Thankyou



> My Linux Mint system is starting up and usable, however, I am unable
> to complete any scrub as they abort before finished. There are various
> inode errors in dmesg. Badblocks (readonly) finds no errors. checking
> extents gives bad block 5123372711936 on both /dev/sda2 and /dev/sda2.
> A btrfscheck (readonly) results in a 306MB text file of output of root
> xxx inode errors.
> There are two drives 3TB each in RAID 1 for sda2/sdb2 for which
> partition 2 is nearly the entire drive.
>
> Currently I am now using a Manjaro Live Boot with btrfs tools
> btrfs-progs v4.10.1 in an attempt to recover/repair what seems to be
> bitflip
> (The original Linux Mint System has btrfs-progs v4.5.3)
>
> When doing a scrub on '/', the status of /dev/sdb2 aborts always at ~
> 383GiB with 0 errors. Whereas the /dev/sda2 and thus the '/' scrub
> aborts at more diverse values starting at 537.90GiB with 0 errors.
>
> btrfs inspect-internal dump-tree -b 5123372711936 has one item
> evidently out of order:
> 2551224532992 -> 2551253647360 -> 2551251468288
>
> I am currently attempting to copy files off the system while in
> Manjaro using rsync prior to attempting whatever the knowlegable
> people here recommend. It has resulting in two files not being able to
> be read so far, however, a lot of messages in dmesg for btrfs errors
> https://ptpb.pw/L9Z9
>
> Pastebins from original machine:
> System specs as on original Linux Mint system: https://ptpb.pw/dFz3
> dmesg btrfs grep from prior to errors starting until scrub attempts:
> https://ptpb.pw/rTzs
>
> Pastebins from subsequent live boot with newer btrfs tools 4.10:
> LiveBoot Repair (Manjaro Arch) specs: https://ptpb.pw/ikMM
> Scrub failing/aborting at same place on /dev/sdb: https://ptpb.pw/-vcP
> badblock_extent_btrfscheck_5123372711936: https://ptpb.pw/T1rD
> 'btrfs inspect-internal dump-tree -b 5123372711936 /dev/sda2':
> https://ptpb.pw/zcyI
> 'btrfs inspect-internal dump-tree -b 5123372711936 /dev/sdb2':
> https://ptpb.pw/zcyI
> dmesg on Manjaro attempting to rsync recover files: https://ptpb.pw/L9Z9
>
> I have also just noticed that btrfs causes a complete system crash
> when copying a certain file(s).
>
> RTP: [<ffffffffa055b122>] btrfs_check_repairable+0xf2/0x100 [btrfs]
> BUG: unable to handle kernel paging request at 000000000dealc93
> IP: [<ffffffff810c3e0b>] __wake_up _common+0x2b/0x80
> I have taken a photo of the last part of this. As it is in the live
> DVD boot distro during repair, it did not save the trace to a file
> that would be accessible later.
> The photo is at https://pasteboard.co/MTP2OGMK.jpg
>
> Could someone please advise the steps to repair this.
>
> Thankyou
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Please help. Repair probably bitflip damage and suspected bug
@ 2017-06-17 23:06 Jesse
  0 siblings, 0 replies; 4+ messages in thread
From: Jesse @ 2017-06-17 23:06 UTC (permalink / raw)
  To: linux-btrfs

My Linux Mint system is starting up and usable, however, I am unable
to complete any scrub as they abort before finished. There are various
inode errors in dmesg. Badblocks (readonly) finds no errors. checking
extents gives bad block 5123372711936 on both /dev/sda2 and /dev/sda2.
A btrfscheck (readonly) results in a 306MB text file of output of root
xxx inode errors.
There are two drives 3TB each in RAID 1 for sda2/sdb2 for which
partition 2 is nearly the entire drive.

Currently I am now using a Manjaro Live Boot with btrfs tools
btrfs-progs v4.10.1 in an attempt to recover/repair what seems to be
bitflip
(The original Linux Mint System has btrfs-progs v4.5.3)

When doing a scrub on '/', the status of /dev/sdb2 aborts always at ~
383GiB with 0 errors. Whereas the /dev/sda2 and thus the '/' scrub
aborts at more diverse values starting at 537.90GiB with 0 errors.

btrfs inspect-internal dump-tree -b 5123372711936 has one item
evidently out of order:
2551224532992 -> 2551253647360 -> 2551251468288

I am currently attempting to copy files off the system while in
Manjaro using rsync prior to attempting whatever the knowlegable
people here recommend. It has resulting in two files not being able to
be read so far, however, a lot of messages in dmesg for btrfs errors
https://ptpb.pw/L9Z9

Pastebins from original machine:
System specs as on original Linux Mint system: https://ptpb.pw/dFz3
dmesg btrfs grep from prior to errors starting until scrub attempts:
https://ptpb.pw/rTzs

Pastebins from subsequent live boot with newer btrfs tools 4.10:
LiveBoot Repair (Manjaro Arch) specs: https://ptpb.pw/ikMM
Scrub failing/aborting at same place on /dev/sdb: https://ptpb.pw/-vcP
badblock_extent_btrfscheck_5123372711936: https://ptpb.pw/T1rD
'btrfs inspect-internal dump-tree -b 5123372711936 /dev/sda2':
https://ptpb.pw/zcyI
'btrfs inspect-internal dump-tree -b 5123372711936 /dev/sdb2':
https://ptpb.pw/zcyI
dmesg on Manjaro attempting to rsync recover files: https://ptpb.pw/L9Z9

I have also just noticed that btrfs causes a complete system crash
when copying a certain file(s).

RTP: [<ffffffffa055b122>] btrfs_check_repairable+0xf2/0x100 [btrfs]
BUG: unable to handle kernel paging request at 000000000dealc93
IP: [<ffffffff810c3e0b>] __wake_up _common+0x2b/0x80
I have taken a photo of the last part of this. As it is in the live
DVD boot distro during repair, it did not save the trace to a file
that would be accessible later.
The photo is at https://pasteboard.co/MTP2OGMK.jpg

Could someone please advise the steps to repair this.

Thankyou

Jesse

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-06-21  3:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-19 10:43 Please help. Repair probably bitflip damage and suspected bug Jesse
2017-06-21  3:55 ` Chris Murphy
  -- strict thread matches above, loose matches on Subject: below --
2017-06-19  9:19 Jesse
2017-06-17 23:06 Jesse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.