* btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair @ 2016-05-04 23:21 Niccolò Belli 2016-05-05 1:07 ` Chris Murphy 2016-05-05 4:12 ` Qu Wenruo 0 siblings, 2 replies; 25+ messages in thread From: Niccolò Belli @ 2016-05-04 23:21 UTC (permalink / raw) To: linux-btrfs I really need your help, because it's the second time btrfs ate my data in a couple of days and I can't use my laptop if I don't find the culprit. This was the mail I sent a couple of days ago: https://www.spinics.net/lists/linux-btrfs/msg54754.html I previously thought the culprit was a bug in kernel 4.6-rc, but I was wrong. Then I reinstalled the whole system (Arch Linux) from scratch, and after just two days I lost some of my data, again. Once again btrfs check --repair got stuck in an infinite loop and I can't repair my fs. The system has always been shutdown properly, except for a single time when I had to forcedly power it off just after the boot because I didn't see any signal on the screen. First the obvious things: - memory is ok (https://drive.google.com/open?id=0Bwe9Wtc-5xF1VnJ0SE9fT1FZMTg) - disk is ok (https://drive.google.com/open?id=0Bwe9Wtc-5xF1NGRhd2daVDRJVGc) - tlp has SATA_LINKPWR_ON_BAT=max_performance (https://drive.google.com/open?id=0Bwe9Wtc-5xF1dFAwUE5ETVpNWGM) - rootfs mount options: rw,noatime,compress=lzo,ssd,discard,space_cache,autodefrag,subvolid=257,subvol=/@ - Command line: BOOT_IMAGE=/@/boot/vmlinuz-linux root=UUID=4fc2278e-f6e8-4a21-8876-cabbf885bb2e rw rootflags=subvol=@ cryptdevice=/dev/disk/by-uuid/c7c8f501-507c-4bd2-a80a-8c7360651f02:cryptroot:allow-discards quiet - scrub didn't find any error: $ sudo btrfs scrub status / scrub status for 4fc2278e-f6e8-4a21-8876-cabbf885bb2e scrub started at Thu May 5 00:57:30 2016 and finished after 00:00:45 total bytes scrubbed: 22.26GiB with 0 errors I have the whole rootfs encrypted, including boot. I followed these steps: https://wiki.archlinux.org/index.php/Dm-crypt/Encrypting_an_entire_system#Btrfs_subvolumes_with_swap Disk is a SAMSUNG SSD PM851 M.2 2280 256GB (Firmware Version: EXT25D0Q). Laptop is a Dell XPS 13 9343 QHD+. Distro is Arch Linux, kernel version is 4.5.1. btrfs-progs is 4.5.2. After two days from the previous data loss I finished reinstalling my distro from scratch, then I decided to do a full backup from a snapshot using tar. This is what I got while trying to backup my data: tar: usr/share/kig/icons/hicolor/32x32/actions/test.png: errore di lettura al byte 0 leggendo 810 byte: Errore di input/output tar: usr/share/kig/icons/hicolor/32x32/actions/circlebpd.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/pointOnLine.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/bezierN.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/convexhull.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/centerofcurvature.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/en.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/circlebps.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/directrix.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/beziercurves.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/segment_midpoint.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/distance.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/circlebcl.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/conicb5p.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/kig_polygon.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/conicasymptotes.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/pointxy.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/attacher.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/coniclineintersection.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/vectorsum.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/rbezier4.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/ellipsebffp.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/angle.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/kig_text.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/vectordifference.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/segmentaxis.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/radicalline.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/polygonsides.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/projection.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/inversion.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/bezier4.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/equilateralhyperbolab4p.png: funzione "stat" non riuscita: Stale file handle tar: usr/share/kig/icons/hicolor/32x32/actions/areaCircle.png: funzione "stat" non riuscita: Stale file handle tar: var/lib/samba/private/msg.sock/666: socket ignorato tar: Uscita con stato di fallimento in base agli errori precedenti [ 3057.008185] BTRFS error (device dm-0): parent transid verify failed on 528089088 wanted 3458764513820541211 found 283 [ 3057.008195] BTRFS error (device dm-0): error loading props for ino 183988 (root 505): -5 [ 3057.008417] BTRFS error (device dm-0): parent transid verify failed on 528089088 wanted 3458764513820541211 found 283 [ 3057.008631] BTRFS error (device dm-0): parent transid verify failed on 528089088 wanted 3458764513820541211 found 283 [ 3057.009165] BTRFS error (device dm-0): parent transid verify failed on 528089088 wanted 3458764513820541211 found 283 [ 3057.009389] BTRFS error (device dm-0): parent transid verify failed on 528089088 wanted 3458764513820541211 found 283 [ 3057.009734] BTRFS error (device dm-0): parent transid verify failed on 528089088 wanted 3458764513820541211 found 283 [ 3057.009960] BTRFS error (device dm-0): parent transid verify failed on 528089088 wanted 3458764513820541211 found 283 [ 3057.010664] BTRFS error (device dm-0): parent transid verify failed on 528089088 wanted 3458764513820541211 found 283 [ 3057.010888] BTRFS error (device dm-0): parent transid verify failed on 528089088 wanted 3458764513820541211 found 283 [ 3057.011201] BTRFS error (device dm-0): parent transid verify failed on 528089088 wanted 3458764513820541211 found 283 [ 3331.795474] verify_parent_transid: 57 callbacks suppressed [ 3331.795480] BTRFS error (device dm-0): parent transid verify failed on 528089088 wanted 3458764513820541211 found 283 [ 3331.795776] BTRFS error (device dm-0): parent transid verify failed on 528089088 wanted 3458764513820541211 found 283 I made a copy of /dev/mapper/cryptroot with dd on an external drive and I run btrfs check on it (btrfs-progs 4.5.2): https://drive.google.com/open?id=0Bwe9Wtc-5xF1SjJacXpMMU5mems (37MB) Then I tried to run btrfs check --repair on it but once again it got stuck in an infinite loop like this one (https://www.spinics.net/lists/linux-btrfs/msg54146.html) and after an hour of looping and several hundreds of MBs of logs I had to kill it. Here is the log, truncated to 30MB: https://drive.google.com/open?id=0Bwe9Wtc-5xF1SmRuVUlfeGRES3M They are probably not needed but here is snapper -c @ list: https://drive.google.com/open?id=0Bwe9Wtc-5xF1N0llOFpfVXVwNVk and btrfs subvolume list -p /: https://drive.google.com/open?id=0Bwe9Wtc-5xF1andCdWZzeV9VbDg This is the link to the whole gdrive directory with all the logs: https://drive.google.com/open?id=0Bwe9Wtc-5xF1UFltcXhtRmt4YjA I really don't know what may be the problem, maybe discard? I can't think about switching back to ext4 and losing snapshots, transactions, compression, incremental send/receive backups etc. I would really love being able to do something to fix it, but I don't have the slightest idea about what's the problem. Hopefully someone here will be smarter than me and find the problem, otherwise I will have to switch to ext4 because I need my laptop to work. Thanks, Niccolò ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair 2016-05-04 23:21 btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair Niccolò Belli @ 2016-05-05 1:07 ` Chris Murphy 2016-05-05 10:36 ` Niccolò Belli 2016-05-05 4:12 ` Qu Wenruo 1 sibling, 1 reply; 25+ messages in thread From: Chris Murphy @ 2016-05-05 1:07 UTC (permalink / raw) To: Niccolò Belli; +Cc: Btrfs BTRFS On Wed, May 4, 2016 at 5:21 PM, Niccolò Belli <darkbasic@linuxsystems.it> wrote: > rw,noatime,compress=lzo,ssd,discard,space_cache,autodefrag,subvolid=257,subvol=/@ I suggest using defaults for starters. The only thing in that list that needs be there is either subvolid or subvold, not both. Add in the non-default options once you've proven the defaults are working, and add them one at a time. > I have the whole rootfs encrypted, including boot. I followed these steps: > https://wiki.archlinux.org/index.php/Dm-crypt/Encrypting_an_entire_system#Btrfs_subvolumes_with_swap > > Disk is a SAMSUNG SSD PM851 M.2 2280 256GB (Firmware Version: EXT25D0Q). The firmware is old if I understand the naming scheme used by Dell. It says EXT49D0Q is current. http://www.dell.com/support/home/al/en/aldhs1/Drivers/DriversDetails?driverId=0NXHH If you need to update, you may be best off doing a whole device trim, which is easiest done with mkfs.btrfs pointed at the whole device. I wouldn't trust any data on the drive after a firmware update so I'd start over entirely from scratch, new partition map, new everything. So the way to do this is: mkfs.btrfs /dev/sda wipefs -a /dev/sda That way the btrfs magic is removed, and now you can partition it, setup dmcrypt, etc. I advice using all defaults for everything for now, otherwise it's anyone's guess what you're running into. Off topic, but at least gmail users see your posts go to spam because your domain is configured to disallow relaying. Most mail services ignore this request by the domain but google honors it so no amount of training will make your email not spam. This is what's in your emails that's causing the problem: dmarc=fail (p=QUARANTINE dis=NONE) header.from=linuxsystems.it http://webmasters.stackexchange.com/questions/76765/sent-emails-pass-spf-and-dkim-but-fail-dmarc-when-received-by-gmail http://www.pcworld.com/article/2141120/yahoo-email-antispoofing-policy-breaks-mailing-lists.html -- Chris Murphy ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair 2016-05-05 1:07 ` Chris Murphy @ 2016-05-05 10:36 ` Niccolò Belli 2016-05-05 17:48 ` Omar Sandoval 0 siblings, 1 reply; 25+ messages in thread From: Niccolò Belli @ 2016-05-05 10:36 UTC (permalink / raw) To: Btrfs BTRFS; +Cc: Chris Murphy, Qu Wenruo On giovedì 5 maggio 2016 03:07:37 CEST, Chris Murphy wrote: > I suggest using defaults for starters. The only thing in that list > that needs be there is either subvolid or subvold, not both. Add in > the non-default options once you've proven the defaults are working, > and add them one at a time. Yes I read your previous suggestion and I already dropped subvolid, but since the problem already happened I left it in the mail for completeness. Anyway the culprit here is genfstab and that's probably what a beginner is going to use when installing a distro: https://wiki.archlinux.org/index.php/beginners'_guide#fstab >> Disk is a SAMSUNG SSD PM851 M.2 2280 256GB (Firmware Version: EXT25D0Q). > > The firmware is old if I understand the naming scheme used by Dell. It > says EXT49D0Q is current. > > http://www.dell.com/support/home/al/en/aldhs1/Drivers/DriversDetails?driverId=0NXHH According to this (http://forum.notebookreview.com/threads/2015-xps-13-ssd-fw-problem-with-m-2-samsung-pm851.770501/) the firmware you linked is for the mSATA version of the drive, not the M.2 one. EXT25D0Q seems to be the very latest one for my drive. > I advice using all defaults for everything for > now, otherwise it's anyone's guess what you're running into. On giovedì 5 maggio 2016 06:12:28 CEST, Qu Wenruo wrote: > Would it be OK for you to test your btrfs on a plain ssd, > without encryption? > And just as Chris Murphy said, reducing mount option is also a > pretty good debugging start point. Ok, I will remove dmcrypt, discard, compress=lzo, nodefrag and see what happens. >> I made a copy of /dev/mapper/cryptroot with dd on an external drive and >> I run btrfs check on it (btrfs-progs 4.5.2): >> https://drive.google.com/open?id=0Bwe9Wtc-5xF1SjJacXpMMU5mems (37MB) > > Checked, but seems the output is truncated? No, I didn't truncate the btrfs check output because it wasn't endless. I just truncated the repair output. I also have something new to report. Do you remember when I said that my screen was black and so I had to forcedly power off the system? Something similar happened today and since in the meantime I enabled magic sysrq keys I have been able to recover this from the logs: mag 05 11:55:51 arch-laptop kdeinit5[960]: Registering "org.kde.StatusNotifierItem-1060-1/StatusNotifierItem" to system tray mag 05 11:55:51 arch-laptop obexd[1098]: OBEX daemon 5.39 mag 05 11:55:51 arch-laptop dbus-daemon[920]: Successfully activated service 'org.bluez.obex' mag 05 11:55:51 arch-laptop systemd[898]: Started Bluetooth OBEX service. mag 05 11:55:51 arch-laptop korgac[1044]: log_kidentitymanagement: IdentityManager: There was no default identity. Marking first one as default. mag 05 11:55:51 arch-laptop kernel: BUG: unable to handle kernel paging request at 0000000000017d11 mag 05 11:55:51 arch-laptop kernel: IP: [<ffffffff81194f9f>] anon_vma_interval_tree_insert+0x3f/0x90 mag 05 11:55:51 arch-laptop kernel: PGD 0 mag 05 11:55:51 arch-laptop kernel: Oops: 0000 [#1] PREEMPT SMP mag 05 11:55:51 arch-laptop kernel: Modules linked in: rfcomm(+) visor bnep uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media btusb btrtl btbcm btintel cdc_ether bluetooth usbnet r8152 crc16 mii joydev mousedev nvr mag 05 11:55:51 arch-laptop kernel: mei_me syscopyarea sysfillrect snd sysimgblt fb_sys_fops i2c_algo_bit shpchp soundcore mei wmi thermal fan intel_hid sparse_keymap int3403_thermal video processor_thermal_device dw_dmac snd_soc_sst_acpi snd_soc_sst_m mag 05 11:55:51 arch-laptop kernel: lrw gf128mul glue_helper ablk_helper cryptd ahci libahci libata scsi_mod xhci_pci rtsx_pci mag 05 11:55:51 arch-laptop kernel: Bluetooth: RFCOMM TTY layer initialized mag 05 11:55:51 arch-laptop kernel: Bluetooth: RFCOMM socket layer initialized mag 05 11:55:51 arch-laptop kernel: Bluetooth: RFCOMM ver 1.11 mag 05 11:55:51 arch-laptop kernel: xhci_hcd mag 05 11:55:51 arch-laptop kernel: i8042 serio sdhci_acpi sdhci led_class mmc_core pl2303 mos7720 usbserial parport hid_generic usbhid hid usbcore usb_common mag 05 11:55:51 arch-laptop kernel: CPU: 0 PID: 351 Comm: systemd-udevd Not tainted 4.5.1-1-ARCH #1 mag 05 11:55:51 arch-laptop kernel: Hardware name: Dell Inc. XPS 13 9343/0F5KF3, BIOS A07 11/11/2015 mag 05 11:55:51 arch-laptop kernel: task: ffff88021347d580 ti: ffff880211f8c000 task.ti: ffff880211f8c000 mag 05 11:55:51 arch-laptop kernel: RIP: 0010:[<ffffffff81194f9f>] [<ffffffff81194f9f>] anon_vma_interval_tree_insert+0x3f/0x90 mag 05 11:55:51 arch-laptop kernel: RSP: 0018:ffff880211f8fd68 EFLAGS: 00010206 mag 05 11:55:51 arch-laptop kernel: RAX: ffff8800da2f4820 RBX: ffff8800bb59ce40 RCX: ffff8800da2f4830 mag 05 11:55:51 arch-laptop kernel: RDX: ffff8800da2f4828 RSI: ffff8800374404a0 RDI: ffff8800c58dfa40 mag 05 11:55:51 arch-laptop kernel: RBP: ffff880211f8fdb8 R08: 0000000000017c79 R09: 00000007f55e2059 mag 05 11:55:51 arch-laptop kernel: R10: 00000007f55e2053 R11: ffff8800c58dfa40 R12: ffff880037440460 mag 05 11:55:51 arch-laptop kernel: R13: ffff8800d9e27100 R14: ffff8800c58dfa40 R15: ffff880037440460 mag 05 11:55:51 arch-laptop kernel: FS: 00007f55e20537c0(0000) GS:ffff88021e400000(0000) knlGS:0000000000000000 mag 05 11:55:51 arch-laptop kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 mag 05 11:55:51 arch-laptop kernel: CR2: 0000000000017d11 CR3: 0000000211cd5000 CR4: 00000000003406f0 mag 05 11:55:51 arch-laptop kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 mag 05 11:55:51 arch-laptop kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 mag 05 11:55:51 arch-laptop kernel: Stack: mag 05 11:55:51 arch-laptop kernel: ffffffff811a90c8 0000000000000246 ffff880212d00900 ffff8800bb59ceb8 mag 05 11:55:51 arch-laptop kernel: ffff880212d00978 ffff8800bb59ce40 ffff880212d00900 0000000000000007 mag 05 11:55:51 arch-laptop kernel: 00007f55e2053a90 ffff8800d991e1c0 ffff880211f8fdf0 ffffffff811a9232 mag 05 11:55:51 arch-laptop kernel: Call Trace: mag 05 11:55:51 arch-laptop kernel: [<ffffffff811a90c8>] ? anon_vma_clone+0xc8/0x200 mag 05 11:55:51 arch-laptop kernel: [<ffffffff811a9232>] anon_vma_fork+0x32/0x140 mag 05 11:55:51 arch-laptop kernel: [<ffffffff8107742d>] copy_process.part.8+0xcdd/0x1890 mag 05 11:55:51 arch-laptop kernel: [<ffffffff8107819f>] _do_fork+0xcf/0x3c0 mag 05 11:55:51 arch-laptop kernel: [<ffffffff81078539>] SyS_clone+0x19/0x20 mag 05 11:55:51 arch-laptop kernel: [<ffffffff815ad6ae>] entry_SYSCALL_64_fastpath+0x12/0x6d mag 05 11:55:51 arch-laptop kernel: Code: 01 4c 8b 91 98 00 00 00 31 c9 48 c1 e8 0c 4d 8d 4c 02 ff eb 24 4c 3b 48 18 76 04 4c 89 48 18 4c 8b 40 e0 48 8d 48 10 48 8d 50 08 <4d> 3b 90 98 00 00 00 48 0f 42 d1 48 89 c1 48 8b 02 48 85 c0 75 mag 05 11:55:51 arch-laptop kernel: RIP [<ffffffff81194f9f>] anon_vma_interval_tree_insert+0x3f/0x90 mag 05 11:55:52 arch-laptop kernel: RSP <ffff880211f8fd68> mag 05 11:55:52 arch-laptop kernel: CR2: 0000000000017d11 mag 05 11:55:52 arch-laptop kernel: ---[ end trace 6a392d6afbffe7f5 ]--- [...] mag 05 11:55:52 arch-laptop dbus[584]: [system] Activating via systemd: service name='org.freedesktop.ColorManager' unit='colord.service' mag 05 11:55:52 arch-laptop kernel: BTRFS critical (device dm-0): unable to find logical 2330894282579755008 len 4096 mag 05 11:55:52 arch-laptop kernel: ------------[ cut here ]------------ mag 05 11:55:52 arch-laptop kernel: kernel BUG at fs/btrfs/inode.c:1828! mag 05 11:55:52 arch-laptop kernel: invalid opcode: 0000 [#2] PREEMPT SMP mag 05 11:55:52 arch-laptop kernel: Modules linked in: rfcomm visor bnep uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media btusb btrtl btbcm btintel cdc_ether bluetooth usbnet r8152 crc16 mii joydev mousedev nvram mag 05 11:55:52 arch-laptop kernel: mei_me syscopyarea sysfillrect snd sysimgblt fb_sys_fops i2c_algo_bit shpchp soundcore mei wmi thermal fan intel_hid sparse_keymap int3403_thermal video processor_thermal_device dw_dmac snd_soc_sst_acpi snd_soc_sst_m mag 05 11:55:52 arch-laptop kernel: lrw gf128mul glue_helper ablk_helper cryptd ahci libahci libata scsi_mod xhci_pci rtsx_pci xhci_hcd i8042 serio sdhci_acpi sdhci led_class mmc_core pl2303 mos7720 usbserial parport hid_generic usbhid hid usbcore usb_ mag 05 11:55:52 arch-laptop kernel: CPU: 3 PID: 1028 Comm: plasmashell Tainted: G D 4.5.1-1-ARCH #1 mag 05 11:55:52 arch-laptop kernel: Hardware name: Dell Inc. XPS 13 9343/0F5KF3, BIOS A07 11/11/2015 mag 05 11:55:52 arch-laptop kernel: task: ffff8800d9e2aac0 ti: ffff8801f5900000 task.ti: ffff8801f5900000 mag 05 11:55:52 arch-laptop kernel: RIP: 0010:[<ffffffffa02ddabb>] [<ffffffffa02ddabb>] btrfs_merge_bio_hook+0x8b/0xa0 [btrfs] mag 05 11:55:52 arch-laptop kernel: RSP: 0018:ffff8801f5903938 EFLAGS: 00010282 mag 05 11:55:52 arch-laptop kernel: RAX: 00000000ffffffea RBX: 0000000000001000 RCX: 0000000000000051 mag 05 11:55:52 arch-laptop kernel: RDX: 0000000000000000 RSI: ffff88021e58db38 RDI: 0000000000000000 mag 05 11:55:52 arch-laptop kernel: RBP: ffff8801f5903958 R08: 0000000000070aad R09: 0000000000000368 mag 05 11:55:52 arch-laptop kernel: R10: 00102c80000d13e8 R11: 0000000000000368 R12: 0000000000001000 mag 05 11:55:52 arch-laptop kernel: R13: ffff8801e205ee28 R14: 0000000000000000 R15: ffffea000788d580 mag 05 11:55:52 arch-laptop kernel: FS: 00007fe8e688a800(0000) GS:ffff88021e580000(0000) knlGS:0000000000000000 mag 05 11:55:52 arch-laptop kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 mag 05 11:55:52 arch-laptop kernel: CR2: 00007fe8d14b5cbc CR3: 00000000bf57f000 CR4: 00000000003406e0 mag 05 11:55:52 arch-laptop kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 mag 05 11:55:52 arch-laptop kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 mag 05 11:55:52 arch-laptop kernel: Stack: mag 05 11:55:52 arch-laptop kernel: 0000000000001000 0000000095d6c394 0000000000001000 ffff8801f5903bc0 mag 05 11:55:52 arch-laptop kernel: ffff8801f59039b0 ffffffffa02fbd03 0000000000000000 00102c80000d13e8 mag 05 11:55:52 arch-laptop kernel: 0000002000000000 ffff8800da874040 0000000000000000 ffffea000788d580 mag 05 11:55:52 arch-laptop kernel: Call Trace: mag 05 11:55:52 arch-laptop kernel: [<ffffffffa02fbd03>] submit_extent_page+0xc3/0x230 [btrfs] mag 05 11:55:52 arch-laptop kernel: [<ffffffffa02fd02a>] __do_readpage+0x3aa/0x990 [btrfs] mag 05 11:55:52 arch-laptop kernel: [<ffffffffa02fb450>] ? btrfs_create_repair_bio+0x100/0x100 [btrfs] mag 05 11:55:52 arch-laptop kernel: [<ffffffffa02d0cf0>] ? free_root_pointers+0x70/0x70 [btrfs] mag 05 11:55:52 arch-laptop kernel: [<ffffffffa02fd6f6>] __extent_read_full_page+0xe6/0x100 [btrfs] mag 05 11:55:52 arch-laptop kernel: [<ffffffffa02d0cf0>] ? free_root_pointers+0x70/0x70 [btrfs] mag 05 11:55:52 arch-laptop kernel: [<ffffffffa02ff489>] read_extent_buffer_pages+0x179/0x330 [btrfs] mag 05 11:55:52 arch-laptop kernel: [<ffffffffa02d0cf0>] ? free_root_pointers+0x70/0x70 [btrfs] mag 05 11:55:52 arch-laptop kernel: [<ffffffffa02d26fc>] btree_read_extent_buffer_pages.constprop.19+0xac/0x110 [btrfs] mag 05 11:55:52 arch-laptop kernel: [<ffffffffa02d2cfd>] read_tree_block+0x3d/0x70 [btrfs] mag 05 11:55:52 arch-laptop kernel: [<ffffffffa02b1b49>] read_block_for_search.isra.14+0x139/0x330 [btrfs] mag 05 11:55:52 arch-laptop kernel: [<ffffffffa02b72e5>] btrfs_next_old_leaf+0x245/0x420 [btrfs] mag 05 11:55:52 arch-laptop kernel: [<ffffffffa02b74d0>] btrfs_next_leaf+0x10/0x20 [btrfs] mag 05 11:55:52 arch-laptop kernel: [<ffffffffa02dc564>] btrfs_real_readdir+0x144/0x5f0 [btrfs] mag 05 11:55:52 arch-laptop kernel: [<ffffffff81200492>] iterate_dir+0x92/0x120 mag 05 11:55:52 arch-laptop kernel: [<ffffffff81200939>] SyS_getdents+0x99/0x110 mag 05 11:55:52 arch-laptop kernel: [<ffffffff812005f0>] ? fillonedir+0xd0/0xd0 mag 05 11:55:52 arch-laptop kernel: [<ffffffff815ad6ae>] entry_SYSCALL_64_fastpath+0x12/0x6d mag 05 11:55:52 arch-laptop kernel: Code: 8b 80 38 fe ff ff 4c 89 65 e0 48 8b 80 f0 01 00 00 48 89 c7 e8 77 ac 02 00 85 c0 78 0e 31 c0 4c 01 e3 48 3b 5d e0 0f 97 c0 eb 9a <0f> 0b e8 5e b1 d9 e0 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 mag 05 11:55:52 arch-laptop kernel: RIP [<ffffffffa02ddabb>] btrfs_merge_bio_hook+0x8b/0xa0 [btrfs] mag 05 11:55:52 arch-laptop kernel: RSP <ffff8801f5903938> mag 05 11:55:52 arch-laptop kernel: ---[ end trace 6a392d6afbffe7f6 ]--- On giovedì 5 maggio 2016 03:07:37 CEST, Chris Murphy wrote: > Off topic, but at least gmail users see your posts go to spam > dmarc=fail (p=QUARANTINE dis=NONE) header.from=linuxsystems.it Thanks for reporting, I changed my dmarc DNS entry from quarantine to none. I previously used reject and I hoped that quarantine was enough of a middle ground to survive spam filters, but it seems I will have to get rid of dmarc altogether. Thanks, Niccolò ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair 2016-05-05 10:36 ` Niccolò Belli @ 2016-05-05 17:48 ` Omar Sandoval 2016-05-06 11:38 ` Niccolò Belli 0 siblings, 1 reply; 25+ messages in thread From: Omar Sandoval @ 2016-05-05 17:48 UTC (permalink / raw) To: Niccolò Belli; +Cc: Btrfs BTRFS, Chris Murphy, Qu Wenruo On Thu, May 05, 2016 at 12:36:52PM +0200, Niccolò Belli wrote: > On giovedì 5 maggio 2016 03:07:37 CEST, Chris Murphy wrote: > > I suggest using defaults for starters. The only thing in that list > > that needs be there is either subvolid or subvold, not both. Add in > > the non-default options once you've proven the defaults are working, > > and add them one at a time. > > Yes I read your previous suggestion and I already dropped subvolid, but > since the problem already happened I left it in the mail for completeness. > Anyway the culprit here is genfstab and that's probably what a beginner is > going to use when installing a distro: > https://wiki.archlinux.org/index.php/beginners'_guide#fstab > The redundant subvolid doesn't hurt, the kernel will just check that it matches the passed subvol (see [1]). genfstab probably just pulls the options out of /proc/mounts or /proc/self/mountinfo, and since we show both, that's how it gets in fstab. If it was actually a problem, there would be a clear message in dmesg. 1: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=bb289b7be62db84b9630ce00367444c810cada2c -- Omar ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair 2016-05-05 17:48 ` Omar Sandoval @ 2016-05-06 11:38 ` Niccolò Belli 2016-05-07 15:45 ` Niccolò Belli 0 siblings, 1 reply; 25+ messages in thread From: Niccolò Belli @ 2016-05-06 11:38 UTC (permalink / raw) To: Btrfs BTRFS; +Cc: Chris Murphy, Qu Wenruo, Omar Sandoval I formatted the partition and copied the content of my previous rootfs to it. There is no dmcrypt now and mount options are defaults, except for noatime. After a single boot I got the very same problem as before (fs corrupted and an infinite loop when doing btrfs check --repair. I wanted to replicate results and so I tried once again and since then I only experienced minor corruption, correctly resolved by repair. But during a pacaman upgrade, which triggered snapper pre-post snapshots, the system hanged and I found this in the logs: mag 06 10:31:15 arch-laptop plasmashell[873]: requesting unexisting screen 2 mag 06 10:31:18 arch-laptop dbus[418]: [system] Activating service name='org.opensuse.Snapper' (using servicehelper) mag 06 10:31:18 arch-laptop dbus[418]: [system] Successfully activated service 'org.opensuse.Snapper' mag 06 10:31:20 arch-laptop kernel: ------------[ cut here ]------------ mag 06 10:31:20 arch-laptop kernel: kernel BUG at fs/btrfs/ctree.h:2693! Still no major corruption found since my second attempt. Niccolò ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair 2016-05-06 11:38 ` Niccolò Belli @ 2016-05-07 15:45 ` Niccolò Belli 2016-05-07 15:58 ` Clemens Eisserer 2016-05-07 23:35 ` Chris Murphy 0 siblings, 2 replies; 25+ messages in thread From: Niccolò Belli @ 2016-05-07 15:45 UTC (permalink / raw) To: Btrfs BTRFS; +Cc: Chris Murphy, Qu Wenruo, Omar Sandoval btrfs + dmcrypt + compress=lzo + autodefrag = corruption at first boot So discard is not the culprit. Will try to remove compress=lzo and autodefrag and see if it still happens. [ 748.224346] BTRFS error (device dm-0): memmove bogus src_offset 5431 move len 4294962894 len 16384 [ 748.226206] ------------[ cut here ]------------ [ 748.227831] kernel BUG at fs/btrfs/extent_io.c:5723! [ 748.229498] invalid opcode: 0000 [#1] PREEMPT SMP [ 748.231161] Modules linked in: ext4 mbcache jbd2 nls_iso8859_1 nls_cp437 vfat fat snd_hda_codec_hdmi dell_laptop dcdbas dell_wmi iTCO_wdt iTCO_vendor_support intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel arc4 kvm irqbypass psmouse serio_raw pcspkr elan_i2c snd_soc_ssm4567 snd_soc_rt286 snd_soc_rl6347a snd_soc_core i2c_hid iwlmvm snd_compress snd_pcm_dmaengine ac97_bus mac80211 uvcvideo videobuf2_vmalloc btusb videobuf2_memops cdc_ether btrtl usbnet iwlwifi btbcm videobuf2_v4l2 btintel intel_pch_thermal videobuf2_core i2c_i801 videodev r8152 rtsx_pci_ms cfg80211 bluetooth visor media mii memstick joydev evdev mousedev input_leds rfkill mac_hid crc16 i915 fan thermal wmi dw_dmac int3403_thermal video dw_dmac_core drm_kms_helper snd_soc_sst_acpi i2c_designware_platform snd_soc_sst_match [ 748.237203] snd_hda_intel 8250_dw i2c_designware_core gpio_lynxpoint spi_pxa2xx_platform drm int3402_thermal snd_hda_codec battery tpm_crb intel_hid snd_hda_core sparse_keymap fjes snd_hwdep int3400_thermal acpi_thermal_rel tpm_tis snd_pcm intel_gtt tpm acpi_als syscopyarea sysfillrect snd_timer sysimgblt fb_sys_fops mei_me i2c_algo_bit processor_thermal_device kfifo_buf processor snd industrialio acpi_pad ac int340x_thermal_zone mei intel_soc_dts_iosf button lpc_ich soundcore shpchp sch_fq_codel ip_tables x_tables btrfs xor raid6_pq jitterentropy_rng sha256_ssse3 sha256_generic hmac drbg ansi_cprng algif_skcipher af_alg uas usb_storage dm_crypt dm_mod sd_mod rtsx_pci_sdmmc atkbd libps2 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper [ 748.244176] ablk_helper cryptd ahci libahci libata scsi_mod xhci_pci rtsx_pci xhci_hcd i8042 serio sdhci_acpi sdhci led_class mmc_core pl2303 mos7720 usbserial parport hid_generic usbhid hid usbcore usb_common [ 748.246662] CPU: 0 PID: 2316 Comm: pacman Not tainted 4.5.1-1-ARCH #1 [ 748.249123] Hardware name: Dell Inc. XPS 13 9343/0F5KF3, BIOS A07 11/11/2015 [ 748.251576] task: ffff8800d9d98e40 ti: ffff8800cec10000 task.ti: ffff8800cec10000 [ 748.254064] RIP: 0010:[<ffffffffa0300bac>] [<ffffffffa0300bac>] memmove_extent_buffer+0x10c/0x110 [btrfs] [ 748.256600] RSP: 0018:ffff8800cec13c18 EFLAGS: 00010246 [ 748.259120] RAX: 0000000000000000 RBX: ffff88020c01ba40 RCX: 0000000000000056 [ 748.261631] RDX: 0000000000000000 RSI: ffff88021e40db38 RDI: ffff88021e40db38 [ 748.264166] RBP: ffff8800cec13c48 R08: 0000000000000000 R09: 000000000000033b [ 748.266716] R10: 0000000000000000 R11: 000000000000033b R12: 00000000ffffeece [ 748.269267] R13: 0000000100000405 R14: 00000001000004c9 R15: ffff88020c01ba40 [ 748.271818] FS: 00007f14d4271740(0000) GS:ffff88021e400000(0000) knlGS:0000000000000000 [ 748.274392] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 748.276987] CR2: 0000000001630008 CR3: 00000000cffc8000 CR4: 00000000003406f0 [ 748.279603] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 748.282220] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 748.284815] Stack: [ 748.287422] 00000000e3438cd2 ffff88020c01ba40 00000000000000c4 000000000000002a [ 748.290082] 000000000000006b 00000000000003a0 ffff8800cec13ce8 ffffffffa02b612c [ 748.292754] ffffffffa02b433d ffff8800da9ca820 0000002800000000 ffff8800daa78bd0 [ 748.295441] Call Trace: [ 748.298104] [<ffffffffa02b612c>] btrfs_del_items+0x33c/0x4a0 [btrfs] [ 748.300827] [<ffffffffa02b433d>] ? btrfs_search_slot+0x90d/0x990 [btrfs] [ 748.303564] [<ffffffffa02f3d9c>] ? btrfs_get_token_8+0x6c/0x130 [btrfs] [ 748.306311] [<ffffffffa02e5ca9>] btrfs_truncate_inode_items+0x649/0xd20 [btrfs] [ 748.309071] [<ffffffffa0330b5e>] ? btrfs_delayed_inode_release_metadata.isra.1+0x4e/0xf0 [btrfs] [ 748.311860] [<ffffffffa02e7315>] btrfs_evict_inode+0x485/0x5d0 [btrfs] [ 748.314627] [<ffffffff81207e55>] evict+0xc5/0x190 [ 748.317412] [<ffffffff81208689>] iput+0x1d9/0x260 [ 748.320199] [<ffffffff811fd689>] do_unlinkat+0x199/0x2d0 [ 748.322988] [<ffffffff811fdf66>] SyS_unlink+0x16/0x20 [ 748.325781] [<ffffffff815ad6ae>] entry_SYSCALL_64_fastpath+0x12/0x6d [ 748.328584] Code: 41 5e 41 5f 5d c3 48 8b 7f 18 48 89 f2 48 c7 c6 40 44 36 a0 e8 06 90 fa ff 0f 0b 48 8b 7f 18 48 c7 c6 08 44 36 a0 e8 f4 8f fa ff <0f> 0b 66 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 89 fb [ 748.331558] RIP [<ffffffffa0300bac>] memmove_extent_buffer+0x10c/0x110 [btrfs] [ 748.334473] RSP <ffff8800cec13c18> [ 748.356077] ---[ end trace 9bfb28800ab52273 ]--- [ 748.359042] note: pacman[2316] exited with preempt_count 2 ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair 2016-05-07 15:45 ` Niccolò Belli @ 2016-05-07 15:58 ` Clemens Eisserer 2016-05-07 16:11 ` Niccolò Belli 2016-05-07 23:35 ` Chris Murphy 1 sibling, 1 reply; 25+ messages in thread From: Clemens Eisserer @ 2016-05-07 15:58 UTC (permalink / raw) To: Niccolò Belli, linux-btrfs Hi Niccolo, > btrfs + dmcrypt + compress=lzo + autodefrag = corruption at first boot Just to be curious - couldn't it be a hardware issue? I use almost the same setup (compress-force=lzo instead of compress-force=lzo) on my laptop for 2-3 years and haven't experienced any issues since ~kernel-3.14 or so. Br, Clemens Eisserer 2016-05-07 17:45 GMT+02:00 Niccolò Belli <darkbasic@linuxsystems.it>: > btrfs + dmcrypt + compress=lzo + autodefrag = corruption at first boot > So discard is not the culprit. Will try to remove compress=lzo and > autodefrag and see if it still happens. > > [ 748.224346] BTRFS error (device dm-0): memmove bogus src_offset 5431 move > len 4294962894 len 16384 > [ 748.226206] ------------[ cut here ]------------ > [ 748.227831] kernel BUG at fs/btrfs/extent_io.c:5723! > [ 748.229498] invalid opcode: 0000 [#1] PREEMPT SMP > [ 748.231161] Modules linked in: ext4 mbcache jbd2 nls_iso8859_1 nls_cp437 > vfat fat snd_hda_codec_hdmi dell_laptop dcdbas dell_wmi iTCO_wdt > iTCO_vendor_support intel_rapl x86_pkg_temp_thermal intel_powerclamp > coretemp kvm_intel arc4 kvm irqbypass psmouse serio_raw pcspkr elan_i2c > snd_soc_ssm4567 snd_soc_rt286 snd_soc_rl6347a snd_soc_core i2c_hid iwlmvm > snd_compress snd_pcm_dmaengine ac97_bus mac80211 uvcvideo videobuf2_vmalloc > btusb videobuf2_memops cdc_ether btrtl usbnet iwlwifi btbcm videobuf2_v4l2 > btintel intel_pch_thermal videobuf2_core i2c_i801 videodev r8152 rtsx_pci_ms > cfg80211 bluetooth visor media mii memstick joydev evdev mousedev input_leds > rfkill mac_hid crc16 i915 fan thermal wmi dw_dmac int3403_thermal video > dw_dmac_core drm_kms_helper snd_soc_sst_acpi i2c_designware_platform > snd_soc_sst_match > [ 748.237203] snd_hda_intel 8250_dw i2c_designware_core gpio_lynxpoint > spi_pxa2xx_platform drm int3402_thermal snd_hda_codec battery tpm_crb > intel_hid snd_hda_core sparse_keymap fjes snd_hwdep int3400_thermal > acpi_thermal_rel tpm_tis snd_pcm intel_gtt tpm acpi_als syscopyarea > sysfillrect snd_timer sysimgblt fb_sys_fops mei_me i2c_algo_bit > processor_thermal_device kfifo_buf processor snd industrialio acpi_pad ac > int340x_thermal_zone mei intel_soc_dts_iosf button lpc_ich soundcore shpchp > sch_fq_codel ip_tables x_tables btrfs xor raid6_pq jitterentropy_rng > sha256_ssse3 sha256_generic hmac drbg ansi_cprng algif_skcipher af_alg uas > usb_storage dm_crypt dm_mod sd_mod rtsx_pci_sdmmc atkbd libps2 > crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel > aes_x86_64 lrw gf128mul glue_helper > [ 748.244176] ablk_helper cryptd ahci libahci libata scsi_mod xhci_pci > rtsx_pci xhci_hcd i8042 serio sdhci_acpi sdhci led_class mmc_core pl2303 > mos7720 usbserial parport hid_generic usbhid hid usbcore usb_common > [ 748.246662] CPU: 0 PID: 2316 Comm: pacman Not tainted 4.5.1-1-ARCH #1 > [ 748.249123] Hardware name: Dell Inc. XPS 13 9343/0F5KF3, BIOS A07 > 11/11/2015 > [ 748.251576] task: ffff8800d9d98e40 ti: ffff8800cec10000 task.ti: > ffff8800cec10000 > [ 748.254064] RIP: 0010:[<ffffffffa0300bac>] [<ffffffffa0300bac>] > memmove_extent_buffer+0x10c/0x110 [btrfs] > [ 748.256600] RSP: 0018:ffff8800cec13c18 EFLAGS: 00010246 > [ 748.259120] RAX: 0000000000000000 RBX: ffff88020c01ba40 RCX: > 0000000000000056 > [ 748.261631] RDX: 0000000000000000 RSI: ffff88021e40db38 RDI: > ffff88021e40db38 > [ 748.264166] RBP: ffff8800cec13c48 R08: 0000000000000000 R09: > 000000000000033b > [ 748.266716] R10: 0000000000000000 R11: 000000000000033b R12: > 00000000ffffeece > [ 748.269267] R13: 0000000100000405 R14: 00000001000004c9 R15: > ffff88020c01ba40 > [ 748.271818] FS: 00007f14d4271740(0000) GS:ffff88021e400000(0000) > knlGS:0000000000000000 > [ 748.274392] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 748.276987] CR2: 0000000001630008 CR3: 00000000cffc8000 CR4: > 00000000003406f0 > [ 748.279603] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 748.282220] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > [ 748.284815] Stack: > [ 748.287422] 00000000e3438cd2 ffff88020c01ba40 00000000000000c4 > 000000000000002a > [ 748.290082] 000000000000006b 00000000000003a0 ffff8800cec13ce8 > ffffffffa02b612c > [ 748.292754] ffffffffa02b433d ffff8800da9ca820 0000002800000000 > ffff8800daa78bd0 > [ 748.295441] Call Trace: > [ 748.298104] [<ffffffffa02b612c>] btrfs_del_items+0x33c/0x4a0 [btrfs] > [ 748.300827] [<ffffffffa02b433d>] ? btrfs_search_slot+0x90d/0x990 [btrfs] > [ 748.303564] [<ffffffffa02f3d9c>] ? btrfs_get_token_8+0x6c/0x130 [btrfs] > [ 748.306311] [<ffffffffa02e5ca9>] btrfs_truncate_inode_items+0x649/0xd20 > [btrfs] > [ 748.309071] [<ffffffffa0330b5e>] ? > btrfs_delayed_inode_release_metadata.isra.1+0x4e/0xf0 [btrfs] > [ 748.311860] [<ffffffffa02e7315>] btrfs_evict_inode+0x485/0x5d0 [btrfs] > [ 748.314627] [<ffffffff81207e55>] evict+0xc5/0x190 > [ 748.317412] [<ffffffff81208689>] iput+0x1d9/0x260 > [ 748.320199] [<ffffffff811fd689>] do_unlinkat+0x199/0x2d0 > [ 748.322988] [<ffffffff811fdf66>] SyS_unlink+0x16/0x20 > [ 748.325781] [<ffffffff815ad6ae>] entry_SYSCALL_64_fastpath+0x12/0x6d > [ 748.328584] Code: 41 5e 41 5f 5d c3 48 8b 7f 18 48 89 f2 48 c7 c6 40 44 > 36 a0 e8 06 90 fa ff 0f 0b 48 8b 7f 18 48 c7 c6 08 44 36 a0 e8 f4 8f fa ff > <0f> 0b 66 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 89 fb > [ 748.331558] RIP [<ffffffffa0300bac>] memmove_extent_buffer+0x10c/0x110 > [btrfs] > [ 748.334473] RSP <ffff8800cec13c18> > [ 748.356077] ---[ end trace 9bfb28800ab52273 ]--- > [ 748.359042] note: pacman[2316] exited with preempt_count 2 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair 2016-05-07 15:58 ` Clemens Eisserer @ 2016-05-07 16:11 ` Niccolò Belli 2016-05-08 18:27 ` Patrik Lundquist 2016-05-09 11:52 ` Austin S. Hemmelgarn 0 siblings, 2 replies; 25+ messages in thread From: Niccolò Belli @ 2016-05-07 16:11 UTC (permalink / raw) To: linux-btrfs; +Cc: Clemens Eisserer Il 2016-05-07 17:58 Clemens Eisserer ha scritto: > Hi Niccolo, > >> btrfs + dmcrypt + compress=lzo + autodefrag = corruption at first boot > > Just to be curious - couldn't it be a hardware issue? I use almost the > same setup (compress-force=lzo instead of compress-force=lzo) on my > laptop for 2-3 years and haven't experienced any issues since > ~kernel-3.14 or so. > > Br, Clemens Eisserer Hi, Which kind of hardware issue? I did a full memtest86 check, a full smartmontools extended check and even a badblocks -wsv. If this is really an hardware issue that we can identify I would be more than happy because Dell will replace my laptop and this nightmare will be finally over. I'm open to suggestions. Niccolò ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair 2016-05-07 16:11 ` Niccolò Belli @ 2016-05-08 18:27 ` Patrik Lundquist 2016-05-09 11:52 ` Austin S. Hemmelgarn 1 sibling, 0 replies; 25+ messages in thread From: Patrik Lundquist @ 2016-05-08 18:27 UTC (permalink / raw) To: Niccolò Belli; +Cc: linux-btrfs On 7 May 2016 at 18:11, Niccolò Belli <darkbasic@linuxsystems.it> wrote: > Which kind of hardware issue? I did a full memtest86 check, a full smartmontools extended check and even a badblocks -wsv. > If this is really an hardware issue that we can identify I would be more than happy because Dell will replace my laptop and this nightmare will be finally over. I'm open to suggestions. Well, your hardware differs from a lot of successful installations. Are you using any power management tweaks? ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair 2016-05-07 16:11 ` Niccolò Belli 2016-05-08 18:27 ` Patrik Lundquist @ 2016-05-09 11:52 ` Austin S. Hemmelgarn 2016-05-09 14:53 ` Niccolò Belli 1 sibling, 1 reply; 25+ messages in thread From: Austin S. Hemmelgarn @ 2016-05-09 11:52 UTC (permalink / raw) To: Niccolò Belli, linux-btrfs; +Cc: Clemens Eisserer On 2016-05-07 12:11, Niccolò Belli wrote: > Il 2016-05-07 17:58 Clemens Eisserer ha scritto: >> Hi Niccolo, >> >>> btrfs + dmcrypt + compress=lzo + autodefrag = corruption at first boot >> >> Just to be curious - couldn't it be a hardware issue? I use almost the >> same setup (compress-force=lzo instead of compress-force=lzo) on my >> laptop for 2-3 years and haven't experienced any issues since >> ~kernel-3.14 or so. >> >> Br, Clemens Eisserer > > Hi, > Which kind of hardware issue? I did a full memtest86 check, a full > smartmontools extended check and even a badblocks -wsv. > If this is really an hardware issue that we can identify I would be more > than happy because Dell will replace my laptop and this nightmare will > be finally over. I'm open to suggestions. First, some general advice: 1. It is fully possible to have bad RAM that still passes memtest86 consistently, and in fact, most of the time this will be the case (if you're seeing any thing other than the bit-fade test in memtest86 fail, then your system probably won't boot fully). Memtest doesn't replicate typical usage patterns very well. My usual testing for RAM involves not just memtest, but also booting into a LiveCD (usually SystemRescueCD), pulling down a copy of the kernel source, and then running as many concurrent kernel builds as cores, each with as many make jobs as cores (so if you've got a quad core CPU (or a dual core with hyperthreading), it would be running 4 builds with -j4 passed to make). GCC seems to have memory usage patterns that reliably trigger memory errors that aren't caught by memtest, so this generally gives good results. Secondarily, if it's a big system and I am not pressed for time, I do a quick Gentoo install with Xen, and then spin up twice as many Xen VM's as cores and run memtest in those concurrently (this seems to catch things a bit more reliably than just a plain memtest). 2. On a similar note, badblocks doesn't replicate filesystem like access patterns, it just runs sequentially through the entire disk. This isn't as likely to give bad results, but it's still important to know. In particular, try running it over a dmcrypt volume a couple of times (preferably with a different key each time, pulling keys from /dev/urandom works well for this), as that will result in writing different data. For what it's worth, when I'm doing initial testing of new disks, I always use ddrescue to copy /dev/zero over the whole disk, then do it twice through dmcrypt with different keys, copying from the disk to /dev/null after each pass. This gives random data on disk as a starting point (which is good if you're going to use dmcrypt), and usually triggers reallocation of any bad sectors as early as possible. If I have time and access to an existing system I can connect the disk to, I often do testing with fio as well. Now, to slightly more specific advice: 1. If you have an eSATA port, try plugging your hard disk in there and see if things work. If that works but having the hard drive plugged in internally doesn't, then the issue is probably either that specific SATA port (in which case your chip-set is bad and you should get a new system), or the SATA connector itself (or the wiring, but that's not as likely when it's traces on a PCB). Normally I'd suggest just swapping cables and SATA ports, but that's not really possible with a laptop. 2. If you have access to a reasonably large flash drive, or to a USB to SATA adapter, try that as well, if it works on that but not internally (or on an eSATA port), you've probably got a bad SATA controller, and should get a new system. 3. Try things without dmcrypt. Adding extra layers makes it harder to determine what is actually wrong. If it works without dmcrypt, try using different parameters for the encryption (different ciphers is what I would try first). If it works reliably without dmcrypt, then it's either a bug in dmcrypt (which I don't think is very likely), or it's bad interaction between dmcrypt and BTRFS. If it works with some encryption parameters but not others, then that will help narrow down where the issue is. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair 2016-05-09 11:52 ` Austin S. Hemmelgarn @ 2016-05-09 14:53 ` Niccolò Belli 2016-05-09 16:29 ` Zygo Blaxell ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: Niccolò Belli @ 2016-05-09 14:53 UTC (permalink / raw) To: linux-btrfs Cc: Clemens Eisserer, Austin S. Hemmelgarn, Patrik Lundquist, Chris Murphy, Qu Wenruo, Omar Sandoval On domenica 8 maggio 2016 20:27:55 CEST, Patrik Lundquist wrote: > Are you using any power management tweaks? Yes, as stated in my very first post I use TLP with SATA_LINKPWR_ON_BAT=max_performance, but I managed to reproduce the bug even without TLP. Also in the past week I've alwyas been on AC. On lunedì 9 maggio 2016 13:52:16 CEST, Austin S. Hemmelgarn wrote: > Memtest doesn't replicate typical usage patterns very well. My > usual testing for RAM involves not just memtest, but also > booting into a LiveCD (usually SystemRescueCD), pulling down a > copy of the kernel source, and then running as many concurrent > kernel builds as cores, each with as many make jobs as cores (so > if you've got a quad core CPU (or a dual core with > hyperthreading), it would be running 4 builds with -j4 passed to > make). GCC seems to have memory usage patterns that reliably > trigger memory errors that aren't caught by memtest, so this > generally gives good results. Building kernel with 4 concurrent threads is not an issue for my system, in fact I do compile a lot and I never had any issue. On lunedì 9 maggio 2016 13:52:16 CEST, Austin S. Hemmelgarn wrote: > On a similar note, badblocks doesn't replicate filesystem like > access patterns, it just runs sequentially through the entire > disk. This isn't as likely to give bad results, but it's still > important to know. In particular, try running it over a dmcrypt > volume a couple of times (preferably with a different key each > time, pulling keys from /dev/urandom works well for this), as > that will result in writing different data. For what it's > worth, when I'm doing initial testing of new disks, I always use > ddrescue to copy /dev/zero over the whole disk, then do it twice > through dmcrypt with different keys, copying from the disk to > /dev/null after each pass. This gives random data on disk as a > starting point (which is good if you're going to use dmcrypt), > and usually triggers reallocation of any bad sectors as early as > possible. While trying to find a common denominator for my issue I did lots of backups of /dev/mapper/cryptroot and I restored them into /dev/mapper/cryptroot dozens of times (triggering a 150GB+ random data write every time), without any issue (after restoring the backup I alwyas check the parition with btrfs check). So disk doesn't seem to be the culprit. On lunedì 9 maggio 2016 13:52:16 CEST, Austin S. Hemmelgarn wrote: > 1. If you have an eSATA port, try plugging your hard disk in > there and see if things work. If that works but having the hard > drive plugged in internally doesn't, then the issue is probably > either that specific SATA port (in which case your chip-set is > bad and you should get a new system), or the SATA connector > itself (or the wiring, but that's not as likely when it's traces > on a PCB). Normally I'd suggest just swapping cables and SATA > ports, but that's not really possible with a laptop. > 2. If you have access to a reasonably large flash drive, or to > a USB to SATA adapter, try that as well, if it works on that but > not internally (or on an eSATA port), you've probably got a bad > SATA controller, and should get a new system. My laptop doesn't have an eSATA port and my only big enough external drive is currently used for daily backups, since I fear for data loss. On lunedì 9 maggio 2016 13:52:16 CEST, Austin S. Hemmelgarn wrote: > 3. Try things without dmcrypt. Adding extra layers makes it > harder to determine what is actually wrong. If it works without > dmcrypt, try using different parameters for the encryption > (different ciphers is what I would try first). If it works > reliably without dmcrypt, then it's either a bug in dmcrypt > (which I don't think is very likely), or it's bad interaction > between dmcrypt and BTRFS. If it works with some encryption > parameters but not others, then that will help narrow down where > the issue is. On domenica 8 maggio 2016 01:35:16 CEST, Chris Murphy wrote: > You're making the troubleshooting unnecessarily difficult by > continuing to use non-default options. *shrug* > > Every single layer you add complicates the setup and troubleshooting. > Of course all of it should work together, many people do. But you're > the one having the problem so in order to demonstrate whether this is > a software bug or hardware problem, you need to test it with the most > basic setup possible --> btrfs on plain partitions and default mount > options. I will try to recap because you obviously missed my previous e-mail: I managed to replicate the irrecoverable corruption bug even with default options and no dmcrypt at all. Somehow it was a bit more difficult to replicate with default options and so I started to play with different combinations to find if there was something which increased the chances of getting corruption. I have the feeling that "autodefrag" enhances the chances to get corruption, but I'm not 100% sure about it. Anyway, triggering a whole packages reinstall with "pacaur -S $(pacman -Qe)", giving high chances to get irrecoverable corruption. When running such command it simply extracts the tarballs from the cache and overwrites the already installed files. It doesn't write lots of data (after reinstallation my system is still quite small, just a few GBs) but it seems to be enough to displease the filesystem. To avoid losing my data every time I power on or reboot my laptop I first boot into an external drive, I btrfs check /dev/mapper/cryptroot and if it's still sane I backup /dev/mapper/cryptroot into an external SSD with dd, otherwise I restore the previous copy from the SSD into /dev/mapper/cryptroot. I cannot manage to survive such annoying workflow for long, so I really hope someone will manage to track the bug down soon. Thanks for your help, I really appreciate it. Niccolò ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair 2016-05-09 14:53 ` Niccolò Belli @ 2016-05-09 16:29 ` Zygo Blaxell 2016-05-09 18:21 ` Austin S. Hemmelgarn 2016-05-12 14:35 ` Niccolò Belli 2016-05-09 19:23 ` Lionel Bouton 2016-05-09 21:30 ` Chris Murphy 2 siblings, 2 replies; 25+ messages in thread From: Zygo Blaxell @ 2016-05-09 16:29 UTC (permalink / raw) To: Niccolò Belli Cc: linux-btrfs, Clemens Eisserer, Austin S. Hemmelgarn, Patrik Lundquist, Chris Murphy, Qu Wenruo, Omar Sandoval [-- Attachment #1: Type: text/plain, Size: 3690 bytes --] On Mon, May 09, 2016 at 04:53:13PM +0200, Niccolò Belli wrote: > While trying to find a common denominator for my issue I did lots of backups > of /dev/mapper/cryptroot and I restored them into /dev/mapper/cryptroot > dozens of times (triggering a 150GB+ random data write every time), without > any issue (after restoring the backup I alwyas check the parition with btrfs > check). So disk doesn't seem to be the culprit. Did you also check the data matches the backup? btrfs check will only look at the metadata, which is 0.1% of what you've copied. From what you've written, there should be a lot of errors in the data too. If you have incorrect data but btrfs scrub finds no incorrect checksums, then your storage layer is probably fine and we have to look at CPU, host RAM, and software as possible culprits. The logs you've posted so far indicate that bad metadata (e.g. negative item lengths, nonsense transids in metadata references but sane transids in the referred pages) is getting into otherwise valid and well-formed btrfs metadata pages. Since these pages are protected by checksums, the corruption can't be originating in the storage layer--if it was, the pages should be rejected as they are read from disk, before btrfs even looks at them, and the insane transid should be the "found" one not the "expected" one. That suggests there is either RAM corruption happening _after_ the data is read from disk (i.e. while the pages are cached in RAM), or a severe software bug in the kernel you're running. Try different kernel versions (e.g. 4.4.9 or 4.1.23) in case whoever maintains your kernel had a bad day and merged a patch they should not have. Try a minimal configuration with as few drivers as possible loaded, especially GPU drivers and anything from the staging subdirectory--when these drivers have bugs, they ruin everything. Try memtest86+ which has a few more/different tests than memtest86. I have encountered RAM modules that pass memtest86 but fail memtest86+ and vice versa. Try memtester, a memory tester that runs as a Linux process, so it can detect corruption caused when device drivers spray data randomly into RAM, or when the CPU thermal controls are influenced by Linux (an overheating CPU-to-RAM bridge can really ruin your day, and some of the dumber laptop designs rely on the OS for thermal management). Try running more than one memory testing process, in case there is a bug in your hardware that affects interactions between multiple cores (memtest is single-threaded). You can run memtest86 inside a kvm (e.g. kvm -m 3072 -kernel /boot/memtest86.bin) to detect these kinds of issues. Kernel compiles are a bad way to test RAM. I've successfully built kernels on hosts with known RAM failures. The kernels don't always work properly, but it's quite rare to see a build fail outright. > [...]I have the feeling that "autodefrag" enhances the > chances to get corruption, but I'm not 100% sure about it. Anyway, > triggering a whole packages reinstall with "pacaur -S $(pacman -Qe)", giving > high chances to get irrecoverable corruption. When running such command it > simply extracts the tarballs from the cache and overwrites the already > installed files. It doesn't write lots of data (after reinstallation my > system is still quite small, just a few GBs) but it seems to be enough to > displease the filesystem. pacman probably does a lot of fsync() which will do a lot of metadata tree updates. autodefrag triples the I/O load for fragmented files and most of that extra load is metadata tree writes. Both will make the symptoms of your problem worse. [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair 2016-05-09 16:29 ` Zygo Blaxell @ 2016-05-09 18:21 ` Austin S. Hemmelgarn 2016-05-09 19:18 ` Duncan 2016-05-12 14:35 ` Niccolò Belli 1 sibling, 1 reply; 25+ messages in thread From: Austin S. Hemmelgarn @ 2016-05-09 18:21 UTC (permalink / raw) To: Zygo Blaxell, Niccolò Belli Cc: linux-btrfs, Clemens Eisserer, Patrik Lundquist, Chris Murphy, Qu Wenruo, Omar Sandoval On 2016-05-09 12:29, Zygo Blaxell wrote: > On Mon, May 09, 2016 at 04:53:13PM +0200, Niccolò Belli wrote: >> While trying to find a common denominator for my issue I did lots of backups >> of /dev/mapper/cryptroot and I restored them into /dev/mapper/cryptroot >> dozens of times (triggering a 150GB+ random data write every time), without >> any issue (after restoring the backup I alwyas check the parition with btrfs >> check). So disk doesn't seem to be the culprit. > > Did you also check the data matches the backup? btrfs check will only > look at the metadata, which is 0.1% of what you've copied. From what > you've written, there should be a lot of errors in the data too. If you > have incorrect data but btrfs scrub finds no incorrect checksums, then > your storage layer is probably fine and we have to look at CPU, host RAM, > and software as possible culprits. This is a good point. > > The logs you've posted so far indicate that bad metadata (e.g. negative > item lengths, nonsense transids in metadata references but sane transids > in the referred pages) is getting into otherwise valid and well-formed > btrfs metadata pages. Since these pages are protected by checksums, > the corruption can't be originating in the storage layer--if it was, the > pages should be rejected as they are read from disk, before btrfs even > looks at them, and the insane transid should be the "found" one not the > "expected" one. That suggests there is either RAM corruption happening > _after_ the data is read from disk (i.e. while the pages are cached in > RAM), or a severe software bug in the kernel you're running. > > Try different kernel versions (e.g. 4.4.9 or 4.1.23) in case whoever > maintains your kernel had a bad day and merged a patch they should > not have. > > Try a minimal configuration with as few drivers as possible loaded, > especially GPU drivers and anything from the staging subdirectory--when > these drivers have bugs, they ruin everything. > > Try memtest86+ which has a few more/different tests than memtest86. > I have encountered RAM modules that pass memtest86 but fail memtest86+ > and vice versa. > > Try memtester, a memory tester that runs as a Linux process, so it can > detect corruption caused when device drivers spray data randomly into RAM, > or when the CPU thermal controls are influenced by Linux (an overheating > CPU-to-RAM bridge can really ruin your day, and some of the dumber laptop > designs rely on the OS for thermal management). > > Try running more than one memory testing process, in case there is a bug > in your hardware that affects interactions between multiple cores (memtest > is single-threaded). You can run memtest86 inside a kvm (e.g. kvm > -m 3072 -kernel /boot/memtest86.bin) to detect these kinds of issues. > > Kernel compiles are a bad way to test RAM. I've successfully built > kernels on hosts with known RAM failures. The kernels don't always work > properly, but it's quite rare to see a build fail outright. My original suggestion that prompted that part of the comment was to run a bunch of concurrent kernel builds (I only use kernel builds myself because it's a big project with essentially zero build dependencies, if I had the patience and space (and a LiveCD with the right tools and packages installed), I'd probably be using something like LibreOffice or Chromium instead), each run with as many jobs as CPU's (so on a quad-core system, run a dozen or so concurrently with make -j4). I don't use this as my sole test (I also use multiple other tools), but I find that this does a particularly good job of exercising things that memtest doesn't, and I don't just make sure the build's succeed, but also that the compiled kernel images all match, because if there's bad RAM, the resultant images will often be different in some way (and I had forgotten to mention this bit). This practice evolved out of the fact that the only bad RAM I've ever dealt with either completely failed to POST (which can have all kinds of interesting symptoms if it's just one module, some MB's refuse to boot, some report the error, others just disable the module and act like nothing happened), or passed all the memory testing tools I threw at it (memtest86, memtest86+, memtester, concurrent memtest86 invocations from Xen domains, inventive acrobatics with tmpfs and FIO, etc), but failed under heavy concurrent random access, which can be reliably produced by running a bunch of big software builds at the same time with the CPU insanely over-committed. I could probably produce a similar workload with tmpfs and FIO, but it's a lot quicker and easier to remember how to do a kernel build than it is to remember the complex incantations needed to get FIO to do anything interesting. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair 2016-05-09 18:21 ` Austin S. Hemmelgarn @ 2016-05-09 19:18 ` Duncan 0 siblings, 0 replies; 25+ messages in thread From: Duncan @ 2016-05-09 19:18 UTC (permalink / raw) To: linux-btrfs Austin S. Hemmelgarn posted on Mon, 09 May 2016 14:21:57 -0400 as excerpted: > This practice evolved out of the fact that the only bad RAM I've ever > dealt with either completely failed to POST (which can have all kinds of > interesting symptoms if it's just one module, some MB's refuse to boot, > some report the error, others just disable the module and act like > nothing happened), or passed all the memory testing tools I threw at it > (memtest86, memtest86+, memtester, concurrent memtest86 invocations from > Xen domains, inventive acrobatics with tmpfs and FIO, etc), but failed > under heavy concurrent random access, which can be reliably produced by > running a bunch of big software builds at the same time with the CPU > insanely over-committed. My (likely much more limited) experience matches yours. Tho FWIW, in my case I did find that one of the more common memory failure indicators was bz2-ed tarball decompression, where the tarball would fail its decompression checksum safety checks. However, that most reliably happened in the context of a heavily loaded system doing other package builds in parallel to the package tarball extraction that failed. In my case, I even had ECC RAM, but it was apparently just slightly out of spec for its labeled and internally configured memory speeds (PC3200 DDR1 at the time), at least on my hardware. Once I got a BIOS update that let me, I slightly downclocked the memory (to PC3000, IIRC), and it was absolutely solid, no more errors, even with tightened up wait-state timings. Later I upgraded RAM, and the new RAM worked just fine at the same PC3200 speeds that were a problem for the older RAM. The problem was apparently that while the RAM cells that memcheck checks were fine, it was testing in an otherwise calm environment (not much choice since you can only boot to the test directly and can't do anything else at the same time), without all the other stuff going on in the hectic environment of a multi-package parallel build, that apparently happened to occasionally trigger the edge-case that would corrupt things. And FWIW, I still have major respect for how well reiserfs behaved under those conditions. No filesystem can be expected to be 100% reliable when it's getting corrupted data due to bad memory, but reiserfs held up remarkably well, far better than btrfs did under similar conditions (but then with the PCI and SATA bus) a few year later, forcing me back to reiserfs for a time, which again, continued to work like a champ, even under hardware conditions that were absolutely unworkable with btrfs. I had a heat-related (AC went out, in Phoenix, in the summer, 40+ C outside, 50+C inside, who knows what the disks were!?) head crash on a disk too, where the partitions that were mounted and likely had the head flying over them were damaged beyond (easy) recovery, but other partitions on the same disk were absolutely fine, and I actually continued to run off them for a few months after cooling everything back down. That sort of experience is the reason I still use reiserfs on spinning rust, including my second and third level backups, even while I'm running btrfs on the ssds for the working system and primary backup. It's also the reason I continue to use a partitioned system with multiple independent filesystems (btrfs raid1 on a pair of ssds for most of the working btrfs and primary backups, individual ssd btrfs in dup mode for /boot, and its backup on the other ssd), instead of putting my data eggs all in the same filesystem basket with subvolumes, where if the filesystem goes out all the subvolumes go with it! -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair 2016-05-09 16:29 ` Zygo Blaxell 2016-05-09 18:21 ` Austin S. Hemmelgarn @ 2016-05-12 14:35 ` Niccolò Belli 2016-05-12 15:43 ` Austin S. Hemmelgarn 2016-05-12 16:48 ` Zygo Blaxell 1 sibling, 2 replies; 25+ messages in thread From: Niccolò Belli @ 2016-05-12 14:35 UTC (permalink / raw) To: linux-btrfs Cc: Clemens Eisserer, Austin S. Hemmelgarn, Patrik Lundquist, Chris Murphy, Qu Wenruo, Omar Sandoval, Zygo Blaxell, ahferroin7, 1i5t5.duncan On lunedì 9 maggio 2016 18:29:41 CEST, Zygo Blaxell wrote: > Did you also check the data matches the backup? btrfs check will only > look at the metadata, which is 0.1% of what you've copied. From what > you've written, there should be a lot of errors in the data too. If you > have incorrect data but btrfs scrub finds no incorrect checksums, then > your storage layer is probably fine and we have to look at CPU, host RAM, > and software as possible culprits. > > The logs you've posted so far indicate that bad metadata (e.g. negative > item lengths, nonsense transids in metadata references but sane transids > in the referred pages) is getting into otherwise valid and well-formed > btrfs metadata pages. Since these pages are protected by checksums, > the corruption can't be originating in the storage layer--if it was, the > pages should be rejected as they are read from disk, before btrfs even > looks at them, and the insane transid should be the "found" one not the > "expected" one. That suggests there is either RAM corruption happening > _after_ the data is read from disk (i.e. while the pages are cached in > RAM), or a severe software bug in the kernel you're running. When doing the btrfs check I also always do a btrfs scrub and it never found any error. Once it didn't manage to finish the scrub because of: BTRFS critical (device dm-0): corrupt leaf, slot offset bad: block=670597120,root=1, slot=6 and btrfs scrub status reported "was aborted after 00:00:10". Talking about scrub I created a systemd timer to run scrub hourly and I noticed 2 *uncorrectable* errors suddenly appeared on my system. So I immediately re-run the scrub just to confirm it and then I rebooted into the Arch live usb and runned btrfs check: the metadata were perfect. So I runned btrfs scrub from the live usb and there were no errors at all! I rebooted into my system and runned scrub once again and the uncorrectable errors where really gone! It happened two times in the past few days. > Try different kernel versions (e.g. 4.4.9 or 4.1.23) in case whoever > maintains your kernel had a bad day and merged a patch they should > not have. Almost no patches get applied by the Arch kernel team: https://git.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/linux At the moment the only one is an harmless "change-default-console-loglevel.patch". > Try a minimal configuration with as few drivers as possible loaded, > especially GPU drivers and anything from the staging subdirectory--when > these drivers have bugs, they ruin everything. Arch kernel team is quite conservative regarding staging/experimental features, I remember they rejected some config patches I submitted because of this. Anyway I will try to blacklist as many kernel modules as I can. Maybe blacklisting GPU is too much because if I can't actually use my laptop it will be much more difficult to reproduce the issue. > Try memtest86+ which has a few more/different tests than memtest86. > I have encountered RAM modules that pass memtest86 but fail memtest86+ > and vice versa. > > Try memtester, a memory tester that runs as a Linux process, so it can > detect corruption caused when device drivers spray data randomly into RAM, > or when the CPU thermal controls are influenced by Linux (an overheating > CPU-to-RAM bridge can really ruin your day, and some of the dumber laptop > designs rely on the OS for thermal management). > > Try running more than one memory testing process, in case there is a bug > in your hardware that affects interactions between multiple cores (memtest > is single-threaded). You can run memtest86 inside a kvm (e.g. kvm > -m 3072 -kernel /boot/memtest86.bin) to detect these kinds of issues. > > Kernel compiles are a bad way to test RAM. I've successfully built > kernels on hosts with known RAM failures. The kernels don't always work > properly, but it's quite rare to see a build fail outright. I didn't use memtest86+ because of the lack of EFI support, but I just tried the shiny new memtest86 7.0 beta with improved tests for 12+ hours without issues. Also I runned "memtester 4G" and "systester-cli -gausslg 64M -threads 4 -turns 100000" together for 12 hours without any issue so I think both my ram and cpu are ok. I can think only about two possible culprits now (correct me if I'm wrong): 1) A btrfs bug 2) Another module screwing things around I can do nothing about btrfs bugs so I will try to hunt the second option. This is the list of modules I'm running: lsmod | awk '$4 == ""' | awk '{print $1}' | sort 8250_dw ac acpi_als acpi_pad aesni_intel ahci algif_skcipher ansi_cprng arc4 atkbd battery bnep btrfs btusb cdc_ether cmac coretemp crc32c_intel crc32_pclmul crct10dif_pclmul dell_laptop dell_wmi dm_crypt drbg ecb elan_i2c evdev ext4 fan fjes ghash_clmulni_intel gpio_lynxpoint hid_generic hid_multitouch hmac i2c_designware_platform i2c_hid i2c_i801 i915 input_leds int3400_thermal int3402_thermal int3403_thermal intel_hid intel_pch_thermal intel_powerclamp intel_rapl ip_tables iTCO_wdt iwlmvm jitterentropy_rng joydev kvm_intel lpc_ich mac_hid mei_me mos7720 mousedev msr nls_cp437 nls_iso8859_1 nvram pcspkr pl2303 processor processor_thermal_device psmouse r8152 rfcomm rtsx_pci_ms rtsx_pci_sdmmc sch_fq_codel sdhci_acpi sd_mod serio_raw sha256_ssse3 shpchp snd_hda_codec_hdmi snd_hda_intel snd_soc_ssm4567 snd_soc_sst_acpi snd_soc_sst_broadwell spi_pxa2xx_platform thermal tpm_crb tpm_tis uas usbhid uvcvideo vfat visor x86_pkg_temp_thermal xhci_pci I will try to blacklist as many as I can will still keeping a somehow usable system and see if can reproduce it. If I will not be able to reproduce it anymore then the hunt will begin. It will not be a funny one as I already experienced with hid-multitouch which gave me random kernel hangs at boot ONLY if loaded early into the initramfs: https://bugzilla.kernel.org/show_bug.cgi?id=105251 Another option will be crashing it with my car's wheels hoping that because of my comprehensive insurance policy Dell will give me the next model (the Skylake one) as a replacement (hoping that it will not suffer from the same issue of the Broadwell one). Thanks, Niccolò ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair 2016-05-12 14:35 ` Niccolò Belli @ 2016-05-12 15:43 ` Austin S. Hemmelgarn 2016-05-13 11:07 ` Niccolò Belli 2016-05-12 16:48 ` Zygo Blaxell 1 sibling, 1 reply; 25+ messages in thread From: Austin S. Hemmelgarn @ 2016-05-12 15:43 UTC (permalink / raw) To: Niccolò Belli, linux-btrfs Cc: Clemens Eisserer, Patrik Lundquist, Chris Murphy, Qu Wenruo, Omar Sandoval, Zygo Blaxell, 1i5t5.duncan On 2016-05-12 10:35, Niccolò Belli wrote: > On lunedì 9 maggio 2016 18:29:41 CEST, Zygo Blaxell wrote: >> Did you also check the data matches the backup? btrfs check will only >> look at the metadata, which is 0.1% of what you've copied. From what >> you've written, there should be a lot of errors in the data too. If you >> have incorrect data but btrfs scrub finds no incorrect checksums, then >> your storage layer is probably fine and we have to look at CPU, host RAM, >> and software as possible culprits. >> >> The logs you've posted so far indicate that bad metadata (e.g. negative >> item lengths, nonsense transids in metadata references but sane transids >> in the referred pages) is getting into otherwise valid and well-formed >> btrfs metadata pages. Since these pages are protected by checksums, >> the corruption can't be originating in the storage layer--if it was, the >> pages should be rejected as they are read from disk, before btrfs even >> looks at them, and the insane transid should be the "found" one not the >> "expected" one. That suggests there is either RAM corruption happening >> _after_ the data is read from disk (i.e. while the pages are cached in >> RAM), or a severe software bug in the kernel you're running. > > When doing the btrfs check I also always do a btrfs scrub and it never > found any error. Once it didn't manage to finish the scrub because of: > BTRFS critical (device dm-0): corrupt leaf, slot offset bad: > block=670597120,root=1, slot=6 > and btrfs scrub status reported "was aborted after 00:00:10". > > Talking about scrub I created a systemd timer to run scrub hourly and I > noticed 2 *uncorrectable* errors suddenly appeared on my system. So I > immediately re-run the scrub just to confirm it and then I rebooted into > the Arch live usb and runned btrfs check: the metadata were perfect. So > I runned btrfs scrub from the live usb and there were no errors at all! > I rebooted into my system and runned scrub once again and the > uncorrectable errors where really gone! It happened two times in the > past few days. This would indicate to me that you've either got bad RAM (most likely), or some other hardware component is not working correctly. It's not unusual for hardware issues to be intermittent. > >> Try different kernel versions (e.g. 4.4.9 or 4.1.23) in case whoever >> maintains your kernel had a bad day and merged a patch they should >> not have. > > Almost no patches get applied by the Arch kernel team: > https://git.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/linux > At the moment the only one is an harmless > "change-default-console-loglevel.patch". > >> Try a minimal configuration with as few drivers as possible loaded, >> especially GPU drivers and anything from the staging subdirectory--when >> these drivers have bugs, they ruin everything. > > Arch kernel team is quite conservative regarding staging/experimental > features, I remember they rejected some config patches I submitted > because of this. > Anyway I will try to blacklist as many kernel modules as I can. Maybe > blacklisting GPU is too much because if I can't actually use my laptop > it will be much more difficult to reproduce the issue. Disable the GPU driver, but make sure you have the VGA_CONSOLE config enabled, and you should be fine (you'll just get a 80x25 text-mode console instead of a high-resolution one). > >> Try memtest86+ which has a few more/different tests than memtest86. >> I have encountered RAM modules that pass memtest86 but fail memtest86+ >> and vice versa. >> >> Try memtester, a memory tester that runs as a Linux process, so it can >> detect corruption caused when device drivers spray data randomly into >> RAM, >> or when the CPU thermal controls are influenced by Linux (an overheating >> CPU-to-RAM bridge can really ruin your day, and some of the dumber laptop >> designs rely on the OS for thermal management). >> >> Try running more than one memory testing process, in case there is a bug >> in your hardware that affects interactions between multiple cores >> (memtest >> is single-threaded). You can run memtest86 inside a kvm (e.g. kvm >> -m 3072 -kernel /boot/memtest86.bin) to detect these kinds of issues. >> >> Kernel compiles are a bad way to test RAM. I've successfully built >> kernels on hosts with known RAM failures. The kernels don't always work >> properly, but it's quite rare to see a build fail outright. > > I didn't use memtest86+ because of the lack of EFI support, but I just > tried the shiny new memtest86 7.0 beta with improved tests for 12+ hours > without issues. > Also I runned "memtester 4G" and "systester-cli -gausslg 64M -threads 4 > -turns 100000" together for 12 hours without any issue so I think both > my ram and cpu are ok. That's probably a good indication of the CPU and the MB being OK, but not necessarily the RAM. There's two other possible options for testing the RAM that haven't been mentioned yet though (which I hadn't thought of myself until now): 1. If you have access to Windows, try the Windows Memory Diagnostic. This runs yet another slightly different set of tests from memtest86 and memtest86+, so it may catch issues they don't. You can start this directly on an EFI system by loading /EFI/Microsoft/Boot/MEMTEST.EFI from the EFI system partition. 2. This is a Dell system. If you still have the utility partition which Dell ships all their per-provisioned systems with, that should have a hardware diagnostics tool. I doubt that this will find anything (it's part of their QA procedure AFAICT), but it's probably worth trying, as the memory testing in that uses yet another slightly different implementation of the typical tests. You can usually find this in the boot interrupt menu accessed by hitting F12 before the boot-loader loads. > > I can think only about two possible culprits now (correct me if I'm wrong): > 1) A btrfs bug > 2) Another module screwing things around It could still be the disk (not likely, but possible) or the storage controller. If you have a spare disk, I'd suggest trying with that (assuming of course it doesn't void your warranty). > > I can do nothing about btrfs bugs so I will try to hunt the second > option. This is the list of modules I'm running: > > lsmod | awk '$4 == ""' | awk '{print $1}' | sort > > 8250_dw > ac > acpi_als > acpi_pad > aesni_intel > ahci > algif_skcipher > ansi_cprng > arc4 > atkbd > battery > bnep > btrfs > btusb > cdc_ether > cmac > coretemp > crc32c_intel > crc32_pclmul > crct10dif_pclmul > dell_laptop > dell_wmi > dm_crypt > drbg > ecb > elan_i2c > evdev > ext4 > fan > fjes > ghash_clmulni_intel > gpio_lynxpoint > hid_generic > hid_multitouch > hmac > i2c_designware_platform > i2c_hid > i2c_i801 > i915 > input_leds > int3400_thermal > int3402_thermal > int3403_thermal > intel_hid > intel_pch_thermal > intel_powerclamp > intel_rapl > ip_tables > iTCO_wdt > iwlmvm > jitterentropy_rng > joydev > kvm_intel > lpc_ich > mac_hid > mei_me > mos7720 > mousedev > msr > nls_cp437 > nls_iso8859_1 > nvram > pcspkr > pl2303 > processor > processor_thermal_device > psmouse > r8152 > rfcomm > rtsx_pci_ms > rtsx_pci_sdmmc > sch_fq_codel > sdhci_acpi > sd_mod > serio_raw > sha256_ssse3 > shpchp > snd_hda_codec_hdmi > snd_hda_intel > snd_soc_ssm4567 > snd_soc_sst_acpi > snd_soc_sst_broadwell > spi_pxa2xx_platform > thermal > tpm_crb > tpm_tis > uas > usbhid > uvcvideo > vfat > visor > x86_pkg_temp_thermal > xhci_pci > > I will try to blacklist as many as I can will still keeping a somehow > usable system and see if can reproduce it. If I will not be able to > reproduce it anymore then the hunt will begin. It will not be a funny > one as I already experienced with hid-multitouch which gave me random > kernel hangs at boot ONLY if loaded early into the initramfs: > https://bugzilla.kernel.org/show_bug.cgi?id=105251 Based on what you've got listed for modules, I'd expect the absolute minimum for a usable test system to be: ac acpi_als (you can probably remove this, it's for the ambient light sensor) acpi_pad ahci atkbd battery btrfs coretemp dell_laptop dell_wmi elan_i2c evdev ext4 fan gpio_lynxpoint hid_generic hid_multitouch i2c_i801 i915 (this is your GPU module, you should still have a usable text console if this isn't loaded) int3400_thermal int3402_thermal int3403_thermal intel_hid intel_pch_thermal intel_powerclamp intel_rapl ip_tables (if you have no firewall configured, you can safely blacklist this) iwlmvm (you might try removing this, but you will have no wifi without it) lpc_ich mousedev nvram (you might be able to remove this, I don't remember if the dell modules depend on it or not) processor processor_thermal_device psmouse r8152 (you can try removing this too, but you will have no ethernet without it) sch_fq_codel serio_raw spi_pxa2xx_platform thermal usbhid vfat (if you avoid mounting your EFI system partition, you can probably pull this out) x86_pkg_temp_thermal xhci_pci Note that this assumes you aren't testing on dmcrypt. Make absolutely certain though that you don't remove any of the *thermal modules, the fan module, and the dell modules, not having those may result in hardware damage. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair 2016-05-12 15:43 ` Austin S. Hemmelgarn @ 2016-05-13 11:07 ` Niccolò Belli 2016-05-13 11:35 ` Austin S. Hemmelgarn 0 siblings, 1 reply; 25+ messages in thread From: Niccolò Belli @ 2016-05-13 11:07 UTC (permalink / raw) To: Austin S. Hemmelgarn Cc: linux-btrfs, Clemens Eisserer, Patrik Lundquist, Chris Murphy, Qu Wenruo, Omar Sandoval, Zygo Blaxell, 1i5t5.duncan On giovedì 12 maggio 2016 17:43:38 CEST, Austin S. Hemmelgarn wrote: > That's probably a good indication of the CPU and the MB being > OK, but not necessarily the RAM. There's two other possible > options for testing the RAM that haven't been mentioned yet > though (which I hadn't thought of myself until now): > 1. If you have access to Windows, try the Windows Memory > Diagnostic. This runs yet another slightly different set of > tests from memtest86 and memtest86+, so it may catch issues they > don't. You can start this directly on an EFI system by loading > /EFI/Microsoft/Boot/MEMTEST.EFI from the EFI system partition. > 2. This is a Dell system. If you still have the utility > partition which Dell ships all their per-provisioned systems > with, that should have a hardware diagnostics tool. I doubt > that this will find anything (it's part of their QA procedure > AFAICT), but it's probably worth trying, as the memory testing > in that uses yet another slightly different implementation of > the typical tests. You can usually find this in the boot > interrupt menu accessed by hitting F12 before the boot-loader > loads. I tried the Dell System Test, including the enhanced optional ram tests and it was fine. I also tried the Microsoft one, which passed. BUT if I select the advanced test in the Microsoft One it always stops at 21% of first test. The test menus are still working, but fans get quiet and it keeps writing "test running... 21%" forever. I tried it many times and it always got stuck at 21%, so I suspect a test suite bug instead of a ram failure. I also noticed some other interesting behaviours: while I was running the usual scrub+check (both were fine) from the livecd I noticed this in dmesg: [ 261.301159] BTRFS info (device dm-0): bdev /dev/mapper/cryptroot errs: wr 0, rd 0, flush 0, corrupt 4, gen 0 Corrupt? But both scrub and check were fine... I double checked scrub and check and they were still fine. This is what happened another time: https://drive.google.com/open?id=0Bwe9Wtc-5xF1dGtPaWhTZ0w5aUU I was making a backup of my partition USING DD from the livecd. It wasn't even mounted if I recall correctly! On giovedì 12 maggio 2016 18:48:17 CEST, Zygo Blaxell wrote: > That's what a RAM corruption problem looks like when you run btrfs scrub. > Maybe the RAM itself is OK, but *something* is scribbling on it. > > Does the Arch live usb use the same kernel as your normal system? Yes, except for the point release (the system is slightly ahead of the liveusb). On giovedì 12 maggio 2016 18:48:17 CEST, Zygo Blaxell wrote: > Did you try an older (or newer) kernel? I've been running 4.5.x on a few > canary systems, but so far none of them have survived more than a day. No (except for point releases from 4.5.0 to 4.5.4), but I will try 4.4. On giovedì 12 maggio 2016 18:48:17 CEST, Zygo Blaxell wrote: > It's possible there's a problem that affects only very specific chipsets > You seem to have eliminated RAM in isolation, but there could be a problem > in the kernel that affects only your chipset. Funny considering it is sold as a Linux laptop. Unfortunately they only tested it with the ancient Ubuntu 14.04. Niccolò ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair 2016-05-13 11:07 ` Niccolò Belli @ 2016-05-13 11:35 ` Austin S. Hemmelgarn 2016-05-13 12:10 ` Niccolò Belli 0 siblings, 1 reply; 25+ messages in thread From: Austin S. Hemmelgarn @ 2016-05-13 11:35 UTC (permalink / raw) To: Niccolò Belli Cc: linux-btrfs, Clemens Eisserer, Patrik Lundquist, Chris Murphy, Qu Wenruo, Omar Sandoval, Zygo Blaxell, 1i5t5.duncan On 2016-05-13 07:07, Niccolò Belli wrote: > On giovedì 12 maggio 2016 17:43:38 CEST, Austin S. Hemmelgarn wrote: >> That's probably a good indication of the CPU and the MB being OK, but >> not necessarily the RAM. There's two other possible options for >> testing the RAM that haven't been mentioned yet though (which I hadn't >> thought of myself until now): >> 1. If you have access to Windows, try the Windows Memory Diagnostic. >> This runs yet another slightly different set of tests from memtest86 >> and memtest86+, so it may catch issues they don't. You can start this >> directly on an EFI system by loading /EFI/Microsoft/Boot/MEMTEST.EFI >> from the EFI system partition. >> 2. This is a Dell system. If you still have the utility partition >> which Dell ships all their per-provisioned systems with, that should >> have a hardware diagnostics tool. I doubt that this will find >> anything (it's part of their QA procedure AFAICT), but it's probably >> worth trying, as the memory testing in that uses yet another slightly >> different implementation of the typical tests. You can usually find >> this in the boot interrupt menu accessed by hitting F12 before the >> boot-loader loads. > > I tried the Dell System Test, including the enhanced optional ram tests > and it was fine. I also tried the Microsoft one, which passed. BUT if I > select the advanced test in the Microsoft One it always stops at 21% of > first test. The test menus are still working, but fans get quiet and it > keeps writing "test running... 21%" forever. I tried it many times and > it always got stuck at 21%, so I suspect a test suite bug instead of a > ram failure. I've actually seen this before on other systems (different completion percentage on each system, but otherwise the same), all of them ended up actually having a bad CPU or MB, although the ones with CPU issues were fine after BIOS updates which included newer microcode. > > I also noticed some other interesting behaviours: while I was running > the usual scrub+check (both were fine) from the livecd I noticed this in > dmesg: > [ 261.301159] BTRFS info (device dm-0): bdev /dev/mapper/cryptroot > errs: wr 0, rd 0, flush 0, corrupt 4, gen 0 > Corrupt? But both scrub and check were fine... I double checked scrub > and check and they were still fine. It's worth noting that these are running counts of errors since the last time the stats were reset (and they only get reset manually). If you haven't reset the stats, then this isn't all that surprising. > > This is what happened another time: > https://drive.google.com/open?id=0Bwe9Wtc-5xF1dGtPaWhTZ0w5aUU > I was making a backup of my partition USING DD from the livecd. It > wasn't even mounted if I recall correctly! The fact that you're getting an OOPS involving core kernel threads (kswapd) is a pretty good indication that either there's a bug elsewhere in the kernel, or that something is wrong with your hardware. it's really difficult to be certain if you don't have a reliable test case though. > > On giovedì 12 maggio 2016 18:48:17 CEST, Zygo Blaxell wrote: >> That's what a RAM corruption problem looks like when you run btrfs scrub. >> Maybe the RAM itself is OK, but *something* is scribbling on it. >> >> Does the Arch live usb use the same kernel as your normal system? > > Yes, except for the point release (the system is slightly ahead of the > liveusb). > > On giovedì 12 maggio 2016 18:48:17 CEST, Zygo Blaxell wrote: >> Did you try an older (or newer) kernel? I've been running 4.5.x on a few >> canary systems, but so far none of them have survived more than a day. > > No (except for point releases from 4.5.0 to 4.5.4), but I will try 4.4. FWIW, I've been running 4.5 with almost no issues on my laptop since it came out (the few issues I have had are not unique to 4.5, and are all ultimately firmware issues (Lenovo has been getting _really_ bad recently about having broken ACPI and EFI implementations...)). Of course, I'm also running Gentoo, so everything is built locally, but I doubt that that has much impact on stability. > > On giovedì 12 maggio 2016 18:48:17 CEST, Zygo Blaxell wrote: >> It's possible there's a problem that affects only very specific chipsets >> You seem to have eliminated RAM in isolation, but there could be a >> problem >> in the kernel that affects only your chipset. > > Funny considering it is sold as a Linux laptop. Unfortunately they only > tested it with the ancient Ubuntu 14.04. Sadly, this is pretty typical for anything sold as a 'Linux' system that isn't a server. Even for the servers sold as such, it's not unusual for it to only be tested with with old versions of CentOS. Now, I hadn't thought of this before, but it's a Dell system, so you're trapping out to SMBIOS for everything under the sun, and if they don't pass a correct memory map (or correct ACPI tables) to the OS during boot, then there may be some sections of RAM that both Linux and the firmware think they can use, which could definitely result in symptoms like bad RAM while still consistently passing memory tests (because they don't make BIOS calls after they have the system info they need). ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair 2016-05-13 11:35 ` Austin S. Hemmelgarn @ 2016-05-13 12:10 ` Niccolò Belli 2016-05-13 21:54 ` Chris Murphy 0 siblings, 1 reply; 25+ messages in thread From: Niccolò Belli @ 2016-05-13 12:10 UTC (permalink / raw) To: Austin S. Hemmelgarn Cc: linux-btrfs, Clemens Eisserer, Patrik Lundquist, Chris Murphy, Qu Wenruo, Omar Sandoval, Zygo Blaxell, 1i5t5.duncan On venerdì 13 maggio 2016 13:35:01 CEST, Austin S. Hemmelgarn wrote: > The fact that you're getting an OOPS involving core kernel > threads (kswapd) is a pretty good indication that either there's > a bug elsewhere in the kernel, or that something is wrong with > your hardware. it's really difficult to be certain if you don't > have a reliable test case though. Talking about reliable test cases, I forgot to say that I definitely found an interesting one. It doesn't lead to OOPS but perhaps something even more interesting. While running countless stress tests I tried running some games to stress the system in different ways. I chosed openmw (an open source engine for Morrowind) and I played it for a while on my second external monitor (while I watched at some monitoring tools on my first monitor). I noticed that after playing a while I *always* lose internet connection (I use an USB3 Gigabit Ethernet adapter). This isn't the only thing which happens: even if the game keeps running flawlessly and the system *seems* to work fine (I can drag windows, open the terminal...) lots of commands simply stall (for example mounting a partition, unmounting it, rebooting...). I can reliably reproduce it, it ALWAYS happens. Niccolò ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair 2016-05-13 12:10 ` Niccolò Belli @ 2016-05-13 21:54 ` Chris Murphy 0 siblings, 0 replies; 25+ messages in thread From: Chris Murphy @ 2016-05-13 21:54 UTC (permalink / raw) To: Niccolò Belli Cc: Austin S. Hemmelgarn, Btrfs BTRFS, Clemens Eisserer, Patrik Lundquist, Chris Murphy, Qu Wenruo, Omar Sandoval, Zygo Blaxell, Duncan On Fri, May 13, 2016 at 6:10 AM, Niccolò Belli <darkbasic@linuxsystems.it> wrote: > On venerdì 13 maggio 2016 13:35:01 CEST, Austin S. Hemmelgarn wrote: >> >> The fact that you're getting an OOPS involving core kernel threads >> (kswapd) is a pretty good indication that either there's a bug elsewhere in >> the kernel, or that something is wrong with your hardware. it's really >> difficult to be certain if you don't have a reliable test case though. > > > Talking about reliable test cases, I forgot to say that I definitely found > an interesting one. It doesn't lead to OOPS but perhaps something even more > interesting. While running countless stress tests I tried running some games > to stress the system in different ways. I chosed openmw (an open source > engine for Morrowind) and I played it for a while on my second external > monitor (while I watched at some monitoring tools on my first monitor). I > noticed that after playing a while I *always* lose internet connection (I > use an USB3 Gigabit Ethernet adapter). This isn't the only thing which > happens: even if the game keeps running flawlessly and the system *seems* to > work fine (I can drag windows, open the terminal...) lots of commands simply > stall (for example mounting a partition, unmounting it, rebooting...). I can > reliably reproduce it, it ALWAYS happens. Well there are a bunch of kernel debug options. If your kernel has CONFIG_SLUB_DEBUG=y CONFIG_SLUB=y at compile time you can boot with boot parameter slub_debug=1 to enable it and maybe there'll be something more revealing about the problems you're having. More aggressive is CONFIG_DEBUG_PAGEALLOC=y but it'll slow things down quite noticeably. And then there's some Btrfs debug options for compile time, and are enabled with mount options. But I think the problem you're having isn't specific to Btrfs or someone else would have run into it. -- Chris Murphy ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair 2016-05-12 14:35 ` Niccolò Belli 2016-05-12 15:43 ` Austin S. Hemmelgarn @ 2016-05-12 16:48 ` Zygo Blaxell 1 sibling, 0 replies; 25+ messages in thread From: Zygo Blaxell @ 2016-05-12 16:48 UTC (permalink / raw) To: Niccolò Belli Cc: linux-btrfs, Clemens Eisserer, Austin S. Hemmelgarn, Patrik Lundquist, Chris Murphy, Qu Wenruo, Omar Sandoval, 1i5t5.duncan [-- Attachment #1: Type: text/plain, Size: 2790 bytes --] On Thu, May 12, 2016 at 04:35:24PM +0200, Niccolò Belli wrote: > When doing the btrfs check I also always do a btrfs scrub and it never found > any error. Once it didn't manage to finish the scrub because of: > BTRFS critical (device dm-0): corrupt leaf, slot offset bad: > block=670597120,root=1, slot=6 > and btrfs scrub status reported "was aborted after 00:00:10". > > Talking about scrub I created a systemd timer to run scrub hourly and I > noticed 2 *uncorrectable* errors suddenly appeared on my system. So I > immediately re-run the scrub just to confirm it and then I rebooted into the > Arch live usb and runned btrfs check: the metadata were perfect. So I runned > btrfs scrub from the live usb and there were no errors at all! I rebooted > into my system and runned scrub once again and the uncorrectable errors > where really gone! It happened two times in the past few days. That's what a RAM corruption problem looks like when you run btrfs scrub. Maybe the RAM itself is OK, but *something* is scribbling on it. Does the Arch live usb use the same kernel as your normal system? > Almost no patches get applied by the Arch kernel team: > https://git.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/linux > At the moment the only one is an harmless > "change-default-console-loglevel.patch". Did you try an older (or newer) kernel? I've been running 4.5.x on a few canary systems, but so far none of them have survived more than a day. Contrast with 4.1.x and 4.4.x, which runs for months between reboots for me. Maybe there's a regression in 4.5.x, maybe I did something wrong in my config or build, or maybe I just have too few data points to draw any conclusions, but my data so far is telling me to stay on 4.4.x until something changes (i.e. wait for a 4.5.x stable update or skip directly to 4.6.x). :-/ It's always worth trying this if only to eliminate regression as a possible root cause early. In practice, every mainline kernel release has a regression that affects at least one combination of config options and hardware. btrfs is stable enough now that you can be running one or two releases behind to avoid a problem elsewhere in the kernel. > Another option will be crashing it with my car's wheels hoping that because > of my comprehensive insurance policy Dell will give me the next model (the > Skylake one) as a replacement (hoping that it will not suffer from the same > issue of the Broadwell one). The first rule of Insurance Fraud Club: don't talk about Insurance Fraud Club. ;) It's possible there's a problem that affects only very specific chipsets You seem to have eliminated RAM in isolation, but there could be a problem in the kernel that affects only your chipset. [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair 2016-05-09 14:53 ` Niccolò Belli 2016-05-09 16:29 ` Zygo Blaxell @ 2016-05-09 19:23 ` Lionel Bouton 2016-05-09 21:30 ` Chris Murphy 2 siblings, 0 replies; 25+ messages in thread From: Lionel Bouton @ 2016-05-09 19:23 UTC (permalink / raw) To: Niccolò Belli, linux-btrfs Cc: Clemens Eisserer, Austin S. Hemmelgarn, Patrik Lundquist, Chris Murphy, Qu Wenruo, Omar Sandoval Hi, Le 09/05/2016 16:53, Niccolò Belli a écrit : > On domenica 8 maggio 2016 20:27:55 CEST, Patrik Lundquist wrote: >> Are you using any power management tweaks? > > Yes, as stated in my very first post I use TLP with > SATA_LINKPWR_ON_BAT=max_performance, but I managed to reproduce the > bug even without TLP. Also in the past week I've alwyas been on AC. > > On lunedì 9 maggio 2016 13:52:16 CEST, Austin S. Hemmelgarn wrote: >> Memtest doesn't replicate typical usage patterns very well. My usual >> testing for RAM involves not just memtest, but also booting into a >> LiveCD (usually SystemRescueCD), pulling down a copy of the kernel >> source, and then running as many concurrent kernel builds as cores, >> each with as many make jobs as cores (so if you've got a quad core >> CPU (or a dual core with hyperthreading), it would be running 4 >> builds with -j4 passed to make). GCC seems to have memory usage >> patterns that reliably trigger memory errors that aren't caught by >> memtest, so this generally gives good results. > > Building kernel with 4 concurrent threads is not an issue for my > system, in fact I do compile a lot and I never had any issue. Note : I once had a server which would pass memtest86 and repeated kernel compilations maxing out the CPU threads but couldn't at the same time reliably compile a kernel and copy large amounts of data. I think I lost my little automated test suite (I should definitely look for it again or code it from scratch) but what I did on new servers since that time was : 1/ create a file larger than the system's RAM (this makes sure you will read and write all data from disk and not only caches and might catch controller hardware problems too) with dd if=/dev/urandom (several gigabytes of random data exercise many different patterns, far more than what memtest86 would test), compute its md5 checksum 2/ launch a subprocess repeatedly compiling the kernel with more jobs than available CPU threads and stopping as soon as the make exit code was != 0. 3/ launch another subprocess repeatedly copying the random file to another location and exiting when the md5 checksum didn't match the source. Let it run as a burn-in test for as long as you can afford (from experience after 24 hours if it's still running the probability that the test will find a problem becomes negligible). If one of the subprocess stopped by itself your hardware is not stable. This actually caught a few unstable systems before it could go into production for me. Lionel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair 2016-05-09 14:53 ` Niccolò Belli 2016-05-09 16:29 ` Zygo Blaxell 2016-05-09 19:23 ` Lionel Bouton @ 2016-05-09 21:30 ` Chris Murphy 2 siblings, 0 replies; 25+ messages in thread From: Chris Murphy @ 2016-05-09 21:30 UTC (permalink / raw) To: Niccolò Belli Cc: Btrfs BTRFS, Clemens Eisserer, Austin S. Hemmelgarn, Patrik Lundquist, Chris Murphy, Qu Wenruo, Omar Sandoval On Mon, May 9, 2016 at 8:53 AM, Niccolò Belli <darkbasic@linuxsystems.it> wrote: > I cannot manage to survive such annoying workflow for long, so I really hope > someone will manage to track the bug down soon. I suggest perseverance :) despite how tedious this is. Btrfs is more aware of its state than other file systems, so if you give up and go to ext4 it's entirely possible corruption is still happening but you won't know it until there's a lot more damage. At the least if you have to give up I'd suggest XFS and make sure you're using not older than xfsprogs 3.2.3 which will make a V5 file system that uses metadata checksumming by default. -- Chris Murphy ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair 2016-05-07 15:45 ` Niccolò Belli 2016-05-07 15:58 ` Clemens Eisserer @ 2016-05-07 23:35 ` Chris Murphy 1 sibling, 0 replies; 25+ messages in thread From: Chris Murphy @ 2016-05-07 23:35 UTC (permalink / raw) To: Niccolò Belli; +Cc: Btrfs BTRFS, Chris Murphy, Qu Wenruo, Omar Sandoval On Sat, May 7, 2016 at 9:45 AM, Niccolò Belli <darkbasic@linuxsystems.it> wrote: > btrfs + dmcrypt + compress=lzo + autodefrag = corruption at first boot > So discard is not the culprit. Will try to remove compress=lzo and > autodefrag and see if it still happens. You're making the troubleshooting unnecessarily difficult by continuing to use non-default options. *shrug* Every single layer you add complicates the setup and troubleshooting. Of course all of it should work together, many people do. But you're the one having the problem so in order to demonstrate whether this is a software bug or hardware problem, you need to test it with the most basic setup possible --> btrfs on plain partitions and default mount options. -- Chris Murphy ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair 2016-05-04 23:21 btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair Niccolò Belli 2016-05-05 1:07 ` Chris Murphy @ 2016-05-05 4:12 ` Qu Wenruo 1 sibling, 0 replies; 25+ messages in thread From: Qu Wenruo @ 2016-05-05 4:12 UTC (permalink / raw) To: Niccolò Belli, linux-btrfs Niccolò Belli wrote on 2016/05/05 01:21 +0200: > I really need your help, because it's the second time btrfs ate my data > in a couple of days and I can't use my laptop if I don't find the culprit. > > This was the mail I sent a couple of days ago: > https://www.spinics.net/lists/linux-btrfs/msg54754.html Output in that mail shows obvious tree block corruption: checksum verify failed on 245498111 found C7652CC3 wanted 00000000 checksum verify failed on 245498111 found C7652CC3 wanted 00000000 checksum verify failed on 245498111 found C7652CC3 wanted 00000000 checksum verify failed on 245498111 found C7652CC3 wanted 00000000 bytenr mismatch, want=245498111, have=8454382400481263616 That's the root cause of following tons of error. I assume it maybe the same cause this time. > I previously thought the culprit was a bug in kernel 4.6-rc, but I was > wrong. > > Then I reinstalled the whole system (Arch Linux) from scratch, and after > just two days I lost some of my data, again. Once again btrfs check > --repair got stuck in an infinite loop and I can't repair my fs. The > system has always been shutdown properly, except for a single time when > I had to forcedly power it off just after the boot because I didn't see > any signal on the screen. > > First the obvious things: > > - memory is ok > (https://drive.google.com/open?id=0Bwe9Wtc-5xF1VnJ0SE9fT1FZMTg) > - disk is ok > (https://drive.google.com/open?id=0Bwe9Wtc-5xF1NGRhd2daVDRJVGc) > - tlp has SATA_LINKPWR_ON_BAT=max_performance > (https://drive.google.com/open?id=0Bwe9Wtc-5xF1dFAwUE5ETVpNWGM) > - rootfs mount options: > rw,noatime,compress=lzo,ssd,discard,space_cache,autodefrag,subvolid=257,subvol=/@ > > - Command line: BOOT_IMAGE=/@/boot/vmlinuz-linux > root=UUID=4fc2278e-f6e8-4a21-8876-cabbf885bb2e rw rootflags=subvol=@ > cryptdevice=/dev/disk/by-uuid/c7c8f501-507c-4bd2-a80a-8c7360651f02:cryptroot:allow-discards > quiet > - scrub didn't find any error: > $ sudo btrfs scrub status / > scrub status for 4fc2278e-f6e8-4a21-8876-cabbf885bb2e > scrub started at Thu May 5 00:57:30 2016 and finished after > 00:00:45 > total bytes scrubbed: 22.26GiB with 0 errors > > I have the whole rootfs encrypted, including boot. I followed these > steps: > https://wiki.archlinux.org/index.php/Dm-crypt/Encrypting_an_entire_system#Btrfs_subvolumes_with_swap > Would it be OK for you to test your btrfs on a plain ssd, without encryption? I know this suggestion is quite rude, but this would hugely reduce the possible layers we need to investigate. And just as Chris Murphy said, reducing mount option is also a pretty good debugging start point. > > Disk is a SAMSUNG SSD PM851 M.2 2280 256GB (Firmware Version: EXT25D0Q). > Laptop is a Dell XPS 13 9343 QHD+. > Distro is Arch Linux, kernel version is 4.5.1. btrfs-progs is 4.5.2. > > After two days from the previous data loss I finished reinstalling my > distro from scratch, then I decided to do a full backup from a snapshot > using tar. This is what I got while trying to backup my data: > > tar: usr/share/kig/icons/hicolor/32x32/actions/test.png: errore di > lettura al byte 0 leggendo 810 byte: Errore di input/output > tar: usr/share/kig/icons/hicolor/32x32/actions/circlebpd.png: funzione > "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/pointOnLine.png: funzione > "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/bezierN.png: funzione > "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/convexhull.png: funzione > "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/centerofcurvature.png: > funzione "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/en.png: funzione "stat" > non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/circlebps.png: funzione > "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/directrix.png: funzione > "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/beziercurves.png: > funzione "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/segment_midpoint.png: > funzione "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/distance.png: funzione > "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/circlebcl.png: funzione > "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/conicb5p.png: funzione > "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/kig_polygon.png: funzione > "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/conicasymptotes.png: > funzione "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/pointxy.png: funzione > "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/attacher.png: funzione > "stat" non riuscita: Stale file handle > tar: > usr/share/kig/icons/hicolor/32x32/actions/coniclineintersection.png: > funzione "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/vectorsum.png: funzione > "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/rbezier4.png: funzione > "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/ellipsebffp.png: funzione > "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/angle.png: funzione > "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/kig_text.png: funzione > "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/vectordifference.png: > funzione "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/segmentaxis.png: funzione > "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/radicalline.png: funzione > "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/polygonsides.png: > funzione "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/projection.png: funzione > "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/inversion.png: funzione > "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/bezier4.png: funzione > "stat" non riuscita: Stale file handle > tar: > usr/share/kig/icons/hicolor/32x32/actions/equilateralhyperbolab4p.png: > funzione "stat" non riuscita: Stale file handle > tar: usr/share/kig/icons/hicolor/32x32/actions/areaCircle.png: funzione > "stat" non riuscita: Stale file handle > tar: var/lib/samba/private/msg.sock/666: socket ignorato > tar: Uscita con stato di fallimento in base agli errori precedenti > > > [ 3057.008185] BTRFS error (device dm-0): parent transid verify failed > on 528089088 wanted 3458764513820541211 found 283 Tree blocks are again heavily damaged. Wanted transid is super large, definitely not sane. So parent node is already corrupted. Although the child transid, 283 seems quite valid. > [ 3057.008195] BTRFS error (device dm-0): error loading props for ino > 183988 (root 505): -5 > [ 3057.008417] BTRFS error (device dm-0): parent transid verify failed > on 528089088 wanted 3458764513820541211 found 283 > [ 3057.008631] BTRFS error (device dm-0): parent transid verify failed > on 528089088 wanted 3458764513820541211 found 283 > [ 3057.009165] BTRFS error (device dm-0): parent transid verify failed > on 528089088 wanted 3458764513820541211 found 283 > [ 3057.009389] BTRFS error (device dm-0): parent transid verify failed > on 528089088 wanted 3458764513820541211 found 283 > [ 3057.009734] BTRFS error (device dm-0): parent transid verify failed > on 528089088 wanted 3458764513820541211 found 283 > [ 3057.009960] BTRFS error (device dm-0): parent transid verify failed > on 528089088 wanted 3458764513820541211 found 283 > [ 3057.010664] BTRFS error (device dm-0): parent transid verify failed > on 528089088 wanted 3458764513820541211 found 283 > [ 3057.010888] BTRFS error (device dm-0): parent transid verify failed > on 528089088 wanted 3458764513820541211 found 283 > [ 3057.011201] BTRFS error (device dm-0): parent transid verify failed > on 528089088 wanted 3458764513820541211 found 283 > [ 3331.795474] verify_parent_transid: 57 callbacks suppressed > [ 3331.795480] BTRFS error (device dm-0): parent transid verify failed > on 528089088 wanted 3458764513820541211 found 283 > [ 3331.795776] BTRFS error (device dm-0): parent transid verify failed > on 528089088 wanted 3458764513820541211 found 283 > > I made a copy of /dev/mapper/cryptroot with dd on an external drive and > I run btrfs check on it (btrfs-progs 4.5.2): > https://drive.google.com/open?id=0Bwe9Wtc-5xF1SjJacXpMMU5mems (37MB) Checked, but seems the output is truncated? Thanks, Qu > > Then I tried to run btrfs check --repair on it but once again it got > stuck in an infinite loop like this one > (https://www.spinics.net/lists/linux-btrfs/msg54146.html) and after an > hour of looping and several hundreds of MBs of logs I had to kill it. > Here is the log, truncated to 30MB: > https://drive.google.com/open?id=0Bwe9Wtc-5xF1SmRuVUlfeGRES3M > > They are probably not needed but here is snapper -c @ list: > https://drive.google.com/open?id=0Bwe9Wtc-5xF1N0llOFpfVXVwNVk > and btrfs subvolume list -p /: > https://drive.google.com/open?id=0Bwe9Wtc-5xF1andCdWZzeV9VbDg > > This is the link to the whole gdrive directory with all the logs: > https://drive.google.com/open?id=0Bwe9Wtc-5xF1UFltcXhtRmt4YjA > > I really don't know what may be the problem, maybe discard? I can't > think about switching back to ext4 and losing snapshots, transactions, > compression, incremental send/receive backups etc. > I would really love being able to do something to fix it, but I don't > have the slightest idea about what's the problem. Hopefully someone here > will be smarter than me and find the problem, otherwise I will have to > switch to ext4 because I need my laptop to work. > > Thanks, > Niccolò > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2016-05-13 21:54 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-05-04 23:21 btrfs ate my data in just two days, after a fresh install. ram and disk are ok. it still mounts, but I cannot repair Niccolò Belli 2016-05-05 1:07 ` Chris Murphy 2016-05-05 10:36 ` Niccolò Belli 2016-05-05 17:48 ` Omar Sandoval 2016-05-06 11:38 ` Niccolò Belli 2016-05-07 15:45 ` Niccolò Belli 2016-05-07 15:58 ` Clemens Eisserer 2016-05-07 16:11 ` Niccolò Belli 2016-05-08 18:27 ` Patrik Lundquist 2016-05-09 11:52 ` Austin S. Hemmelgarn 2016-05-09 14:53 ` Niccolò Belli 2016-05-09 16:29 ` Zygo Blaxell 2016-05-09 18:21 ` Austin S. Hemmelgarn 2016-05-09 19:18 ` Duncan 2016-05-12 14:35 ` Niccolò Belli 2016-05-12 15:43 ` Austin S. Hemmelgarn 2016-05-13 11:07 ` Niccolò Belli 2016-05-13 11:35 ` Austin S. Hemmelgarn 2016-05-13 12:10 ` Niccolò Belli 2016-05-13 21:54 ` Chris Murphy 2016-05-12 16:48 ` Zygo Blaxell 2016-05-09 19:23 ` Lionel Bouton 2016-05-09 21:30 ` Chris Murphy 2016-05-07 23:35 ` Chris Murphy 2016-05-05 4:12 ` Qu Wenruo
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.