* "csum failed" that was not detected by scrub @ 2014-05-02 9:42 Jaap Pieroen 2014-05-02 10:20 ` Duncan 2014-05-02 11:13 ` Shilong Wang 0 siblings, 2 replies; 9+ messages in thread From: Jaap Pieroen @ 2014-05-02 9:42 UTC (permalink / raw) To: linux-btrfs Hi all, I completed a full scrub: root@nasbak:/home/jpieroen# btrfs scrub status /home/ scrub status for 7ca5f38e-308f-43ab-b3ea-31b3bcd11a0d scrub started at Wed Apr 30 08:30:19 2014 and finished after 144131 seconds total bytes scrubbed: 4.76TiB with 0 errors Then tried to remove a device: root@nasbak:/home/jpieroen# btrfs device delete /dev/sdb /home This triggered bug_on, with the following error in dmesg: csum failed ino 258 off 1395560448 csum 2284440321 expected csum 319628859 How can there still be csum failures directly after a scrub? If I rerun the scrub it still won't find any errors. I know this, because I've had the same issue 3 times in a row. Each time running a scrub and still being unable to remove the device. Kind Regards, Jaap -------------------------------------------------------------- Details: root@nasbak:/home/jpieroen# uname -a Linux nasbak 3.14.1-031401-generic #201404141220 SMP Mon Apr 14 16:21:48 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux root@nasbak:/home/jpieroen# btrfs --version Btrfs v3.14.1 root@nasbak:/home/jpieroen# btrfs fi df /home Data, RAID5: total=4.57TiB, used=4.55TiB System, RAID1: total=32.00MiB, used=352.00KiB Metadata, RAID1: total=7.00GiB, used=5.59GiB root@nasbak:/home/jpieroen# btrfs fi show Label: 'btrfs_storage' uuid: 7ca5f38e-308f-43ab-b3ea-31b3bcd11a0d Total devices 6 FS bytes used 4.56TiB devid 1 size 1.82TiB used 1.31TiB path /dev/sde devid 2 size 1.82TiB used 1.31TiB path /dev/sdf devid 3 size 1.82TiB used 1.31TiB path /dev/sdg devid 4 size 931.51GiB used 25.00GiB path /dev/sdb devid 6 size 2.73TiB used 994.03GiB path /dev/sdh devid 7 size 2.73TiB used 994.03GiB path /dev/sdi Btrfs v3.14.1 jpieroen@nasbak:~$ dmesg [227248.656438] BTRFS info (device sdi): relocating block group 9735225016320 flags 129 [227261.713860] BTRFS info (device sdi): found 9 extents [227264.531019] BTRFS info (device sdi): found 9 extents [227265.011826] BTRFS info (device sdi): relocating block group 76265029632 flags 129 [227274.052249] BTRFS info (device sdi): csum failed ino 258 off 1395560448 csum 2284440321 expected csum 319628859 [227274.052354] BTRFS info (device sdi): csum failed ino 258 off 1395564544 csum 3646299263 expected csum 319628859 [227274.052402] BTRFS info (device sdi): csum failed ino 258 off 1395568640 csum 281259278 expected csum 319628859 [227274.052449] BTRFS info (device sdi): csum failed ino 258 off 1395572736 csum 2594807184 expected csum 319628859 [227274.052492] BTRFS info (device sdi): csum failed ino 258 off 1395576832 csum 4288971971 expected csum 319628859 [227274.052537] BTRFS info (device sdi): csum failed ino 258 off 1395580928 csum 752615894 expected csum 319628859 [227274.052581] BTRFS info (device sdi): csum failed ino 258 off 1395585024 csum 3828951500 expected csum 319628859 [227274.061279] ------------[ cut here ]------------ [227274.061354] kernel BUG at /home/apw/COD/linux/fs/btrfs/extent_io.c:2116! [227274.061445] invalid opcode: 0000 [#1] SMP [227274.061509] Modules linked in: cuse deflate [227274.061573] BTRFS info (device sdi): csum failed ino 258 off 1395560448 csum 2284440321 expected csum 319628859 [227274.061707] ctr twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common camellia_generic camellia_x86_64 serpent_sse2_x86_64 xts serpent_generic lrw gf128mul glue_helper blowfish_generic blowfish_x86_64 blowfish_common cast5_generic cast_common ablk_helper cryptd des_generic cmac xcbc rmd160 crypto_null af_key xfrm_algo nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache dm_crypt ip6t_REJECT ppdev xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_comment xt_LOG kvm xt_recent microcode xt_multiport xt_limit xt_tcpudp psmouse serio_raw xt_addrtype k10temp edac_core ipt_MASQUERADE edac_mce_amd iptable_nat nf_nat_ipv4 sp5100_tco nf_conntrack_ipv4 nf_defrag_ipv4 ftdi_sio i2c_piix4 usbserial xt_conntrack ip6table_filter ip6_tables joydev nf_conntrack_netbios_ns nf_conntrack_broadcast snd_hda_codec_via nf_nat_ftp snd_hda_codec_hdmi nf_nat snd_hda_codec_generic nf_conntrack_ftp nf_conntrack snd_hda_intel iptable_filter ir_lirc_codec(OF) lirc_dev(OF) ip_tables snd_hda_codec ir_mce_kbd_decoder(OF) x_tables snd_hwdep ir_sony_decoder(OF) rc_tbs_nec(OF) ir_jvc_decoder(OF) snd_pcm ir_rc6_decoder(OF) ir_rc5_decoder(OF) saa716x_tbs_dvb(OF) tbs6982fe(POF) tbs6680fe(POF) ir_nec_decoder(OF) tbs6923fe(POF) tbs6985se(POF) tbs6928se(POF) tbs6982se(POF) tbs6991fe(POF) tbs6618fe(POF) saa716x_core(OF) tbs6922fe(POF) tbs6928fe(POF) tbs6991se(POF) stv090x(OF) dvb_core(OF) rc_core(OF) snd_timer snd soundcore asus_atk0110 parport_pc shpchp mac_hid lp parport btrfs xor raid6_pq pata_acpi hid_generic usbhid hid usb_storage radeon pata_atiixp r8169 mii i2c_algo_bit sata_sil24 ttm drm_kms_helper drm ahci libahci wmi [227274.064118] CPU: 1 PID: 15543 Comm: btrfs-endio-4 Tainted: PF O 3.14.1-031401-generic #201404141220 [227274.064246] Hardware name: System manufacturer System Product Name/M4A78LT-M, BIOS 0802 08/24/2010 [227274.064368] task: ffff88030a0e31e0 ti: ffff8800a15b8000 task.ti: ffff8800a15b8000 [227274.064467] RIP: 0010:[<ffffffffa0304c33>] [<ffffffffa0304c33>] clean_io_failure+0x1a3/0x1b0 [btrfs] [227274.064623] RSP: 0018:ffff8800a15b9cd8 EFLAGS: 00010246 [227274.064694] RAX: 0000000000000000 RBX: ffff88010b2869b8 RCX: 0000000000000000 [227274.064789] RDX: ffff8802cad30f00 RSI: 00000000720071fe RDI: ffff88010b286884 [227274.064883] RBP: ffff8800a15b9d28 R08: 0000000000000000 R09: 0000000000000000 [227274.064977] R10: 0000000000000200 R11: 0000000000000000 R12: ffffea000102b080 [227274.065071] R13: ffff880004366c00 R14: ffff88010b286800 R15: 00000000532ef000 [227274.065166] FS: 00007f16670b0740(0000) GS:ffff88031fc40000(0000) knlGS:0000000000000000 [227274.065271] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [227274.065348] CR2: 00007f9c5c3b0000 CR3: 00000002dd8a8000 CR4: 00000000000007e0 [227274.065443] Stack: [227274.065471] 00000000000532ef ffff88030a14c000 0000000000000000 ffff880004366c00 [227274.065584] ffff88030a95f780 ffffea000102b080 ffff8801026cc4b0 ffff88010b2869b8 [227274.065697] 00000000532ef000 0000000000000000 ffff8800a15b9db8 ffffffffa0304f1b [227274.065809] Call Trace: [227274.065872] [<ffffffffa0304f1b>] end_bio_extent_readpage+0x2db/0x3d0 [btrfs] [227274.065971] [<ffffffff8120a013>] bio_endio+0x53/0xa0 [227274.066042] [<ffffffff8120a072>] bio_endio_nodec+0x12/0x20 [227274.066137] [<ffffffffa02dde81>] end_workqueue_fn+0x41/0x50 [btrfs] [227274.066243] [<ffffffffa03157d0>] worker_loop+0xa0/0x330 [btrfs] [227274.066345] [<ffffffffa0315730>] ? check_pending_worker_creates.isra.1+0xe0/0xe0 [btrfs] [227274.066455] [<ffffffff8108ffa9>] kthread+0xc9/0xe0 [227274.066522] [<ffffffff8108fee0>] ? flush_kthread_worker+0xb0/0xb0 [227274.066606] [<ffffffff817721bc>] ret_from_fork+0x7c/0xb0 [227274.066680] [<ffffffff8108fee0>] ? flush_kthread_worker+0xb0/0xb0 [227274.066761] Code: 00 00 83 f8 01 0f 8e 49 ff ff ff 49 8b 4d 18 49 8b 55 10 4d 89 e0 45 8b 4d 2c 48 8b 7d b8 4c 89 fe e8 72 fc ff ff e9 29 ff ff ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 [227274.067266] RIP [<ffffffffa0304c33>] clean_io_failure+0x1a3/0x1b0 [btrfs] [227274.067380] RSP <ffff8800a15b9cd8> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "csum failed" that was not detected by scrub 2014-05-02 9:42 "csum failed" that was not detected by scrub Jaap Pieroen @ 2014-05-02 10:20 ` Duncan 2014-05-02 17:48 ` Jaap Pieroen 2014-05-03 13:57 ` "csum failed" that was not detected by scrub Marc MERLIN 2014-05-02 11:13 ` Shilong Wang 1 sibling, 2 replies; 9+ messages in thread From: Duncan @ 2014-05-02 10:20 UTC (permalink / raw) To: linux-btrfs Jaap Pieroen posted on Fri, 02 May 2014 11:42:35 +0200 as excerpted: > I completed a full scrub: > root@nasbak:/home/jpieroen# btrfs scrub status /home/ > scrub status for 7ca5f38e-308f-43ab-b3ea-31b3bcd11a0d > scrub started at Wed Apr 30 08:30:19 2014 > and finished after 144131 seconds > total bytes scrubbed: 4.76TiB with 0 errors > > Then tried to remove a device: > root@nasbak:/home/jpieroen# btrfs device delete /dev/sdb /home > > This triggered bug_on, with the following error in dmesg: csum failed > ino 258 off 1395560448 csum 2284440321 expected csum 319628859 > > How can there still be csum failures directly after a scrub? Simple enough, really... > root@nasbak:/home/jpieroen# btrfs fi df /home > Data, RAID5: total=4.57TiB, used=4.55TiB > System, RAID1: total=32.00MiB, used=352.00KiB > Metadata, RAID1: total=7.00GiB, used=5.59GiB To those that know the details, this tells the story. Btrfs raid5/6 modes are not yet code-complete, and scrub is one of the incomplete bits. btrfs scrub doesn't know how to deal with raid5/6 properly just yet. While the operational bits of raid5/6 support are there, parity is calculated and written, scrub, and recovery from a lost device, are not yet code complete. Thus, it's effectively a slower, lower capacity raid0 without scrub support at this point, except that when the code is complete, you'll get an automatic "free" upgrade to full raid5 or raid6, because the operational bits have been working since they were introduced, just the recovery and scrub bits were bad, making it effectively a raid0 in reliability terms, lose one and you've lost them all. That's the big picture anyway. Marc Merlin recently did quite a bit of raid5/6 testing and there's a page on the wiki now with what he found. Additionally, I saw a scrub support for raid5/6 modes patch on the list recently, but while it may be in integration, I believe it's too new to have reached release yet. Wiki, for memory or bookmark: https://btrfs.wiki.kernel.org Direct user documentation link for bookmark (unwrap as necessary): https://btrfs.wiki.kernel.org/index.php/ Main_Page#Guides_and_usage_information The raid5/6 page (which I didn't otherwise see conveniently linked, I dug it out of the recent changes list since I knew it was there from on-list discussion): https://btrfs.wiki.kernel.org/index.php/RAID56 @ Marc or Hugo or someone with a wiki account: Can this be more visibly linked from the user-docs contents, added to the user docs category list, and probably linked from at least the multiple devices and (for now) the gotchas pages? -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2014-05-02 10:20 ` Duncan @ 2014-05-02 17:48 ` Jaap Pieroen 2014-05-03 3:10 ` btrfs raid56 Was: "csum failed" that was not detected by scrub Duncan 2014-05-03 13:31 ` Frank Holton 2014-05-03 13:57 ` "csum failed" that was not detected by scrub Marc MERLIN 1 sibling, 2 replies; 9+ messages in thread From: Jaap Pieroen @ 2014-05-02 17:48 UTC (permalink / raw) To: linux-btrfs Duncan <1i5t5.duncan <at> cox.net> writes: > > To those that know the details, this tells the story. > > Btrfs raid5/6 modes are not yet code-complete, and scrub is one of the > incomplete bits. btrfs scrub doesn't know how to deal with raid5/6 > properly just yet. > > While the operational bits of raid5/6 support are there, parity is > calculated and written, scrub, and recovery from a lost device, are not > yet code complete. Thus, it's effectively a slower, lower capacity raid0 > without scrub support at this point, except that when the code is > complete, you'll get an automatic "free" upgrade to full raid5 or raid6, > because the operational bits have been working since they were > introduced, just the recovery and scrub bits were bad, making it > effectively a raid0 in reliability terms, lose one and you've lost them > all. > > That's the big picture anyway. Marc Merlin recently did quite a bit of > raid5/6 testing and there's a page on the wiki now with what he found. > Additionally, I saw a scrub support for raid5/6 modes patch on the list > recently, but while it may be in integration, I believe it's too new to > have reached release yet. > > Wiki, for memory or bookmark: https://btrfs.wiki.kernel.org > > Direct user documentation link for bookmark (unwrap as necessary): > > https://btrfs.wiki.kernel.org/index.php/ > Main_Page#Guides_and_usage_information > > The raid5/6 page (which I didn't otherwise see conveniently linked, I dug > it out of the recent changes list since I knew it was there from on-list > discussion): > > https://btrfs.wiki.kernel.org/index.php/RAID56 > > <at> Marc or Hugo or someone with a wiki account: Can this be more visibly > linked from the user-docs contents, added to the user docs category list, > and probably linked from at least the multiple devices and (for now) the > gotchas pages? > So raid5 is much more useless than I assumed. I read Marc's blog and figured that btrfs was ready enough. I' really in trouble now. I tried to get rid of raid5 by doing a convert balance to raid1. But of course this triggered the same issue. And now I have a dead system because the first thing btrfs does after mounting is continue the balance which will crash the system and send me into a vicious loop. - How can I stop btrfs from continuing balancing? - How can I salvage this situation and convert to raid1? Unfortunately I have little spare drives left. Not enough to contain 4.7TiB of data.. :( ^ permalink raw reply [flat|nested] 9+ messages in thread
* btrfs raid56 Was: "csum failed" that was not detected by scrub 2014-05-02 17:48 ` Jaap Pieroen @ 2014-05-03 3:10 ` Duncan 2014-05-03 7:53 ` btrfs raid56 Was: Jaap Pieroen 2014-05-03 13:31 ` Frank Holton 1 sibling, 1 reply; 9+ messages in thread From: Duncan @ 2014-05-03 3:10 UTC (permalink / raw) To: linux-btrfs Jaap Pieroen posted on Fri, 02 May 2014 17:48:13 +0000 as excerpted: > Duncan <1i5t5.duncan <at> cox.net> writes: > > >> To those that know the details, this tells the story. >> >> Btrfs raid5/6 modes are not yet code-complete, and scrub is one of the >> incomplete bits. btrfs scrub doesn't know how to deal with raid5/6 >> properly just yet. >> The raid5/6 page (which I didn't otherwise see conveniently linked, I >> dug it out of the recent changes list since I knew it was there from >> on-list discussion): >> >> https://btrfs.wiki.kernel.org/index.php/RAID56 > So raid5 is much more useless than I assumed. I read Marc's blog and > figured that btrfs was ready enough. > > I' really in trouble now. I tried to get rid of raid5 by doing a convert > balance to raid1. But of course this triggered the same issue. And now I > have a dead system because the first thing btrfs does after mounting is > continue the balance which will crash the system and send me into a > vicious loop. > > - How can I stop btrfs from continuing balancing? That one's easy. See the Documentation/filesystems/btrfs.txt file in the kernel tree or the wiki for btrfs mount options, one of which is "skip_balance", to address this very sort of problem! =:^) Alternatively, mounting it read-only should prevent further changes including the balance, at least allowing you to get the data off the filesystem. > - How can I salvage this situation and convert to raid1? > > Unfortunately I have little spare drives left. Not enough to contain > 4.7TiB of data.. :( [OK, this goes a bit philosophical, but it's something to think about...] If you've done your research and followed the advice of the warnings when you do a mkfs.btrfs or on the wiki, not a problem, since you know that btrfs is still under heavy development and that as a result, it's even more critical to have current tested backups for anything you value anyway. Simply use those backups. Which, by definition, means that if you don't have such backups, you didn't consider the data all that valuable after all, actions perhaps giving the lie to your claims. And no excuse for not doing the research either, since if you really care about your data, you research a filesystem you're not familiar with before trusting your data to it. So again, if you didn't know btrfs was experimental and thus didn't have those backups, by definition your actions say you didn't really care about the data you put on it, no matter what your words might say. OTOH, there *IS* such a thing as not realizing the value of something until you're in the process of losing it... that I do understand. But of course try telling that to, for instance, someone who has just lost a loved one that they never actually /told/ them that... Sometimes it's simply too late. Tho if it's going to happen, at least here I'd much rather it happen to some data, than one of my own loved ones... Anyway, at least for now you should still be able to recover most of the data using skip_balance or read-only mounting. My guess is that if push comes to shove you can either prioritize that data and give up a TiB or two if it comes to that, or scrimp here and there, putting a few gigs on the odd blank DVD you may have lying around or downgrading a few meals to Raman-noodle to afford the $100 or so shipped that pricewatch says a new 3 TB drive costs, these days. I've been there, and have found that if I think I need it bad enough, that $100 has a way of appearing, like I said even if I'm noodling it for a few meals to do it. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: btrfs raid56 Was: 2014-05-03 3:10 ` btrfs raid56 Was: "csum failed" that was not detected by scrub Duncan @ 2014-05-03 7:53 ` Jaap Pieroen 0 siblings, 0 replies; 9+ messages in thread From: Jaap Pieroen @ 2014-05-03 7:53 UTC (permalink / raw) To: linux-btrfs Duncan <1i5t5.duncan <at> cox.net> writes: > > - How can I salvage this situation and convert to raid1? > > > > Unfortunately I have little spare drives left. Not enough to contain > > 4.7TiB of data.. :( > > [OK, this goes a bit philosophical, but it's something to think about...] > > ... > > Anyway, at least for now you should still be able to recover most of the > data using skip_balance or read-only mounting. My guess is that if push > comes to shove you can either prioritize that data and give up a TiB or > two if it comes to that, or scrimp here and there, putting a few gigs on > the odd blank DVD you may have lying around or downgrading a few meals to > Raman-noodle to afford the $100 or so shipped that pricewatch says a new > 3 TB drive costs, these days. I've been there, and have found that if I > think I need it bad enough, that $100 has a way of appearing, like I said > even if I'm noodling it for a few meals to do it. > Thanks for the philosophical response. Both telling me I can't simply convert, and reminding me that this was an outcome I was prepared to face. :) Because you are right. When push comes to shove, it's data I'm prepared to lose. I'm going to hedge my bets and convince the Mrs te let me invest in some new hardware. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2014-05-02 17:48 ` Jaap Pieroen 2014-05-03 3:10 ` btrfs raid56 Was: "csum failed" that was not detected by scrub Duncan @ 2014-05-03 13:31 ` Frank Holton 1 sibling, 0 replies; 9+ messages in thread From: Frank Holton @ 2014-05-03 13:31 UTC (permalink / raw) To: Jaap Pieroen; +Cc: linux-btrfs Hi Jaap, This patch http://www.spinics.net/lists/linux-btrfs/msg33025.html made it into 3.15 RC2 so if you're willing to build your own RC kernel you may have better luck with scrub in 3.15. The patch only scrubs the data blocks in RAID5/6 so hopefully your parity blocks are intact. I'm not sure if it would help any but it may be worth a try. On Fri, May 2, 2014 at 1:48 PM, Jaap Pieroen <jaap@pieroen.nl> wrote: > Duncan <1i5t5.duncan <at> cox.net> writes: > >> >> To those that know the details, this tells the story. >> >> Btrfs raid5/6 modes are not yet code-complete, and scrub is one of the >> incomplete bits. btrfs scrub doesn't know how to deal with raid5/6 >> properly just yet. >> >> While the operational bits of raid5/6 support are there, parity is >> calculated and written, scrub, and recovery from a lost device, are not >> yet code complete. Thus, it's effectively a slower, lower capacity raid0 >> without scrub support at this point, except that when the code is >> complete, you'll get an automatic "free" upgrade to full raid5 or raid6, >> because the operational bits have been working since they were >> introduced, just the recovery and scrub bits were bad, making it >> effectively a raid0 in reliability terms, lose one and you've lost them >> all. >> >> That's the big picture anyway. Marc Merlin recently did quite a bit of >> raid5/6 testing and there's a page on the wiki now with what he found. >> Additionally, I saw a scrub support for raid5/6 modes patch on the list >> recently, but while it may be in integration, I believe it's too new to >> have reached release yet. >> >> Wiki, for memory or bookmark: https://btrfs.wiki.kernel.org >> >> Direct user documentation link for bookmark (unwrap as necessary): >> >> https://btrfs.wiki.kernel.org/index.php/ >> Main_Page#Guides_and_usage_information >> >> The raid5/6 page (which I didn't otherwise see conveniently linked, I dug >> it out of the recent changes list since I knew it was there from on-list >> discussion): >> >> https://btrfs.wiki.kernel.org/index.php/RAID56 >> >> <at> Marc or Hugo or someone with a wiki account: Can this be more visibly >> linked from the user-docs contents, added to the user docs category list, >> and probably linked from at least the multiple devices and (for now) the >> gotchas pages? >> > > So raid5 is much more useless than I assumed. I read Marc's blog and > figured that btrfs was ready enough. > > I' really in trouble now. I tried to get rid of raid5 by doing a convert > balance to raid1. But of course this triggered the same issue. And now > I have a dead system because the first thing btrfs does after mounting > is continue the balance which will crash the system and send me into > a vicious loop. > > - How can I stop btrfs from continuing balancing? > - How can I salvage this situation and convert to raid1? > > Unfortunately I have little spare drives left. Not enough to contain > 4.7TiB of data.. :( > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "csum failed" that was not detected by scrub 2014-05-02 10:20 ` Duncan 2014-05-02 17:48 ` Jaap Pieroen @ 2014-05-03 13:57 ` Marc MERLIN 1 sibling, 0 replies; 9+ messages in thread From: Marc MERLIN @ 2014-05-03 13:57 UTC (permalink / raw) To: Duncan; +Cc: linux-btrfs On Fri, May 02, 2014 at 10:20:03AM +0000, Duncan wrote: > The raid5/6 page (which I didn't otherwise see conveniently linked, I dug It's linked off https://btrfs.wiki.kernel.org/index.php/FAQ#Can_I_use_RAID.5B56.5D_on_my_Btrfs_filesystem.3F > it out of the recent changes list since I knew it was there from on-list > discussion): > > https://btrfs.wiki.kernel.org/index.php/RAID56 > > > @ Marc or Hugo or someone with a wiki account: Can this be more visibly @ Marc relies on a lot for me to see this, never mind at the bottom of a message when my inbox is over 900 and I'm boarding a plane in a few hours ;) More seriously, please Cc me (and I'd say generally others) if you're trying to get their attention. I typically also put one liner at the top to tell the Cced person to look for a bit with their name. > linked from the user-docs contents, added to the user docs category list, > and probably linked from at least the multiple devices and (for now) the > gotchas pages? I added it here https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices note that it's first result on google for raid56. Also raid5 btrfs brings you to https://btrfs.wiki.kernel.org/index.php/FAQ#Case_study:_btrfs-raid_5.2F6_versus_MD-RAID_5.2F6 which also links to the raid56 page. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "csum failed" that was not detected by scrub 2014-05-02 9:42 "csum failed" that was not detected by scrub Jaap Pieroen 2014-05-02 10:20 ` Duncan @ 2014-05-02 11:13 ` Shilong Wang 2014-05-02 17:55 ` Jaap Pieroen 1 sibling, 1 reply; 9+ messages in thread From: Shilong Wang @ 2014-05-02 11:13 UTC (permalink / raw) To: Jaap Pieroen; +Cc: linux-btrfs Hello, 2014-05-02 17:42 GMT+08:00 Jaap Pieroen <jaap@pieroen.nl>: > Hi all, > > I completed a full scrub: > root@nasbak:/home/jpieroen# btrfs scrub status /home/ > scrub status for 7ca5f38e-308f-43ab-b3ea-31b3bcd11a0d > scrub started at Wed Apr 30 08:30:19 2014 and finished after 144131 seconds > total bytes scrubbed: 4.76TiB with 0 errors > > Then tried to remove a device: > root@nasbak:/home/jpieroen# btrfs device delete /dev/sdb /home > > This triggered bug_on, with the following error in dmesg: csum failed > ino 258 off 1395560448 csum 2284440321 expected csum 319628859 > > How can there still be csum failures directly after a scrub? > If I rerun the scrub it still won't find any errors. I know this, > because I've had the same issue 3 times in a row. Each time running a > scrub and still being unable to remove the device. There is a known RAID5/6 bug, i sent a patch to address this problem. Could you please double check if your kernel source includes the following commit: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3b080b2564287be91605bfd1d5ee985696e61d3c RAID5/6 should detect checksum mismatch, it can not fix errors now. Thanks, Wang > > Kind Regards, > Jaap > > -------------------------------------------------------------- > Details: > > root@nasbak:/home/jpieroen# uname -a > Linux nasbak 3.14.1-031401-generic #201404141220 SMP Mon Apr 14 > 16:21:48 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux > > root@nasbak:/home/jpieroen# btrfs --version > Btrfs v3.14.1 > > root@nasbak:/home/jpieroen# btrfs fi df /home > Data, RAID5: total=4.57TiB, used=4.55TiB > System, RAID1: total=32.00MiB, used=352.00KiB > Metadata, RAID1: total=7.00GiB, used=5.59GiB > > root@nasbak:/home/jpieroen# btrfs fi show > Label: 'btrfs_storage' uuid: 7ca5f38e-308f-43ab-b3ea-31b3bcd11a0d > Total devices 6 FS bytes used 4.56TiB > devid 1 size 1.82TiB used 1.31TiB path /dev/sde > devid 2 size 1.82TiB used 1.31TiB path /dev/sdf > devid 3 size 1.82TiB used 1.31TiB path /dev/sdg > devid 4 size 931.51GiB used 25.00GiB path /dev/sdb > devid 6 size 2.73TiB used 994.03GiB path /dev/sdh > devid 7 size 2.73TiB used 994.03GiB path /dev/sdi > > Btrfs v3.14.1 > > jpieroen@nasbak:~$ dmesg > [227248.656438] BTRFS info (device sdi): relocating block group > 9735225016320 flags 129 > [227261.713860] BTRFS info (device sdi): found 9 extents > [227264.531019] BTRFS info (device sdi): found 9 extents > [227265.011826] BTRFS info (device sdi): relocating block group > 76265029632 flags 129 > [227274.052249] BTRFS info (device sdi): csum failed ino 258 off > 1395560448 csum 2284440321 expected csum 319628859 > [227274.052354] BTRFS info (device sdi): csum failed ino 258 off > 1395564544 csum 3646299263 expected csum 319628859 > [227274.052402] BTRFS info (device sdi): csum failed ino 258 off > 1395568640 csum 281259278 expected csum 319628859 > [227274.052449] BTRFS info (device sdi): csum failed ino 258 off > 1395572736 csum 2594807184 expected csum 319628859 > [227274.052492] BTRFS info (device sdi): csum failed ino 258 off > 1395576832 csum 4288971971 expected csum 319628859 > [227274.052537] BTRFS info (device sdi): csum failed ino 258 off > 1395580928 csum 752615894 expected csum 319628859 > [227274.052581] BTRFS info (device sdi): csum failed ino 258 off > 1395585024 csum 3828951500 expected csum 319628859 > [227274.061279] ------------[ cut here ]------------ > [227274.061354] kernel BUG at /home/apw/COD/linux/fs/btrfs/extent_io.c:2116! > [227274.061445] invalid opcode: 0000 [#1] SMP > [227274.061509] Modules linked in: cuse deflate > [227274.061573] BTRFS info (device sdi): csum failed ino 258 off > 1395560448 csum 2284440321 expected csum 319628859 > [227274.061707] ctr twofish_generic twofish_x86_64_3way > twofish_x86_64 twofish_common camellia_generic camellia_x86_64 > serpent_sse2_x86_64 xts serpent_generic lrw gf128mul glue_helper > blowfish_generic blowfish_x86_64 blowfish_common cast5_generic > cast_common ablk_helper cryptd des_generic cmac xcbc rmd160 > crypto_null af_key xfrm_algo nfsd auth_rpcgss nfs_acl nfs lockd sunrpc > fscache dm_crypt ip6t_REJECT ppdev xt_hl ip6t_rt nf_conntrack_ipv6 > nf_defrag_ipv6 ipt_REJECT xt_comment xt_LOG kvm xt_recent microcode > xt_multiport xt_limit xt_tcpudp psmouse serio_raw xt_addrtype k10temp > edac_core ipt_MASQUERADE edac_mce_amd iptable_nat nf_nat_ipv4 > sp5100_tco nf_conntrack_ipv4 nf_defrag_ipv4 ftdi_sio i2c_piix4 > usbserial xt_conntrack ip6table_filter ip6_tables joydev > nf_conntrack_netbios_ns nf_conntrack_broadcast snd_hda_codec_via > nf_nat_ftp snd_hda_codec_hdmi nf_nat snd_hda_codec_generic > nf_conntrack_ftp nf_conntrack snd_hda_intel iptable_filter > ir_lirc_codec(OF) lirc_dev(OF) ip_tables snd_hda_codec > ir_mce_kbd_decoder(OF) x_tables snd_hwdep ir_sony_decoder(OF) > rc_tbs_nec(OF) ir_jvc_decoder(OF) snd_pcm ir_rc6_decoder(OF) > ir_rc5_decoder(OF) saa716x_tbs_dvb(OF) tbs6982fe(POF) tbs6680fe(POF) > ir_nec_decoder(OF) tbs6923fe(POF) tbs6985se(POF) tbs6928se(POF) > tbs6982se(POF) tbs6991fe(POF) tbs6618fe(POF) saa716x_core(OF) > tbs6922fe(POF) tbs6928fe(POF) tbs6991se(POF) stv090x(OF) dvb_core(OF) > rc_core(OF) snd_timer snd soundcore asus_atk0110 parport_pc shpchp > mac_hid lp parport btrfs xor raid6_pq pata_acpi hid_generic usbhid hid > usb_storage radeon pata_atiixp r8169 mii i2c_algo_bit sata_sil24 ttm > drm_kms_helper drm ahci libahci wmi > [227274.064118] CPU: 1 PID: 15543 Comm: btrfs-endio-4 Tainted: PF > O 3.14.1-031401-generic #201404141220 > [227274.064246] Hardware name: System manufacturer System Product > Name/M4A78LT-M, BIOS 0802 08/24/2010 > [227274.064368] task: ffff88030a0e31e0 ti: ffff8800a15b8000 task.ti: > ffff8800a15b8000 > [227274.064467] RIP: 0010:[<ffffffffa0304c33>] [<ffffffffa0304c33>] > clean_io_failure+0x1a3/0x1b0 [btrfs] > [227274.064623] RSP: 0018:ffff8800a15b9cd8 EFLAGS: 00010246 > [227274.064694] RAX: 0000000000000000 RBX: ffff88010b2869b8 RCX: > 0000000000000000 > [227274.064789] RDX: ffff8802cad30f00 RSI: 00000000720071fe RDI: > ffff88010b286884 > [227274.064883] RBP: ffff8800a15b9d28 R08: 0000000000000000 R09: > 0000000000000000 > [227274.064977] R10: 0000000000000200 R11: 0000000000000000 R12: > ffffea000102b080 > [227274.065071] R13: ffff880004366c00 R14: ffff88010b286800 R15: > 00000000532ef000 > [227274.065166] FS: 00007f16670b0740(0000) GS:ffff88031fc40000(0000) > knlGS:0000000000000000 > [227274.065271] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [227274.065348] CR2: 00007f9c5c3b0000 CR3: 00000002dd8a8000 CR4: > 00000000000007e0 > [227274.065443] Stack: > [227274.065471] 00000000000532ef ffff88030a14c000 0000000000000000 > ffff880004366c00 > [227274.065584] ffff88030a95f780 ffffea000102b080 ffff8801026cc4b0 > ffff88010b2869b8 > [227274.065697] 00000000532ef000 0000000000000000 ffff8800a15b9db8 > ffffffffa0304f1b > [227274.065809] Call Trace: > [227274.065872] [<ffffffffa0304f1b>] > end_bio_extent_readpage+0x2db/0x3d0 [btrfs] > [227274.065971] [<ffffffff8120a013>] bio_endio+0x53/0xa0 > [227274.066042] [<ffffffff8120a072>] bio_endio_nodec+0x12/0x20 > [227274.066137] [<ffffffffa02dde81>] end_workqueue_fn+0x41/0x50 [btrfs] > [227274.066243] [<ffffffffa03157d0>] worker_loop+0xa0/0x330 [btrfs] > [227274.066345] [<ffffffffa0315730>] ? > check_pending_worker_creates.isra.1+0xe0/0xe0 [btrfs] > [227274.066455] [<ffffffff8108ffa9>] kthread+0xc9/0xe0 > [227274.066522] [<ffffffff8108fee0>] ? flush_kthread_worker+0xb0/0xb0 > [227274.066606] [<ffffffff817721bc>] ret_from_fork+0x7c/0xb0 > [227274.066680] [<ffffffff8108fee0>] ? flush_kthread_worker+0xb0/0xb0 > [227274.066761] Code: 00 00 83 f8 01 0f 8e 49 ff ff ff 49 8b 4d 18 49 > 8b 55 10 4d 89 e0 45 8b 4d 2c 48 8b 7d b8 4c 89 fe e8 72 fc ff ff e9 > 29 ff ff ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 > 48 89 > [227274.067266] RIP [<ffffffffa0304c33>] clean_io_failure+0x1a3/0x1b0 [btrfs] > [227274.067380] RSP <ffff8800a15b9cd8> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "csum failed" that was not detected by scrub 2014-05-02 11:13 ` Shilong Wang @ 2014-05-02 17:55 ` Jaap Pieroen 0 siblings, 0 replies; 9+ messages in thread From: Jaap Pieroen @ 2014-05-02 17:55 UTC (permalink / raw) To: linux-btrfs Shilong Wang <wangshilong1991 <at> gmail.com> writes: > > Hello, > > There is a known RAID5/6 bug, i sent a patch to address this problem. > Could you please double check if your kernel source includes the > following commit: > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/? id=3b080b2564287be91605bfd1d5ee985696e61d3c > > RAID5/6 should detect checksum mismatch, it can not fix errors now. > > Thanks, > Wang Your patch seems to be in 3.15rc1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.15-rc1-trusty/CHANGES I tried rc3 but that made my system crash on boot.. I'm having bad luck ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-05-03 13:57 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-05-02 9:42 "csum failed" that was not detected by scrub Jaap Pieroen 2014-05-02 10:20 ` Duncan 2014-05-02 17:48 ` Jaap Pieroen 2014-05-03 3:10 ` btrfs raid56 Was: "csum failed" that was not detected by scrub Duncan 2014-05-03 7:53 ` btrfs raid56 Was: Jaap Pieroen 2014-05-03 13:31 ` Frank Holton 2014-05-03 13:57 ` "csum failed" that was not detected by scrub Marc MERLIN 2014-05-02 11:13 ` Shilong Wang 2014-05-02 17:55 ` Jaap Pieroen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.