From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757827Ab3BBPUk (ORCPT ); Sat, 2 Feb 2013 10:20:40 -0500 Received: from mx2.fusionio.com ([66.114.96.31]:58028 "EHLO mx2.fusionio.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757563Ab3BBPUh (ORCPT ); Sat, 2 Feb 2013 10:20:37 -0500 X-ASG-Debug-ID: 1359818436-0421b503d5670b0001-xx1T2L X-Barracuda-Envelope-From: clmason@fusionio.com Date: Sat, 2 Feb 2013 10:20:35 -0500 From: Chris Mason To: Arnd Bergmann CC: "linux-kernel@vger.kernel.org" , "linux-btrfs@vger.kernel.org" , "arnd@linaro.org" Subject: Re: Oops when mounting btrfs partition Message-ID: <20130202152035.GA24264@shiny> X-ASG-Orig-Subj: Re: Oops when mounting btrfs partition Mail-Followup-To: Chris Mason , Arnd Bergmann , "linux-kernel@vger.kernel.org" , "linux-btrfs@vger.kernel.org" , "arnd@linaro.org" References: <4028366.UQxPtEU6If@wuerfel> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <4028366.UQxPtEU6If@wuerfel> User-Agent: Mutt/1.5.21 (2011-07-01) X-Barracuda-Connect: mail1.int.fusionio.com[10.101.1.21] X-Barracuda-Start-Time: 1359818436 X-Barracuda-Encrypted: AES128-SHA X-Barracuda-URL: http://10.101.1.181:8000/cgi-mod/mark.cgi X-Barracuda-Spam-Score: 0.41 X-Barracuda-Spam-Status: No, SCORE=0.41 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests=SUBJECT_FUZZY_TION X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.121614 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.41 SUBJECT_FUZZY_TION Attempt to obfuscate words in Subject: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Arnd, First things first, nospace_cache is a safe thing to use. It is slow because it's finding free extents, but it's just a cache and always safe to discard. With your other errors, I'd just mount it readonly and then you won't waste time on atime updates. I'll take a look at the BUG you got during log recovery. We've fixed a few of those during the 3.8 rc cycle. > Feb 1 22:57:37 localhost kernel: [ 8561.599482] Kernel BUG at ffffffffa01fdcf7 [verbose debug info unavailable] > Jan 14 19:18:42 localhost kernel: [1060055.746373] btrfs csum failed ino 15619835 off 454656 csum 2755731641 private 864823192 > Jan 14 19:18:42 localhost kernel: [1060055.746381] btrfs: bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 17, gen 0 > ... > Jan 21 16:35:40 localhost kernel: [1655047.701147] parent transid verify failed on 17006399488 wanted 54700 found 54764 These aren't good. With a few exceptions for really tight races in fsx use cases, csum errors are bad data from the disk. The transid verify failed shows we wanted to find a metadata block from generation 54700 but found 54764 instead: 54700 = 0xD5AC 54764 = 0xD5EC This same bad block comes up a few different times. > Jan 21 16:35:40 localhost kernel: [1655047.752692] btrfs read error corrected: ino 1 off 17006399488 (dev /dev/sdb1 sector 64689288) This shows we pulled from the second copy of this block and got the right answer, and then wrote the right answer to the duplicate. Inode 1 means it was metadata. But for some reason still aborted the transaction. It could have been an EIO on the correction, but the auto correction code in 3.5 did work well. I think your plan to pull the data off and reformat is a good one. I'd also look hard at your ram since drives don't usually send back single bit errors. -chris