linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* btrfs Bug?
@ 2010-04-06 17:08 yoosty69
  2010-04-08 14:04 ` Chris Mason
  0 siblings, 1 reply; 7+ messages in thread
From: yoosty69 @ 2010-04-06 17:08 UTC (permalink / raw)
  To: linux-btrfs

Background:
Was checking E-mail and browsing the internet when suddenly Pidgin crashed out. I thought that was pretty weird so I went to go re-start Pidgin when I noticed the machine hang really hard for about 30 seconds. The machine finally came back and that's when I noticed that my E-mail client (Claws Mail) had stopped responding. I 'touch'ed a file in my home dir and that was fine, but then I went to md5sum a large file and it came back with an I/O error. I ran dmesg and found that there had been a kernel dump (or whatever the proper term is) related to BTRFS. I went to shut down my programs gracefully and do a reboot, unfortunately none of my programs (FF, Pidgin, Claws-Mail, one or two others) wanted to respond so I just used the power-button.

I switched my Intel X-25M (2nd gen, latest FW as of about a month ago) to a different SATA cable and on a different port on the motherboard (Supermicro C2SBX) to see if there was some sort of hardware problem there. I booted again into Gentoo and the boot failed (I'm guessing it failed after trying to mount the root partition as RO the first time).

I booted in to System Rescue CD 1.5.1 and tried to mount the partition and mount returned with a SegFault and dmesg spit out the following:

[code]

[   75.218065] device label root devid 1 transid 4446 /dev/sda3
[   75.225843] btrfs: sda3 checksum verify failed on 42488987648 wanted FC733AC3 found F7794308 level 1
[   75.226049] btrfs: sda3 checksum verify failed on 42488987648 wanted FC733AC3 found F7794308 level 1
[   75.226238] btrfs: sda3 checksum verify failed on 42488987648 wanted FC733AC3 found F7794308 level 1
[   75.226271] Btrfs detected SSD devices, enabling SSD mode
[   75.226490] ------------[ cut here ]------------
[   75.226492] kernel BUG at fs/btrfs/extent-tree.c:3541!
[   75.226494] invalid opcode: 0000 [#1] SMP
[   75.226497] last sysfs file: /sys/kernel/uevent_seqnum
[   75.226499] CPU 0
[   75.226500] Modules linked in: video nvidiafb output shpchp pci_hotplug hid_apple i2c_i801 processor button container i2c_core pcspkr psmouse serio_raw vgastate evdev iTCO_wdt iTCO_vendor_support x38_edac edac_core raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 raid0 multipath linear md_mod sg sd_mod sr_mod crc_t10dif cdrom usbhid hid uhci_hcd ahci libata e1000e ehci_hcd scsi_mod thermal usbcore thermal_sys
[   75.226534] Pid: 1804, comm: mount Not tainted 2.6.32.10-std151-amd64 #1 C2SBX
[   75.226536] RIP: 0010:[<ffffffff81298c44>]  [<ffffffff81298c44>] btrfs_pin_extent+0x28/0xab
[   75.226545] RSP: 0018:ffff88013abeba48  EFLAGS: 00010246
[   75.226547] RAX: 0000000000000000 RBX: 00000009e492c000 RCX: 00000007c1bfffff
[   75.226549] RDX: 0000000000000000 RSI: ffff88013a93e000 RDI: 0000000040000000
[   75.226552] RBP: 0000000000001000 R08: ffff88013abebb68 R09: 0000000000080050
[   75.226554] R10: 000000000000027c R11: 00000000000338c6 R12: ffff88013a414000
[   75.226556] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff812cb15f
[   75.226564] FS:  0000000000000000(0000) GS:ffff880005400000(0063) knlGS:00000000f75e4b60
[   75.226566] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
[   75.226568] CR2: 00000000f76b2890 CR3: 000000013b324000 CR4: 00000000000006f0
[   75.226570] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   75.226572] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   75.226574] Process mount (pid: 1804, threadinfo ffff88013abea000, task ffff88013ab71500)
[   75.226575] Stack:
[   75.226576]  ffff880134781a20 ffff88013abebb68 000000000000115f 0000000000001000
[   75.226579] <0> ffff88013abebb68 ffffffff812cb18b ffff88013abebb14 ffff880134781b40
[   75.226582] <0> ffff88013dd8f800 ffffffff812caa63 fffffffffffffffa 00000009e492c000
[   75.226585] Call Trace:
[   75.226589]  [<ffffffff812cb18b>] ? process_one_buffer+0x2c/0x5e
[   75.226592]  [<ffffffff812caa63>] ? walk_down_log_tree+0x2c3/0x362
[   75.226595]  [<ffffffff812cab7a>] ? walk_log_tree+0x78/0x183
[   75.226598]  [<ffffffff812a723f>] ? join_transaction+0x174/0x1a0
[   75.226601]  [<ffffffff812ce073>] ? btrfs_recover_log_trees+0x92/0x283
[   75.226603]  [<ffffffff812a2b14>] ? btree_get_extent+0x0/0x18b
[   75.226606]  [<ffffffff812cb15f>] ? process_one_buffer+0x0/0x5e
[   75.226609]  [<ffffffff812a2a62>] ? btree_read_extent_buffer_pages+0x65/0xa3
[   75.226612]  [<ffffffff812a6362>] ? open_ctree+0xee5/0x1137
[   75.226615]  [<ffffffff8133a08d>] ? vsnprintf+0x3f4/0x42d
[   75.226619]  [<ffffffff8128fe79>] ? btrfs_get_sb+0x1ad/0x3a2
[   75.226623]  [<ffffffff810ecbb8>] ? vfs_kern_mount+0x96/0x15b
[   75.226626]  [<ffffffff810eccdc>] ? do_kern_mount+0x49/0xe7
[   75.226629]  [<ffffffff8110029c>] ? do_mount+0x73e/0x7a4
[   75.226633]  [<ffffffff8111ce06>] ? compat_sys_mount+0x1f6/0x231
[   75.226636]  [<ffffffff81037472>] ? ia32_sysret+0x0/0x5
[   75.226637] Code: 41 5d c3 41 56 41 55 41 89 cd 41 54 55 48 89 d5 53 4c 8b a7 28 01 00 00 48 89 f3 4c 89 e7 e8 d7 e1 ff ff 48 85 c0 49 89 c6 75 04 <0f> 0b eb fe 48 8b b8 90 00 00 00 48 81 c7 b8 00 00 00 e8 65 de
[   75.226658] RIP  [<ffffffff81298c44>] btrfs_pin_extent+0x28/0xab
[   75.226662]  RSP <ffff88013abeba48>
[   75.226664] ---[ end trace 0ab19e2d653aad66 ]---
root@sysresccd /root %
[/code]

I tried to mount it again to see if I got a different error but then the machine hung and never came back from it's vacation.

After another hard restart I tried mounting the filesystem again and got the same exact kernel dump (AFAICT anyways, I'm sure then memory locations are different but the rest looks the same).

After yet another hard restart (the previous mount attempt didn't want to let me do a reboot) I tried a btrfsck and it spit out basically the same checksum errors:

[code]

checksum verify failed on 42488987648 wanted FC733AC3 found F7794308
checksum verify failed on 42488987648 wanted FC733AC3 found F7794308
checksum verify failed on 42488987648 wanted FC733AC3 found F7794308

[/code]

and then btrfsck segfaulted. I got the following as I attempted the btrfsck from 'tail -f /var/log/messages'

[code]

Apr  5 18:31:10 sysresccd kernel: [  262.847992] btrfsck[1849]: segfault at a8 ip 0000000008054269 sp 00000000ffc862a0 error 4 in btrfsck[8048000+1c000]

[/code]

I'm using gentoo-sources-2.6.33, and System Rescue CD 1.5.1 uses 2.6.32.10.


   Justin


____________________________________________________________
Get Free Email with Video Mail & Video Chat!
http://www.netzero.net/freeemail?refcd=NZTAGOUT1FREM0210

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: btrfs Bug?
  2010-04-06 17:08 btrfs Bug? yoosty69
@ 2010-04-08 14:04 ` Chris Mason
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Mason @ 2010-04-08 14:04 UTC (permalink / raw)
  To: yoosty69; +Cc: linux-btrfs

On Tue, Apr 06, 2010 at 05:08:12PM +0000, yoosty69@netzero.com wrote:
> Background:
> Was checking E-mail and browsing the internet when suddenly Pidgin crashed out. I thought that was pretty weird so I went to go re-start Pidgin when I noticed the machine hang really hard for about 30 seconds. The machine finally came back and that's when I noticed that my E-mail client (Claws Mail) had stopped responding. I 'touch'ed a file in my home dir and that was fine, but then I went to md5sum a large file and it came back with an I/O error. I ran dmesg and found that there had been a kernel dump (or whatever the proper term is) related to BTRFS. I went to shut down my programs gracefully and do a reboot, unfortunately none of my programs (FF, Pidgin, Claws-Mail, one or two others) wanted to respond so I just used the power-button.
> 
> I switched my Intel X-25M (2nd gen, latest FW as of about a month ago) to a different SATA cable and on a different port on the motherboard (Supermicro C2SBX) to see if there was some sort of hardware problem there. I booted again into Gentoo and the boot failed (I'm guessing it failed after trying to mount the root partition as RO the first time).
> 
> I booted in to System Rescue CD 1.5.1 and tried to mount the partition and mount returned with a SegFault and dmesg spit out the following:
> 
> [code]
> 
> [   75.218065] device label root devid 1 transid 4446 /dev/sda3
> [   75.225843] btrfs: sda3 checksum verify failed on 42488987648 wanted FC733AC3 found F7794308 level 1
> [   75.226049] btrfs: sda3 checksum verify failed on 42488987648 wanted FC733AC3 found F7794308 level 1
> [   75.226238] btrfs: sda3 checksum verify failed on 42488987648 wanted FC733AC3 found F7794308 level 1

Ok, this checksum verify failed means the block was corrupted.  Do you
still have this image?  We can pull the data off that one block and see
what was really there.

The crc errors are most likely from an error on the drive.  But I can
help you pull the data off if you haven't already reformatted.

-chris

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: btrfs Bug?
  2010-04-10  3:10 Justin
@ 2010-04-12 14:03 ` Chris Mason
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Mason @ 2010-04-12 14:03 UTC (permalink / raw)
  To: Justin; +Cc: linux-btrfs

On Sat, Apr 10, 2010 at 03:10:55AM +0000, Justin wrote:
> As far as I know TRIM was enabled. I didn't forcibly disable it and I'm under the assumption that btrfs enables it when an SSD is detected.

Btrfs won't use trim unless you do mount -o discard.  So, if you weren't
doing this that wasn't the cause.

-chris

> 
> 
> ---------- Original Message ----------
> From: Chris Mason <chris.mason@oracle.com>
> To: Justin <yoosty69@netzero.com>
> Cc: linux-btrfs@vger.kernel.org
> Subject: Re: btrfs Bug?
> Date: Fri, 9 Apr 2010 07:18:44 -0400
> 
> On Thu, Apr 08, 2010 at 06:46:40PM +0000, Justin wrote:
> > Unfortunately I did reformat.
> > Actually, I did a complete zero-out of the drive with dd, and then I ran "badblocks -w" on the drive, which returned 0 bad blocks (not sure if this is really a good test for SSD's as there's some amount of internal voo-doo on the drive itself).
> > 
> > For future reference, how would I go about getting an image of the drive without being able to use btrfs-image?
> 
> Well, we'll have to fixup btrfs-image to make it more tolerant of
> errors.  It needs options to skip corrupted sections of the btree and
> encode what it can.
> 
> In this case, I would have had you run btrfs-map-logical, which will
> just read the one bad block and save its contents.
> 
> We've had cases on ssd where every other byte was ff, so I was curious
> how the bad block looked on your intel.
> 
> Were you running with trim enabled?
> 
> -chris
> 
> 
> 
> ____________________________________________________________
> Penny Stock Jumping 2000%
> Sign up to the #1 voted penny stock newsletter for free today!
> http://thirdpartyoffers.netzero.net/TGL3231/4bbfec7c95e751a3b6bst04vuc
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: btrfs Bug?
@ 2010-04-10  3:10 Justin
  2010-04-12 14:03 ` Chris Mason
  0 siblings, 1 reply; 7+ messages in thread
From: Justin @ 2010-04-10  3:10 UTC (permalink / raw)
  To: chris.mason; +Cc: linux-btrfs

As far as I know TRIM was enabled. I didn't forcibly disable it and I'm under the assumption that btrfs enables it when an SSD is detected.


---------- Original Message ----------
From: Chris Mason <chris.mason@oracle.com>
To: Justin <yoosty69@netzero.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: btrfs Bug?
Date: Fri, 9 Apr 2010 07:18:44 -0400

On Thu, Apr 08, 2010 at 06:46:40PM +0000, Justin wrote:
> Unfortunately I did reformat.
> Actually, I did a complete zero-out of the drive with dd, and then I ran "badblocks -w" on the drive, which returned 0 bad blocks (not sure if this is really a good test for SSD's as there's some amount of internal voo-doo on the drive itself).
> 
> For future reference, how would I go about getting an image of the drive without being able to use btrfs-image?

Well, we'll have to fixup btrfs-image to make it more tolerant of
errors.  It needs options to skip corrupted sections of the btree and
encode what it can.

In this case, I would have had you run btrfs-map-logical, which will
just read the one bad block and save its contents.

We've had cases on ssd where every other byte was ff, so I was curious
how the bad block looked on your intel.

Were you running with trim enabled?

-chris



____________________________________________________________
Penny Stock Jumping 2000%
Sign up to the #1 voted penny stock newsletter for free today!
http://thirdpartyoffers.netzero.net/TGL3231/4bbfec7c95e751a3b6bst04vuc

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: btrfs Bug?
  2010-04-08 18:46 Justin
@ 2010-04-09 11:18 ` Chris Mason
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Mason @ 2010-04-09 11:18 UTC (permalink / raw)
  To: Justin; +Cc: linux-btrfs

On Thu, Apr 08, 2010 at 06:46:40PM +0000, Justin wrote:
> Unfortunately I did reformat.
> Actually, I did a complete zero-out of the drive with dd, and then I ran "badblocks -w" on the drive, which returned 0 bad blocks (not sure if this is really a good test for SSD's as there's some amount of internal voo-doo on the drive itself).
> 
> For future reference, how would I go about getting an image of the drive without being able to use btrfs-image?

Well, we'll have to fixup btrfs-image to make it more tolerant of
errors.  It needs options to skip corrupted sections of the btree and
encode what it can.

In this case, I would have had you run btrfs-map-logical, which will
just read the one bad block and save its contents.

We've had cases on ssd where every other byte was ff, so I was curious
how the bad block looked on your intel.

Were you running with trim enabled?

-chris


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: btrfs Bug?
@ 2010-04-08 18:46 Justin
  2010-04-09 11:18 ` Chris Mason
  0 siblings, 1 reply; 7+ messages in thread
From: Justin @ 2010-04-08 18:46 UTC (permalink / raw)
  To: chris.mason; +Cc: linux-btrfs

Unfortunately I did reformat.
Actually, I did a complete zero-out of the drive with dd, and then I ran "badblocks -w" on the drive, which returned 0 bad blocks (not sure if this is really a good test for SSD's as there's some amount of internal voo-doo on the drive itself).

For future reference, how would I go about getting an image of the drive without being able to use btrfs-image?


   Justin


____________________________________________________________
Car Insurance 18.29/Month
Get car insurance for as low as $18.29 a month.
http://thirdpartyoffers.netzero.net/TGL3231/4bbe24ec3a7bb18d3e5st01vuc

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: btrfs Bug?
@ 2010-04-07  0:37 yoosty69
  0 siblings, 0 replies; 7+ messages in thread
From: yoosty69 @ 2010-04-07  0:37 UTC (permalink / raw)
  To: yoosty69; +Cc: linux-btrfs

After reading around a bit on the btrfs wiki (the Getting_Started page and Gotchas page specifically) I found that I might be able to at least capture an image of the drive in case any devs needed to take a look at it; unfortunately btrfs-image failed with the same error.
I deduced that a repair of the FS requires it to be mounted with "mount -o degraded <dev> <mount_point>", but trying to mount in degraded mode also failed with the same error.
Not too sure where to go from here except shedding a tear for the few files that I didn't have backed up and starting over (or returning the drive? badblocks didn't return anything however..).


   Justin


____________________________________________________________
Get Free Email with Video Mail & Video Chat!
http://www.netzero.net/freeemail?refcd=NZTAGOUT1FREM0210

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-04-12 14:03 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-06 17:08 btrfs Bug? yoosty69
2010-04-08 14:04 ` Chris Mason
2010-04-07  0:37 yoosty69
2010-04-08 18:46 Justin
2010-04-09 11:18 ` Chris Mason
2010-04-10  3:10 Justin
2010-04-12 14:03 ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).