linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* volume broken? btrfsck fails
@ 2010-06-26 22:15 Yee-Ting Li
  2010-07-01 12:51 ` Daniel Kozlowski
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Yee-Ting Li @ 2010-06-26 22:15 UTC (permalink / raw)
  To: The development of BTRFS

Hi,

i think my btrfs volume is hosed.... it mounts okay, but iostat shows /dev/sdg on 100% load. dmesg shows lots of 'parent transid verify failed on x wanted y found z'. then after a while i can't read from it (access to the filesystem freezes).

the machine had crashed (prob from some other process), and upon reboot i've been experience this problem since.

can anyone provide any guidance in how to proceed?

cheers,

Yee.

$ sudo /usr/local/bin/btrfs-show 
failed to read /dev/sr0

Label: none  uuid: ea7ea0b3-bc42-4b0c-9173-346df61d4454
	Total devices 3 FS bytes used 3.56TB
	devid    3 size 1.82TB used 0.00 path /dev/sde
	devid    1 size 1.82TB used 1.82TB path /dev/sdf
	devid    2 size 1.82TB used 1.82TB path /dev/sdg

Btrfs v0.19-16-g075587c


$ sudo /usr/local/bin/btrfsck /dev/sdf 
failed to read /dev/sr0
parent transid verify failed on 2703873638400 wanted 9074 found 9016
parent transid verify failed on 2703884750848 wanted 9074 found 9055
parent transid verify failed on 2703884763136 wanted 9074 found 9060
parent transid verify failed on 2703883599872 wanted 9074 found 9034
parent transid verify failed on 2703920717824 wanted 9066 found 7543
parent transid verify failed on 2703912325120 wanted 9066 found 7543
parent transid verify failed on 2703912034304 wanted 9066 found 7543
parent transid verify failed on 2703881900032 wanted 9071 found 9060
parent transid verify failed on 2703881793536 wanted 9069 found 9057
bad block 2703860367360
Extent back ref already exists for 2703873536000 parent 0 root 2 
bad block 2703860621312
bad block 2703861547008
Extent back ref already exists for 2703876689920 parent 0 root 2 
Extent back ref already exists for 2703881900032 parent 0 root 2 
Extent back ref already exists for 2703879290880 parent 0 root 2 
Extent back ref already exists for 2703873753088 parent 0 root 2 
parent transid verify failed on 2703921885184 wanted 9066 found 7543
parent transid verify failed on 2703921889280 wanted 9066 found 7543
parent transid verify failed on 2703879036928 wanted 9069 found 9061
parent transid verify failed on 2703881867264 wanted 9075 found 9065
parent transid verify failed on 2703873536000 wanted 9074 found 9062
parent transid verify failed on 2703883190272 wanted 9075 found 9061
parent transid verify failed on 2703869997056 wanted 9073 found 9060
parent transid verify failed on 2703922012160 wanted 9066 found 7543
parent transid verify failed on 2703921975296 wanted 9066 found 7543
parent transid verify failed on 2703867707392 wanted 9071 found 9060
parent transid verify failed on 2703922679808 wanted 9066 found 7543
parent transid verify failed on 2703922032640 wanted 9066 found 7543
parent transid verify failed on 2703881891840 wanted 9075 found 9057
parent transid verify failed on 2703882297344 wanted 9075 found 9061
parent transid verify failed on 2703884488704 wanted 9074 found 9057
parent transid verify failed on 2703884353536 wanted 9074 found 9057
parent transid verify failed on 2703884365824 wanted 9074 found 9055
parent transid verify failed on 2703921500160 wanted 9066 found 7543
parent transid verify failed on 2703883177984 wanted 9075 found 9061
parent transid verify failed on 2703921487872 wanted 9066 found 7543
parent transid verify failed on 2703922683904 wanted 9066 found 7543
parent transid verify failed on 2703873753088 wanted 9074 found 9062
parent transid verify failed on 2703874314240 wanted 9074 found 9056
Extent back ref already exists for 2703865823232 parent 0 root 2 
Extent back ref already exists for 2703866810368 parent 0 root 2 
Extent back ref already exists for 2703866986496 parent 0 root 2 
Extent back ref already exists for 2703867031552 parent 0 root 2 
Extent back ref already exists for 2703867625472 parent 0 root 2 
Extent back ref already exists for 2703867609088 parent 0 root 2 
Extent back ref already exists for 2703868829696 parent 0 root 2 
Extent back ref already exists for 2703869734912 parent 0 root 2 
Extent back ref already exists for 2703870255104 parent 0 root 2 
Extent back ref already exists for 2703870562304 parent 0 root 2 
Extent back ref already exists for 2703871201280 parent 0 root 2 
Extent back ref already exists for 2703871168512 parent 0 root 2 
Extent back ref already exists for 2703873040384 parent 0 root 2 
Extent back ref already exists for 2703872610304 parent 0 root 2 
Extent back ref already exists for 2703874686976 parent 0 root 2 
Extent back ref already exists for 2703873318912 parent 0 root 2 
Extent back ref already exists for 2703873740800 parent 0 root 2 
Extent back ref already exists for 2703874465792 parent 0 root 2 
Extent back ref already exists for 2703876370432 parent 0 root 2 
Extent back ref already exists for 2703877046272 parent 0 root 2 
Extent back ref already exists for 2703877050368 parent 0 root 2 
Extent back ref already exists for 2703878647808 parent 0 root 2 
Extent back ref already exists for 2703876407296 parent 0 root 2 
Extent back ref already exists for 2703872782336 parent 0 root 2 
Extent back ref already exists for 2703907266560 parent 0 root 2 
Extent back ref already exists for 2703906869248 parent 0 root 2 
Extent back ref already exists for 2703907241984 parent 0 root 2 
Extent back ref already exists for 2703907553280 parent 0 root 2 
Extent back ref already exists for 2703907942400 parent 0 root 2 
Extent back ref already exists for 2703910154240 parent 0 root 2 
Extent back ref already exists for 2703915515904 parent 0 root 2 
Extent back ref already exists for 2703916965888 parent 0 root 2 
Extent back ref already exists for 2703875280896 parent 0 root 2 
Extent back ref already exists for 2703878635520 parent 0 root 2 
Extent back ref already exists for 2221635985408 parent 0 root 2 
Extent back ref already exists for 2703883841536 parent 0 root 2 
Extent back ref already exists for 2703882489856 parent 0 root 2 
Extent back ref already exists for 2703883186176 parent 0 root 2 
Extent back ref already exists for 2221711962112 parent 0 root 2 
parent transid verify failed on 2703875964928 wanted 9066 found 9064
parent transid verify failed on 2703920701440 wanted 9066 found 7543
parent transid verify failed on 2703921225728 wanted 9066 found 7543
parent transid verify failed on 2703919247360 wanted 9066 found 7543
parent transid verify failed on 2703921467392 wanted 9066 found 7543
parent transid verify failed on 2703919116288 wanted 9066 found 7543
parent transid verify failed on 2703920193536 wanted 9066 found 7543
leaf parent key incorrect 2703862099968
bad block 2703862099968
parent transid verify failed on 2703869194240 wanted 9069 found 9062
parent transid verify failed on 2703872065536 wanted 9075 found 9060
leaf parent key incorrect 2703865634816
bad block 2703865634816
parent transid verify failed on 2703872434176 wanted 9077 found 9059
leaf parent key incorrect 2703868116992
bad block 2703868116992
leaf parent key incorrect 2703869460480
bad block 2703869460480
parent transid verify failed on 2703878242304 wanted 9075 found 9065
leaf parent key incorrect 2703871660032
bad block 2703871660032
leaf parent key incorrect 2703872061440
bad block 2703872061440
bad block 2703873073152
parent transid verify failed on 2703873613824 wanted 9077 found 9025
bad block 2703873536000
bad block 2703876689920
leaf parent key incorrect 2703877709824
bad block 2703877709824
parent transid verify failed on 2703897231360 wanted 9077 found 9061
parent transid verify failed on 2703901822976 wanted 9077 found 9061
parent transid verify failed on 2703879938048 wanted 9075 found 9065
leaf parent key incorrect 2703879299072
bad block 2703879299072
bad block 2703881900032
leaf parent key incorrect 2703882805248
bad block 2703882805248
Extent back ref already exists for 2703885160448 parent 0 root 2 
leaf parent key incorrect 2703883829248
bad block 2703883829248
parent transid verify failed on 2703878213632 wanted 9077 found 9061
bad block 2703896338432
Extent back ref already exists for 531120128 parent 0 root 2 
Extent back ref already exists for 3624028745728 parent 0 root 2 
Extent back ref already exists for 458403840 parent 0 root 2 
Extent back ref already exists for 3624039575552 parent 0 root 2 
Extent back ref already exists for 2221892575232 parent 0 root 2 
Extent back ref already exists for 538480640 parent 0 root 2 
Extent back ref already exists for 2221926707200 parent 0 root 2 
Extent back ref already exists for 2221926719488 parent 0 root 2 
Extent back ref already exists for 746985025536 parent 0 root 2 
Extent back ref already exists for 2703867379712 parent 0 root 2 
Extent back ref already exists for 2703877795840 parent 0 root 2 
Extent back ref already exists for 3624023527424 parent 0 root 2 
Extent back ref already exists for 3624023547904 parent 0 root 2 
Extent back ref already exists for 3624029978624 parent 0 root 2 
Extent back ref already exists for 2221998817280 parent 0 root 2 
Extent back ref already exists for 747239817216 parent 0 root 2 
Extent back ref already exists for 1497120432128 parent 0 root 2 
Extent back ref already exists for 1497285292032 parent 0 root 2 
Extent back ref already exists for 1497514807296 parent 0 root 2 
Extent back ref already exists for 1497549565952 parent 0 root 2 
Extent back ref already exists for 746363998208 parent 0 root 2 
Extent back ref already exists for 2703878045696 parent 0 root 2 
Extent back ref already exists for 2221998825472 parent 0 root 2 
Extent back ref already exists for 3624204349440 parent 0 root 2 
Extent back ref already exists for 484401152 parent 0 root 2 
Extent back ref already exists for 2221929988096 parent 0 root 2 
Extent back ref already exists for 707141632 parent 0 root 2 
Extent back ref already exists for 2221930053632 parent 0 root 2 
Extent back ref already exists for 2703875485696 parent 0 root 2 
Extent back ref already exists for 3624161251328 parent 0 root 2 
Extent back ref already exists for 3624024666112 parent 0 root 2 
Extent back ref already exists for 165191680 parent 0 root 2 
Extent back ref already exists for 3623966523392 parent 0 root 2 
Extent back ref already exists for 2221876412416 parent 0 root 2 
Extent back ref already exists for 1496842756096 parent 0 root 2 
Extent back ref already exists for 2221936676864 parent 0 root 2 
Extent back ref already exists for 1497422680064 parent 0 root 2 
Extent back ref already exists for 1497454501888 parent 0 root 2 
Extent back ref already exists for 2221823078400 parent 0 root 2 
Extent back ref already exists for 3624937074688 parent 0 root 2 
Extent back ref already exists for 3624953167872 parent 0 root 2 
Extent back ref already exists for 3624268865536 parent 0 root 2 
Extent back ref already exists for 2221718986752 parent 0 root 2 
Extent back ref already exists for 414621696 parent 0 root 2 
Extent back ref already exists for 2221929848832 parent 0 root 2 
Extent back ref already exists for 3624936488960 parent 0 root 2 
Extent back ref already exists for 3623950848000 parent 0 root 2 
Extent back ref already exists for 733777920 parent 0 root 2 
Extent back ref already exists for 3624953176064 parent 0 root 2 
Extent back ref already exists for 2221928071168 parent 0 root 2 
Extent back ref already exists for 3624310071296 parent 0 root 2 
Extent back ref already exists for 2221906374656 parent 0 root 2 
Extent back ref already exists for 2221906382848 parent 0 root 2 
Extent back ref already exists for 2703871188992 parent 0 root 2 
Extent back ref already exists for 2703879311360 parent 0 root 2 
Extent back ref already exists for 761036800 parent 0 root 2 
Extent back ref already exists for 751378432 parent 0 root 2 
Extent back ref already exists for 2221916528640 parent 0 root 2 
parent transid verify failed on 2703899471872 wanted 9077 found 9061
parent transid verify failed on 2703876403200 wanted 9078 found 9055
parent transid verify failed on 2703880609792 wanted 9069 found 9065
parent transid verify failed on 2703904714752 wanted 9066 found 5091
leaf parent key incorrect 2703904018432
bad block 2703904018432
parent transid verify failed on 2703921881088 wanted 9066 found 7543
parent transid verify failed on 2703883845632 wanted 9074 found 9061
parent transid verify failed on 2703887519744 wanted 9076 found 9056
btrfsck: disk-io.c:410: find_and_setup_root: Assertion `!(ret)' failed.
Aborted


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: volume broken? btrfsck fails
  2010-06-26 22:15 volume broken? btrfsck fails Yee-Ting Li
@ 2010-07-01 12:51 ` Daniel Kozlowski
  2010-07-04  6:57   ` Yee-Ting Li
  2010-07-07  0:19   ` Chris Mason
  2010-07-07  0:16 ` Chris Mason
  2010-07-11  8:19 ` Yee-Ting Li
  2 siblings, 2 replies; 16+ messages in thread
From: Daniel Kozlowski @ 2010-07-01 12:51 UTC (permalink / raw)
  To: linux-btrfs

Yee-Ting Li <yee379 <at> gmail.com> writes:

> 
> Hi,
> 
> i think my btrfs volume is hosed.... it mounts okay, but iostat shows /dev/sdg 
on 100% load. dmesg shows lots
> of 'parent transid verify failed on x wanted y found z'. then after a while i 
can't read from it (access to the
> filesystem freezes).
> 
> the machine had crashed (prob from some other process), and upon reboot i've 
been experience this problem since.
> 
> can anyone provide any guidance in how to proceed?
> 
> cheers,
> 
> Yee.

I am also having the same problem with a slightly different setup. In My case I 
cannot mount the filesystem. mount, btrfs-endio-met and kblockd/0 will all 
continually run until the system freezes up and requires a power cycle. I have 
both the kernel module and the tools checked out from git so if you have any 
ideas on fix's I can build them and test it out. 

here is some information about my setup 

[root@solution ~]# uname -a
Linux solution.bcig 2.6.35-0.13.rc3.git2.fc14.x86_64 #1 SMP Mon Jun 28 19:27:35 
UTC 2010 x86_64 x86_64 x86_64 GNU/Linux
[root@solution ~]# 

[root@solution ~]# btrfs-show 
Label: store  uuid: 4ba1cc6b-e12a-454a-a064-f4019312c063
	Total devices 7 FS bytes used 1.15TB
	devid    1 size 931.51GB used 415.55GB path /dev/sdb
	devid    2 size 931.51GB used 518.50GB path /dev/sdc
	devid    3 size 931.51GB used 342.04GB path /dev/sdd
	devid    4 size 931.51GB used 523.54GB path /dev/sde
	devid    5 size 465.76GB used 402.54GB path /dev/sdf
	devid    6 size 465.76GB used 382.54GB path /dev/sdg
	devid    7 size 465.76GB used 367.54GB path /dev/sdh

Btrfs v0.19-16-g075587c-dirty
[root@solution ~]# 

[root@solution ~]# tail  -n 12 /var/log/messages
Jul  1 04:47:03 solution kernel: parent transid verify failed on 1682196926464 
wanted 285263 found 283510
Jul  1 04:47:08 solution kernel: verify_parent_transid: 9244 callbacks 
suppressed
Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
wanted 285263 found 283510
Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
wanted 285263 found 283510
Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
wanted 285263 found 283510
Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
wanted 285263 found 283510
Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
wanted 285263 found 283510
Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
wanted 285263 found 283510
Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
wanted 285263 found 283510
Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
wanted 285263 found 283510
Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
wanted 285263 found 283510
Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
wanted 285263 found 283510
[root@solution ~]# 




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: volume broken? btrfsck fails
  2010-07-01 12:51 ` Daniel Kozlowski
@ 2010-07-04  6:57   ` Yee-Ting Li
  2010-07-07  0:19   ` Chris Mason
  1 sibling, 0 replies; 16+ messages in thread
From: Yee-Ting Li @ 2010-07-04  6:57 UTC (permalink / raw)
  To: Daniel Kozlowski; +Cc: linux-btrfs


On 1 Jul 2010, at 05:51, Daniel Kozlowski wrote:
> I am also having the same problem with a slightly different setup. In My case I 
> cannot mount the filesystem. mount, btrfs-endio-met and kblockd/0 will all 
> continually run until the system freezes up and requires a power cycle.

have you tried mounting with '-o degraded'?

having monitored the system for a while, i also think that in fact it's btrfs that's killing my system. i'm on ubuntu 10.4 with:

$ uname -a
Linux htpc 2.6.32-22-server #36-Ubuntu SMP Thu Jun 3 20:38:33 UTC 2010 x86_64 GNU/Linux

using the default kernel module, but git'd out the tools.

following the other thread 'Is there a more aggressive fixer than btrfsck?' i suspect that we'll just have to wait until some actual fsck operations are available for btrfs :(

on my system, it's btrfs-endio-met (only 1 out of 4) and btrfs-transacti (1 out of 2) that is taking up all the cpu/io wait cycles.

i wonder if it's only certain files on the array that are hosed; if that's the case is there a way i can map the kernel messages to a real filename? i don't mind loosing the odd file on this array, but i don't fancy copying it all over to somewhere else (yeah-yeah, up to date backups blah blah!) - i figured given the momentum btrfs was gaining it would be much more stable than this :(

Yee.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: volume broken? btrfsck fails
  2010-06-26 22:15 volume broken? btrfsck fails Yee-Ting Li
  2010-07-01 12:51 ` Daniel Kozlowski
@ 2010-07-07  0:16 ` Chris Mason
  2010-07-07  5:23   ` Yee-Ting Li
  2010-08-04 18:48   ` Thomas Kuther
  2010-07-11  8:19 ` Yee-Ting Li
  2 siblings, 2 replies; 16+ messages in thread
From: Chris Mason @ 2010-07-07  0:16 UTC (permalink / raw)
  To: Yee-Ting Li; +Cc: The development of BTRFS

On Sat, Jun 26, 2010 at 03:15:04PM -0700, Yee-Ting Li wrote:
> Hi,
> 
> i think my btrfs volume is hosed.... it mounts okay, but iostat shows /dev/sdg on 100% load. dmesg shows lots of 'parent transid verify failed on x wanted y found z'. then after a while i can't read from it (access to the filesystem freezes).
> 
> the machine had crashed (prob from some other process), and upon reboot i've been experience this problem since.
> 
> can anyone provide any guidance in how to proceed?

These are definitely corruptions, and they probably came from the crash.
Can you tell me more about the crash? (Power failure, what is the
storage underneath etc, what are the write cache settings).  We don't
expect these kinds corruptions to happen.

Yan Zheng is making a lot of progress on btrfsck, but I don't think
you'll want to be one of the first testers there.  I can definitely help
copy things off if you're having trouble accessing the FS.

-chris

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: volume broken? btrfsck fails
  2010-07-01 12:51 ` Daniel Kozlowski
  2010-07-04  6:57   ` Yee-Ting Li
@ 2010-07-07  0:19   ` Chris Mason
  2010-07-08  0:21     ` Daniel Kozlowski
  1 sibling, 1 reply; 16+ messages in thread
From: Chris Mason @ 2010-07-07  0:19 UTC (permalink / raw)
  To: Daniel Kozlowski; +Cc: linux-btrfs

On Thu, Jul 01, 2010 at 12:51:04PM +0000, Daniel Kozlowski wrote:
> Yee-Ting Li <yee379 <at> gmail.com> writes:
> 
> > 
> > Hi,
> > 
> > i think my btrfs volume is hosed.... it mounts okay, but iostat shows /dev/sdg 
> on 100% load. dmesg shows lots
> > of 'parent transid verify failed on x wanted y found z'. then after a while i 
> can't read from it (access to the
> > filesystem freezes).
> > 
> > the machine had crashed (prob from some other process), and upon reboot i've 
> been experience this problem since.
> > 
> > can anyone provide any guidance in how to proceed?
> > 
> > cheers,
> > 
> > Yee.
> 
> I am also having the same problem with a slightly different setup. In My case I 
> cannot mount the filesystem.

What is your hardware setup here?  Including write cache settings.  Did
you have craces with 2.6.35-rc1 or rc2?

> mount, btrfs-endio-met and kblockd/0 will all 
> continually run until the system freezes up and requires a power cycle. I have 
> both the kernel module and the tools checked out from git so if you have any 
> ideas on fix's I can build them and test it out. 
> 
> here is some information about my setup 
> [root@solution ~]# uname -a
> Linux solution.bcig 2.6.35-0.13.rc3.git2.fc14.x86_64 #1 SMP Mon Jun 28 19:27:35 
> UTC 2010 x86_64 x86_64 x86_64 GNU/Linux
> [root@solution ~]# 
> 
> [root@solution ~]# btrfs-show 
> Label: store  uuid: 4ba1cc6b-e12a-454a-a064-f4019312c063
> 	Total devices 7 FS bytes used 1.15TB
> 	devid    1 size 931.51GB used 415.55GB path /dev/sdb
> 	devid    2 size 931.51GB used 518.50GB path /dev/sdc
> 	devid    3 size 931.51GB used 342.04GB path /dev/sdd
> 	devid    4 size 931.51GB used 523.54GB path /dev/sde
> 	devid    5 size 465.76GB used 402.54GB path /dev/sdf
> 	devid    6 size 465.76GB used 382.54GB path /dev/sdg
> 	devid    7 size 465.76GB used 367.54GB path /dev/sdh
> 
> Btrfs v0.19-16-g075587c-dirty
> [root@solution ~]# 
> 
> [root@solution ~]# tail  -n 12 /var/log/messages
> Jul  1 04:47:03 solution kernel: parent transid verify failed on 1682196926464 
> wanted 285263 found 283510
> Jul  1 04:47:08 solution kernel: verify_parent_transid: 9244 callbacks 
> suppressed
> Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
> wanted 285263 found 283510
> Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
> wanted 285263 found 283510
> Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
> wanted 285263 found 283510
> Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
> wanted 285263 found 283510
> Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
> wanted 285263 found 283510
> Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
> wanted 285263 found 283510
> Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
> wanted 285263 found 283510
> Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
> wanted 285263 found 283510
> Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
> wanted 285263 found 283510
> Jul  1 04:47:08 solution kernel: parent transid verify failed on 1682196926464 
> wanted 285263 found 283510

Looks like we're looping on a single block.  What happens when you
dmesg -n1 to cut down on the console traffic?

If that doesn't help we can change it to spit a stack trace to figure
out where the looping is happening.  We should be erroring out instead
of hitting it over and over again.

-chris


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: volume broken? btrfsck fails
  2010-07-07  0:16 ` Chris Mason
@ 2010-07-07  5:23   ` Yee-Ting Li
  2010-08-04 18:48   ` Thomas Kuther
  1 sibling, 0 replies; 16+ messages in thread
From: Yee-Ting Li @ 2010-07-07  5:23 UTC (permalink / raw)
  To: Chris Mason; +Cc: The development of BTRFS


On 6 Jul 2010, at 17:16, Chris Mason wrote:
> These are definitely corruptions, and they probably came from the crash.
> Can you tell me more about the crash? (Power failure, what is the
> storage underneath etc, what are the write cache settings).  We don't
> expect these kinds corruptions to happen.

i think what happened was that the power got pulled accidentally. at the time i had a drive (sde) on an external usb controller. the other two drives are internal on a nForce 730i chipset. they are all 2TB WD drives (combination of EADS and EARS drives). according to hdparm all the drives have write-caching on.

> Yan Zheng is making a lot of progress on btrfsck, but I don't think
> you'll want to be one of the first testers there.  I can definitely help
> copy things off if you're having trouble accessing the FS.

i'm performing rsyncs at the moment to get some of the data off. i can read the drive fine, but after a while (i guess when something tries to access the corrupt file) i get the dmesgs again, and high cpu on the two btrfs-transacti and btrfs-endio-met threads.

is there a way i can determine the actual filenames that may be corrupt?

also, as i'm not using the /dev/sde drive (btrfs-show gives used 0.00TB) as i didn't do a balance after i installed it - is there a way i can degrade the array to recover that disk and keep the array with just two disks? then i will have enough storage to copy the 'good' files off :)

once i have a replica, then i can test whatever code you'd like to throw at me :)

cheers,

Yee.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: volume broken? btrfsck fails
  2010-07-07  0:19   ` Chris Mason
@ 2010-07-08  0:21     ` Daniel Kozlowski
  2010-07-08  2:39       ` Daniel Kozlowski
  2010-07-08  8:43       ` Daniel J Blueman
  0 siblings, 2 replies; 16+ messages in thread
From: Daniel Kozlowski @ 2010-07-08  0:21 UTC (permalink / raw)
  To: Chris Mason, Daniel Kozlowski, linux-btrfs

On Tue, Jul 6, 2010 at 8:19 PM, Chris Mason <chris.mason@oracle.com> wr=
ote:
>> I am also having the same problem with a slightly different setup. I=
n My case I
>> cannot mount the filesystem.
>
> What is your hardware setup here? =A0Including write cache settings. =
=A0Did
> you have craces with 2.6.35-rc1 or rc2?

My setup is

Eight hard Drive
four 1TB Drives
four 500GB Drives
All drives are connected through a 3ware Inc 9550SX SATA-II RAID PCI-X =
card
The card is configured to export all drives essentially acting as a
SATA port multiplier. (drives show up sdb - sdi)
Drives are configured in btrfs raid0
=46ilesystem is mounted using:
mount -t btrfs /dev/sdb /opt

I have been able to lock up the system on
2.6.33.5-124.fc13.x86_64
2.6.35-0.13.rc3.git2.fc14.x86_64
2.6.35-0.23.rc3.git6.fc14.x86_64
and
2.6.35-0.23.rc3.git6.fc14.x86_64 with a DKMS build of the btrfs module
(Btrfs v0.19-16-g075587c-dirty)

If you would like me to pull out another version of the kernel or roll
back specific commits from the kernel module I can

I have been able to get different responses form different version
2.6.33.* - This will mount the volume but will hang shortly after
mounting when reading data form the filesystem ( ls /opt) writes a
bunch of transid verify failed messages hangs on ls
2.6.34.* - Will not mount at all still gives the transid verify failed
 hands on mount

>
> Looks like we're looping on a single block. =A0What happens when you
> dmesg -n1 to cut down on the console traffic?
>
Nothing changes I still have endless repeats of

parent transid verify failed on 1682586464256 wanted 285114 found 11257

> If that doesn't help we can change it to spit a stack trace to figure
> out where the looping is happening. =A0We should be erroring out inst=
ead
> of hitting it over and over again.

In my kernel noviceness i tried attaching gdb to the btrfs-endio-met,
however apparently you can't attach gdb to a kernel thread like that
If you could assist me in obtaining a call trace I will gladly attempt
to resolve the matter.

Dan Kozlowski

--=20
S.D.G.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: volume broken? btrfsck fails
  2010-07-08  0:21     ` Daniel Kozlowski
@ 2010-07-08  2:39       ` Daniel Kozlowski
  2010-07-12  0:50         ` Chris Mason
  2010-07-08  8:43       ` Daniel J Blueman
  1 sibling, 1 reply; 16+ messages in thread
From: Daniel Kozlowski @ 2010-07-08  2:39 UTC (permalink / raw)
  To: Chris Mason, Daniel Kozlowski, linux-btrfs

>> Looks like we're looping on a single block. =A0What happens when you
>> dmesg -n1 to cut down on the console traffic?
>>
> Nothing changes I still have endless repeats of
>
> parent transid verify failed on 1682586464256 wanted 285114 found 112=
57
>
>> If that doesn't help we can change it to spit a stack trace to figur=
e
>> out where the looping is happening. =A0We should be erroring out ins=
tead
>> of hitting it over and over again.
>
> In my kernel noviceness i tried attaching gdb to the btrfs-endio-met,
> however apparently you can't attach gdb to a kernel thread like that
> If you could assist me in obtaining a call trace I will gladly attemp=
t
> to resolve the matter.

Ok I had some free time and decided to excersice my googlefoo and came
up with this trace

parent transid verify failed on 3241193205760 wanted 285287 found 28138=
2
Pid: 2163, comm: mount Not tainted 2.6.35-0.23.rc3.git6.fc14.x86_64 #1
Call Trace:
 [<ffffffffa047c376>] verify_parent_transid+0xb7/0xfe [btrfs]
 [<ffffffffa047c4f2>] btrfs_buffer_uptodate+0x49/0x59 [btrfs]
 [<ffffffffa04686a2>] read_block_for_search+0x8f/0x289 [btrfs]
 [<ffffffffa046d554>] btrfs_search_slot+0x3ae/0x513 [btrfs]
 [<ffffffffa0470ece>] btrfs_read_block_groups+0x73/0x526 [btrfs]
 [<ffffffff8149b0a3>] ? _raw_spin_unlock+0x2b/0x2f
 [<ffffffffa0469f56>] ? btrfs_root_node+0x2a/0x32 [btrfs]
 [<ffffffffa047d287>] ? find_and_setup_root+0xab/0xbc [btrfs]
 [<ffffffffa04800eb>] open_ctree+0xf19/0x143a [btrfs]
 [<ffffffffa0467960>] btrfs_get_sb+0x1ce/0x40b [btrfs]
 [<ffffffff810e9cfd>] ? free_pages+0x49/0x4e
 [<ffffffff8112c9f9>] vfs_kern_mount+0xbd/0x19b
 [<ffffffff8112cb3f>] do_kern_mount+0x4d/0xed
 [<ffffffff81143742>] do_mount+0x776/0x7ed
 [<ffffffff81143841>] sys_mount+0x88/0xc2
 [<ffffffff81009c32>] system_call_fastpath+0x16/0x1b


> Dan Kozlowski
>
> --
> S.D.G.
>



--=20
S.D.G.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: volume broken? btrfsck fails
  2010-07-08  0:21     ` Daniel Kozlowski
  2010-07-08  2:39       ` Daniel Kozlowski
@ 2010-07-08  8:43       ` Daniel J Blueman
  1 sibling, 0 replies; 16+ messages in thread
From: Daniel J Blueman @ 2010-07-08  8:43 UTC (permalink / raw)
  To: Daniel Kozlowski; +Cc: Chris Mason, linux-btrfs

On 8 July 2010 01:21, Daniel Kozlowski <dan.kozlowski@gmail.com> wrote:
> On Tue, Jul 6, 2010 at 8:19 PM, Chris Mason <chris.mason@oracle.com> =
wrote:
>>> I am also having the same problem with a slightly different setup. =
In My case I
>>> cannot mount the filesystem.
>>
>> What is your hardware setup here? =A0Including write cache settings.=
 =A0Did
>> you have craces with 2.6.35-rc1 or rc2?
>
> My setup is
>
> Eight hard Drive
> four 1TB Drives
> four 500GB Drives
> All drives are connected through a 3ware Inc 9550SX SATA-II RAID PCI-=
X card
> The card is configured to export all drives essentially acting as a
> SATA port multiplier. (drives show up sdb - sdi)
> Drives are configured in btrfs raid0
> Filesystem is mounted using:
> mount -t btrfs /dev/sdb /opt
>
> I have been able to lock up the system on
> 2.6.33.5-124.fc13.x86_64
> 2.6.35-0.13.rc3.git2.fc14.x86_64
> 2.6.35-0.23.rc3.git6.fc14.x86_64
> and
> 2.6.35-0.23.rc3.git6.fc14.x86_64 with a DKMS build of the btrfs modul=
e
> (Btrfs v0.19-16-g075587c-dirty)
>
> If you would like me to pull out another version of the kernel or rol=
l
> back specific commits from the kernel module I can
>
> I have been able to get different responses form different version
> 2.6.33.* - This will mount the volume but will hang shortly after
> mounting when reading data form the filesystem ( ls /opt) writes a
> bunch of transid verify failed messages hangs on ls
> 2.6.34.* - Will not mount at all still gives the transid verify faile=
d
> =A0hands on mount
>
>>
>> Looks like we're looping on a single block. =A0What happens when you
>> dmesg -n1 to cut down on the console traffic?
>>
> Nothing changes I still have endless repeats of
>
> parent transid verify failed on 1682586464256 wanted 285114 found 112=
57
>
>> If that doesn't help we can change it to spit a stack trace to figur=
e
>> out where the looping is happening. =A0We should be erroring out ins=
tead
>> of hitting it over and over again.
>
> In my kernel noviceness i tried attaching gdb to the btrfs-endio-met,
> however apparently you can't attach gdb to a kernel thread like that
> If you could assist me in obtaining a call trace I will gladly attemp=
t
> to resolve the matter.

=46or grabbing kernel backtraces:

$ sudo -s
# dmesg -c >/dev/null
# echo t >/proc/sysrq-trigger
# dmesg >backtraces.txt
(there are other ways with

The problem is that you'll be taking instantaneous snapshots, which
may or may not be representative of the main looping, but over a few
shots should be.

Thanks,
  Daniel
--=20
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: volume broken? btrfsck fails
  2010-06-26 22:15 volume broken? btrfsck fails Yee-Ting Li
  2010-07-01 12:51 ` Daniel Kozlowski
  2010-07-07  0:16 ` Chris Mason
@ 2010-07-11  8:19 ` Yee-Ting Li
  2010-07-12  0:43   ` Chris Mason
  2 siblings, 1 reply; 16+ messages in thread
From: Yee-Ting Li @ 2010-07-11  8:19 UTC (permalink / raw)
  To: The development of BTRFS

so after leaving the array for a while, with the disk churning away for a few days, it stopped. i copied some files off the disk (everything seems okay) and decided to unmount and run btrfsck again - this time i get a different error:

$ sudo /usr/local/bin/btrfsck /dev/sdf
failed to read /dev/sr0
parent transid verify failed on 2703919247360 wanted 9066 found 7543
parent transid verify failed on 2703914500096 wanted 9066 found 7543
parent transid verify failed on 2703873781760 wanted 9074 found 9022
parent transid verify failed on 2703877693440 wanted 9070 found 9062
parent transid verify failed on 2703921868800 wanted 9066 found 7543
parent transid verify failed on 2703922647040 wanted 9066 found 7543
parent transid verify failed on 2703919247360 wanted 9066 found 7543
parent transid verify failed on 2703919255552 wanted 9066 found 7543
parent transid verify failed on 2703917125632 wanted 9066 found 7543
parent transid verify failed on 2703879294976 wanted 9075 found 9055
parent transid verify failed on 2703883194368 wanted 9075 found 9057
parent transid verify failed on 2703922688000 wanted 9066 found 7543
parent transid verify failed on 2703873781760 wanted 9074 found 9022
parent transid verify failed on 2703877693440 wanted 9070 found 9062
parent transid verify failed on 2703921868800 wanted 9066 found 7543
parent transid verify failed on 2703922647040 wanted 9066 found 7543
parent transid verify failed on 2703919247360 wanted 9066 found 7543
parent transid verify failed on 2703919255552 wanted 9066 found 7543
bad block 2703873781760
Extent back ref already exists for 365342720 parent 0 root 2 
Extent back ref already exists for 2221870616576 parent 0 root 2 
Extent back ref already exists for 383959040 parent 0 root 2 
Extent back ref already exists for 367714304 parent 0 root 2 
Extent back ref already exists for 706744320 parent 0 root 2 
Extent back ref already exists for 368672768 parent 0 root 2 
Extent back ref already exists for 315338752 parent 0 root 2 
Extent back ref already exists for 377356288 parent 0 root 2 
Extent back ref already exists for 368914432 parent 0 root 2 
Extent back ref already exists for 369807360 parent 0 root 2 
Extent back ref already exists for 2221957713920 parent 0 root 2 
Extent back ref already exists for 370139136 parent 0 root 2 
Extent back ref already exists for 369811456 parent 0 root 2 
Extent back ref already exists for 370122752 parent 0 root 2 
Extent back ref already exists for 365936640 parent 0 root 2 
Extent back ref already exists for 2221948424192 parent 0 root 2 
Extent back ref already exists for 3624002596864 parent 0 root 2 
Extent back ref already exists for 706789376 parent 0 root 2 
Extent back ref already exists for 2703778734080 parent 0 root 2 
Extent back ref already exists for 372252672 parent 0 root 2 
Extent back ref already exists for 372109312 parent 0 root 2 
Extent back ref already exists for 372989952 parent 0 root 2 
Extent back ref already exists for 373657600 parent 0 root 2 
Extent back ref already exists for 374521856 parent 0 root 2 
Extent back ref already exists for 374628352 parent 0 root 2 
Extent back ref already exists for 374976512 parent 0 root 2 
Extent back ref already exists for 2221948403712 parent 0 root 2 
Extent back ref already exists for 375586816 parent 0 root 2 
Extent back ref already exists for 375906304 parent 0 root 2 
Extent back ref already exists for 376639488 parent 0 root 2 
Extent back ref already exists for 706818048 parent 0 root 2 
Extent back ref already exists for 383778816 parent 0 root 2 
Extent back ref already exists for 377626624 parent 0 root 2 
leaf parent key incorrect 2703874203648
bad block 2703874203648
leaf 2222080487424 items 37 free space 1183 generation 10279 owner 2
fs uuid ea7ea0b3-bc42-4b0c-9173-346df61d4454
chunk uuid 886b0dfb-fa34-49c7-9ab0-2589603f8ae4
	item 0 key (364388352 EXTENT_ITEM 4096) itemoff 3944 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200172044288) level 0
		tree block backref root 7
	item 1 key (364392448 EXTENT_ITEM 4096) itemoff 3893 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200220258304) level 0
		tree block backref root 7
	item 2 key (364396544 EXTENT_ITEM 4096) itemoff 3842 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200179384320) level 0
		tree block backref root 7
	item 3 key (364400640 EXTENT_ITEM 4096) itemoff 3791 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200220258304) level 0
		tree block backref root 7
	item 4 key (364404736 EXTENT_ITEM 4096) itemoff 3740 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200184492032) level 0
		tree block backref root 7
	item 5 key (364408832 EXTENT_ITEM 4096) itemoff 3689 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200183775232) level 0
		tree block backref root 7
	item 6 key (364412928 EXTENT_ITEM 4096) itemoff 3638 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200192163840) level 0
		tree block backref root 7
	item 7 key (364417024 EXTENT_ITEM 4096) itemoff 3587 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200189181952) level 0
		tree block backref root 7
	item 8 key (364421120 EXTENT_ITEM 4096) itemoff 3536 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200220258304) level 0
		tree block backref root 7
	item 9 key (364425216 EXTENT_ITEM 4096) itemoff 3485 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200198520832) level 0
		tree block backref root 7
	item 10 key (364429312 EXTENT_ITEM 4096) itemoff 3434 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200201732096) level 0
		tree block backref root 7
	item 11 key (364433408 EXTENT_ITEM 4096) itemoff 3383 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200208547840) level 0
		tree block backref root 7
	item 12 key (364437504 EXTENT_ITEM 4096) itemoff 3332 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200215756800) level 0
		tree block backref root 7
	item 13 key (364441600 EXTENT_ITEM 4096) itemoff 3281 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200219885568) level 0
		tree block backref root 7
	item 14 key (364445696 EXTENT_ITEM 4096) itemoff 3230 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200213725184) level 0
		tree block backref root 7
	item 15 key (364449792 EXTENT_ITEM 4096) itemoff 3179 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200208547840) level 0
		tree block backref root 7
	item 16 key (364453888 EXTENT_ITEM 4096) itemoff 3128 itemsize 51
		extent refs 1 gen 8461 flags 2
		tree block key (104423 1 0) level 0
		tree block backref root 5
	item 17 key (364462080 EXTENT_ITEM 4096) itemoff 3077 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200657076224) level 0
		tree block backref root 7
	item 18 key (364466176 EXTENT_ITEM 4096) itemoff 3026 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200663105536) level 0
		tree block backref root 7
	item 19 key (364470272 EXTENT_ITEM 4096) itemoff 2975 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200674902016) level 0
		tree block backref root 7
	item 20 key (364474368 EXTENT_ITEM 4096) itemoff 2924 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200666513408) level 0
		tree block backref root 7
	item 21 key (364478464 EXTENT_ITEM 4096) itemoff 2873 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200687484928) level 0
		tree block backref root 7
	item 22 key (364482560 EXTENT_ITEM 4096) itemoff 2822 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200670707712) level 0
		tree block backref root 7
	item 23 key (364486656 EXTENT_ITEM 4096) itemoff 2771 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200699871232) level 0
		tree block backref root 7
	item 24 key (364490752 EXTENT_ITEM 4096) itemoff 2720 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200674902016) level 0
		tree block backref root 7
	item 25 key (364494848 EXTENT_ITEM 4096) itemoff 2669 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200683290624) level 0
		tree block backref root 7
	item 26 key (364498944 EXTENT_ITEM 4096) itemoff 2618 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200708128768) level 0
		tree block backref root 7
	item 27 key (364503040 EXTENT_ITEM 4096) itemoff 2567 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200699871232) level 0
		tree block backref root 7
	item 28 key (364507136 EXTENT_ITEM 4096) itemoff 2516 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200720515072) level 0
		tree block backref root 7
	item 29 key (364511232 EXTENT_ITEM 4096) itemoff 2465 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200704000000) level 0
		tree block backref root 7
	item 30 key (364515328 EXTENT_ITEM 4096) itemoff 2414 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200712257536) level 0
		tree block backref root 7
	item 31 key (364519424 EXTENT_ITEM 4096) itemoff 2363 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200724643840) level 0
		tree block backref root 7
	item 32 key (364523520 EXTENT_ITEM 4096) itemoff 2312 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200666513408) level 0
		tree block backref root 7
	item 33 key (364527616 EXTENT_ITEM 4096) itemoff 2261 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200669659136) level 0
		tree block backref root 7
	item 34 key (364531712 EXTENT_ITEM 4096) itemoff 2210 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200670707712) level 0
		tree block backref root 7
	item 35 key (364535808 EXTENT_ITEM 4096) itemoff 2159 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200691613696) level 0
		tree block backref root 7
	item 36 key (364539904 EXTENT_ITEM 4096) itemoff 2108 itemsize 51
		extent refs 1 gen 1061 flags 2
		tree block key (18446744073709551606 80 200682242048) level 0
		tree block backref root 7
failed to find block number 364457984
Aborted


any ideas?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: volume broken? btrfsck fails
  2010-07-11  8:19 ` Yee-Ting Li
@ 2010-07-12  0:43   ` Chris Mason
  2010-07-12  4:05     ` Yee-Ting Li
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Mason @ 2010-07-12  0:43 UTC (permalink / raw)
  To: Yee-Ting Li; +Cc: The development of BTRFS

On Sun, Jul 11, 2010 at 01:19:34AM -0700, Yee-Ting Li wrote:
> so after leaving the array for a while, with the disk churning away for a few days, it stopped. i copied some files off the disk (everything seems okay) and decided to unmount and run btrfsck again - this time i get a different error:
> 
> $ sudo /usr/local/bin/btrfsck /dev/sdf
[ ... ick ... ]

> failed to find block number 364457984
> Aborted

Was this after a fresh mkfs?  Clearly things are very corrupt on this
original drive.  It would be a good test case for Yan Zhengs new fsck
code, but first I'd like to figure out if you're still seeing the old
corruption of if you've started over.

-chris


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: volume broken? btrfsck fails
  2010-07-08  2:39       ` Daniel Kozlowski
@ 2010-07-12  0:50         ` Chris Mason
  0 siblings, 0 replies; 16+ messages in thread
From: Chris Mason @ 2010-07-12  0:50 UTC (permalink / raw)
  To: Daniel Kozlowski; +Cc: linux-btrfs

On Wed, Jul 07, 2010 at 10:39:48PM -0400, Daniel Kozlowski wrote:
> >> Looks like we're looping on a single block. =A0What happens when y=
ou
> >> dmesg -n1 to cut down on the console traffic?
> >>
> > Nothing changes I still have endless repeats of
> >
> > parent transid verify failed on 1682586464256 wanted 285114 found 1=
1257
> >
> >> If that doesn't help we can change it to spit a stack trace to fig=
ure
> >> out where the looping is happening. =A0We should be erroring out i=
nstead
> >> of hitting it over and over again.
> >
> > In my kernel noviceness i tried attaching gdb to the btrfs-endio-me=
t,
> > however apparently you can't attach gdb to a kernel thread like tha=
t
> > If you could assist me in obtaining a call trace I will gladly atte=
mpt
> > to resolve the matter.
>=20
> Ok I had some free time and decided to excersice my googlefoo and cam=
e
> up with this trace
>=20
> parent transid verify failed on 3241193205760 wanted 285287 found 281=
382
> Pid: 2163, comm: mount Not tainted 2.6.35-0.23.rc3.git6.fc14.x86_64 #=
1
> Call Trace:
>  [<ffffffffa047c376>] verify_parent_transid+0xb7/0xfe [btrfs]
>  [<ffffffffa047c4f2>] btrfs_buffer_uptodate+0x49/0x59 [btrfs]
>  [<ffffffffa04686a2>] read_block_for_search+0x8f/0x289 [btrfs]
>  [<ffffffffa046d554>] btrfs_search_slot+0x3ae/0x513 [btrfs]
>  [<ffffffffa0470ece>] btrfs_read_block_groups+0x73/0x526 [btrfs]
>  [<ffffffff8149b0a3>] ? _raw_spin_unlock+0x2b/0x2f
>  [<ffffffffa0469f56>] ? btrfs_root_node+0x2a/0x32 [btrfs]
>  [<ffffffffa047d287>] ? find_and_setup_root+0xab/0xbc [btrfs]
>  [<ffffffffa04800eb>] open_ctree+0xf19/0x143a [btrfs]
>  [<ffffffffa0467960>] btrfs_get_sb+0x1ce/0x40b [btrfs]
>  [<ffffffff810e9cfd>] ? free_pages+0x49/0x4e
>  [<ffffffff8112c9f9>] vfs_kern_mount+0xbd/0x19b
>  [<ffffffff8112cb3f>] do_kern_mount+0x4d/0xed
>  [<ffffffff81143742>] do_mount+0x776/0x7ed
>  [<ffffffff81143841>] sys_mount+0x88/0xc2
>  [<ffffffff81009c32>] system_call_fastpath+0x16/0x1b


Ok, so we're never getting out of mount.  A recent change to
read_block_for_search is causing this problem.  We're looping over and
over again because it is returning -EAGAIN instead of -EIO.

Thanks for nailing this trace down, I'll get a fix in for the looping.
I'm afraid it won't bring back the filesystem though, you'll end up
failing in mount.  Would you like some helping copying the data off?

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: volume broken? btrfsck fails
  2010-07-12  0:43   ` Chris Mason
@ 2010-07-12  4:05     ` Yee-Ting Li
  0 siblings, 0 replies; 16+ messages in thread
From: Yee-Ting Li @ 2010-07-12  4:05 UTC (permalink / raw)
  To: Chris Mason; +Cc: The development of BTRFS


On 11 Jul 2010, at 17:43, Chris Mason wrote:
> Was this after a fresh mkfs?  Clearly things are very corrupt on this
> original drive.  It would be a good test case for Yan Zhengs new fsck
> code, but first I'd like to figure out if you're still seeing the old
> corruption of if you've started over.

nope, same disk as before when the btrfsck exited with:

btrfsck: disk-io.c:410: find_and_setup_root: Assertion `!(ret)' failed.

the strange thing was that i'm pretty sure that btrfs crashed the system a couple of times (hung). after reboot the mounted drive would basically churn away for hours and spit out lots of the parent transid messages. but after a while it stops and everything seems fine again.

i don't mind losing files on the disk array, but it would be nice if it could tell me the actual filenames which are corrupt.

Yee.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: volume broken? btrfsck fails
  2010-07-07  0:16 ` Chris Mason
  2010-07-07  5:23   ` Yee-Ting Li
@ 2010-08-04 18:48   ` Thomas Kuther
  2010-08-05  1:30     ` Chris Mason
  1 sibling, 1 reply; 16+ messages in thread
From: Thomas Kuther @ 2010-08-04 18:48 UTC (permalink / raw)
  To: The development of BTRFS; +Cc: Chris Mason

On Di, 06.07.10 20:16 Chris Mason <chris.mason@oracle.com> wrote:

> On Sat, Jun 26, 2010 at 03:15:04PM -0700, Yee-Ting Li wrote:
> > Hi,
> > 
> > i think my btrfs volume is hosed.... it mounts okay, but iostat
> > shows /dev/sdg on 100% load. dmesg shows lots of 'parent transid
> > verify failed on x wanted y found z'. then after a while i can't
> > read from it (access to the filesystem freezes).
> > 
> > the machine had crashed (prob from some other process), and upon
> > reboot i've been experience this problem since.
> > 
> > can anyone provide any guidance in how to proceed?
> 
> These are definitely corruptions, and they probably came from the
> crash. Can you tell me more about the crash? (Power failure, what is
> the storage underneath etc, what are the write cache settings).  We
> don't expect these kinds corruptions to happen.
> 
> Yan Zheng is making a lot of progress on btrfsck, but I don't think
> you'll want to be one of the first testers there.  I can definitely
> help copy things off if you're having trouble accessing the FS.
> 
> -chris

Hello Chris,

sorry if I'm hijacking this thread. I got a similar problem, probably
caused by a system crash due to faulty/badly timed memory dimms. The
system suddenly hardlocked during write activity.

- kernel is 2.6.35
- btrfs on top of a md raid5, which looks healthy. Desktop SATA disks.

# cat /proc/mdstat|grep -A1 md0
md0 : active raid5 sdb1[0] sdd1[1] sdc1[2]
      2930271872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

# btrfsck
usage: btrfsck dev
Btrfs v0.19-16-g075587c-dirty

# btrfsck /dev/md0
parent transid verify failed on 2419218964480 wanted 127839 found 127260
parent transid verify failed on 2419218964480 wanted 127839 found 127260
parent transid verify failed on 2419218915328 wanted 127839 found 127260
parent transid verify failed on 2419218915328 wanted 127839 found 127260
parent transid verify failed on 2419214266368 wanted 127839 found 127837
parent transid verify failed on 2419214266368 wanted 127839 found 127837
parent transid verify failed on 2419214266368 wanted 127839 found 127837
Segmentation fault

Mount endlessly loops, like explained in this thread.

If there is a way, I would really like some aid copying the data off.
The backup is quite out of date, shame on me.

Best regards,
Thomas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: volume broken? btrfsck fails
  2010-08-04 18:48   ` Thomas Kuther
@ 2010-08-05  1:30     ` Chris Mason
  2010-08-14 11:08       ` Thomas Kuther
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Mason @ 2010-08-05  1:30 UTC (permalink / raw)
  To: Thomas Kuther; +Cc: The development of BTRFS

On Wed, Aug 04, 2010 at 08:48:40PM +0200, Thomas Kuther wrote:
> On Di, 06.07.10 20:16 Chris Mason <chris.mason@oracle.com> wrote:
> 
> > On Sat, Jun 26, 2010 at 03:15:04PM -0700, Yee-Ting Li wrote:
> > > Hi,
> > > 
> > > i think my btrfs volume is hosed.... it mounts okay, but iostat
> > > shows /dev/sdg on 100% load. dmesg shows lots of 'parent transid
> > > verify failed on x wanted y found z'. then after a while i can't
> > > read from it (access to the filesystem freezes).
> > > 
> > > the machine had crashed (prob from some other process), and upon
> > > reboot i've been experience this problem since.
> > > 
> > > can anyone provide any guidance in how to proceed?
> > 
> > These are definitely corruptions, and they probably came from the
> > crash. Can you tell me more about the crash? (Power failure, what is
> > the storage underneath etc, what are the write cache settings).  We
> > don't expect these kinds corruptions to happen.
> > 
> > Yan Zheng is making a lot of progress on btrfsck, but I don't think
> > you'll want to be one of the first testers there.  I can definitely
> > help copy things off if you're having trouble accessing the FS.
> > 
> > -chris
> 
> Hello Chris,
> 
> sorry if I'm hijacking this thread. I got a similar problem, probably
> caused by a system crash due to faulty/badly timed memory dimms. The
> system suddenly hardlocked during write activity.
> 
> - kernel is 2.6.35
> - btrfs on top of a md raid5, which looks healthy. Desktop SATA disks.
> 
> # cat /proc/mdstat|grep -A1 md0
> md0 : active raid5 sdb1[0] sdd1[1] sdc1[2]
>       2930271872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
> 
> # btrfsck
> usage: btrfsck dev
> Btrfs v0.19-16-g075587c-dirty
> 
> # btrfsck /dev/md0
> parent transid verify failed on 2419218964480 wanted 127839 found 127260
> parent transid verify failed on 2419218964480 wanted 127839 found 127260
> parent transid verify failed on 2419218915328 wanted 127839 found 127260
> parent transid verify failed on 2419218915328 wanted 127839 found 127260
> parent transid verify failed on 2419214266368 wanted 127839 found 127837
> parent transid verify failed on 2419214266368 wanted 127839 found 127837
> parent transid verify failed on 2419214266368 wanted 127839 found 127837
> Segmentation fault
> 
> Mount endlessly loops, like explained in this thread.
> 
> If there is a way, I would really like some aid copying the data off.
> The backup is quite out of date, shame on me.

No problem, I'll get a test patch out in the morning.

-chris


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: volume broken? btrfsck fails
  2010-08-05  1:30     ` Chris Mason
@ 2010-08-14 11:08       ` Thomas Kuther
  0 siblings, 0 replies; 16+ messages in thread
From: Thomas Kuther @ 2010-08-14 11:08 UTC (permalink / raw)
  To: Chris Mason; +Cc: The development of BTRFS

[-- Attachment #1: Type: text/plain, Size: 3016 bytes --]

On Mi, 04.08.10 21:30 Chris Mason <chris.mason@oracle.com> wrote:

> On Wed, Aug 04, 2010 at 08:48:40PM +0200, Thomas Kuther wrote:
> > On Di, 06.07.10 20:16 Chris Mason <chris.mason@oracle.com> wrote:
> > 
> > > On Sat, Jun 26, 2010 at 03:15:04PM -0700, Yee-Ting Li wrote:
> > > > Hi,
> > > > 
> > > > i think my btrfs volume is hosed.... it mounts okay, but iostat
> > > > shows /dev/sdg on 100% load. dmesg shows lots of 'parent transid
> > > > verify failed on x wanted y found z'. then after a while i can't
> > > > read from it (access to the filesystem freezes).
> > > > 
> > > > the machine had crashed (prob from some other process), and upon
> > > > reboot i've been experience this problem since.
> > > > 
> > > > can anyone provide any guidance in how to proceed?
> > > 
> > > These are definitely corruptions, and they probably came from the
> > > crash. Can you tell me more about the crash? (Power failure, what
> > > is the storage underneath etc, what are the write cache
> > > settings).  We don't expect these kinds corruptions to happen.
> > > 
> > > Yan Zheng is making a lot of progress on btrfsck, but I don't
> > > think you'll want to be one of the first testers there.  I can
> > > definitely help copy things off if you're having trouble
> > > accessing the FS.
> > > 
> > > -chris
> > 
> > Hello Chris,
> > 
> > sorry if I'm hijacking this thread. I got a similar problem,
> > probably caused by a system crash due to faulty/badly timed memory
> > dimms. The system suddenly hardlocked during write activity.
> > 
> > - kernel is 2.6.35
> > - btrfs on top of a md raid5, which looks healthy. Desktop SATA
> > disks.
> > 
> > # cat /proc/mdstat|grep -A1 md0
> > md0 : active raid5 sdb1[0] sdd1[1] sdc1[2]
> >       2930271872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
> > 
> > # btrfsck
> > usage: btrfsck dev
> > Btrfs v0.19-16-g075587c-dirty
> > 
> > # btrfsck /dev/md0
> > parent transid verify failed on 2419218964480 wanted 127839 found
> > 127260 parent transid verify failed on 2419218964480 wanted 127839
> > found 127260 parent transid verify failed on 2419218915328 wanted
> > 127839 found 127260 parent transid verify failed on 2419218915328
> > wanted 127839 found 127260 parent transid verify failed on
> > 2419214266368 wanted 127839 found 127837 parent transid verify
> > failed on 2419214266368 wanted 127839 found 127837 parent transid
> > verify failed on 2419214266368 wanted 127839 found 127837
> > Segmentation fault
> > 
> > Mount endlessly loops, like explained in this thread.
> > 
> > If there is a way, I would really like some aid copying the data
> > off. The backup is quite out of date, shame on me.
> 
> No problem, I'll get a test patch out in the morning.
> 
> -chris
> 

Hi Chris,

did you find the time to get that patch done meanwhile?
I'm willing to test.

Seems more people get this error after power outages, suspending or
similar.

Thanks in advance.

~Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-08-14 11:08 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-26 22:15 volume broken? btrfsck fails Yee-Ting Li
2010-07-01 12:51 ` Daniel Kozlowski
2010-07-04  6:57   ` Yee-Ting Li
2010-07-07  0:19   ` Chris Mason
2010-07-08  0:21     ` Daniel Kozlowski
2010-07-08  2:39       ` Daniel Kozlowski
2010-07-12  0:50         ` Chris Mason
2010-07-08  8:43       ` Daniel J Blueman
2010-07-07  0:16 ` Chris Mason
2010-07-07  5:23   ` Yee-Ting Li
2010-08-04 18:48   ` Thomas Kuther
2010-08-05  1:30     ` Chris Mason
2010-08-14 11:08       ` Thomas Kuther
2010-07-11  8:19 ` Yee-Ting Li
2010-07-12  0:43   ` Chris Mason
2010-07-12  4:05     ` Yee-Ting Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).