All of lore.kernel.org
 help / color / mirror / Atom feed
* coredump in btrfsck
@ 2014-01-01 21:27 Oliver Mangold
  2014-01-01 21:58 ` Chris Murphy
  0 siblings, 1 reply; 7+ messages in thread
From: Oliver Mangold @ 2014-01-01 21:27 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2850 bytes --]

I fear, I broke my FS by running btrfsck. I tried 'btrfsck --repair' and 
it fixed several problems but finally crashed with some debug message 
from 'extent-tree.c', so I also tried 'btrfsck --repair 
--init-extent-tree'. Since then I can't mount the FS anymore:

 > mount -t btrfs /dev/mapper/primary-home /home

produces log messages:

Jan 01 21:45:09 home kernel: btrfs: device fsid 
31a5d433-4f7b-49cc-9bc0-9422471f5194 devid 1 transid 4793 
/dev/mapper/primary-home
Jan 01 21:45:09 home kernel: btrfs: disk space caching is enabled
Jan 01 21:45:09 home kernel: parent transid verify failed on 2176851968 
wanted 4792 found 4793
Jan 01 21:45:09 home kernel: parent transid verify failed on 2176851968 
wanted 4792 found 4793
Jan 01 21:45:09 home kernel: btrfs: open_ctree failed

The FS operates in RAID1-mode over 2 block devices 
/dev/mapper/primary-home and /dev/mapper/secondary-home. Trying to rerun 
'btrfsck --repair' or 'btrfsck --repair --init-extent-tree' (for both 
devices) still exits with a debug assertion:

 > btrfsck --repair /dev/mapper/primary-home
parent transid verify failed on 2176851968 wanted 4792 found 4793
Ignoring transid failure
checking extents
bad block 2783195136
ref mismatch on [1103101952 4096] extent item 0, found 1
incorrect offsets 3200 4794
btrfsck: extent-tree.c:2717: alloc_reserved_tree_block: Assertion 
`!(ret)' failed.
enabling repair mode
Checking filesystem on /dev/mapper/primary-home
UUID: 31a5d433-4f7b-49cc-9bc0-9422471f5194

 > btrfsck --repair /dev/mapper/secondary-home
parent transid verify failed on 2176851968 wanted 4792 found 4793
Ignoring transid failure
checking extents
bad block 2783195136
ref mismatch on [1103101952 4096] extent item 0, found 1
incorrect offsets 3200 4794
btrfsck: extent-tree.c:2717: alloc_reserved_tree_block: Assertion 
`!(ret)' failed.
enabling repair mode
Checking filesystem on /dev/mapper/secondary-home
UUID: 31a5d433-4f7b-49cc-9bc0-9422471f5194

 > btrfsck --repair --init-extent-tree /dev/mapper/primary-home
parent transid verify failed on 2176851968 wanted 4792 found 4793
Ignoring transid failure
btrfsck: root-tree.c:80: btrfs_update_root: Assertion `!(ret != 0)' failed.
enabling repair mode
Checking filesystem on /dev/mapper/primary-home
UUID: 31a5d433-4f7b-49cc-9bc0-9422471f5194
Creating a new extent tree

 > btrfsck --repair --init-extent-tree /dev/mapper/secondary-home
parent transid verify failed on 2176851968 wanted 4792 found 4793
Ignoring transid failure
btrfsck: root-tree.c:80: btrfs_update_root: Assertion `!(ret != 0)' failed.
enabling repair mode
Checking filesystem on /dev/sda1
UUID: 31a5d433-4f7b-49cc-9bc0-9422471f5194
Creating a new extent tree

Can the FS be repaired or at least the data be recovered? Apparently I 
found a bug in btrfsck which needs fixing. If it helps, I attached the 
output of 'btrfs-debug-tree -e'.

[-- Attachment #2: debug-tree.log.gz --]
[-- Type: application/gzip, Size: 27584 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: coredump in btrfsck
  2014-01-01 21:27 coredump in btrfsck Oliver Mangold
@ 2014-01-01 21:58 ` Chris Murphy
  2014-01-01 22:35   ` Oliver Mangold
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Murphy @ 2014-01-01 21:58 UTC (permalink / raw)
  To: Btrfs BTRFS


On Jan 1, 2014, at 2:27 PM, Oliver Mangold <o.mangold@gmail.com> wrote:

> I fear, I broke my FS by running btrfsck. I tried 'btrfsck --repair' and it fixed several problems but finally crashed with some debug message from 'extent-tree.c', so I also tried 'btrfsck --repair --init-extent-tree'.

It is sort of a (near) last restort, you know this right? What did you try before btrfsck? Did you set dmesg -n7, then mount -o recovery and if so what was recorded in dmesg?

> 
> produces log messages:
> 
> Jan 01 21:45:09 home kernel: btrfs: device fsid 31a5d433-4f7b-49cc-9bc0-9422471f5194 devid 1 transid 4793 /dev/mapper/primary-home
> Jan 01 21:45:09 home kernel: btrfs: disk space caching is enabled
> Jan 01 21:45:09 home kernel: parent transid verify failed on 2176851968 wanted 4792 found 4793
> Jan 01 21:45:09 home kernel: parent transid verify failed on 2176851968 wanted 4792 found 4793
> Jan 01 21:45:09 home kernel: btrfs: open_ctree failed

If you previously tried -o recovery, before btrfsck did you try btrfs-zero-log and if so what were the results in console and in dmesg?


Chris Murphy


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: coredump in btrfsck
  2014-01-01 21:58 ` Chris Murphy
@ 2014-01-01 22:35   ` Oliver Mangold
  2014-01-02 17:37     ` Chris Murphy
  0 siblings, 1 reply; 7+ messages in thread
From: Oliver Mangold @ 2014-01-01 22:35 UTC (permalink / raw)
  To: linux-btrfs

On 01.01.2014 22:58, Chris Murphy wrote:
> On Jan 1, 2014, at 2:27 PM, Oliver Mangold <o.mangold@gmail.com> wrote:
>
>> I fear, I broke my FS by running btrfsck. I tried 'btrfsck --repair' and it fixed several problems but finally crashed with some debug message from 'extent-tree.c', so I also tried 'btrfsck --repair --init-extent-tree'.
> It is sort of a (near) last restort, you know this right? What did you try before btrfsck? Did you set dmesg -n7, then mount -o recovery and if so what was recorded in dmesg?
Ehm, actually, no. Before I ran btrfsck there was no reason to use '-o 
recovery' or something, because the filesystem seemed to work. But I was 
worried after running btrfsck, because the FS apparently was in an 
inconsistent state. So I tried 'btrfsck --repair' and when that crashed 
'btrfsck --init-extent-tree'. Didn't know it is considered 'last 
resort'. It did the trick for several previous problems and seemed to 
have no negative consequences, so I tried it now also.

But it looks like I can still recover my data with 'btrfs restore', so 
it's less critical than assumed.

Sorry, that I can't give you the logs you would have liked. Didn't 
expect anything bad to happen. I would just wsh that btrfsck could fix 
that kind of problem. Let me know if I can help.
>> produces log messages:
>>
>> Jan 01 21:45:09 home kernel: btrfs: device fsid 31a5d433-4f7b-49cc-9bc0-9422471f5194 devid 1 transid 4793 /dev/mapper/primary-home
>> Jan 01 21:45:09 home kernel: btrfs: disk space caching is enabled
>> Jan 01 21:45:09 home kernel: parent transid verify failed on 2176851968 wanted 4792 found 4793
>> Jan 01 21:45:09 home kernel: parent transid verify failed on 2176851968 wanted 4792 found 4793
>> Jan 01 21:45:09 home kernel: btrfs: open_ctree failed
> If you previously tried -o recovery, before btrfsck did you try btrfs-zero-log and if so what were the results in console and in dmesg?
>
>
> Chris Murphy
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: coredump in btrfsck
  2014-01-01 22:35   ` Oliver Mangold
@ 2014-01-02 17:37     ` Chris Murphy
  2014-01-03 12:33       ` Marc MERLIN
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Murphy @ 2014-01-02 17:37 UTC (permalink / raw)
  To: Btrfs BTRFS, Oliver Mangold


On Jan 1, 2014, at 3:35 PM, Oliver Mangold <o.mangold@gmail.com> wrote:

> On 01.01.2014 22:58, Chris Murphy wrote:
>> On Jan 1, 2014, at 2:27 PM, Oliver Mangold <o.mangold@gmail.com> wrote:
>> 
>>> I fear, I broke my FS by running btrfsck. I tried 'btrfsck --repair' and it fixed several problems but finally crashed with some debug message from 'extent-tree.c', so I also tried 'btrfsck --repair --init-extent-tree'.
>> It is sort of a (near) last restort, you know this right? What did you try before btrfsck? Did you set dmesg -n7, then mount -o recovery and if so what was recorded in dmesg?
> Ehm, actually, no.

https://btrfs.wiki.kernel.org/index.php/FAQ#When_will_Btrfs_have_a_fsck_like_tool.3F

This is a bit dated, but the general idea is to not use repair except on advice of a developer, and also there are still some risks. Just a week or so ago, one said it was a little dangerous still. So yeah, -o recovery should be the first choice.


> Before I ran btrfsck there was no reason to use '-o recovery' or something, because the filesystem seemed to work.

Ahh so you ran btrfsck without --repair first? It found problems so then you next ran it with --repair?


> But I was worried after running btrfsck, because the FS apparently was in an inconsistent state. So I tried 'btrfsck --repair' and when that crashed 'btrfsck --init-extent-tree'. Didn't know it is considered 'last resort'. It did the trick for several previous problems and seemed to have no negative consequences, so I tried it now also.

It's sort of a sledgehammer.


> But it looks like I can still recover my data with 'btrfs restore', so it's less critical than assumed.

That's good news.


> Sorry, that I can't give you the logs you would have liked.

Yeah, I'm not certain anyone can give you much advice without more details. What kernel version and btfs-progs were you using at the time of the problem, and the btrfsck? You report a crash but no dmesg of the crash?


> Didn't expect anything bad to happen. I would just wsh that btrfsck could fix that kind of problem. 

Sure but the problem is unclear, and you have no logs to make it more clear what even happened.



Chris Murphy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: coredump in btrfsck
  2014-01-02 17:37     ` Chris Murphy
@ 2014-01-03 12:33       ` Marc MERLIN
  2014-01-04  0:14         ` Chris Murphy
  0 siblings, 1 reply; 7+ messages in thread
From: Marc MERLIN @ 2014-01-03 12:33 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS, Oliver Mangold

On Thu, Jan 02, 2014 at 10:37:28AM -0700, Chris Murphy wrote:
> 
> On Jan 1, 2014, at 3:35 PM, Oliver Mangold <o.mangold@gmail.com> wrote:
> 
> > On 01.01.2014 22:58, Chris Murphy wrote:
> >> On Jan 1, 2014, at 2:27 PM, Oliver Mangold <o.mangold@gmail.com> wrote:
> >> 
> >>> I fear, I broke my FS by running btrfsck. I tried 'btrfsck --repair' and it fixed several problems but finally crashed with some debug message from 'extent-tree.c', so I also tried 'btrfsck --repair --init-extent-tree'.
> >> It is sort of a (near) last restort, you know this right? What did you try before btrfsck? Did you set dmesg -n7, then mount -o recovery and if so what was recorded in dmesg?
> > Ehm, actually, no.
> 
> https://btrfs.wiki.kernel.org/index.php/FAQ#When_will_Btrfs_have_a_fsck_like_tool.3F
> 
> This is a bit dated, but the general idea is to not use repair except on advice of a developer, and also there are still some risks. Just a week or so ago, one said it was a little dangerous still. So yeah, -o recovery should be the first choice.
 
I was thinking about this:
Considering that everyone out there has been conditioned/used to running
fsck on any filesystem if thre is a problem, and considering btrfs has
been different and likely will be for the forseable future, I'd like to
suggest the following:

In order to accomodate more users trying btrfs, the documentation for
btrfsck really needs to be changed. Neither the tool help nor the man
page say anything about 'this is not the fsck you're looking for', nor
point to the wiki above.

See:
gandalfthegreat:~# btrfsck 
usage: btrfs check [options] <device>

    Check an unmounted btrfs filesystem.
(...)
and
man btrfsck

Would it be possible for whoever maintains btrfs-tools to change both
the man page and the help included in the tool to clearly state that
running the fsck tool is unlikely to be the right course of action
and talk about btrfs-zero-log as well as mount -o recovery?

Cheers,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: coredump in btrfsck
  2014-01-03 12:33       ` Marc MERLIN
@ 2014-01-04  0:14         ` Chris Murphy
  2014-01-05  6:13           ` Marc MERLIN
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Murphy @ 2014-01-04  0:14 UTC (permalink / raw)
  To: Btrfs BTRFS, Hugo Mills; +Cc: Marc MERLIN, Eric Sandeen


On Jan 3, 2014, at 5:33 AM, Marc MERLIN <marc@merlins.org> wrote:
> 
> Would it be possible for whoever maintains btrfs-tools to change both
> the man page and the help included in the tool to clearly state that
> running the fsck tool is unlikely to be the right course of action
> and talk about btrfs-zero-log as well as mount -o recovery?

The problem FAQ doesn't even mention btrfsck so I think people are just getting around that page or making assumptions.
https://btrfs.wiki.kernel.org/index.php/Problem_FAQ

Should btrfs check (btrfsck without --repair) work similar to xfs_repair when the file system is not cleanly unmounted? If an XFS volume is not cleanly unmounted, running xfs_repair will instruct the user to first mount the volume so that the journal is replayed, then umount the volume, then run xfs_repair.

A possible variant of this for btrfs check: inform the user the first step in repairing a problem Btrfs volume is to use -o recovery, for more information see Btrfs FAQ <url> for additional problem solving recommendations.

?

Chris Murphy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: coredump in btrfsck
  2014-01-04  0:14         ` Chris Murphy
@ 2014-01-05  6:13           ` Marc MERLIN
  0 siblings, 0 replies; 7+ messages in thread
From: Marc MERLIN @ 2014-01-05  6:13 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS, Hugo Mills, Eric Sandeen

On Fri, Jan 03, 2014 at 05:14:56PM -0700, Chris Murphy wrote:
> 
> On Jan 3, 2014, at 5:33 AM, Marc MERLIN <marc@merlins.org> wrote:
> > 
> > Would it be possible for whoever maintains btrfs-tools to change both
> > the man page and the help included in the tool to clearly state that
> > running the fsck tool is unlikely to be the right course of action
> > and talk about btrfs-zero-log as well as mount -o recovery?
> 
> The problem FAQ doesn't even mention btrfsck so I think people are just getting around that page or making assumptions.
> https://btrfs.wiki.kernel.org/index.php/Problem_FAQ

It's easy to find btrfsck without the wiki, whether it's with dpkg -l,
rpm -ql, or command line completion.
My point is that as you said, it's most often not the command to use, it
can even do more damage than good, but neither its command line help,
nor the man page warn of anything dangerous or bad in using it.

Telling people they should have read a wiki instead of the canonical man
page isn't the right way to go longer term, nor how things are done on
linux usually.
 
> Should btrfs check (btrfsck without --repair) work similar to xfs_repair when the file system is not cleanly unmounted? If an XFS volume is not cleanly unmounted, running xfs_repair will instruct the user to first mount the volume so that the journal is replayed, then umount the volume, then run xfs_repair.

I don't know about what the actual tool does when it works, I've never
had it do anything useful for me, so I can't comment, except about the
fact that it should warn users about "I'm not the fsck you're used to or
are likely looking for"

> A possible variant of this for btrfs check: inform the user the first step in repairing a problem Btrfs volume is to use -o recovery, for more information see Btrfs FAQ <url> for additional problem solving recommendations.

Yes, along with tweaking the man page to say the same.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-01-05  8:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-01 21:27 coredump in btrfsck Oliver Mangold
2014-01-01 21:58 ` Chris Murphy
2014-01-01 22:35   ` Oliver Mangold
2014-01-02 17:37     ` Chris Murphy
2014-01-03 12:33       ` Marc MERLIN
2014-01-04  0:14         ` Chris Murphy
2014-01-05  6:13           ` Marc MERLIN

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.