All of lore.kernel.org
 help / color / mirror / Atom feed
* Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
@ 2012-10-08 14:16 Richard W.M. Jones
  2012-10-08 14:27 ` Chris Mason
  0 siblings, 1 reply; 23+ messages in thread
From: Richard W.M. Jones @ 2012-10-08 14:16 UTC (permalink / raw)
  To: linux-btrfs


I'm tracking this bug here:

https://bugzilla.redhat.com/show_bug.cgi?id=863978

Since approx. last week I'm seeing lots of failures in btrfs.  The
common factor seems to be that the filesystem is created (mkfs.btrfs
/dev/sda1) and then it is immediately used -- eg.  mounted or some
btrfs subtool is run on it.  There is no pause or sync between the
operations.

Typical errors include:

mkfs.btrfs /dev/sda1
mount -o  /dev/sda1 /sysroot/
[   96.384211] device fsid 962db3c0-4153-450b-9ca7-c9216e81afe3 devid 1 transid 3 /dev/sda1
[   96.385314] device fsid 962db3c0-4153-450b-9ca7-c9216e81afe3 devid 1 transid 3 /dev/sda1
[   96.394158] btrfs: disk space caching is enabled
[   96.428656] btrfs: failed to recover relocation
[   96.437190] btrfs: open_ctree failed

and:

btrfsck /dev/sda1
Check tree block failed, want=139264, have=0
Check tree block failed, want=139264, have=0
Check tree block failed, want=139264, have=0
read block failed check_tree_block
Couldn't read chunk root

(There are plenty of others, see the above bug link)

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://et.redhat.com/~rjones/virt-top

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
  2012-10-08 14:16 Anyone seeing lots of "Check tree block failed" and other errors with latest kernel? Richard W.M. Jones
@ 2012-10-08 14:27 ` Chris Mason
  2012-10-08 14:57   ` Richard W.M. Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Mason @ 2012-10-08 14:27 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: linux-btrfs

On Mon, Oct 08, 2012 at 08:16:42AM -0600, Richard W.M. Jones wrote:
> 
> I'm tracking this bug here:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=863978
> 
> Since approx. last week I'm seeing lots of failures in btrfs.  The
> common factor seems to be that the filesystem is created (mkfs.btrfs
> /dev/sda1) and then it is immediately used -- eg.  mounted or some
> btrfs subtool is run on it.  There is no pause or sync between the
> operations.

This was a problem on older btrfs-progs, but this commit:

btrfs-progs-0.19.20120817git043a639-1.fc19.i686

(043a639) has long had the fixes to flush things after mkfs.  Is there
any change the guest you're testing had an ancient progs on it?

-chris

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
  2012-10-08 14:27 ` Chris Mason
@ 2012-10-08 14:57   ` Richard W.M. Jones
  2012-10-08 15:04     ` Chris Mason
  0 siblings, 1 reply; 23+ messages in thread
From: Richard W.M. Jones @ 2012-10-08 14:57 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs

On Mon, Oct 08, 2012 at 10:27:57AM -0400, Chris Mason wrote:
> On Mon, Oct 08, 2012 at 08:16:42AM -0600, Richard W.M. Jones wrote:
> > 
> > I'm tracking this bug here:
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=863978
> > 
> > Since approx. last week I'm seeing lots of failures in btrfs.  The
> > common factor seems to be that the filesystem is created (mkfs.btrfs
> > /dev/sda1) and then it is immediately used -- eg.  mounted or some
> > btrfs subtool is run on it.  There is no pause or sync between the
> > operations.
> 
> This was a problem on older btrfs-progs, but this commit:
> 
> btrfs-progs-0.19.20120817git043a639-1.fc19.i686
> 
> (043a639) has long had the fixes to flush things after mkfs.  Is there
> any change the guest you're testing had an ancient progs on it?

We have a couple of guests where this fails.  One has
btrfs-progs-0.19.20120817git043a639-1.fc19.i686.  The other has
btrfs-progs-0.19-20.fc18 which appears to be based on
btrfs-progs-0.19.20120817git043a639.tar.bz2 plus some upstream
patches.

What is the commit which we need?  I can't see anything related to
this in the btrfs-progs git log.

I should note this was all working fine until very recently (under 5
days ago).  Nothing has changed in btrfs-progs in Fedora for a few
months.  Could this be related to a kernel change?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://et.redhat.com/~rjones/virt-df/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
  2012-10-08 14:57   ` Richard W.M. Jones
@ 2012-10-08 15:04     ` Chris Mason
  2012-10-08 15:15       ` Richard W.M. Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Mason @ 2012-10-08 15:04 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Chris Mason, linux-btrfs

On Mon, Oct 08, 2012 at 08:57:30AM -0600, Richard W.M. Jones wrote:
> On Mon, Oct 08, 2012 at 10:27:57AM -0400, Chris Mason wrote:
> > On Mon, Oct 08, 2012 at 08:16:42AM -0600, Richard W.M. Jones wrote:
> > > 
> > > I'm tracking this bug here:
> > > 
> > > https://bugzilla.redhat.com/show_bug.cgi?id=863978
> > > 
> > > Since approx. last week I'm seeing lots of failures in btrfs.  The
> > > common factor seems to be that the filesystem is created (mkfs.btrfs
> > > /dev/sda1) and then it is immediately used -- eg.  mounted or some
> > > btrfs subtool is run on it.  There is no pause or sync between the
> > > operations.
> > 
> > This was a problem on older btrfs-progs, but this commit:
> > 
> > btrfs-progs-0.19.20120817git043a639-1.fc19.i686
> > 
> > (043a639) has long had the fixes to flush things after mkfs.  Is there
> > any change the guest you're testing had an ancient progs on it?
> 
> We have a couple of guests where this fails.  One has
> btrfs-progs-0.19.20120817git043a639-1.fc19.i686.  The other has
> btrfs-progs-0.19-20.fc18 which appears to be based on
> btrfs-progs-0.19.20120817git043a639.tar.bz2 plus some upstream
> patches.
> 
> What is the commit which we need?  I can't see anything related to
> this in the btrfs-progs git log.

Sorry, I was remembering wrong.  I fixed this up in the kernel by
running invalidate_bdev during mount.  I just double checked and the
invalidates look right, so something strange must be going on.

If it is possible to reproduce this reliably, could you please check and
see if syncs do fix it?  We saw this often with xfstests in the past,
but haven't seen it since the invalidates were added.

-chris

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
  2012-10-08 15:04     ` Chris Mason
@ 2012-10-08 15:15       ` Richard W.M. Jones
  2012-10-08 15:18         ` Chris Mason
                           ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Richard W.M. Jones @ 2012-10-08 15:15 UTC (permalink / raw)
  To: Chris Mason, Chris Mason, linux-btrfs

On Mon, Oct 08, 2012 at 11:04:19AM -0400, Chris Mason wrote:
> On Mon, Oct 08, 2012 at 08:57:30AM -0600, Richard W.M. Jones wrote:
> > On Mon, Oct 08, 2012 at 10:27:57AM -0400, Chris Mason wrote:
> > > On Mon, Oct 08, 2012 at 08:16:42AM -0600, Richard W.M. Jones wrote:
> > > > 
> > > > I'm tracking this bug here:
> > > > 
> > > > https://bugzilla.redhat.com/show_bug.cgi?id=863978
> > > > 
> > > > Since approx. last week I'm seeing lots of failures in btrfs.  The
> > > > common factor seems to be that the filesystem is created (mkfs.btrfs
> > > > /dev/sda1) and then it is immediately used -- eg.  mounted or some
> > > > btrfs subtool is run on it.  There is no pause or sync between the
> > > > operations.
> > > 
> > > This was a problem on older btrfs-progs, but this commit:
> > > 
> > > btrfs-progs-0.19.20120817git043a639-1.fc19.i686
> > > 
> > > (043a639) has long had the fixes to flush things after mkfs.  Is there
> > > any change the guest you're testing had an ancient progs on it?
> > 
> > We have a couple of guests where this fails.  One has
> > btrfs-progs-0.19.20120817git043a639-1.fc19.i686.  The other has
> > btrfs-progs-0.19-20.fc18 which appears to be based on
> > btrfs-progs-0.19.20120817git043a639.tar.bz2 plus some upstream
> > patches.
> > 
> > What is the commit which we need?  I can't see anything related to
> > this in the btrfs-progs git log.
> 
> Sorry, I was remembering wrong.  I fixed this up in the kernel by
> running invalidate_bdev during mount.  I just double checked and the
> invalidates look right, so something strange must be going on.
> 
> If it is possible to reproduce this reliably, could you please check and
> see if syncs do fix it?  We saw this often with xfstests in the past,
> but haven't seen it since the invalidates were added.

Unfortunately I'm struggling to reproduce this outside of our build
system (Koji).  I will keep you informed if I do manage to reproduce
it locally.  Adding fsync /dev/sda1 was also my first instinct :-)

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://et.redhat.com/~rjones/virt-df/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
  2012-10-08 15:15       ` Richard W.M. Jones
@ 2012-10-08 15:18         ` Chris Mason
  2012-10-08 16:42         ` David Sterba
  2012-10-08 21:22         ` Richard W.M. Jones
  2 siblings, 0 replies; 23+ messages in thread
From: Chris Mason @ 2012-10-08 15:18 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Chris Mason, linux-btrfs

On Mon, Oct 08, 2012 at 09:15:14AM -0600, Richard W.M. Jones wrote:
> On Mon, Oct 08, 2012 at 11:04:19AM -0400, Chris Mason wrote:
> > On Mon, Oct 08, 2012 at 08:57:30AM -0600, Richard W.M. Jones wrote:
> > > On Mon, Oct 08, 2012 at 10:27:57AM -0400, Chris Mason wrote:
> > > > On Mon, Oct 08, 2012 at 08:16:42AM -0600, Richard W.M. Jones wrote:
> > > > > 
> > > > > I'm tracking this bug here:
> > > > > 
> > > > > https://bugzilla.redhat.com/show_bug.cgi?id=863978
> > > > > 
> > > > > Since approx. last week I'm seeing lots of failures in btrfs.  The
> > > > > common factor seems to be that the filesystem is created (mkfs.btrfs
> > > > > /dev/sda1) and then it is immediately used -- eg.  mounted or some
> > > > > btrfs subtool is run on it.  There is no pause or sync between the
> > > > > operations.
> > > > 
> > > > This was a problem on older btrfs-progs, but this commit:
> > > > 
> > > > btrfs-progs-0.19.20120817git043a639-1.fc19.i686
> > > > 
> > > > (043a639) has long had the fixes to flush things after mkfs.  Is there
> > > > any change the guest you're testing had an ancient progs on it?
> > > 
> > > We have a couple of guests where this fails.  One has
> > > btrfs-progs-0.19.20120817git043a639-1.fc19.i686.  The other has
> > > btrfs-progs-0.19-20.fc18 which appears to be based on
> > > btrfs-progs-0.19.20120817git043a639.tar.bz2 plus some upstream
> > > patches.
> > > 
> > > What is the commit which we need?  I can't see anything related to
> > > this in the btrfs-progs git log.
> > 
> > Sorry, I was remembering wrong.  I fixed this up in the kernel by
> > running invalidate_bdev during mount.  I just double checked and the
> > invalidates look right, so something strange must be going on.
> > 
> > If it is possible to reproduce this reliably, could you please check and
> > see if syncs do fix it?  We saw this often with xfstests in the past,
> > but haven't seen it since the invalidates were added.
> 
> Unfortunately I'm struggling to reproduce this outside of our build
> system (Koji).  I will keep you informed if I do manage to reproduce
> it locally.  Adding fsync /dev/sda1 was also my first instinct :-)

When we saw this during xfstests, the fsync wasn't sufficient.  It was
really pretty maddening and the invalidate was a nuke it from orbit
style solution.

The kernel side of the invalidate may have changed, so your first
instinct of a kernel change is probably right.

-chris


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
  2012-10-08 15:15       ` Richard W.M. Jones
  2012-10-08 15:18         ` Chris Mason
@ 2012-10-08 16:42         ` David Sterba
  2012-10-08 17:01           ` Richard W.M. Jones
  2012-10-08 21:22         ` Richard W.M. Jones
  2 siblings, 1 reply; 23+ messages in thread
From: David Sterba @ 2012-10-08 16:42 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Chris Mason, Chris Mason, linux-btrfs

On Mon, Oct 08, 2012 at 04:15:14PM +0100, Richard W.M. Jones wrote:
> Unfortunately I'm struggling to reproduce this outside of our build
> system (Koji).  I will keep you informed if I do manage to reproduce
> it locally.  Adding fsync /dev/sda1 was also my first instinct :-)

Have you updated the VM/guest related packages recently? This may be a
bug in the VM drivers.

david

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
  2012-10-08 16:42         ` David Sterba
@ 2012-10-08 17:01           ` Richard W.M. Jones
  0 siblings, 0 replies; 23+ messages in thread
From: Richard W.M. Jones @ 2012-10-08 17:01 UTC (permalink / raw)
  To: Chris Mason, Chris Mason, linux-btrfs

On Mon, Oct 08, 2012 at 06:42:27PM +0200, David Sterba wrote:
> On Mon, Oct 08, 2012 at 04:15:14PM +0100, Richard W.M. Jones wrote:
> > Unfortunately I'm struggling to reproduce this outside of our build
> > system (Koji).  I will keep you informed if I do manage to reproduce
> > it locally.  Adding fsync /dev/sda1 was also my first instinct :-)
> 
> Have you updated the VM/guest related packages recently? This may be a
> bug in the VM drivers.

qemu hasn't been updated for over a week.  However I'm having a hard
time understanding how even a change to qemu's caching would in any
way affect only btrfs and nothing else.  The libguestfs test suite is
extremely comprehensive and tests many other filesystems, and none of
them are failing.

These guests are built on the fly from the latest packages in Fedora,
so any other package might be the cause, but it seems like the kernel
is the most likely candidate.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://et.redhat.com/~rjones/virt-df/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
  2012-10-08 15:15       ` Richard W.M. Jones
  2012-10-08 15:18         ` Chris Mason
  2012-10-08 16:42         ` David Sterba
@ 2012-10-08 21:22         ` Richard W.M. Jones
  2012-10-09  0:00           ` Chris Mason
  2012-10-10 11:49           ` Richard W.M. Jones
  2 siblings, 2 replies; 23+ messages in thread
From: Richard W.M. Jones @ 2012-10-08 21:22 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs

On Mon, Oct 08, 2012 at 04:15:14PM +0100, Richard W.M. Jones wrote:
> On Mon, Oct 08, 2012 at 11:04:19AM -0400, Chris Mason wrote:
> > On Mon, Oct 08, 2012 at 08:57:30AM -0600, Richard W.M. Jones wrote:
> > > On Mon, Oct 08, 2012 at 10:27:57AM -0400, Chris Mason wrote:
> > > > On Mon, Oct 08, 2012 at 08:16:42AM -0600, Richard W.M. Jones wrote:
> > > > > 
> > > > > I'm tracking this bug here:
> > > > > 
> > > > > https://bugzilla.redhat.com/show_bug.cgi?id=863978
> > > > > 
> > > > > Since approx. last week I'm seeing lots of failures in btrfs.  The
> > > > > common factor seems to be that the filesystem is created (mkfs.btrfs
> > > > > /dev/sda1) and then it is immediately used -- eg.  mounted or some
> > > > > btrfs subtool is run on it.  There is no pause or sync between the
> > > > > operations.
> > > > 
> > > > This was a problem on older btrfs-progs, but this commit:
> > > > 
> > > > btrfs-progs-0.19.20120817git043a639-1.fc19.i686
> > > > 
> > > > (043a639) has long had the fixes to flush things after mkfs.  Is there
> > > > any change the guest you're testing had an ancient progs on it?
> > > 
> > > We have a couple of guests where this fails.  One has
> > > btrfs-progs-0.19.20120817git043a639-1.fc19.i686.  The other has
> > > btrfs-progs-0.19-20.fc18 which appears to be based on
> > > btrfs-progs-0.19.20120817git043a639.tar.bz2 plus some upstream
> > > patches.
> > > 
> > > What is the commit which we need?  I can't see anything related to
> > > this in the btrfs-progs git log.
> > 
> > Sorry, I was remembering wrong.  I fixed this up in the kernel by
> > running invalidate_bdev during mount.  I just double checked and the
> > invalidates look right, so something strange must be going on.
> > 
> > If it is possible to reproduce this reliably, could you please check and
> > see if syncs do fix it?  We saw this often with xfstests in the past,
> > but haven't seen it since the invalidates were added.
> 
> Unfortunately I'm struggling to reproduce this outside of our build
> system (Koji).  I will keep you informed if I do manage to reproduce
> it locally.  Adding fsync /dev/sda1 was also my first instinct :-)

I have now reproduced this bug locally.

Adding sync() + fsync of each /dev/sd* device after the mkfs command
does appear to fix the problem.

However it's a little bit difficult to know for sure because I might
just be changing the timing of things by adding these calls.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://et.redhat.com/~rjones/virt-df/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
  2012-10-08 21:22         ` Richard W.M. Jones
@ 2012-10-09  0:00           ` Chris Mason
  2012-10-09  7:20             ` Richard W.M. Jones
  2012-10-10 11:49           ` Richard W.M. Jones
  1 sibling, 1 reply; 23+ messages in thread
From: Chris Mason @ 2012-10-09  0:00 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Chris Mason, linux-btrfs

On Mon, Oct 08, 2012 at 03:22:30PM -0600, Richard W.M. Jones wrote:
> 
> I have now reproduced this bug locally.
> 
> Adding sync() + fsync of each /dev/sd* device after the mkfs command
> does appear to fix the problem.
> 
> However it's a little bit difficult to know for sure because I might
> just be changing the timing of things by adding these calls.

Ok, what's a rough idea of the mainline git equiv of the buggy kernel?

-chris


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
  2012-10-09  0:00           ` Chris Mason
@ 2012-10-09  7:20             ` Richard W.M. Jones
  2012-10-09  7:33               ` Richard W.M. Jones
  2012-10-09  9:16               ` David Sterba
  0 siblings, 2 replies; 23+ messages in thread
From: Richard W.M. Jones @ 2012-10-09  7:20 UTC (permalink / raw)
  To: Chris Mason, Chris Mason, linux-btrfs

On Mon, Oct 08, 2012 at 08:00:51PM -0400, Chris Mason wrote:
> On Mon, Oct 08, 2012 at 03:22:30PM -0600, Richard W.M. Jones wrote:
> > 
> > I have now reproduced this bug locally.
> > 
> > Adding sync() + fsync of each /dev/sd* device after the mkfs command
> > does appear to fix the problem.
> > 
> > However it's a little bit difficult to know for sure because I might
> > just be changing the timing of things by adding these calls.
> 
> Ok, what's a rough idea of the mainline git equiv of the buggy kernel?

On my local machine, I'm reproducing this with what Fedora calls
3.7.0-0.rc0.git2.4.fc19.x86_64 (note I found an unrelated but very
serious bug in this kernel:
http://marc.info/?l=linux-kernel&m=134973394826408&w=2 )

In Fedora we apply several patches on top, but none of them
appear as if they would affect btrfs or sync/invalidate paths:

http://pkgs.fedoraproject.org/cgit/kernel.git/tree/

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
  2012-10-09  7:20             ` Richard W.M. Jones
@ 2012-10-09  7:33               ` Richard W.M. Jones
  2012-10-09  9:00                 ` David Sterba
  2012-10-09  9:16               ` David Sterba
  1 sibling, 1 reply; 23+ messages in thread
From: Richard W.M. Jones @ 2012-10-09  7:33 UTC (permalink / raw)
  To: Chris Mason, Chris Mason, linux-btrfs

On Tue, Oct 09, 2012 at 08:20:02AM +0100, Richard W.M. Jones wrote:
> On Mon, Oct 08, 2012 at 08:00:51PM -0400, Chris Mason wrote:
> > On Mon, Oct 08, 2012 at 03:22:30PM -0600, Richard W.M. Jones wrote:
> > > 
> > > I have now reproduced this bug locally.
> > > 
> > > Adding sync() + fsync of each /dev/sd* device after the mkfs command
> > > does appear to fix the problem.
> > > 
> > > However it's a little bit difficult to know for sure because I might
> > > just be changing the timing of things by adding these calls.
> > 
> > Ok, what's a rough idea of the mainline git equiv of the buggy kernel?
> 
> On my local machine, I'm reproducing this with what Fedora calls
> 3.7.0-0.rc0.git2.4.fc19.x86_64

OK, that's not very helpful is it :-)  AFAIK it should be possible
to reproduce this with Linus's git kernel, but I haven't proven
that yet.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://et.redhat.com/~rjones/virt-df/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
  2012-10-09  7:33               ` Richard W.M. Jones
@ 2012-10-09  9:00                 ` David Sterba
  2012-10-10 12:38                   ` Chris Mason
  0 siblings, 1 reply; 23+ messages in thread
From: David Sterba @ 2012-10-09  9:00 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Chris Mason, Chris Mason, linux-btrfs

On Tue, Oct 09, 2012 at 08:33:57AM +0100, Richard W.M. Jones wrote:
> On Tue, Oct 09, 2012 at 08:20:02AM +0100, Richard W.M. Jones wrote:
> > On Mon, Oct 08, 2012 at 08:00:51PM -0400, Chris Mason wrote:
> > > Ok, what's a rough idea of the mainline git equiv of the buggy kernel?
> > 
> > On my local machine, I'm reproducing this with what Fedora calls
> > 3.7.0-0.rc0.git2.4.fc19.x86_64
> 
> OK, that's not very helpful is it :-)  AFAIK it should be possible
> to reproduce this with Linus's git kernel, but I haven't proven
> that yet.

Found the same error message in my logs with master+next:

Oct  8 15:07:25 kernel: [13048.856283] device fsid cd15a893-e955-49cc-989c-4fd952a838a6 devid 1 transid 3 /dev/sda9
Oct  8 15:07:25 kernel: [13048.866880] btrfs: disk space caching is enabled
Oct  8 15:07:25 kernel: [13048.875767] btrfs: failed to recover relocation
Oct  8 15:07:25 kernel: [13048.884662] btrfs: open_ctree failed

There are some xfstests that triggered the related bug with stale data,
I'm investigating further.


david

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
  2012-10-09  7:20             ` Richard W.M. Jones
  2012-10-09  7:33               ` Richard W.M. Jones
@ 2012-10-09  9:16               ` David Sterba
  2012-10-09  9:26                 ` Richard W.M. Jones
  1 sibling, 1 reply; 23+ messages in thread
From: David Sterba @ 2012-10-09  9:16 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Chris Mason, Chris Mason, linux-btrfs

On Tue, Oct 09, 2012 at 08:20:02AM +0100, Richard W.M. Jones wrote:
> On my local machine, I'm reproducing this with what Fedora calls
> 3.7.0-0.rc0.git2.4.fc19.x86_64 (note I found an unrelated but very
> serious bug in this kernel:
> http://marc.info/?l=linux-kernel&m=134973394826408&w=2 )

And it's going to be fixed

http://marc.info/?l=linux-fsdevel&m=134973493227376&w=2

david

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
  2012-10-09  9:16               ` David Sterba
@ 2012-10-09  9:26                 ` Richard W.M. Jones
  0 siblings, 0 replies; 23+ messages in thread
From: Richard W.M. Jones @ 2012-10-09  9:26 UTC (permalink / raw)
  To: linux-btrfs

On Tue, Oct 09, 2012 at 11:16:57AM +0200, David Sterba wrote:
> On Tue, Oct 09, 2012 at 08:20:02AM +0100, Richard W.M. Jones wrote:
> > On my local machine, I'm reproducing this with what Fedora calls
> > 3.7.0-0.rc0.git2.4.fc19.x86_64 (note I found an unrelated but very
> > serious bug in this kernel:
> > http://marc.info/?l=linux-kernel&m=134973394826408&w=2 )
> 
> And it's going to be fixed
> 
> http://marc.info/?l=linux-fsdevel&m=134973493227376&w=2

And this one too ...

http://marc.info/?l=linux-fsdevel&m=134977414011004&w=2

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
  2012-10-08 21:22         ` Richard W.M. Jones
  2012-10-09  0:00           ` Chris Mason
@ 2012-10-10 11:49           ` Richard W.M. Jones
  1 sibling, 0 replies; 23+ messages in thread
From: Richard W.M. Jones @ 2012-10-10 11:49 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs

On Mon, Oct 08, 2012 at 10:22:30PM +0100, Richard W.M. Jones wrote:
> Adding sync() + fsync of each /dev/sd* device after the mkfs command
> does appear to fix the problem.
> 
> However it's a little bit difficult to know for sure because I might
> just be changing the timing of things by adding these calls.

An update:

Although doing the sync + fsync certainly makes the bug much much
rarer, it does not entirely eliminate it.  I have now seen one case
where this still happened (log below).

If there's anything else you'd like me to test, including kernel
patches, just let me know.

Rich.

Extract from the full log at:
http://kojipkgs.fedoraproject.org//work/tasks/7186/4577186/build.log

modprobe btrfs
[   15.823412] Btrfs loaded
grep ^[[:space:]]*btrfs$ /proc/filesystems
mkfs.btrfs /dev/sda1 /dev/sdb1
[   16.740868] device fsid 25aaca9b-0192-4cfd-a9eb-37fd222c2c8f devid 1 transid 1 /dev/sda1
[   16.743227] device fsid 25aaca9b-0192-4cfd-a9eb-37fd222c2c8f devid 1 transid 1 /dev/sda1
[   17.446334] device fsid 25aaca9b-0192-4cfd-a9eb-37fd222c2c8f devid 2 transid 3 /dev/sdb1
fsync /dev/sda
fsync /dev/sdb
fsync /dev/sdc
fsync /dev/sdd
libguestfs: recv_from_daemon: 40 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 01 3d | 00 00 00 01 | 00 12 34 04 | ...
libguestfs: trace: mkfs_btrfs = 0
libguestfs: trace: mount "/dev/sda1" "/"
libguestfs: send_to_daemon: 68 bytes: 00 00 00 40 | 20 00 f5 f5 | 00 00 00 04 | 00 00 00 01 | 00 00 00 00 | ...
guestfsd: main_loop: proc 317 (mkfs_btrfs) took 2.39 seconds
guestfsd: main_loop: new request, len 0x40
mount -o  /dev/sda1 /sysroot/
[   17.838747] device fsid 25aaca9b-0192-4cfd-a9eb-37fd222c2c8f devid 2 transid 4 /dev/sdb1
[   17.917277] device fsid 25aaca9b-0192-4cfd-a9eb-37fd222c2c8f devid 1 transid 4 /dev/sda1
[   18.084520] device fsid 25aaca9b-0192-4cfd-a9eb-37fd222c2c8f devid 1 transid 4 /dev/sda1
[   18.132447] device fsid 25aaca9b-0192-4cfd-a9eb-37fd222c2c8f devid 1 transid 4 /dev/sda1
[   18.176566] btrfs: disk space caching is enabled
[   18.200901] btrfs bad tree block start 0 135168
[   18.213456] btrfs: open_ctree failed
mount: wrong fs type, bad option, bad superblock on /dev/sda1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so
guestfsd: error: /dev/sda1 on / (options: ''): mount: wrong fs type, bad option, bad superblock on /dev/sda1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so
libguestfs: recv_from_daemon: 284 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 00 01 | 00 00 00 01 | 00 12 34 05 | ...
libguestfs: trace: mount = -1 (error)

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://et.redhat.com/~rjones/virt-df/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
  2012-10-09  9:00                 ` David Sterba
@ 2012-10-10 12:38                   ` Chris Mason
  2012-10-10 19:38                     ` Richard W.M. Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Mason @ 2012-10-10 12:38 UTC (permalink / raw)
  To: David Sterba; +Cc: Richard W.M. Jones, Chris Mason, linux-btrfs

On Tue, Oct 09, 2012 at 03:00:12AM -0600, David Sterba wrote:
> On Tue, Oct 09, 2012 at 08:33:57AM +0100, Richard W.M. Jones wrote:
> > On Tue, Oct 09, 2012 at 08:20:02AM +0100, Richard W.M. Jones wrote:
> > > On Mon, Oct 08, 2012 at 08:00:51PM -0400, Chris Mason wrote:
> > > > Ok, what's a rough idea of the mainline git equiv of the buggy kernel?
> > > 
> > > On my local machine, I'm reproducing this with what Fedora calls
> > > 3.7.0-0.rc0.git2.4.fc19.x86_64
> > 
> > OK, that's not very helpful is it :-)  AFAIK it should be possible
> > to reproduce this with Linus's git kernel, but I haven't proven
> > that yet.
> 
> Found the same error message in my logs with master+next:
> 
> Oct  8 15:07:25 kernel: [13048.856283] device fsid cd15a893-e955-49cc-989c-4fd952a838a6 devid 1 transid 3 /dev/sda9
> Oct  8 15:07:25 kernel: [13048.866880] btrfs: disk space caching is enabled
> Oct  8 15:07:25 kernel: [13048.875767] btrfs: failed to recover relocation
> Oct  8 15:07:25 kernel: [13048.884662] btrfs: open_ctree failed
> 
> There are some xfstests that triggered the related bug with stale data,
> I'm investigating further.

Check your progs, this commit was updated to continue instead of break.

https://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git;a=commitdiff;h=6eba9002956ac40db87d42fb653a0524dc568810;hp=bc130ecd0260e4ee6ffe07ae43fc90db281a4daa

The original commit triggered those errors during 204.

-chris


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
  2012-10-10 12:38                   ` Chris Mason
@ 2012-10-10 19:38                     ` Richard W.M. Jones
  2012-10-10 19:41                       ` Chris Mason
  0 siblings, 1 reply; 23+ messages in thread
From: Richard W.M. Jones @ 2012-10-10 19:38 UTC (permalink / raw)
  To: Chris Mason, David Sterba, Chris Mason, linux-btrfs

On Wed, Oct 10, 2012 at 08:38:08AM -0400, Chris Mason wrote:
> On Tue, Oct 09, 2012 at 03:00:12AM -0600, David Sterba wrote:
> > On Tue, Oct 09, 2012 at 08:33:57AM +0100, Richard W.M. Jones wrote:
> > > On Tue, Oct 09, 2012 at 08:20:02AM +0100, Richard W.M. Jones wrote:
> > > > On Mon, Oct 08, 2012 at 08:00:51PM -0400, Chris Mason wrote:
> > > > > Ok, what's a rough idea of the mainline git equiv of the buggy kernel?
> > > > 
> > > > On my local machine, I'm reproducing this with what Fedora calls
> > > > 3.7.0-0.rc0.git2.4.fc19.x86_64
> > > 
> > > OK, that's not very helpful is it :-)  AFAIK it should be possible
> > > to reproduce this with Linus's git kernel, but I haven't proven
> > > that yet.
> > 
> > Found the same error message in my logs with master+next:
> > 
> > Oct  8 15:07:25 kernel: [13048.856283] device fsid cd15a893-e955-49cc-989c-4fd952a838a6 devid 1 transid 3 /dev/sda9
> > Oct  8 15:07:25 kernel: [13048.866880] btrfs: disk space caching is enabled
> > Oct  8 15:07:25 kernel: [13048.875767] btrfs: failed to recover relocation
> > Oct  8 15:07:25 kernel: [13048.884662] btrfs: open_ctree failed
> > 
> > There are some xfstests that triggered the related bug with stale data,
> > I'm investigating further.
> 
> Check your progs, this commit was updated to continue instead of break.
> 
> https://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git;a=commitdiff;h=6eba9002956ac40db87d42fb653a0524dc568810;hp=bc130ecd0260e4ee6ffe07ae43fc90db281a4daa
> 
> The original commit triggered those errors during 204.

It does seem as if adding that commit to btrfs-progs fixes the
original bug I was reporting.  As before, my test isn't very reliable,
so I cannot be 100% sure.  I will continue running tests.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
  2012-10-10 19:38                     ` Richard W.M. Jones
@ 2012-10-10 19:41                       ` Chris Mason
  2012-10-10 19:46                         ` Richard W.M. Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Mason @ 2012-10-10 19:41 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Chris Mason, David Sterba, linux-btrfs

On Wed, Oct 10, 2012 at 01:38:53PM -0600, Richard W.M. Jones wrote:
> On Wed, Oct 10, 2012 at 08:38:08AM -0400, Chris Mason wrote:
> > On Tue, Oct 09, 2012 at 03:00:12AM -0600, David Sterba wrote:
> > > On Tue, Oct 09, 2012 at 08:33:57AM +0100, Richard W.M. Jones wrote:
> > > > On Tue, Oct 09, 2012 at 08:20:02AM +0100, Richard W.M. Jones wrote:
> > > > > On Mon, Oct 08, 2012 at 08:00:51PM -0400, Chris Mason wrote:
> > > > > > Ok, what's a rough idea of the mainline git equiv of the buggy kernel?
> > > > > 
> > > > > On my local machine, I'm reproducing this with what Fedora calls
> > > > > 3.7.0-0.rc0.git2.4.fc19.x86_64
> > > > 
> > > > OK, that's not very helpful is it :-)  AFAIK it should be possible
> > > > to reproduce this with Linus's git kernel, but I haven't proven
> > > > that yet.
> > > 
> > > Found the same error message in my logs with master+next:
> > > 
> > > Oct  8 15:07:25 kernel: [13048.856283] device fsid cd15a893-e955-49cc-989c-4fd952a838a6 devid 1 transid 3 /dev/sda9
> > > Oct  8 15:07:25 kernel: [13048.866880] btrfs: disk space caching is enabled
> > > Oct  8 15:07:25 kernel: [13048.875767] btrfs: failed to recover relocation
> > > Oct  8 15:07:25 kernel: [13048.884662] btrfs: open_ctree failed
> > > 
> > > There are some xfstests that triggered the related bug with stale data,
> > > I'm investigating further.
> > 
> > Check your progs, this commit was updated to continue instead of break.
> > 
> > https://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git;a=commitdiff;h=6eba9002956ac40db87d42fb653a0524dc568810;hp=bc130ecd0260e4ee6ffe07ae43fc90db281a4daa
> > 
> > The original commit triggered those errors during 204.
> 
> It does seem as if adding that commit to btrfs-progs fixes the
> original bug I was reporting.  As before, my test isn't very reliable,
> so I cannot be 100% sure.  I will continue running tests.

I didn't mention that one earlier because the git commit id in your
progs version string never had the buggy commit.

-chris


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
  2012-10-10 19:41                       ` Chris Mason
@ 2012-10-10 19:46                         ` Richard W.M. Jones
  2012-10-11  7:28                           ` Richard W.M. Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Richard W.M. Jones @ 2012-10-10 19:46 UTC (permalink / raw)
  To: Chris Mason, Chris Mason, David Sterba, linux-btrfs

On Wed, Oct 10, 2012 at 03:41:13PM -0400, Chris Mason wrote:
> On Wed, Oct 10, 2012 at 01:38:53PM -0600, Richard W.M. Jones wrote:
> > On Wed, Oct 10, 2012 at 08:38:08AM -0400, Chris Mason wrote:
> > > On Tue, Oct 09, 2012 at 03:00:12AM -0600, David Sterba wrote:
> > > > On Tue, Oct 09, 2012 at 08:33:57AM +0100, Richard W.M. Jones wrote:
> > > > > On Tue, Oct 09, 2012 at 08:20:02AM +0100, Richard W.M. Jones wrote:
> > > > > > On Mon, Oct 08, 2012 at 08:00:51PM -0400, Chris Mason wrote:
> > > > > > > Ok, what's a rough idea of the mainline git equiv of the buggy kernel?
> > > > > > 
> > > > > > On my local machine, I'm reproducing this with what Fedora calls
> > > > > > 3.7.0-0.rc0.git2.4.fc19.x86_64
> > > > > 
> > > > > OK, that's not very helpful is it :-)  AFAIK it should be possible
> > > > > to reproduce this with Linus's git kernel, but I haven't proven
> > > > > that yet.
> > > > 
> > > > Found the same error message in my logs with master+next:
> > > > 
> > > > Oct  8 15:07:25 kernel: [13048.856283] device fsid cd15a893-e955-49cc-989c-4fd952a838a6 devid 1 transid 3 /dev/sda9
> > > > Oct  8 15:07:25 kernel: [13048.866880] btrfs: disk space caching is enabled
> > > > Oct  8 15:07:25 kernel: [13048.875767] btrfs: failed to recover relocation
> > > > Oct  8 15:07:25 kernel: [13048.884662] btrfs: open_ctree failed
> > > > 
> > > > There are some xfstests that triggered the related bug with stale data,
> > > > I'm investigating further.
> > > 
> > > Check your progs, this commit was updated to continue instead of break.
> > > 
> > > https://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git;a=commitdiff;h=6eba9002956ac40db87d42fb653a0524dc568810;hp=bc130ecd0260e4ee6ffe07ae43fc90db281a4daa
> > > 
> > > The original commit triggered those errors during 204.
> > 
> > It does seem as if adding that commit to btrfs-progs fixes the
> > original bug I was reporting.  As before, my test isn't very reliable,
> > so I cannot be 100% sure.  I will continue running tests.
> 
> I didn't mention that one earlier because the git commit id in your
> progs version string never had the buggy commit.

The git commit we have in Fedora is 043a639, which is earlier than
this commit.  In any case it doesn't matter because I've suggested in
the Fedora bug that we upgrade to the latest git.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into Xen guests.
http://et.redhat.com/~rjones/virt-p2v

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
  2012-10-10 19:46                         ` Richard W.M. Jones
@ 2012-10-11  7:28                           ` Richard W.M. Jones
  2012-10-11 11:26                             ` Chris Mason
  0 siblings, 1 reply; 23+ messages in thread
From: Richard W.M. Jones @ 2012-10-11  7:28 UTC (permalink / raw)
  To: Chris Mason, Chris Mason, David Sterba, linux-btrfs

Well the bad news is that the bug happened again overnight, even
though we were definitely using btrfs-progs with the 6eba90029 patch
added, _and_ it was doing a sync + fsync between the mkfs and the
mount.

Here is the log:

modprobe btrfs
[   15.716610] Btrfs loaded
grep ^[[:space:]]*btrfs$ /proc/filesystems
mkfs.btrfs /dev/sda1 /dev/sdb1

[   16.656467] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 1 transid 1 /dev/sda1
[   16.657467] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 1 transid 1 /dev/sda1
[   17.227381] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 2 transid 3 /dev/sdb1
fsync /dev/sda
fsync /dev/sdb
fsync /dev/sdc
fsync /dev/sdd
libguestfs: recv_from_daemon: 40 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 01 3d | 00 00 00 01 | 00 12 34 04 | ...
libguestfs: trace: mkfs_btrfs = 0
libguestfs: trace: mount "/dev/sda1" "/"
libguestfs: send_to_daemon: 68 bytes: 00 00 00 40 | 20 00 f5 f5 | 00 00 00 04 | 00 00 00 01 | 00 00 00 00 | ...
guestfsd: main_loop: proc 317 (mkfs_btrfs) took 2.22 seconds
guestfsd: main_loop: new request, len 0x40
mount -o  /dev/sda1 /sysroot/
[   17.512337] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 1 transid 4 /dev/sda1
[   17.758300] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 2 transid 4 /dev/sdb1
[   17.857285] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 1 transid 4 /dev/sda1
[   17.893279] device fsid fb238b09-83b9-4b18-9545-d1fae8f5d489 devid 1 transid 4 /dev/sda1
[   17.909277] btrfs: disk space caching is enabled
[   17.943272] btrfs bad tree block start 0 135168
[   17.955270] btrfs: open_ctree failed
mount: wrong fs type, bad option, bad superblock on /dev/sda1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so
guestfsd: error: /dev/sda1 on / (options: ''): mount: wrong fs type, bad option, bad superblock on /dev/sda1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail or so
libguestfs: recv_from_daemon: 284 bytes: 20 00 f5 f5 | 00 00 00 04 | 00 00 00 01 | 00 00 00 01 | 00 12 34 05 | ...
libguestfs: trace: mount = -1 (error)

http://kojipkgs.fedoraproject.org//work/tasks/9775/4579775/build.log

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
New in Fedora 11: Fedora Windows cross-compiler. Compile Windows
programs, test, and build Windows installers. Over 70 libraries supprt'd
http://fedoraproject.org/wiki/MinGW http://www.annexia.org/fedora_mingw

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
  2012-10-11  7:28                           ` Richard W.M. Jones
@ 2012-10-11 11:26                             ` Chris Mason
  2012-10-29 14:52                               ` Richard W.M. Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Mason @ 2012-10-11 11:26 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Chris Mason, David Sterba, linux-btrfs

On Thu, Oct 11, 2012 at 01:28:21AM -0600, Richard W.M. Jones wrote:
> Well the bad news is that the bug happened again overnight, even
> though we were definitely using btrfs-progs with the 6eba90029 patch
> added, _and_ it was doing a sync + fsync between the mkfs and the
> mount.

This is good just because it makes the most sense.  The only thing worse
than a bug is a bug that disappears for the wrong reasons ;)

> 
> Here is the log:
> [   17.943272] btrfs bad tree block start 0 135168
> [   17.955270] btrfs: open_ctree failed

This is also good because it really points to the invalidate.  You've
got zeros where we wrote 135168, and pretty much the only way to get
zeros on a disk block is if the kernel did a memset.  Sure some app
could have written the zeros there, but that block offset is unlikely to
get allocated as a data block by the other filesystems.

So, I'll go back to the invalidate code ;)

-chris


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Anyone seeing lots of "Check tree block failed" and other errors with latest kernel?
  2012-10-11 11:26                             ` Chris Mason
@ 2012-10-29 14:52                               ` Richard W.M. Jones
  0 siblings, 0 replies; 23+ messages in thread
From: Richard W.M. Jones @ 2012-10-29 14:52 UTC (permalink / raw)
  To: Chris Mason, Chris Mason, David Sterba, linux-btrfs

On Thu, Oct 11, 2012 at 07:26:28AM -0400, Chris Mason wrote:
> On Thu, Oct 11, 2012 at 01:28:21AM -0600, Richard W.M. Jones wrote:
> > Well the bad news is that the bug happened again overnight, even
> > though we were definitely using btrfs-progs with the 6eba90029 patch
> > added, _and_ it was doing a sync + fsync between the mkfs and the
> > mount.
> 
> This is good just because it makes the most sense.  The only thing worse
> than a bug is a bug that disappears for the wrong reasons ;)
> 
> > 
> > Here is the log:
> > [   17.943272] btrfs bad tree block start 0 135168
> > [   17.955270] btrfs: open_ctree failed
> 
> This is also good because it really points to the invalidate.  You've
> got zeros where we wrote 135168, and pretty much the only way to get
> zeros on a disk block is if the kernel did a memset.  Sure some app
> could have written the zeros there, but that block offset is unlikely to
> get allocated as a data block by the other filesystems.
> 
> So, I'll go back to the invalidate code ;)

Any luck on this?  It's still happening in the latest kernels.  If
there's anything / patch you want me to try, let me know.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2012-10-29 14:52 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-08 14:16 Anyone seeing lots of "Check tree block failed" and other errors with latest kernel? Richard W.M. Jones
2012-10-08 14:27 ` Chris Mason
2012-10-08 14:57   ` Richard W.M. Jones
2012-10-08 15:04     ` Chris Mason
2012-10-08 15:15       ` Richard W.M. Jones
2012-10-08 15:18         ` Chris Mason
2012-10-08 16:42         ` David Sterba
2012-10-08 17:01           ` Richard W.M. Jones
2012-10-08 21:22         ` Richard W.M. Jones
2012-10-09  0:00           ` Chris Mason
2012-10-09  7:20             ` Richard W.M. Jones
2012-10-09  7:33               ` Richard W.M. Jones
2012-10-09  9:00                 ` David Sterba
2012-10-10 12:38                   ` Chris Mason
2012-10-10 19:38                     ` Richard W.M. Jones
2012-10-10 19:41                       ` Chris Mason
2012-10-10 19:46                         ` Richard W.M. Jones
2012-10-11  7:28                           ` Richard W.M. Jones
2012-10-11 11:26                             ` Chris Mason
2012-10-29 14:52                               ` Richard W.M. Jones
2012-10-09  9:16               ` David Sterba
2012-10-09  9:26                 ` Richard W.M. Jones
2012-10-10 11:49           ` Richard W.M. Jones

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.