Re: ext4 metadata corruption bug?

* Re: ext4 metadata corruption bug?
       [not found] ` <CAGagf4eEzY4+3cfNWSEENTo1PKe40nq1Ne6ZzOLGm-O78W7RcA@mail.gmail.com>
@ 2014-04-10  5:04   ` Nathaniel W Filardo
  2014-04-10 14:03     ` Theodore Ts'o
  0 siblings, 1 reply; 19+ messages in thread
From: Nathaniel W Filardo @ 2014-04-10  5:04 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Mike Rubin, Frank Mayhar, admins, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 1891 bytes --]

On Wed, Apr 09, 2014 at 10:55:48PM -0400, Theodore Tso wrote:
> Hi Nathaniel,
> 
> In general, it's best if you send these sorts of requests for help to the
> linux-ext4@vger.kernel.org mailing list.

Added to CC.

> The fact that we see the "error count" line early in the boot message
> suggests to me that your VM is not running fsck to fix up the errors before
> mounting the file system.  (Well, either that or you're using a really
> ancient version of e2fsck, but given that you're using a bleeding edge
> kernel, but I'm guessing you're using a reasonably recent version of
> e2fsck.  But that would be good for you to check.)

e2fsck version is 1.42.9 using the same library version.

> The ext4 error message is due to the file system getting corrupted.  How
> the file system got corrupted isn't 100% clear, but one potential cause is
> how the disk is configured with qemu.
>[snip]

We use QEMU directives like

        -drive format=raw,file=rbd:rbdafs-mirror/mirror-0,id=drive5,if=none,cache=writeback \
        -device driver=ide-hd,drive=drive5,discard_granularity=512,bus=ahci0.3

We've never had, so far as I know, an unexpected shutdown of the QEMU
process, so I don't think that unexpected loss of cache contents is to
blame.

Perhaps the dmesg I sent was not representative; some days ago, we saw, only
(comparatively!) late in the machine's uptime:

[309894.428685] EXT4-fs (sdd): pa ffff88000d9f9440: logic 832, phys.  957458972, len 192
[309894.430023] EXT4-fs error (device sdd): ext4_mb_release_inode_pa:3729: group 29219, free 192, pa_free 191
[309894.431822] Aborting journal on device sdd-8.
[309894.442913] EXT4-fs (sdd): Remounting filesystem read-only

with Debian kernel 3.13.5-1; sdd here is the same filesystem as in the
earlier dmesg.

I'll capture any subsequent crashes and follow up.

Thanks much!
--nwf;

[-- Attachment #2: Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread