On Wed, Apr 09, 2014 at 10:55:48PM -0400, Theodore Tso wrote:
> Hi Nathaniel,
> 
> In general, it's best if you send these sorts of requests for help to the
> linux-ext4@vger.kernel.org mailing list.

Added to CC.

> The fact that we see the "error count" line early in the boot message
> suggests to me that your VM is not running fsck to fix up the errors before
> mounting the file system.  (Well, either that or you're using a really
> ancient version of e2fsck, but given that you're using a bleeding edge
> kernel, but I'm guessing you're using a reasonably recent version of
> e2fsck.  But that would be good for you to check.)

e2fsck version is 1.42.9 using the same library version.
 
> The ext4 error message is due to the file system getting corrupted.  How
> the file system got corrupted isn't 100% clear, but one potential cause is
> how the disk is configured with qemu.
>[snip]

We use QEMU directives like

        -drive format=raw,file=rbd:rbdafs-mirror/mirror-0,id=drive5,if=none,cache=writeback \
        -device driver=ide-hd,drive=drive5,discard_granularity=512,bus=ahci0.3

We've never had, so far as I know, an unexpected shutdown of the QEMU
process, so I don't think that unexpected loss of cache contents is to
blame.

Perhaps the dmesg I sent was not representative; some days ago, we saw, only
(comparatively!) late in the machine's uptime:

[309894.428685] EXT4-fs (sdd): pa ffff88000d9f9440: logic 832, phys.  957458972, len 192
[309894.430023] EXT4-fs error (device sdd): ext4_mb_release_inode_pa:3729: group 29219, free 192, pa_free 191
[309894.431822] Aborting journal on device sdd-8.
[309894.442913] EXT4-fs (sdd): Remounting filesystem read-only

with Debian kernel 3.13.5-1; sdd here is the same filesystem as in the
earlier dmesg.

I'll capture any subsequent crashes and follow up.

Thanks much!
--nwf;