Nature of ext4 corruption fixed by recent patch?

* Nature of ext4 corruption fixed by recent patch?
@ 2015-05-18 22:58 josh
  2015-05-19 13:40 ` Theodore Ts'o
  0 siblings, 1 reply; 7+ messages in thread
From: josh @ 2015-05-18 22:58 UTC (permalink / raw)
  To: Lukas Czerner, tytso, linux-kernel

Hi,

I recently had my server's filesystem implode, and I'm currently in the
process of cleaning it up.  It had widespread corruption in files and
directories scattered across the filesystem, though all vaguely recently
changed.  Directories appeared corrupted or truncated, various files
showed up as piles of NULs, and 5000+ files and directories ended up in
lost+found.  I observed this corruption shortly after a reboot into
4.0.2 (from a previous kernel of 3.16), with ext4 noticing an
inconsistency and mounting the filesystem read-only.  The underling
disks had no errors.

Reading about the corruption issue fixed by
d2dc317d564a46dfc683978a2e5a4f91434e9711 ("ext4: fix data corruption
caused by unwritten and delayed extents"), it sounds plausible.  Can
that strike both file data and directory data, assuming all of that data
ended up grouped with a delayed extent?  Would that bug manifest as
corrupted directories and files filled with NULs?  The system is a
72-way server on which I was doing piles of parallel git pulls and
builds, so hitting a race seems plausible.

I'm trying to track down potential causes of this so that I can feel
comfortable trusting that system again.

Thanks,
Josh Triplett

^ permalink raw reply	[flat|nested] 7+ messages in thread