All of lore.kernel.org
 help / color / mirror / Atom feed
From: Curt Wohlgemuth <curtw@google.com>
To: ext4 development <linux-ext4@vger.kernel.org>
Subject: Odd "leak" of extent info into data blocks?
Date: Sat, 22 Aug 2009 16:10:56 -0700	[thread overview]
Message-ID: <6601abe90908221610p60629809qcde6848308b8affe@mail.gmail.com> (raw)

On the off chance that this sounds familiar to anyone out there...

I've got a situation in which data files written by an application are
showing very occasional checksum errors sometimes.  The data files are
all around 8MB long, written using O_DIRECT into fallocated space.
(The entire fallocated space for the example file below is written to
with valid data; i.e., no holes, no truncation, no uninitialized
extents.)

When these occasional checksum failures show up, the data in the files
is rather odd.  I've seen 4 cases of this so far, and the "bad" data
always starts on a block boundary, and always has the first 12 bytes
that are identical to what an extent header would look like (for a
header at the start of a block of extents or extent indexes):

Here's the "od -Ad -x" output from one such file:

             8388608 f30a 0000 0154 0000 0000 0000 0000 0000

(I.e., the first 2 bytes are EXT4_EXT_MAGIC, and bytes 4-5 are 0x154,
or what eh_max would be for a block size of 4096 bytes.)

In this case, the "bad" data starts at block 2048.  Two cases have
this pattern at block 2048; two at block 2050.  A syscall trace of one
such corrupted file shows that this block was written with a single
write encompassing many adjacent blocks:

         write(fd=10, size=192512, offset=8204288)

The file in question above has only two (in-inode) extents, which I
verified look valid.  The block in question (2048) above is covered by
the second extent:  logical blocks 2037-2050.

I've seen the amount of "bad" data (including the "extent header"
above) to be pretty variable: between 70 and 800 bytes; I haven't been
able to correlate the rest of the bad data to any particular ext4 data
structures.

My guess is that a block of extents from a truncated or removed file
was reused for data for this file, and somehow was not written
correctly.  This seems (slightly) more plausible to me than the extent
metadata of an existing file was "leaked" into this one.

Does any of this ring a bell to anybody?

Thanks,
Curt

             reply	other threads:[~2009-08-22 23:10 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-22 23:10 Curt Wohlgemuth [this message]
     [not found] ` <20090908175605.GB7801@shell>
2009-09-08 18:21   ` Odd "leak" of extent info into data blocks? Curt Wohlgemuth
2009-09-08 19:40     ` Theodore Tso
2009-09-08 21:18       ` Curt Wohlgemuth
2009-09-08 23:36         ` Theodore Tso
2009-09-09  4:00           ` Curt Wohlgemuth
2009-09-09 15:19             ` Theodore Tso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6601abe90908221610p60629809qcde6848308b8affe@mail.gmail.com \
    --to=curtw@google.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.