All of lore.kernel.org
 help / color / mirror / Atom feed
From: Samuel Mendoza-Jonas <samjonas@amazon.com>
To: <linux-ext4@vger.kernel.org>, <tytso@mit.edu>,
	<adilger.kernel@dilger.ca>
Cc: <benh@amazon.com>
Subject: Debugging ext4 corruption with nojournal & extents
Date: Mon, 8 Nov 2021 09:35:20 -0800	[thread overview]
Message-ID: <20211108173520.xp6xphodfhcen2sy@u87e72aa3c6c25c.ant.amazon.com> (raw)

Hi all,

Recently I've been digging into a corruption issue which I think is just about
pinned, but I'd appreciate some more expert EXT4 eyes to confirm we're on the
right path.

What we have boils down to a system with
- An ext4 filesystem with the journal disabled
- A workload[0] which in a loop
  - Creates a lot of small files
  - Occasionally deletes these files and collects them into a single larger "compound" file
  - Checks the header of all of these files periodically to ensure they're correct

After a while this check fails, and when inspecting the "bad" file, the contents of that file are actually an EXT4 extent structure, for example:

[ec2-user@ip-172-31-0-206 ~]$ hexdump -C _2w.si
00000000  0a f3 05 00 54 01 00 00  00 00 00 00 00 00 00 00  |....T...........|
00000010  01 00 00 00 63 84 08 05  01 00 00 00 ff 01 00 00  |....c...........|
00000020  75 8a 1c 02 00 02 00 00  00 02 00 00 00 9c 1c 02  |u...............|
00000030  00 04 00 00 dc 00 00 00  00 ac 1c 02 dc 04 00 00  |................|
00000040  08 81 00 00 dc ac 1c 02  00 00 00 00 00 00 00 00  |................|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000170  00 00 00                                          |...|
00000173

This has EXT4_EXT_MAGIC (cpu_to_le16(0xf30a)), and when parsed as extent header
plus array has 5 extent entries at 0 depth.
By the time the file is checked, the file that these extents presumably pointed
to appears to have been deleted, but reading the physical blocks looks like the
data of one of the larger files this test creates.


Based on that what I think is happening is
- A file with separate (i.e. non-inline) extents is synced / written to disk
  (in this case, one of the large "compound" files)
- ext4_end_io_end() kicks off writeback of extent metadata
  - AIUI this marks the related buffers dirty but does not wait on them in the
    no-journal case
- The file is deleted, causing the extents to be "removed" and the blocks where
  they were stored are marked unused
- A new file is created (any file, separate extents not required)
- The new file is allocated the block that was just freed (the physical block
  where the old extents were located)

Some time between this point and when the file is next read, the dirty extent
buffer hits the disk instead of the intended data for the new file.
A big-hammer hack in __ext4_handle_dirty_metadata() to always sync metadata
blocks appears to avoid the issue but isn't ideal - most likely a better
solution would be to ensure any dirty metadata buffers are synced before the
inode is dropped.

Overall does this summary sound valid, or have I wandered into the weeds somewhere?

Cheers,
Sam Mendoza-Jonas

[0] This is an Elastisearch/Lucene workload, running the esrally tests to hit the issue.


             reply	other threads:[~2021-11-08 17:35 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-08 17:35 Samuel Mendoza-Jonas [this message]
2021-11-09  3:14 ` Debugging ext4 corruption with nojournal & extents Theodore Ts'o
2021-11-15 23:55   ` Samuel Mendoza-Jonas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211108173520.xp6xphodfhcen2sy@u87e72aa3c6c25c.ant.amazon.com \
    --to=samjonas@amazon.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=benh@amazon.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.