linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: Baokun Li <libaokun1@huawei.com>
Cc: "Luís Henriques" <lhenriques@suse.de>,
	"Andreas Dilger" <adilger.kernel@dilger.ca>,
	linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org
Subject: Re: [PATCH v2] ext4: fix a NULL pointer when validating an inode bitmap
Date: Tue, 29 Nov 2022 16:00:08 -0500	[thread overview]
Message-ID: <Y4Zy2HHOmak3k637@mit.edu> (raw)
In-Reply-To: <d357e15b-e44a-1e3b-41c3-0b732e4685ed@huawei.com>

On Tue, Nov 29, 2022 at 11:18:34AM +0800, Baokun Li wrote:
> 
> In my opinion, the s_journal_inum should not be modified when the
> file system is mounted, especially after we have successfully loaded
> and replayed the journal with the current s_journal_inum. Even if
> the s_journal_inumon the disk is modified, we should use the current
> one. This is how journal_devnum is handled in ext4_load_journal():
> 
>          if (!really_read_only && journal_devnum &&
>              journal_devnum != le32_to_cpu(es->s_journal_dev)) {
>                  es->s_journal_dev = cpu_to_le32(journal_devnum);
> 
>                  /* Make sure we flush the recovery flag to disk. */
>                  ext4_commit_super(sb);
>          }
> 
> We can avoid this problem by adding a similar check for journal_inum in
> ext4_load_journal().

This check you've pointed out wasn't actually intended to protect
against the problem where the journal_inum is getting overwritten by
the journal replay.  The s_journal_dev field is a hint about where to
find the external journal.  However, this can change over time --- for
example, if a SCSI disk is removed from the system, so /dev/sdcXX
becomes /dev/sbdXX.  The official way to find the journal device is
via the external journal's UUID.  So userspace might use a command
like:

  mount -t ext4 -o journal_path="$(blkid -U <journal uuid>)" UUID=<fs uuid> /mnt

So s_journal_devnum might get updated, and we don't want the hint to
get overwritten by the journal replay.  So that's why the code that
you've quoted exists (and this goes all the way back to ext3).  It's a
code path that can be quite legitimately triggered when the location
of the external journal device changes (or when the device's
major/minor numbers get renumbered).

Now, we *could* do something like this for s_journal_inum, but it
would be for a different purpose.  In practice, this would never
happen in real life due to random bit flips, since the journal is
protected using checksum.  It can only happen when there are
deliberately, maliciously fuzzed file system images, such as was the
case here.  And s_journal_inum is only one of any number of superblock
fields that shouldn't ever be modified by the journal replay.  We have
had previous failures caused by we validated the superblock fields to
be valid, but then after that, we replay the journal, and then it
turns out the superblock fields are incorrect.  (And then some Red Hat
principal engineer will try to call it a high severity CVE, which is
really bullshit, since if you allow random unprivileged processes to
mount arbitrary file system images, you've got other problems.  Don't
do that.)

If we *really* cared about these sorts of problems, we should special
case the journal replay of certain blocks, such as the superblock, and
validate those blocks to make sure they are not crazy --- and if it is
crazy, we should abort the journal replay right then and there.

Alternatively, one *could* consider making a copy of certain blocks
(in particular the superblock and block group descriptors), and then
do a post-hoc validation of the superblock after the replay --- and if
it is invalid, we could put the old superblock back.  But we need to
remember that sometimes superblock fields *can* change.  For example,
in the case of online resize, we can't just say that if the old
superblock value is different from the new superblock value, something
Must Be Wrong.  That being said, if the result of the journal replay
ends up with a result where s_block_count is greater than the physical
block device, *that's* something that is probably wrong.

That being said, the question is whether all of this complexity is
really all *that* necessary, since again, thanks to checksums, short
of Malicious File System Fuzzing, this should never happen.  If all of
this complexity is only needed to protect against malicious fuzzed
file systems, then maybe it's not the worth it.

If we can protect against the problem by adding a check that has other
value as well (such as making usre that when ext4_iget fetches a
special inode, we enforce that i_links_couint must be > 0), maybe
that's worth it.

So ultimately it's all a question of engineering tradeoffs.  Is it
worth it to check for s_journal_inum changing, and changing it back?
Meh.  At the very least we would want to add a warning, since this is
only going happen in case of a malicious fuzzed file system.  And
since we're only undoing the s_journal_inum update, there may be other
superblock fields beyond s_journal_inum that could get modified by the
malicious file system fuzzer.  So how many post hoc checks do we
really want to be adding here?  Should we also do similar checks for
s_blocks_per_group?  s_inodes_per_group?  s_log_block_size?
s_first_ino?  s_first_data_block?  I realize this is a slipperly slope
argument; but the bottom line that it's not immediately obvious that
it's good idea to only worry about s_journal_inum.

Cheers,

						- Ted

  reply	other threads:[~2022-11-29 21:00 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-10 14:20 [PATCH] ext4: fix a NULL pointer when validating an inode bitmap Luís Henriques
2022-10-11 15:56 ` [PATCH v2] " Luís Henriques
2022-11-06  0:32   ` Theodore Ts'o
2022-11-08 14:06     ` Luís Henriques
2022-11-28 22:28       ` Theodore Ts'o
2022-11-29  3:18         ` Baokun Li
2022-11-29 21:00           ` Theodore Ts'o [this message]
2022-11-30  3:20             ` Baokun Li
2022-12-01  4:32               ` Theodore Ts'o
2022-12-01  6:20                 ` Baokun Li
2022-10-12 13:13 ` [PATCH v2] ext4: fix BUG_ON() when directory entry has invalid rec_len Luís Henriques
2022-10-12 13:16   ` Luís Henriques
2022-10-12 14:21     ` Theodore Ts'o
2022-10-12 15:18       ` Luís Henriques
2022-11-06  6:16   ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y4Zy2HHOmak3k637@mit.edu \
    --to=tytso@mit.edu \
    --cc=adilger.kernel@dilger.ca \
    --cc=lhenriques@suse.de \
    --cc=libaokun1@huawei.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).