linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ext2fs_link() corrupting a directory
@ 2020-01-17 12:24 Jan Kara
  2020-01-17 16:29 ` Theodore Y. Ts'o
  0 siblings, 1 reply; 3+ messages in thread
From: Jan Kara @ 2020-01-17 12:24 UTC (permalink / raw)
  To: linux-ext4; +Cc: Ted Tso, Darrick J. Wong

Hello,

I was tracking down a filesystem corruption issue with one proposed
xfstests testcase. After some debugging I've found out that the problem
actually is not in the kernel but rather in e2fsck (libext2fs
respectively). The testcase deletes lost+found, e2fsck recreates it. But
after the testcase / is h-tree directory. So ext2fs_link() creates
lost+found in / and clears EXT4_INODE_INDEX flag. Now because the
filesystem has metadata checksums, clearing the index flag needs to also
rewrite all directory blocks with h-tree index blocks because the layout
now needs to be different.  ext2fs_link() actually tries to do this in its
link_proc() but if the space for new directory entry is found before we
walk all the h-tree index blocks, we terminate the iteration and some index
blocks remain unconverted resulting in checksum errors and other weirdness
later on.

The question is how to best fix this. The easiest fix is to just make
link_proc() iterate through all directory blocks when it needs to do the
index blocks conversion. But this seems somewhat stupid. Also there's
another problem with clearing EXT4_INODE_INDEX in ext2fs_link() - if the
directory has more than 65000 subdirectories, clearing EXT4_INODE_INDEX is
not allowed because large directory link count handling is supported only
for EXT4_INODE_INDEX directories.

So what do we do with this? For e2fsck, we could just link the new entry
into the directory and force rehashing later. But ext2fs_link() can be
called also from other tools and it should be a self-contained API... Any
ideas? Should we just bite the bullet and implement ext2fs_link() for
h-tree dirs properly?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: ext2fs_link() corrupting a directory
  2020-01-17 12:24 ext2fs_link() corrupting a directory Jan Kara
@ 2020-01-17 16:29 ` Theodore Y. Ts'o
  2020-01-30 16:25   ` Jan Kara
  0 siblings, 1 reply; 3+ messages in thread
From: Theodore Y. Ts'o @ 2020-01-17 16:29 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, Darrick J. Wong

On Fri, Jan 17, 2020 at 01:24:20PM +0100, Jan Kara wrote:
> 
> I was tracking down a filesystem corruption issue with one proposed
> xfstests testcase. After some debugging I've found out that the problem
> actually is not in the kernel but rather in e2fsck (libext2fs
> respectively). The testcase deletes lost+found, e2fsck recreates it. But
> after the testcase / is h-tree directory. So ext2fs_link() creates
> lost+found in / and clears EXT4_INODE_INDEX flag. Now because the
> filesystem has metadata checksums, clearing the index flag needs to also
> rewrite all directory blocks with h-tree index blocks because the layout
> now needs to be different.  ext2fs_link() actually tries to do this in its
> link_proc() but if the space for new directory entry is found before we
> walk all the h-tree index blocks, we terminate the iteration and some index
> blocks remain unconverted resulting in checksum errors and other weirdness
> later on.

Ouch.  Nice catch.

I suspect we have a similar problem when the kernel discovers that the
htree is corrupted; it will clear EXT4_INODE_INDEX flag, since before
how we added metadata checksum, it was safe to do that.  Given our
choice of where the stash the checksum, in order to not decrease the
fanout of htree directories, this is no no longer safe to do.

I think what we will need to do is to set an in-memory "fallback"
flag, which simply ignores the index blocks, but doesn't actually clear
the EXT4_INODE_INDEX flag in the case where metadata_csum is enabled.

> The question is how to best fix this. The easiest fix is to just make
> link_proc() iterate through all directory blocks when it needs to do the
> index blocks conversion. But this seems somewhat stupid. Also there's
> another problem with clearing EXT4_INODE_INDEX in ext2fs_link() - if the
> directory has more than 65000 subdirectories, clearing EXT4_INODE_INDEX is
> not allowed because large directory link count handling is supported only
> for EXT4_INODE_INDEX directories.
> 
> So what do we do with this? For e2fsck, we could just link the new entry
> into the directory and force rehashing later. But ext2fs_link() can be
> called also from other tools and it should be a self-contained API... Any
> ideas? Should we just bite the bullet and implement ext2fs_link() for
> h-tree dirs properly?

Yes, I think that's what we are going to need to do; we had cheated
and not bothered to implement htree support in ext2fs library, but if
we're going to implement it for link_proc, we might as well also
implement it for lookups as well.

					- Ted

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: ext2fs_link() corrupting a directory
  2020-01-17 16:29 ` Theodore Y. Ts'o
@ 2020-01-30 16:25   ` Jan Kara
  0 siblings, 0 replies; 3+ messages in thread
From: Jan Kara @ 2020-01-30 16:25 UTC (permalink / raw)
  To: Theodore Y. Ts'o; +Cc: Jan Kara, linux-ext4, Darrick J. Wong

On Fri 17-01-20 11:29:27, Theodore Y. Ts'o wrote:
> On Fri, Jan 17, 2020 at 01:24:20PM +0100, Jan Kara wrote:
> > 
> > I was tracking down a filesystem corruption issue with one proposed
> > xfstests testcase. After some debugging I've found out that the problem
> > actually is not in the kernel but rather in e2fsck (libext2fs
> > respectively). The testcase deletes lost+found, e2fsck recreates it. But
> > after the testcase / is h-tree directory. So ext2fs_link() creates
> > lost+found in / and clears EXT4_INODE_INDEX flag. Now because the
> > filesystem has metadata checksums, clearing the index flag needs to also
> > rewrite all directory blocks with h-tree index blocks because the layout
> > now needs to be different.  ext2fs_link() actually tries to do this in its
> > link_proc() but if the space for new directory entry is found before we
> > walk all the h-tree index blocks, we terminate the iteration and some index
> > blocks remain unconverted resulting in checksum errors and other weirdness
> > later on.
> 
> Ouch.  Nice catch.
> 
> I suspect we have a similar problem when the kernel discovers that the
> htree is corrupted; it will clear EXT4_INODE_INDEX flag, since before
> how we added metadata checksum, it was safe to do that.  Given our
> choice of where the stash the checksum, in order to not decrease the
> fanout of htree directories, this is no no longer safe to do.
> 
> I think what we will need to do is to set an in-memory "fallback"
> flag, which simply ignores the index blocks, but doesn't actually clear
> the EXT4_INODE_INDEX flag in the case where metadata_csum is enabled.

Yeah, that might be a good fix for the kernel.

> > The question is how to best fix this. The easiest fix is to just make
> > link_proc() iterate through all directory blocks when it needs to do the
> > index blocks conversion. But this seems somewhat stupid. Also there's
> > another problem with clearing EXT4_INODE_INDEX in ext2fs_link() - if the
> > directory has more than 65000 subdirectories, clearing EXT4_INODE_INDEX is
> > not allowed because large directory link count handling is supported only
> > for EXT4_INODE_INDEX directories.
> > 
> > So what do we do with this? For e2fsck, we could just link the new entry
> > into the directory and force rehashing later. But ext2fs_link() can be
> > called also from other tools and it should be a self-contained API... Any
> > ideas? Should we just bite the bullet and implement ext2fs_link() for
> > h-tree dirs properly?
> 
> Yes, I think that's what we are going to need to do; we had cheated
> and not bothered to implement htree support in ext2fs library, but if
> we're going to implement it for link_proc, we might as well also
> implement it for lookups as well.

JFYI (since I won't make today's ext4 call) I have full-featured insertion
into indexed dirs implemented, just need to write some decent testing
because it's quite a bit of new code and not exactly trivial...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-01-31 17:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-17 12:24 ext2fs_link() corrupting a directory Jan Kara
2020-01-17 16:29 ` Theodore Y. Ts'o
2020-01-30 16:25   ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).