linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] ext4: fix bug for rename with RENAME_WHITEOUT
@ 2021-01-05  6:28 yangerkun
  2021-01-05 14:27 ` Jan Kara
  2021-01-14  3:51 ` Theodore Ts'o
  0 siblings, 2 replies; 8+ messages in thread
From: yangerkun @ 2021-01-05  6:28 UTC (permalink / raw)
  To: linux-ext4, tytso, adilger.kernel, jack
  Cc: yi.zhang, lihaotian9, lutianxiong, linfeilong, yangerkun

We got a "deleted inode referenced" warning cross our fsstress test. The
bug can be reproduced easily with following steps:

  cd /dev/shm
  mkdir test/
  fallocate -l 128M img
  mkfs.ext4 -b 1024 img
  mount img test/
  dd if=/dev/zero of=test/foo bs=1M count=128
  mkdir test/dir/ && cd test/dir/
  for ((i=0;i<1000;i++)); do touch file$i; done # consume all block
  cd ~ && renameat2(AT_FDCWD, /dev/shm/test/dir/file1, AT_FDCWD,
    /dev/shm/test/dir/dst_file, RENAME_WHITEOUT) # ext4_add_entry in
    ext4_rename will return ENOSPC!!
  cd /dev/shm/ && umount test/ && mount img test/ && ls -li test/dir/file1
  We will get the output:
  "ls: cannot access 'test/dir/file1': Structure needs cleaning"
  and the dmesg show:
  "EXT4-fs error (device loop0): ext4_lookup:1626: inode #2049: comm ls:
  deleted inode referenced: 139"

ext4_rename will create a special inode for whiteout and use this 'ino'
to replace the source file's dir entry 'ino'. Once error happens
latter(the error above was the ENOSPC return from ext4_add_entry in
ext4_rename since all space has been consumed), the cleanup do drop the
nlink for whiteout, but forget to restore 'ino' with source file. This
will trigger the bug describle as above.

Signed-off-by: yangerkun <yangerkun@huawei.com>
---
 fs/ext4/namei.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index b17a082b7db1..90f7ebeb69c8 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3593,9 +3593,6 @@ static int ext4_setent(handle_t *handle, struct ext4_renament *ent,
 			return retval2;
 		}
 	}
-	brelse(ent->bh);
-	ent->bh = NULL;
-
 	return retval;
 }
 
@@ -3794,6 +3791,7 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 		}
 	}
 
+	old_file_type = old.de->file_type;
 	if (IS_DIRSYNC(old.dir) || IS_DIRSYNC(new.dir))
 		ext4_handle_sync(handle);
 
@@ -3821,7 +3819,6 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 	force_reread = (new.dir->i_ino == old.dir->i_ino &&
 			ext4_test_inode_flag(new.dir, EXT4_INODE_INLINE_DATA));
 
-	old_file_type = old.de->file_type;
 	if (whiteout) {
 		/*
 		 * Do this before adding a new entry, so the old entry is sure
@@ -3919,15 +3916,19 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 	retval = 0;
 
 end_rename:
-	brelse(old.dir_bh);
-	brelse(old.bh);
-	brelse(new.bh);
 	if (whiteout) {
-		if (retval)
+		if (retval) {
+			ext4_setent(handle, &old,
+				old.inode->i_ino, old_file_type);
 			drop_nlink(whiteout);
+		}
 		unlock_new_inode(whiteout);
 		iput(whiteout);
+
 	}
+	brelse(old.dir_bh);
+	brelse(old.bh);
+	brelse(new.bh);
 	if (handle)
 		ext4_journal_stop(handle);
 	return retval;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] ext4: fix bug for rename with RENAME_WHITEOUT
  2021-01-05  6:28 [PATCH v3] ext4: fix bug for rename with RENAME_WHITEOUT yangerkun
@ 2021-01-05 14:27 ` Jan Kara
  2021-01-14  3:51 ` Theodore Ts'o
  1 sibling, 0 replies; 8+ messages in thread
From: Jan Kara @ 2021-01-05 14:27 UTC (permalink / raw)
  To: yangerkun
  Cc: linux-ext4, tytso, adilger.kernel, jack, yi.zhang, lihaotian9,
	lutianxiong, linfeilong

On Tue 05-01-21 14:28:57, yangerkun wrote:
> We got a "deleted inode referenced" warning cross our fsstress test. The
> bug can be reproduced easily with following steps:
> 
>   cd /dev/shm
>   mkdir test/
>   fallocate -l 128M img
>   mkfs.ext4 -b 1024 img
>   mount img test/
>   dd if=/dev/zero of=test/foo bs=1M count=128
>   mkdir test/dir/ && cd test/dir/
>   for ((i=0;i<1000;i++)); do touch file$i; done # consume all block
>   cd ~ && renameat2(AT_FDCWD, /dev/shm/test/dir/file1, AT_FDCWD,
>     /dev/shm/test/dir/dst_file, RENAME_WHITEOUT) # ext4_add_entry in
>     ext4_rename will return ENOSPC!!
>   cd /dev/shm/ && umount test/ && mount img test/ && ls -li test/dir/file1
>   We will get the output:
>   "ls: cannot access 'test/dir/file1': Structure needs cleaning"
>   and the dmesg show:
>   "EXT4-fs error (device loop0): ext4_lookup:1626: inode #2049: comm ls:
>   deleted inode referenced: 139"
> 
> ext4_rename will create a special inode for whiteout and use this 'ino'
> to replace the source file's dir entry 'ino'. Once error happens
> latter(the error above was the ENOSPC return from ext4_add_entry in
> ext4_rename since all space has been consumed), the cleanup do drop the
> nlink for whiteout, but forget to restore 'ino' with source file. This
> will trigger the bug describle as above.
> 
> Signed-off-by: yangerkun <yangerkun@huawei.com>

Thanks! The patch looks good to me now. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/namei.c | 17 +++++++++--------
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index b17a082b7db1..90f7ebeb69c8 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -3593,9 +3593,6 @@ static int ext4_setent(handle_t *handle, struct ext4_renament *ent,
>  			return retval2;
>  		}
>  	}
> -	brelse(ent->bh);
> -	ent->bh = NULL;
> -
>  	return retval;
>  }
>  
> @@ -3794,6 +3791,7 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
>  		}
>  	}
>  
> +	old_file_type = old.de->file_type;
>  	if (IS_DIRSYNC(old.dir) || IS_DIRSYNC(new.dir))
>  		ext4_handle_sync(handle);
>  
> @@ -3821,7 +3819,6 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
>  	force_reread = (new.dir->i_ino == old.dir->i_ino &&
>  			ext4_test_inode_flag(new.dir, EXT4_INODE_INLINE_DATA));
>  
> -	old_file_type = old.de->file_type;
>  	if (whiteout) {
>  		/*
>  		 * Do this before adding a new entry, so the old entry is sure
> @@ -3919,15 +3916,19 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
>  	retval = 0;
>  
>  end_rename:
> -	brelse(old.dir_bh);
> -	brelse(old.bh);
> -	brelse(new.bh);
>  	if (whiteout) {
> -		if (retval)
> +		if (retval) {
> +			ext4_setent(handle, &old,
> +				old.inode->i_ino, old_file_type);
>  			drop_nlink(whiteout);
> +		}
>  		unlock_new_inode(whiteout);
>  		iput(whiteout);
> +
>  	}
> +	brelse(old.dir_bh);
> +	brelse(old.bh);
> +	brelse(new.bh);
>  	if (handle)
>  		ext4_journal_stop(handle);
>  	return retval;
> -- 
> 2.25.4
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] ext4: fix bug for rename with RENAME_WHITEOUT
  2021-01-05  6:28 [PATCH v3] ext4: fix bug for rename with RENAME_WHITEOUT yangerkun
  2021-01-05 14:27 ` Jan Kara
@ 2021-01-14  3:51 ` Theodore Ts'o
  2021-01-20  6:57   ` Amir Goldstein
  1 sibling, 1 reply; 8+ messages in thread
From: Theodore Ts'o @ 2021-01-14  3:51 UTC (permalink / raw)
  To: yangerkun
  Cc: linux-ext4, adilger.kernel, jack, yi.zhang, lihaotian9,
	lutianxiong, linfeilong

On Tue, Jan 05, 2021 at 02:28:57PM +0800, yangerkun wrote:
> We got a "deleted inode referenced" warning cross our fsstress test. The
> bug can be reproduced easily with following steps:
> 
>   cd /dev/shm
>   mkdir test/
>   fallocate -l 128M img
>   mkfs.ext4 -b 1024 img
>   mount img test/
>   dd if=/dev/zero of=test/foo bs=1M count=128
>   mkdir test/dir/ && cd test/dir/
>   for ((i=0;i<1000;i++)); do touch file$i; done # consume all block
>   cd ~ && renameat2(AT_FDCWD, /dev/shm/test/dir/file1, AT_FDCWD,
>     /dev/shm/test/dir/dst_file, RENAME_WHITEOUT) # ext4_add_entry in
>     ext4_rename will return ENOSPC!!
>   cd /dev/shm/ && umount test/ && mount img test/ && ls -li test/dir/file1
>   We will get the output:
>   "ls: cannot access 'test/dir/file1': Structure needs cleaning"
>   and the dmesg show:
>   "EXT4-fs error (device loop0): ext4_lookup:1626: inode #2049: comm ls:
>   deleted inode referenced: 139"
> 
> ext4_rename will create a special inode for whiteout and use this 'ino'
> to replace the source file's dir entry 'ino'. Once error happens
> latter(the error above was the ENOSPC return from ext4_add_entry in
> ext4_rename since all space has been consumed), the cleanup do drop the
> nlink for whiteout, but forget to restore 'ino' with source file. This
> will trigger the bug describle as above.
> 
> Signed-off-by: yangerkun <yangerkun@huawei.com>

Thanks, replied.

					- Ted

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] ext4: fix bug for rename with RENAME_WHITEOUT
  2021-01-14  3:51 ` Theodore Ts'o
@ 2021-01-20  6:57   ` Amir Goldstein
  2021-01-20  8:42     ` Miklos Szeredi
  0 siblings, 1 reply; 8+ messages in thread
From: Amir Goldstein @ 2021-01-20  6:57 UTC (permalink / raw)
  To: Theodore Ts'o, harshad shirwadkar
  Cc: yangerkun, Ext4, Andreas Dilger, Jan Kara, zhangyi (F),
	lihaotian9, lutianxiong, linfeilong, fstests, Miklos Szeredi,
	Vijaychidambaram Velayudhan Pillai

On Thu, Jan 14, 2021 at 5:53 AM Theodore Ts'o <tytso@mit.edu> wrote:
>
> On Tue, Jan 05, 2021 at 02:28:57PM +0800, yangerkun wrote:
> > We got a "deleted inode referenced" warning cross our fsstress test. The
> > bug can be reproduced easily with following steps:
> >
> >   cd /dev/shm
> >   mkdir test/
> >   fallocate -l 128M img
> >   mkfs.ext4 -b 1024 img
> >   mount img test/
> >   dd if=/dev/zero of=test/foo bs=1M count=128
> >   mkdir test/dir/ && cd test/dir/
> >   for ((i=0;i<1000;i++)); do touch file$i; done # consume all block
> >   cd ~ && renameat2(AT_FDCWD, /dev/shm/test/dir/file1, AT_FDCWD,
> >     /dev/shm/test/dir/dst_file, RENAME_WHITEOUT) # ext4_add_entry in
> >     ext4_rename will return ENOSPC!!
> >   cd /dev/shm/ && umount test/ && mount img test/ && ls -li test/dir/file1
> >   We will get the output:
> >   "ls: cannot access 'test/dir/file1': Structure needs cleaning"
> >   and the dmesg show:
> >   "EXT4-fs error (device loop0): ext4_lookup:1626: inode #2049: comm ls:
> >   deleted inode referenced: 139"
> >
> > ext4_rename will create a special inode for whiteout and use this 'ino'
> > to replace the source file's dir entry 'ino'. Once error happens
> > latter(the error above was the ENOSPC return from ext4_add_entry in
> > ext4_rename since all space has been consumed), the cleanup do drop the
> > nlink for whiteout, but forget to restore 'ino' with source file. This
> > will trigger the bug describle as above.
> >
> > Signed-off-by: yangerkun <yangerkun@huawei.com>
>

Apropos RENAME_WHITEOUT, it seems to be missing __ext4_fc_track_link().
I guess test coverage of RENAME_WHITEOUT in fstests is not much.
I have been seeing trickles of bug fixes for RENAME_WHITEOUT for almost
every filesystem that supports it.

But I must say it would have been very hard to catch missing ext4_fc_track_*
without specialized fs fuzzer such as the CrashMonkey generated tests.

And as long as I am ranting, I'd like to point out that it is a shame
that whiteout
was not implemented as a special (constant) inode whose nlink is irrelevant
(or a special dirent with d_ino 0 and d_type DT_WHT for that matter).
It would have been a rather small RO_COMPAT on-disk change for ext4.
It could also be implemented in slightly more backward compat manner by
maintaining a valid nlink and postpone setting the RO_COMPAT flag until
EXT4_LINK_MAX is reached.

As things stand now, overlayfs makes an effort to maintain a singleton
hardlinked whiteout inode, without being able to use it with RENAME_WHITEOUT
and filesystems have to take special care to journal the metadata of all
individual whiteout inodes, without any added value to the only user
(overlayfs).

But I guess that train has left the station long ago...

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] ext4: fix bug for rename with RENAME_WHITEOUT
  2021-01-20  6:57   ` Amir Goldstein
@ 2021-01-20  8:42     ` Miklos Szeredi
  2021-01-22 19:20       ` harshad shirwadkar
  0 siblings, 1 reply; 8+ messages in thread
From: Miklos Szeredi @ 2021-01-20  8:42 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Theodore Ts'o, harshad shirwadkar, yangerkun, Ext4,
	Andreas Dilger, Jan Kara, zhangyi (F),
	lihaotian, lutianxiong, linfeilong, fstests,
	Vijaychidambaram Velayudhan Pillai

On Wed, Jan 20, 2021 at 7:57 AM Amir Goldstein <amir73il@gmail.com> wrote:

> And as long as I am ranting, I'd like to point out that it is a shame
> that whiteout
> was not implemented as a special (constant) inode whose nlink is irrelevant
> (or a special dirent with d_ino 0 and d_type DT_WHT for that matter).
> It would have been a rather small RO_COMPAT on-disk change for ext4.
> It could also be implemented in slightly more backward compat manner by
> maintaining a valid nlink and postpone setting the RO_COMPAT flag until
> EXT4_LINK_MAX is reached.
>
> As things stand now, overlayfs makes an effort to maintain a singleton
> hardlinked whiteout inode, without being able to use it with RENAME_WHITEOUT
> and filesystems have to take special care to journal the metadata of all
> individual whiteout inodes, without any added value to the only user
> (overlayfs).
>
> But I guess that train has left the station long ago...

Not so, I believe.  Kernel internal interfaces are easy to change, and
adding support for DT_WHT to overlayfs would mostly be a trivial
undertaking.

The big issue (as always) is userspace API's and not introducing
DT_WHT there was a very deliberate choice.  Adding a translation layer
from an internal whiteout representation to the userspace API also
does not seem to be a very complex problem, but I haven't looked into
that deeply.

So AFAICS there's really nothing preventing the addition of whiteout
objects to filesystems, other than developer dedication.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] ext4: fix bug for rename with RENAME_WHITEOUT
  2021-01-20  8:42     ` Miklos Szeredi
@ 2021-01-22 19:20       ` harshad shirwadkar
  2021-01-22 20:32         ` Amir Goldstein
  2021-03-09  7:30         ` Amir Goldstein
  0 siblings, 2 replies; 8+ messages in thread
From: harshad shirwadkar @ 2021-01-22 19:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Amir Goldstein, Theodore Ts'o, yangerkun, Ext4,
	Andreas Dilger, Jan Kara, zhangyi (F),
	lihaotian, lutianxiong, linfeilong, fstests,
	Vijaychidambaram Velayudhan Pillai

Thanks Amir for pointing that out. Yes we are missing fast commit
tracking in whiteout. I'll send out a fix for that.

> But I must say it would have been very hard to catch missing ext4_fc_track_*
> without specialized fs fuzzer such as the CrashMonkey generated tests.

I agree, it's been on my to-do list to run CrashMonkey tests with fast
commits. I'm curious what kind of CrashMonkey tests you ran that
helped you catch this? Were you running Overlayfs on top of Ext4 with
fast commits?

Thanks,
Harshad

On Wed, Jan 20, 2021 at 12:42 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Wed, Jan 20, 2021 at 7:57 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> > And as long as I am ranting, I'd like to point out that it is a shame
> > that whiteout
> > was not implemented as a special (constant) inode whose nlink is irrelevant
> > (or a special dirent with d_ino 0 and d_type DT_WHT for that matter).
> > It would have been a rather small RO_COMPAT on-disk change for ext4.
> > It could also be implemented in slightly more backward compat manner by
> > maintaining a valid nlink and postpone setting the RO_COMPAT flag until
> > EXT4_LINK_MAX is reached.
> >
> > As things stand now, overlayfs makes an effort to maintain a singleton
> > hardlinked whiteout inode, without being able to use it with RENAME_WHITEOUT
> > and filesystems have to take special care to journal the metadata of all
> > individual whiteout inodes, without any added value to the only user
> > (overlayfs).
> >
> > But I guess that train has left the station long ago...
>
> Not so, I believe.  Kernel internal interfaces are easy to change, and
> adding support for DT_WHT to overlayfs would mostly be a trivial
> undertaking.
>
> The big issue (as always) is userspace API's and not introducing
> DT_WHT there was a very deliberate choice.  Adding a translation layer
> from an internal whiteout representation to the userspace API also
> does not seem to be a very complex problem, but I haven't looked into
> that deeply.
>
> So AFAICS there's really nothing preventing the addition of whiteout
> objects to filesystems, other than developer dedication.
>
> Thanks,
> Miklos

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] ext4: fix bug for rename with RENAME_WHITEOUT
  2021-01-22 19:20       ` harshad shirwadkar
@ 2021-01-22 20:32         ` Amir Goldstein
  2021-03-09  7:30         ` Amir Goldstein
  1 sibling, 0 replies; 8+ messages in thread
From: Amir Goldstein @ 2021-01-22 20:32 UTC (permalink / raw)
  To: harshad shirwadkar
  Cc: Miklos Szeredi, Theodore Ts'o, yangerkun, Ext4,
	Andreas Dilger, Jan Kara, zhangyi (F),
	lihaotian, lutianxiong, linfeilong, fstests,
	Vijaychidambaram Velayudhan Pillai

On Fri, Jan 22, 2021 at 9:21 PM harshad shirwadkar
<harshadshirwadkar@gmail.com> wrote:
>
> Thanks Amir for pointing that out. Yes we are missing fast commit
> tracking in whiteout. I'll send out a fix for that.
>
> > But I must say it would have been very hard to catch missing ext4_fc_track_*
> > without specialized fs fuzzer such as the CrashMonkey generated tests.
>
> I agree, it's been on my to-do list to run CrashMonkey tests with fast
> commits. I'm curious what kind of CrashMonkey tests you ran that
> helped you catch this? Were you running Overlayfs on top of Ext4 with
> fast commits?
>

Neither. I just guessed RENAME_WHITEOUT might be missed as
developers are rarely aware of it.
I never ran CrashMonkey tests myself.
I found a few crash consistency bugs using xfstest generic/455.
I suggest that you run it with fast commits
and try using NUM_OPS and NUM_FILES larger than the test defaults
to let the test run for a longer time.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] ext4: fix bug for rename with RENAME_WHITEOUT
  2021-01-22 19:20       ` harshad shirwadkar
  2021-01-22 20:32         ` Amir Goldstein
@ 2021-03-09  7:30         ` Amir Goldstein
  1 sibling, 0 replies; 8+ messages in thread
From: Amir Goldstein @ 2021-03-09  7:30 UTC (permalink / raw)
  To: harshad shirwadkar
  Cc: Miklos Szeredi, Theodore Ts'o, yangerkun, Ext4,
	Andreas Dilger, Jan Kara, zhangyi (F),
	lihaotian, lutianxiong, linfeilong, fstests,
	Vijaychidambaram Velayudhan Pillai

On Fri, Jan 22, 2021 at 9:21 PM harshad shirwadkar
<harshadshirwadkar@gmail.com> wrote:
>
> Thanks Amir for pointing that out. Yes we are missing fast commit
> tracking in whiteout. I'll send out a fix for that.
>

Ping.

Harshad,

Did you forget or did I miss the patch?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-03-09  7:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-05  6:28 [PATCH v3] ext4: fix bug for rename with RENAME_WHITEOUT yangerkun
2021-01-05 14:27 ` Jan Kara
2021-01-14  3:51 ` Theodore Ts'o
2021-01-20  6:57   ` Amir Goldstein
2021-01-20  8:42     ` Miklos Szeredi
2021-01-22 19:20       ` harshad shirwadkar
2021-01-22 20:32         ` Amir Goldstein
2021-03-09  7:30         ` Amir Goldstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).