All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ext4: fix memory leak in ext4_fill_super
@ 2021-04-28 17:28 Pavel Skripkin
  2021-04-29 10:01 ` Vegard Nossum
  0 siblings, 1 reply; 21+ messages in thread
From: Pavel Skripkin @ 2021-04-28 17:28 UTC (permalink / raw)
  To: tytso, adilger.kernel
  Cc: linux-ext4, linux-kernel, Pavel Skripkin, syzbot+d9e482e303930fa4f6ff

syzbot reported memory leak in ext4 subsyetem.
The problem appears, when thread_stop() call happens
before wake_up_process().

Normally, this data will be freed by
created thread, but if kthread_stop()
returned -EINTR, this data should be freed manually

Reported-by: syzbot+d9e482e303930fa4f6ff@syzkaller.appspotmail.com
Tested-by: syzbot+d9e482e303930fa4f6ff@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
---
 fs/ext4/super.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index b9693680463a..9c33e97bd5c5 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5156,8 +5156,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 failed_mount3:
 	flush_work(&sbi->s_error_work);
 	del_timer_sync(&sbi->s_err_report);
-	if (sbi->s_mmp_tsk)
-		kthread_stop(sbi->s_mmp_tsk);
+	if (sbi->s_mmp_tsk) {
+		if (kthread_stop(sbi->s_mmp_tsk) == -EINTR)
+			kfree(kthread_data(sbi->s_mmp_tsk));
+	}
 failed_mount2:
 	rcu_read_lock();
 	group_desc = rcu_dereference(sbi->s_group_desc);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread
* [PATCH] ext4: fix memory leak in ext4_fill_super
@ 2021-04-28 22:19 Alexey Makhalov
  2021-05-21  4:43 ` Theodore Y. Ts'o
  0 siblings, 1 reply; 21+ messages in thread
From: Alexey Makhalov @ 2021-04-28 22:19 UTC (permalink / raw)
  To: linux-ext4; +Cc: Alexey Makhalov, stable, Theodore Ts'o, Andreas Dilger

I've recently discovered that doing infinite loop of
  systemctl start <ext4_on_lvm>.mount, and
  systemctl stop <ext4_on_lvm>.mount
linearly increases percpu allocator memory consumption.
In several hours, it might lead to system instability by
consuming most of the memory.

Bug is not reproducible when the ext4 filesystem is on
physical partition, but it is persistent when ext4 is on
logical volume.

During debugging it was found that most of active percpu
allocations are from /system.slice/<ext4_on_lvm>.mount
memory cgroups (created by systemd for each mount). All
of these cgroups are in dying state with refcount equal
to 2. And most interesting that each mount/umount itera-
tion creates exactly one dying memory cgroup.

Tracking down the remaining refcounts showed that it was
charged from ext4_fill_super(). And the page is always
0 index in the page cache mapping.

The issue was hidden behind initial super block read using
logical blocksize from bdev and adjusting blocksize later
after reading actual block size from superblock.
If blocksizes differ, sb_set_blocksize() will kill current
buffers and page cache by using kill_bdev(). And then super
block will be reread again but using correct blocksize this
time. sb_set_blocksize() didn't fully free superblock page
and buffers as buffer pointed by bh variable remained busy.
So buffer and its page remains in the memory (leak). Super
block reread logic does not happen when ext4 filesystem is
on physical partition as blocksize is correct for initial
superblock read.

brelse(bh), where bh is a buffer head of superblock page,
must be called and bh references must be released before
kill_bdev(). kill_bdev() subfunctions (see callstack below)
won't be able to free not released buffer (even if it's
clean) and superblock page won't be freed as well.

callstack:
kill_bdev()
->truncate_inode_pages()
  ->truncate_inode_pages_range()
    ->truncate_cleanup_page()
      ->do_invalidatepage
        ->block_invalidatepage()
	  ->try_to_release_page() == fail to release
	    ->try_to_free_buffers() == fail to free
	      ->drop_buffers()
	        ->buffer_busy() == yes

Incorrect order of brelse() and kill_bdev() in ext4_fill_super()
was introduced by commit ce40733ce93d ("ext4: Check for return
value from sb_set_blocksize") 13 years ago! Thanks to memory
hungry percpu, it was easy to detect this issue now.

Fix this by moving the brelse() before sb_set_blocksize() and
add a comment about the dependency.

In addition, fix similar issue under failed_mount: label (in
the same function) about incorrect order of ext4_blkdev_remove()
vs brelse() introduced by commit ac27a0ec112a ("ext4: initial
copy of files from ext3")

Signed-off-by: Alexey Makhalov <amakhalov@vmware.com>
Cc: stable@vger.kernel.org
Fixes: ce40733ce93d ("ext4: Check for return value from sb_set_blocksize")
Fixes: ac27a0ec112a ("ext4: initial copy of files from ext3")
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
---
 fs/ext4/super.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index b9693680463a..6c8f68309834 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4451,14 +4451,20 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 	}
 
 	if (sb->s_blocksize != blocksize) {
+		/*
+		 * bh must be released before kill_bdev(), otherwise
+		 * it won't be freed and its page also. kill_bdev()
+		 * is called by sb_set_blocksize().
+		 */
+		brelse(bh);
 		/* Validate the filesystem blocksize */
 		if (!sb_set_blocksize(sb, blocksize)) {
 			ext4_msg(sb, KERN_ERR, "bad block size %d",
 					blocksize);
+			bh = NULL;
 			goto failed_mount;
 		}
 
-		brelse(bh);
 		logical_sb_block = sb_block * EXT4_MIN_BLOCK_SIZE;
 		offset = do_div(logical_sb_block, blocksize);
 		bh = ext4_sb_bread_unmovable(sb, logical_sb_block);
@@ -5178,8 +5184,9 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 		kfree(get_qf_name(sb, sbi, i));
 #endif
 	fscrypt_free_dummy_policy(&sbi->s_dummy_enc_policy);
-	ext4_blkdev_remove(sbi);
+	/* ext4_blkdev_remove() calls kill_bdev(), release bh before it. */
 	brelse(bh);
+	ext4_blkdev_remove(sbi);
 out_fail:
 	sb->s_fs_info = NULL;
 	kfree(sbi->s_blockgroup_lock);
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread
* FAILED: patch "[PATCH] ext4: fix memory leak in ext4_fill_super" failed to apply to 5.4-stable tree
@ 2021-06-08 12:23 gregkh
  2021-06-08 21:02 ` [PATCH] ext4: fix memory leak in ext4_fill_super Alexey Makhalov
  0 siblings, 1 reply; 21+ messages in thread
From: gregkh @ 2021-06-08 12:23 UTC (permalink / raw)
  To: amakhalov, tytso; +Cc: stable


The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable@vger.kernel.org>.

thanks,

greg k-h

------------------ original commit in Linus's tree ------------------

From afd09b617db3786b6ef3dc43e28fe728cfea84df Mon Sep 17 00:00:00 2001
From: Alexey Makhalov <amakhalov@vmware.com>
Date: Fri, 21 May 2021 07:55:33 +0000
Subject: [PATCH] ext4: fix memory leak in ext4_fill_super

Buffer head references must be released before calling kill_bdev();
otherwise the buffer head (and its page referenced by b_data) will not
be freed by kill_bdev, and subsequently that bh will be leaked.

If blocksizes differ, sb_set_blocksize() will kill current buffers and
page cache by using kill_bdev(). And then super block will be reread
again but using correct blocksize this time. sb_set_blocksize() didn't
fully free superblock page and buffer head, and being busy, they were
not freed and instead leaked.

This can easily be reproduced by calling an infinite loop of:

  systemctl start <ext4_on_lvm>.mount, and
  systemctl stop <ext4_on_lvm>.mount

... since systemd creates a cgroup for each slice which it mounts, and
the bh leak get amplified by a dying memory cgroup that also never
gets freed, and memory consumption is much more easily noticed.

Fixes: ce40733ce93d ("ext4: Check for return value from sb_set_blocksize")
Fixes: ac27a0ec112a ("ext4: initial copy of files from ext3")
Link: https://lore.kernel.org/r/20210521075533.95732-1-amakhalov@vmware.com
Signed-off-by: Alexey Makhalov <amakhalov@vmware.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 886e0d518668..f66c7301b53a 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4462,14 +4462,20 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 	}
 
 	if (sb->s_blocksize != blocksize) {
+		/*
+		 * bh must be released before kill_bdev(), otherwise
+		 * it won't be freed and its page also. kill_bdev()
+		 * is called by sb_set_blocksize().
+		 */
+		brelse(bh);
 		/* Validate the filesystem blocksize */
 		if (!sb_set_blocksize(sb, blocksize)) {
 			ext4_msg(sb, KERN_ERR, "bad block size %d",
 					blocksize);
+			bh = NULL;
 			goto failed_mount;
 		}
 
-		brelse(bh);
 		logical_sb_block = sb_block * EXT4_MIN_BLOCK_SIZE;
 		offset = do_div(logical_sb_block, blocksize);
 		bh = ext4_sb_bread_unmovable(sb, logical_sb_block);
@@ -5202,8 +5208,9 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 		kfree(get_qf_name(sb, sbi, i));
 #endif
 	fscrypt_free_dummy_policy(&sbi->s_dummy_enc_policy);
-	ext4_blkdev_remove(sbi);
+	/* ext4_blkdev_remove() calls kill_bdev(), release bh before it. */
 	brelse(bh);
+	ext4_blkdev_remove(sbi);
 out_fail:
 	sb->s_fs_info = NULL;
 	kfree(sbi->s_blockgroup_lock);


^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-06-17  1:15 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-28 17:28 [PATCH] ext4: fix memory leak in ext4_fill_super Pavel Skripkin
2021-04-29 10:01 ` Vegard Nossum
2021-04-29 11:08   ` Pavel Skripkin
2021-04-29 11:33   ` Pavel Skripkin
2021-04-29 17:05     ` Theodore Ts'o
2021-04-29 19:20       ` Pavel Skripkin
2021-04-29 20:09       ` Pavel Skripkin
2021-04-29 21:41         ` Theodore Ts'o
2021-04-29 22:05           ` Pavel Skripkin
2021-04-30  3:44             ` Theodore Ts'o
2021-04-30 18:50               ` [PATCH v2] " Pavel Skripkin
2021-05-17 13:40                 ` Pavel Skripkin
2021-05-17 18:34                   ` Pavel Skripkin
2021-06-05 12:52                     ` [RESEND PATCH " Pavel Skripkin
2021-06-17  1:15                 ` [PATCH " Theodore Ts'o
2021-04-28 22:19 [PATCH] " Alexey Makhalov
2021-05-21  4:43 ` Theodore Y. Ts'o
2021-05-21  7:43   ` Alexey Makhalov
2021-05-21 14:29     ` Theodore Y. Ts'o
2021-05-21 16:12       ` Alexey Makhalov
2021-06-08 12:23 FAILED: patch "[PATCH] ext4: fix memory leak in ext4_fill_super" failed to apply to 5.4-stable tree gregkh
2021-06-08 21:02 ` [PATCH] ext4: fix memory leak in ext4_fill_super Alexey Makhalov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.