All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Omar Sandoval <osandov@fb.com>,
	David Sterba <dsterba@suse.com>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.19 67/67] Btrfs: fix missing delayed iputs on unmount
Date: Thu, 20 Dec 2018 10:19:19 +0100	[thread overview]
Message-ID: <20181220085906.183047324@linuxfoundation.org> (raw)
In-Reply-To: <20181220085903.562090333@linuxfoundation.org>

4.19-stable review patch.  If anyone has any objections, please let me know.

------------------

[ Upstream commit d6fd0ae25c6495674dc5a41a8d16bc8e0073276d ]

There's a race between close_ctree() and cleaner_kthread().
close_ctree() sets btrfs_fs_closing(), and the cleaner stops when it
sees it set, but this is racy; the cleaner might have already checked
the bit and could be cleaning stuff. In particular, if it deletes unused
block groups, it will create delayed iputs for the free space cache
inodes. As of "btrfs: don't run delayed_iputs in commit", we're no
longer running delayed iputs after a commit. Therefore, if the cleaner
creates more delayed iputs after delayed iputs are run in
btrfs_commit_super(), we will leak inodes on unmount and get a busy
inode crash from the VFS.

Fix it by parking the cleaner before we actually close anything. Then,
any remaining delayed iputs will always be handled in
btrfs_commit_super(). This also ensures that the commit in close_ctree()
is really the last commit, so we can get rid of the commit in
cleaner_kthread().

The fstest/generic/475 followed by 476 can trigger a crash that
manifests as a slab corruption caused by accessing the freed kthread
structure by a wake up function. Sample trace:

[ 5657.077612] BUG: unable to handle kernel NULL pointer dereference at 00000000000000cc
[ 5657.079432] PGD 1c57a067 P4D 1c57a067 PUD da10067 PMD 0
[ 5657.080661] Oops: 0000 [#1] PREEMPT SMP
[ 5657.081592] CPU: 1 PID: 5157 Comm: fsstress Tainted: G        W         4.19.0-rc8-default+ #323
[ 5657.083703] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
[ 5657.086577] RIP: 0010:shrink_page_list+0x2f9/0xe90
[ 5657.091937] RSP: 0018:ffffb5c745c8f728 EFLAGS: 00010287
[ 5657.092953] RAX: 0000000000000074 RBX: ffffb5c745c8f830 RCX: 0000000000000000
[ 5657.094590] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9a8747fdf3d0
[ 5657.095987] RBP: ffffb5c745c8f9e0 R08: 0000000000000000 R09: 0000000000000000
[ 5657.097159] R10: ffff9a8747fdf5e8 R11: 0000000000000000 R12: ffffb5c745c8f788
[ 5657.098513] R13: ffff9a877f6ff2c0 R14: ffff9a877f6ff2c8 R15: dead000000000200
[ 5657.099689] FS:  00007f948d853b80(0000) GS:ffff9a877d600000(0000) knlGS:0000000000000000
[ 5657.101032] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5657.101953] CR2: 00000000000000cc CR3: 00000000684bd000 CR4: 00000000000006e0
[ 5657.103159] Call Trace:
[ 5657.103776]  shrink_inactive_list+0x194/0x410
[ 5657.104671]  shrink_node_memcg.constprop.84+0x39a/0x6a0
[ 5657.105750]  shrink_node+0x62/0x1c0
[ 5657.106529]  try_to_free_pages+0x1a4/0x500
[ 5657.107408]  __alloc_pages_slowpath+0x2c9/0xb20
[ 5657.108418]  __alloc_pages_nodemask+0x268/0x2b0
[ 5657.109348]  kmalloc_large_node+0x37/0x90
[ 5657.110205]  __kmalloc_node+0x236/0x310
[ 5657.111014]  kvmalloc_node+0x3e/0x70

Fixes: 30928e9baac2 ("btrfs: don't run delayed_iputs in commit")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add trace ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/disk-io.c | 51 ++++++++++++++--------------------------------
 1 file changed, 15 insertions(+), 36 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 834a3f5ef642..d4a7f7ca4145 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1656,9 +1656,8 @@ static int cleaner_kthread(void *arg)
 	struct btrfs_root *root = arg;
 	struct btrfs_fs_info *fs_info = root->fs_info;
 	int again;
-	struct btrfs_trans_handle *trans;
 
-	do {
+	while (1) {
 		again = 0;
 
 		/* Make the cleaner go to sleep early. */
@@ -1707,42 +1706,16 @@ static int cleaner_kthread(void *arg)
 		 */
 		btrfs_delete_unused_bgs(fs_info);
 sleep:
+		if (kthread_should_park())
+			kthread_parkme();
+		if (kthread_should_stop())
+			return 0;
 		if (!again) {
 			set_current_state(TASK_INTERRUPTIBLE);
-			if (!kthread_should_stop())
-				schedule();
+			schedule();
 			__set_current_state(TASK_RUNNING);
 		}
-	} while (!kthread_should_stop());
-
-	/*
-	 * Transaction kthread is stopped before us and wakes us up.
-	 * However we might have started a new transaction and COWed some
-	 * tree blocks when deleting unused block groups for example. So
-	 * make sure we commit the transaction we started to have a clean
-	 * shutdown when evicting the btree inode - if it has dirty pages
-	 * when we do the final iput() on it, eviction will trigger a
-	 * writeback for it which will fail with null pointer dereferences
-	 * since work queues and other resources were already released and
-	 * destroyed by the time the iput/eviction/writeback is made.
-	 */
-	trans = btrfs_attach_transaction(root);
-	if (IS_ERR(trans)) {
-		if (PTR_ERR(trans) != -ENOENT)
-			btrfs_err(fs_info,
-				  "cleaner transaction attach returned %ld",
-				  PTR_ERR(trans));
-	} else {
-		int ret;
-
-		ret = btrfs_commit_transaction(trans);
-		if (ret)
-			btrfs_err(fs_info,
-				  "cleaner open transaction commit returned %d",
-				  ret);
 	}
-
-	return 0;
 }
 
 static int transaction_kthread(void *arg)
@@ -3923,6 +3896,13 @@ void close_ctree(struct btrfs_fs_info *fs_info)
 	int ret;
 
 	set_bit(BTRFS_FS_CLOSING_START, &fs_info->flags);
+	/*
+	 * We don't want the cleaner to start new transactions, add more delayed
+	 * iputs, etc. while we're closing. We can't use kthread_stop() yet
+	 * because that frees the task_struct, and the transaction kthread might
+	 * still try to wake up the cleaner.
+	 */
+	kthread_park(fs_info->cleaner_kthread);
 
 	/* wait for the qgroup rescan worker to stop */
 	btrfs_qgroup_wait_for_completion(fs_info, false);
@@ -3950,9 +3930,8 @@ void close_ctree(struct btrfs_fs_info *fs_info)
 
 	if (!sb_rdonly(fs_info->sb)) {
 		/*
-		 * If the cleaner thread is stopped and there are
-		 * block groups queued for removal, the deletion will be
-		 * skipped when we quit the cleaner thread.
+		 * The cleaner kthread is stopped, so do one final pass over
+		 * unused block groups.
 		 */
 		btrfs_delete_unused_bgs(fs_info);
 
-- 
2.19.1




  parent reply	other threads:[~2018-12-20  9:31 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-20  9:18 [PATCH 4.19 00/67] 4.19.12-stable review Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 01/67] locking/qspinlock: Re-order code Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 02/67] locking/qspinlock, x86: Provide liveness guarantee Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 03/67] IB/hfi1: Remove race conditions in user_sdma send path Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 04/67] mac80211_hwsim: fix module init error paths for netlink Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 05/67] Input: hyper-v - fix wakeup from suspend-to-idle Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 06/67] i2c: rcar: check bus state before reinitializing Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 07/67] scsi: libiscsi: Fix NULL pointer dereference in iscsi_eh_session_reset Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 08/67] scsi: vmw_pscsi: Rearrange code to avoid multiple calls to free_irq during unload Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 09/67] tools/bpf: fix two test_btf unit test cases Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 10/67] tools/bpf: add addition type tests to test_btf Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 11/67] net: ethernet: ave: Replace NET_IP_ALIGN with AVE_FRAME_HEADROOM Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 12/67] drm/amd/display: Fix 6x4K displays light-up on Vega20 (v2) Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 13/67] x86/earlyprintk/efi: Fix infinite loop on some screen widths Greg Kroah-Hartman
2018-12-20  9:18   ` Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 14/67] drm/msm: Fix task dump in gpu recovery Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 15/67] drm/msm/gpu: Fix a couple memory leaks in debugfs Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 16/67] drm/msm: fix handling of cmdstream offset Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 17/67] drm/msm/dsi: configure VCO rate for 10nm PLL driver Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 18/67] drm/msm: Grab a vblank reference when waiting for commit_done Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 19/67] drm/ttm: fix LRU handling in ttm_buffer_object_transfer Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 20/67] drm/amdgpu: wait for IB test on first device open Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 21/67] ARC: io.h: Implement reads{x}()/writes{x}() Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 22/67] net: stmmac: Move debugfs init/exit to ->probe()/->remove() Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 23/67] net: aquantia: fix rx checksum offload bits Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 24/67] bonding: fix 802.3ad state sent to partner when unbinding slave Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 25/67] bpf: Fix verifier log string check for bad alignment Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 26/67] liquidio: read sc->iq_no before release sc Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 27/67] nfs: dont dirty kernel pages read by direct-io Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 28/67] SUNRPC: Fix a potential race in xprt_connect() Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 29/67] sbus: char: add of_node_put() Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 30/67] drivers/sbus/char: " Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 31/67] drivers/tty: add missing of_node_put() Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 32/67] ide: pmac: add of_node_put() Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 33/67] drm/msm/hdmi: Enable HPD after HDMI IRQ is set up Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 34/67] drm/msm: dpu: Dont set legacy plane->crtc pointer Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 35/67] drm/msm: dpu: Fix "WARNING: invalid free of devm_ allocated data" Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 36/67] drm/msm: Fix error return checking Greg Kroah-Hartman
2018-12-20  9:18   ` Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 37/67] drm/amd/powerplay: issue pre-display settings for display change event Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 38/67] clk: mvebu: Off by one bugs in cp110_of_clk_get() Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 39/67] clk: mmp: Off by one in mmp_clk_add() Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 40/67] Input: synaptics - enable SMBus for HP 15-ay000 Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 41/67] Input: omap-keypad - fix keyboard debounce configuration Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 42/67] libata: whitelist all SAMSUNG MZ7KM* solid-state disks Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 43/67] macvlan: return correct error value Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 44/67] mv88e6060: disable hardware level MAC learning Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 45/67] net/mlx4_en: Fix build break when CONFIG_INET is off Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 46/67] bpf: check pending signals while verifying programs Greg Kroah-Hartman
2018-12-20  9:18 ` [PATCH 4.19 47/67] ARM: 8814/1: mm: improve/fix ARM v7_dma_inv_range() unaligned address handling Greg Kroah-Hartman
2018-12-20  9:19 ` [PATCH 4.19 48/67] ARM: 8815/1: V7M: align v7m_dma_inv_range() with v7 counterpart Greg Kroah-Hartman
2018-12-20  9:19 ` [PATCH 4.19 49/67] ARM: 8816/1: dma-mapping: fix potential uninitialized return Greg Kroah-Hartman
2018-12-20  9:19 ` [PATCH 4.19 50/67] ethernet: fman: fix wrong of_node_put() in probe function Greg Kroah-Hartman
2018-12-20  9:19 ` [PATCH 4.19 51/67] thermal: armada: fix legacy validity test sense Greg Kroah-Hartman
2018-12-20  9:19 ` [PATCH 4.19 52/67] net: mvpp2: fix detection of 10G SFP modules Greg Kroah-Hartman
2018-12-20  9:19 ` [PATCH 4.19 53/67] net: mvpp2: fix phylink handling of invalid PHY modes Greg Kroah-Hartman
2018-12-20  9:19 ` [PATCH 4.19 54/67] drm/amdgpu/vcn: Update vcn.cur_state during suspend Greg Kroah-Hartman
2018-12-20  9:19 ` [PATCH 4.19 55/67] tools/testing/nvdimm: Align test resources to 128M Greg Kroah-Hartman
2018-12-20  9:19 ` [PATCH 4.19 56/67] acpi/nfit: Fix user-initiated ARS to be "ARS-long" rather than "ARS-short" Greg Kroah-Hartman
2018-12-20  9:19 ` [PATCH 4.19 57/67] drm/ast: Fix connector leak during driver unload Greg Kroah-Hartman
2018-12-20  9:19 ` [PATCH 4.19 58/67] cifs: In Kconfig CONFIG_CIFS_POSIX needs depends on legacy (insecure cifs) Greg Kroah-Hartman
2018-12-20  9:19 ` [PATCH 4.19 59/67] vhost/vsock: fix reset orphans race with close timeout Greg Kroah-Hartman
2018-12-20  9:19 ` [PATCH 4.19 60/67] mlxsw: spectrum_switchdev: Fix VLAN device deletion via ioctl Greg Kroah-Hartman
2018-12-20  9:19 ` [PATCH 4.19 61/67] i2c: axxia: properly handle master timeout Greg Kroah-Hartman
2018-12-20  9:19 ` [PATCH 4.19 62/67] i2c: scmi: Fix probe error on devices with an empty SMB0001 ACPI device node Greg Kroah-Hartman
2018-12-20  9:19 ` [PATCH 4.19 63/67] i2c: uniphier: fix violation of tLOW requirement for Fast-mode Greg Kroah-Hartman
2018-12-20  9:19 ` [PATCH 4.19 64/67] i2c: uniphier-f: " Greg Kroah-Hartman
2018-12-20  9:19 ` [PATCH 4.19 65/67] nvme: validate controller state before rescheduling keep alive Greg Kroah-Hartman
2018-12-20  9:19 ` [PATCH 4.19 66/67] nvmet-rdma: fix response use after free Greg Kroah-Hartman
2018-12-20  9:19 ` Greg Kroah-Hartman [this message]
2018-12-20 15:03 ` [PATCH 4.19 00/67] 4.19.12-stable review Naresh Kamboju
2018-12-21  7:10   ` Greg Kroah-Hartman
2018-12-20 18:29 ` Guenter Roeck
2018-12-21  7:10   ` Greg Kroah-Hartman
2018-12-20 22:48 ` shuah
2018-12-21  7:09   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181220085906.183047324@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dsterba@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=osandov@fb.com \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.