linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Dave Chinner <dchinner@redhat.com>,
	Brian Foster <bfoster@redhat.com>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	Sasha Levin <sashal@kernel.org>,
	linux-xfs@vger.kernel.org
Subject: [PATCH AUTOSEL 5.7 31/61] xfs: Don't allow logging of XFS_ISTALE inodes
Date: Fri, 21 Aug 2020 12:15:15 -0400	[thread overview]
Message-ID: <20200821161545.347622-31-sashal@kernel.org> (raw)
In-Reply-To: <20200821161545.347622-1-sashal@kernel.org>

From: Dave Chinner <dchinner@redhat.com>

[ Upstream commit 96355d5a1f0ee6dcc182c37db4894ec0c29f1692 ]

In tracking down a problem in this patchset, I discovered we are
reclaiming dirty stale inodes. This wasn't discovered until inodes
were always attached to the cluster buffer and then the rcu callback
that freed inodes was assert failing because the inode still had an
active pointer to the cluster buffer after it had been reclaimed.

Debugging the issue indicated that this was a pre-existing issue
resulting from the way the inodes are handled in xfs_inactive_ifree.
When we free a cluster buffer from xfs_ifree_cluster, all the inodes
in cache are marked XFS_ISTALE. Those that are clean have nothing
else done to them and so eventually get cleaned up by background
reclaim. i.e. it is assumed we'll never dirty/relog an inode marked
XFS_ISTALE.

On journal commit dirty stale inodes as are handled by both
buffer and inode log items to run though xfs_istale_done() and
removed from the AIL (buffer log item commit) or the log item will
simply unpin it because the buffer log item will clean it. What happens
to any specific inode is entirely dependent on which log item wins
the commit race, but the result is the same - stale inodes are
clean, not attached to the cluster buffer, and not in the AIL. Hence
inode reclaim can just free these inodes without further care.

However, if the stale inode is relogged, it gets dirtied again and
relogged into the CIL. Most of the time this isn't an issue, because
relogging simply changes the inode's location in the current
checkpoint. Problems arise, however, when the CIL checkpoints
between two transactions in the xfs_inactive_ifree() deferops
processing. This results in the XFS_ISTALE inode being redirtied
and inserted into the CIL without any of the other stale cluster
buffer infrastructure being in place.

Hence on journal commit, it simply gets unpinned, so it remains
dirty in memory. Everything in inode writeback avoids XFS_ISTALE
inodes so it can't be written back, and it is not tracked in the AIL
so there's not even a trigger to attempt to clean the inode. Hence
the inode just sits dirty in memory until inode reclaim comes along,
sees that it is XFS_ISTALE, and goes to reclaim it. This reclaiming
of a dirty inode caused use after free, list corruptions and other
nasty issues later in this patchset.

Hence this patch addresses a violation of the "never log XFS_ISTALE
inodes" caused by the deferops processing rolling a transaction
and relogging a stale inode in xfs_inactive_free. It also adds a
bunch of asserts to catch this problem in debug kernels so that
we don't reintroduce this problem in future.

Reproducer for this issue was generic/558 on a v4 filesystem.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/xfs/libxfs/xfs_trans_inode.c |  2 ++
 fs/xfs/xfs_icache.c             |  3 ++-
 fs/xfs/xfs_inode.c              | 25 ++++++++++++++++++++++---
 3 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c
index 2b8ccb5b975df..e59507a24a839 100644
--- a/fs/xfs/libxfs/xfs_trans_inode.c
+++ b/fs/xfs/libxfs/xfs_trans_inode.c
@@ -36,6 +36,7 @@ xfs_trans_ijoin(
 
 	ASSERT(iip->ili_lock_flags == 0);
 	iip->ili_lock_flags = lock_flags;
+	ASSERT(!xfs_iflags_test(ip, XFS_ISTALE));
 
 	/*
 	 * Get a log_item_desc to point at the new item.
@@ -89,6 +90,7 @@ xfs_trans_log_inode(
 
 	ASSERT(ip->i_itemp != NULL);
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
+	ASSERT(!xfs_iflags_test(ip, XFS_ISTALE));
 
 	/*
 	 * Don't bother with i_lock for the I_DIRTY_TIME check here, as races
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 8bf1d15be3f6a..67c232283a171 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1136,7 +1136,7 @@ xfs_reclaim_inode(
 			goto out_ifunlock;
 		xfs_iunpin_wait(ip);
 	}
-	if (xfs_iflags_test(ip, XFS_ISTALE) || xfs_inode_clean(ip)) {
+	if (xfs_inode_clean(ip)) {
 		xfs_ifunlock(ip);
 		goto reclaim;
 	}
@@ -1223,6 +1223,7 @@ xfs_reclaim_inode(
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_qm_dqdetach(ip);
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	ASSERT(xfs_inode_clean(ip));
 
 	__xfs_inode_free(ip);
 	return error;
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 8845faa8161a9..e38dd625e914b 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1744,10 +1744,31 @@ xfs_inactive_ifree(
 		return error;
 	}
 
+	/*
+	 * We do not hold the inode locked across the entire rolling transaction
+	 * here. We only need to hold it for the first transaction that
+	 * xfs_ifree() builds, which may mark the inode XFS_ISTALE if the
+	 * underlying cluster buffer is freed. Relogging an XFS_ISTALE inode
+	 * here breaks the relationship between cluster buffer invalidation and
+	 * stale inode invalidation on cluster buffer item journal commit
+	 * completion, and can result in leaving dirty stale inodes hanging
+	 * around in memory.
+	 *
+	 * We have no need for serialising this inode operation against other
+	 * operations - we freed the inode and hence reallocation is required
+	 * and that will serialise on reallocating the space the deferops need
+	 * to free. Hence we can unlock the inode on the first commit of
+	 * the transaction rather than roll it right through the deferops. This
+	 * avoids relogging the XFS_ISTALE inode.
+	 *
+	 * We check that xfs_ifree() hasn't grown an internal transaction roll
+	 * by asserting that the inode is still locked when it returns.
+	 */
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	xfs_trans_ijoin(tp, ip, 0);
+	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
 
 	error = xfs_ifree(tp, ip);
+	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
 	if (error) {
 		/*
 		 * If we fail to free the inode, shut down.  The cancel
@@ -1760,7 +1781,6 @@ xfs_inactive_ifree(
 			xfs_force_shutdown(mp, SHUTDOWN_META_IO_ERROR);
 		}
 		xfs_trans_cancel(tp);
-		xfs_iunlock(ip, XFS_ILOCK_EXCL);
 		return error;
 	}
 
@@ -1778,7 +1798,6 @@ xfs_inactive_ifree(
 		xfs_notice(mp, "%s: xfs_trans_commit returned error %d",
 			__func__, error);
 
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	return 0;
 }
 
-- 
2.25.1


  parent reply	other threads:[~2020-08-21 17:19 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-21 16:14 [PATCH AUTOSEL 5.7 01/61] ALSA: hda/hdmi: Add quirk to force connectivity Sasha Levin
2020-08-21 16:14 ` [PATCH AUTOSEL 5.7 02/61] ALSA: pci: delete repeated words in comments Sasha Levin
2020-08-21 16:14 ` [PATCH AUTOSEL 5.7 03/61] ALSA: hda/realtek: Fix pin default on Intel NUC 8 Rugged Sasha Levin
2020-08-21 16:14 ` [PATCH AUTOSEL 5.7 04/61] ALSA: hda/hdmi: Use force connectivity quirk on another HP desktop Sasha Levin
2020-08-21 16:14 ` [PATCH AUTOSEL 5.7 05/61] drm/amdgpu: fix RAS memory leak in error case Sasha Levin
2020-08-21 16:14 ` [PATCH AUTOSEL 5.7 06/61] EDAC/mc: Call edac_inc_ue_error() before panic Sasha Levin
2020-08-21 16:14 ` [PATCH AUTOSEL 5.7 07/61] ASoC: img: Fix a reference count leak in img_i2s_in_set_fmt Sasha Levin
2020-08-21 16:14 ` [PATCH AUTOSEL 5.7 08/61] ASoC: img-parallel-out: Fix a reference count leak Sasha Levin
2020-08-21 16:14 ` [PATCH AUTOSEL 5.7 09/61] ASoC: tegra: Fix reference count leaks Sasha Levin
2020-08-21 16:14 ` [PATCH AUTOSEL 5.7 10/61] mfd: intel-lpss: Add Intel Emmitsburg PCH PCI IDs Sasha Levin
2020-08-21 16:14 ` [PATCH AUTOSEL 5.7 11/61] arm64: dts: qcom: msm8916: Pull down PDM GPIOs during sleep Sasha Levin
2020-08-21 16:14 ` [PATCH AUTOSEL 5.7 12/61] powerpc/xive: Ignore kmemleak false positives Sasha Levin
2020-08-21 16:14 ` [PATCH AUTOSEL 5.7 13/61] media: pci: ttpci: av7110: fix possible buffer overflow caused by bad DMA value in debiirq() Sasha Levin
2020-08-21 16:14 ` [PATCH AUTOSEL 5.7 14/61] gcc-plugins/stackleak: Don't instrument itself Sasha Levin
2020-08-21 16:14 ` [PATCH AUTOSEL 5.7 15/61] blktrace: ensure our debugfs dir exists Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 16/61] staging: rts5208: fix memleaks on error handling paths in probe Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 17/61] scsi: target: tcmu: Fix crash on ARM during cmd completion Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 18/61] mfd: intel-lpss: Add Intel Tiger Lake PCH-H PCI IDs Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 19/61] iommu/iova: Don't BUG on invalid PFNs Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 20/61] platform/chrome: cros_ec_sensorhub: Fix EC timestamp overflow Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 21/61] drm/amdkfd: Fix reference count leaks Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 22/61] drm/radeon: fix multiple reference count leak Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 23/61] drm/amdgpu: fix ref count leak in amdgpu_driver_open_kms Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 24/61] drm/amd/display: fix ref count leak in amdgpu_drm_ioctl Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 25/61] drm/amdgpu: fix ref count leak in amdgpu_display_crtc_set_config Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 26/61] drm/amdgpu/display: fix ref count leak when pm_runtime_get_sync fails Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 27/61] drm/amdgpu/fence: " Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 28/61] drm/amdkfd: " Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 29/61] drm/amdgpu/pm: " Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 30/61] scsi: lpfc: Fix shost refcount mismatch when deleting vport Sasha Levin
2020-08-21 16:15 ` Sasha Levin [this message]
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 32/61] scsi: target: Fix xcopy sess release leak Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 33/61] selftests/powerpc: Purge extra count_pmc() calls of ebb selftests Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 34/61] f2fs: remove write attribute of main_blkaddr sysfs node Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 35/61] f2fs: fix error path in do_recover_data() Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 36/61] omapfb: fix multiple reference count leaks due to pm_runtime_get_sync Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 37/61] PCI: Fix pci_create_slot() reference count leak Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 38/61] ARM: dts: ls1021a: output PPS signal on FIPER2 Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 39/61] rtlwifi: rtl8192cu: Prevent leaking urb Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 40/61] mips/vdso: Fix resource leaks in genvdso.c Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 41/61] ALSA: hda: Add support for Loongson 7A1000 controller Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 42/61] gpu: host1x: Put gather's BO on pinning error Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 43/61] cec-api: prevent leaking memory through hole in structure Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 44/61] ASoC: Intel: sof_sdw_rt711: remove properties in card remove Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 45/61] HID: quirks: add NOGET quirk for Logitech GROUP Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 46/61] f2fs: fix use-after-free issue Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 47/61] drm/nouveau/drm/noveau: fix reference count leak in nouveau_fbcon_open Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 48/61] drm/nouveau: fix reference count leak in nv50_disp_atomic_commit Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 49/61] drm/nouveau: Fix reference count leak in nouveau_connector_detect Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 50/61] locking/lockdep: Fix overflow in presentation of average lock-time Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 51/61] btrfs: file: reserve qgroup space after the hole punch range is locked Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 52/61] btrfs: make btrfs_qgroup_check_reserved_leak take btrfs_inode Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 53/61] scsi: iscsi: Do not put host in iscsi_set_flashnode_param() Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 54/61] RDMA/efa: Add EFA 0xefa1 PCI ID Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 55/61] netfilter: nf_tables: report EEXIST on overlaps Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 56/61] ceph: fix potential mdsc use-after-free crash Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 57/61] ceph: do not access the kiocb after aio requests Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 58/61] scsi: fcoe: Memory leak fix in fcoe_sysfs_fcf_del() Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 59/61] i2c: i801: Add support for Intel Tiger Lake PCH-H Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 60/61] EDAC/ie31200: Fallback if host bridge device is already initialized Sasha Levin
2020-08-21 16:15 ` [PATCH AUTOSEL 5.7 61/61] hugetlbfs: prevent filesystem stacking of hugetlbfs Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200821161545.347622-31-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).