All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Brian Foster <bfoster@redhat.com>, Christoph Hellwig <hch@lst.de>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Chandan Babu R <chandan.babu@oracle.com>
Subject: [PATCH 5.4 38/74] xfs: use ordered buffers to initialize dquot buffers during quotacheck
Date: Tue,  8 Nov 2022 14:39:06 +0100	[thread overview]
Message-ID: <20221108133335.284786255@linuxfoundation.org> (raw)
In-Reply-To: <20221108133333.659601604@linuxfoundation.org>

From: "Darrick J. Wong" <darrick.wong@oracle.com>

commit 78bba5c812cc651cee51b64b786be926ab7fe2a9 upstream.

While QAing the new xfs_repair quotacheck code, I uncovered a quota
corruption bug resulting from a bad interaction between dquot buffer
initialization and quotacheck.  The bug can be reproduced with the
following sequence:

# mkfs.xfs -f /dev/sdf
# mount /dev/sdf /opt -o usrquota
# su nobody -s /bin/bash -c 'touch /opt/barf'
# sync
# xfs_quota -x -c 'report -ahi' /opt
User quota on /opt (/dev/sdf)
                        Inodes
User ID      Used   Soft   Hard Warn/Grace
---------- ---------------------------------
root            3      0      0  00 [------]
nobody          1      0      0  00 [------]

# xfs_io -x -c 'shutdown' /opt
# umount /opt
# mount /dev/sdf /opt -o usrquota
# touch /opt/man2
# xfs_quota -x -c 'report -ahi' /opt
User quota on /opt (/dev/sdf)
                        Inodes
User ID      Used   Soft   Hard Warn/Grace
---------- ---------------------------------
root            1      0      0  00 [------]
nobody          1      0      0  00 [------]

# umount /opt

Notice how the initial quotacheck set the root dquot icount to 3
(rootino, rbmino, rsumino), but after shutdown -> remount -> recovery,
xfs_quota reports that the root dquot has only 1 icount.  We haven't
deleted anything from the filesystem, which means that quota is now
under-counting.  This behavior is not limited to icount or the root
dquot, but this is the shortest reproducer.

I traced the cause of this discrepancy to the way that we handle ondisk
dquot updates during quotacheck vs. regular fs activity.  Normally, when
we allocate a disk block for a dquot, we log the buffer as a regular
(dquot) buffer.  Subsequent updates to the dquots backed by that block
are done via separate dquot log item updates, which means that they
depend on the logged buffer update being written to disk before the
dquot items.  Because individual dquots have their own LSN fields, that
initial dquot buffer must always be recovered.

However, the story changes for quotacheck, which can cause dquot block
allocations but persists the final dquot counter values via a delwri
list.  Because recovery doesn't gate dquot buffer replay on an LSN, this
means that the initial dquot buffer can be replayed over the (newer)
contents that were delwritten at the end of quotacheck.  In effect, this
re-initializes the dquot counters after they've been updated.  If the
log does not contain any other dquot items to recover, the obsolete
dquot contents will not be corrected by log recovery.

Because quotacheck uses a transaction to log the setting of the CHKD
flags in the superblock, we skip quotacheck during the second mount
call, which allows the incorrect icount to remain.

Fix this by changing the ondisk dquot initialization function to use
ordered buffers to write out fresh dquot blocks if it detects that we're
running quotacheck.  If the system goes down before quotacheck can
complete, the CHKD flags will not be set in the superblock and the next
mount will run quotacheck again, which can fix uninitialized dquot
buffers.  This requires amending the defer code to maintaine ordered
buffer state across defer rolls for the sake of the dquot allocation
code.

For regular operations we preserve the current behavior since the dquot
items require properly initialized ondisk dquot records.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/xfs/libxfs/xfs_defer.c |   10 +++++++-
 fs/xfs/xfs_dquot.c        |   56 +++++++++++++++++++++++++++++++++++-----------
 2 files changed, 52 insertions(+), 14 deletions(-)

--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -234,10 +234,13 @@ xfs_defer_trans_roll(
 	struct xfs_log_item		*lip;
 	struct xfs_buf			*bplist[XFS_DEFER_OPS_NR_BUFS];
 	struct xfs_inode		*iplist[XFS_DEFER_OPS_NR_INODES];
+	unsigned int			ordered = 0; /* bitmap */
 	int				bpcount = 0, ipcount = 0;
 	int				i;
 	int				error;
 
+	BUILD_BUG_ON(NBBY * sizeof(ordered) < XFS_DEFER_OPS_NR_BUFS);
+
 	list_for_each_entry(lip, &tp->t_items, li_trans) {
 		switch (lip->li_type) {
 		case XFS_LI_BUF:
@@ -248,7 +251,10 @@ xfs_defer_trans_roll(
 					ASSERT(0);
 					return -EFSCORRUPTED;
 				}
-				xfs_trans_dirty_buf(tp, bli->bli_buf);
+				if (bli->bli_flags & XFS_BLI_ORDERED)
+					ordered |= (1U << bpcount);
+				else
+					xfs_trans_dirty_buf(tp, bli->bli_buf);
 				bplist[bpcount++] = bli->bli_buf;
 			}
 			break;
@@ -289,6 +295,8 @@ xfs_defer_trans_roll(
 	/* Rejoin the buffers and dirty them so the log moves forward. */
 	for (i = 0; i < bpcount; i++) {
 		xfs_trans_bjoin(tp, bplist[i]);
+		if (ordered & (1U << i))
+			xfs_trans_ordered_buf(tp, bplist[i]);
 		xfs_trans_bhold(tp, bplist[i]);
 	}
 
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -205,16 +205,18 @@ xfs_qm_adjust_dqtimers(
  */
 STATIC void
 xfs_qm_init_dquot_blk(
-	xfs_trans_t	*tp,
-	xfs_mount_t	*mp,
-	xfs_dqid_t	id,
-	uint		type,
-	xfs_buf_t	*bp)
+	struct xfs_trans	*tp,
+	struct xfs_mount	*mp,
+	xfs_dqid_t		id,
+	uint			type,
+	struct xfs_buf		*bp)
 {
 	struct xfs_quotainfo	*q = mp->m_quotainfo;
-	xfs_dqblk_t	*d;
-	xfs_dqid_t	curid;
-	int		i;
+	struct xfs_dqblk	*d;
+	xfs_dqid_t		curid;
+	unsigned int		qflag;
+	unsigned int		blftype;
+	int			i;
 
 	ASSERT(tp);
 	ASSERT(xfs_buf_islocked(bp));
@@ -238,11 +240,39 @@ xfs_qm_init_dquot_blk(
 		}
 	}
 
-	xfs_trans_dquot_buf(tp, bp,
-			    (type & XFS_DQ_USER ? XFS_BLF_UDQUOT_BUF :
-			    ((type & XFS_DQ_PROJ) ? XFS_BLF_PDQUOT_BUF :
-			     XFS_BLF_GDQUOT_BUF)));
-	xfs_trans_log_buf(tp, bp, 0, BBTOB(q->qi_dqchunklen) - 1);
+	if (type & XFS_DQ_USER) {
+		qflag = XFS_UQUOTA_CHKD;
+		blftype = XFS_BLF_UDQUOT_BUF;
+	} else if (type & XFS_DQ_PROJ) {
+		qflag = XFS_PQUOTA_CHKD;
+		blftype = XFS_BLF_PDQUOT_BUF;
+	} else {
+		qflag = XFS_GQUOTA_CHKD;
+		blftype = XFS_BLF_GDQUOT_BUF;
+	}
+
+	xfs_trans_dquot_buf(tp, bp, blftype);
+
+	/*
+	 * quotacheck uses delayed writes to update all the dquots on disk in an
+	 * efficient manner instead of logging the individual dquot changes as
+	 * they are made. However if we log the buffer allocated here and crash
+	 * after quotacheck while the logged initialisation is still in the
+	 * active region of the log, log recovery can replay the dquot buffer
+	 * initialisation over the top of the checked dquots and corrupt quota
+	 * accounting.
+	 *
+	 * To avoid this problem, quotacheck cannot log the initialised buffer.
+	 * We must still dirty the buffer and write it back before the
+	 * allocation transaction clears the log. Therefore, mark the buffer as
+	 * ordered instead of logging it directly. This is safe for quotacheck
+	 * because it detects and repairs allocated but initialized dquot blocks
+	 * in the quota inodes.
+	 */
+	if (!(mp->m_qflags & qflag))
+		xfs_trans_ordered_buf(tp, bp);
+	else
+		xfs_trans_log_buf(tp, bp, 0, BBTOB(q->qi_dqchunklen) - 1);
 }
 
 /*



  parent reply	other threads:[~2022-11-08 13:49 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-08 13:38 [PATCH 5.4 00/74] 5.4.224-rc1 review Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 01/74] RDMA/cma: Use output interface for net_dev check Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 02/74] IB/hfi1: Correctly move list in sc_disable() Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 03/74] NFSv4.1: Handle RECLAIM_COMPLETE trunking errors Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 04/74] NFSv4.1: We must always send RECLAIM_COMPLETE after a reboot Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 05/74] nfs4: Fix kmemleak when allocate slot failed Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 06/74] net: dsa: Fix possible memory leaks in dsa_loop_init() Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 07/74] RDMA/core: Fix null-ptr-deref in ib_core_cleanup() Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 08/74] RDMA/qedr: clean up work queue on failure in qedr_alloc_resources() Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 09/74] nfc: s3fwrn5: Fix potential memory leak in s3fwrn5_nci_send() Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 10/74] nfc: nfcmrvl: Fix potential memory leak in nfcmrvl_i2c_nci_send() Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 11/74] net: fec: fix improper use of NETDEV_TX_BUSY Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 12/74] ata: pata_legacy: fix pdc20230_set_piomode() Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 13/74] net: sched: Fix use after free in red_enqueue() Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 14/74] net: tun: fix bugs for oversize packet when napi frags enabled Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 15/74] netfilter: nf_tables: release flow rule object from commit path Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 16/74] ipvs: use explicitly signed chars Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 17/74] ipvs: fix WARNING in __ip_vs_cleanup_batch() Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 18/74] ipvs: fix WARNING in ip_vs_app_net_cleanup() Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 19/74] rose: Fix NULL pointer dereference in rose_send_frame() Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 20/74] mISDN: fix possible memory leak in mISDN_register_device() Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 21/74] isdn: mISDN: netjet: fix wrong check of device registration Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 22/74] btrfs: fix inode list leak during backref walking at resolve_indirect_refs() Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 23/74] btrfs: fix inode list leak during backref walking at find_parent_nodes() Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 24/74] btrfs: fix ulist leaks in error paths of qgroup self tests Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 25/74] Bluetooth: L2CAP: Fix use-after-free caused by l2cap_reassemble_sdu Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 26/74] Bluetooth: L2CAP: fix use-after-free in l2cap_conn_del() Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 27/74] net: mdio: fix undefined behavior in bit shift for __mdiobus_register Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 28/74] net, neigh: Fix null-ptr-deref in neigh_table_clear() Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 29/74] ipv6: fix WARNING in ip6_route_net_exit_late() Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 30/74] media: s5p_cec: limit msg.len to CEC_MAX_MSG_SIZE Greg Kroah-Hartman
2022-11-08 13:38 ` [PATCH 5.4 31/74] media: cros-ec-cec: " Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 32/74] media: dvb-frontends/drxk: initialize err to 0 Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 33/74] media: meson: vdec: fix possible refcount leak in vdec_probe() Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 34/74] scsi: core: Restrict legal sdev_state transitions via sysfs Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 35/74] HID: saitek: add madcatz variant of MMO7 mouse device ID Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 36/74] i2c: xiic: Add platform module alias Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 37/74] xfs: dont fail verifier on empty attr3 leaf block Greg Kroah-Hartman
2022-11-08 13:39 ` Greg Kroah-Hartman [this message]
2022-11-08 13:39 ` [PATCH 5.4 39/74] xfs: gut error handling in xfs_trans_unreserve_and_mod_sb() Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 40/74] xfs: group quota should return EDQUOT when prj quota enabled Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 41/74] xfs: dont fail unwritten extent conversion on writeback due to edquot Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 42/74] xfs: Add the missed xfs_perag_put() for xfs_ifree_cluster() Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 43/74] Bluetooth: L2CAP: Fix attempting to access uninitialized memory Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 44/74] block, bfq: protect bfqd->queued by bfqd->lock Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 45/74] tcp/udp: Fix memory leak in ipv6_renew_options() Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 46/74] memcg: enable accounting of ipc resources Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 47/74] binder: fix UAF of alloc->vma in race with munmap() Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 48/74] btrfs: fix type of parameter generation in btrfs_get_dentry Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 49/74] tcp/udp: Make early_demux back namespacified Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 50/74] kprobe: reverse kp->flags when arm_kprobe failed Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 51/74] tools/nolibc/string: Fix memcmp() implementation Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 52/74] tracing/histogram: Update document for KEYS_MAX size Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 53/74] capabilities: fix potential memleak on error path from vfs_getxattr_alloc() Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 54/74] fuse: add file_modified() to fallocate Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 55/74] efi: random: reduce seed size to 32 bytes Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 56/74] perf/x86/intel: Fix pebs event constraints for ICL Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 57/74] perf/x86/intel: Add Cooper Lake stepping to isolation_ucodes[] Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 58/74] ALSA: usb-audio: Add quirks for MacroSilicon MS2100/MS2106 devices Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 59/74] parisc: Make 8250_gsc driver dependend on CONFIG_PARISC Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 60/74] parisc: Export iosapic_serial_irq() symbol for serial port driver Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 61/74] parisc: Avoid printing the hardware path twice Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 62/74] ext4: fix warning in ext4_da_release_space Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 63/74] ext4: fix BUG_ON() when directory entry has invalid rec_len Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 64/74] KVM: x86: Mask off reserved bits in CPUID.8000001AH Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 65/74] KVM: x86: Mask off reserved bits in CPUID.80000008H Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 66/74] KVM: x86: emulator: em_sysexit should update ctxt->mode Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 67/74] KVM: x86: emulator: introduce emulator_recalc_and_set_mode Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 68/74] KVM: x86: emulator: update the emulation mode after CR0 write Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 69/74] mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout based on program/erase times Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 70/74] drm/rockchip: dsi: Force synchronous probe Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 71/74] drm/i915/sdvo: Filter out invalid outputs more sensibly Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 72/74] drm/i915/sdvo: Setup DDC fully before output init Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 73/74] wifi: brcmfmac: Fix potential buffer overflow in brcmf_fweh_event_worker() Greg Kroah-Hartman
2022-11-08 13:39 ` [PATCH 5.4 74/74] ipc: remove memcg accounting for sops objects in do_semtimedop() Greg Kroah-Hartman
2022-11-08 19:02 ` [PATCH 5.4 00/74] 5.4.224-rc1 review Florian Fainelli
2022-11-09  2:57 ` Guenter Roeck
2022-11-09 10:47 ` Jon Hunter
2022-11-09 13:31 ` Naresh Kamboju
2022-11-10  1:55 ` Shuah Khan
2022-11-10 10:49 ` Sudip Mukherjee
2022-11-11  1:19 ` zhouzhixiu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221108133335.284786255@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=bfoster@redhat.com \
    --cc=chandan.babu@oracle.com \
    --cc=darrick.wong@oracle.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.