linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Junxiao Bi <junxiao.bi@oracle.com>,
	Joseph Qi <joseph.qi@linux.alibaba.com>,
	Mark Fasheh <mark@fasheh.com>, Joel Becker <jlbec@evilplan.org>,
	Changwei Ge <gechangwei@live.cn>, Gang He <ghe@suse.com>,
	Jun Piao <piaojun@huawei.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 5.10 11/67] ocfs2: issue zeroout to EOF blocks
Date: Mon,  2 Aug 2021 15:44:34 +0200	[thread overview]
Message-ID: <20210802134339.402337965@linuxfoundation.org> (raw)
In-Reply-To: <20210802134339.023067817@linuxfoundation.org>

From: Junxiao Bi <junxiao.bi@oracle.com>

commit 9449ad33be8480f538b11a593e2dda2fb33ca06d upstream.

For punch holes in EOF blocks, fallocate used buffer write to zero the
EOF blocks in last cluster.  But since ->writepage will ignore EOF
pages, those zeros will not be flushed.

This "looks" ok as commit 6bba4471f0cc ("ocfs2: fix data corruption by
fallocate") will zero the EOF blocks when extend the file size, but it
isn't.  The problem happened on those EOF pages, before writeback, those
pages had DIRTY flag set and all buffer_head in them also had DIRTY flag
set, when writeback run by write_cache_pages(), DIRTY flag on the page
was cleared, but DIRTY flag on the buffer_head not.

When next write happened to those EOF pages, since buffer_head already
had DIRTY flag set, it would not mark page DIRTY again.  That made
writeback ignore them forever.  That will cause data corruption.  Even
directio write can't work because it will fail when trying to drop pages
caches before direct io, as it found the buffer_head for those pages
still had DIRTY flag set, then it will fall back to buffer io mode.

To make a summary of the issue, as writeback ingores EOF pages, once any
EOF page is generated, any write to it will only go to the page cache,
it will never be flushed to disk even file size extends and that page is
not EOF page any more.  The fix is to avoid zero EOF blocks with buffer
write.

The following code snippet from qemu-img could trigger the corruption.

  656   open("6b3711ae-3306-4bdd-823c-cf1c0060a095.conv.2", O_RDWR|O_DIRECT|O_CLOEXEC) = 11
  ...
  660   fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2275868672, 327680 <unfinished ...>
  660   fallocate(11, 0, 2275868672, 327680) = 0
  658   pwrite64(11, "

Link: https://lkml.kernel.org/r/20210722054923.24389-2-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/ocfs2/file.c |   99 +++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 60 insertions(+), 39 deletions(-)

--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1529,6 +1529,45 @@ static void ocfs2_truncate_cluster_pages
 	}
 }
 
+/*
+ * zero out partial blocks of one cluster.
+ *
+ * start: file offset where zero starts, will be made upper block aligned.
+ * len: it will be trimmed to the end of current cluster if "start + len"
+ *      is bigger than it.
+ */
+static int ocfs2_zeroout_partial_cluster(struct inode *inode,
+					u64 start, u64 len)
+{
+	int ret;
+	u64 start_block, end_block, nr_blocks;
+	u64 p_block, offset;
+	u32 cluster, p_cluster, nr_clusters;
+	struct super_block *sb = inode->i_sb;
+	u64 end = ocfs2_align_bytes_to_clusters(sb, start);
+
+	if (start + len < end)
+		end = start + len;
+
+	start_block = ocfs2_blocks_for_bytes(sb, start);
+	end_block = ocfs2_blocks_for_bytes(sb, end);
+	nr_blocks = end_block - start_block;
+	if (!nr_blocks)
+		return 0;
+
+	cluster = ocfs2_bytes_to_clusters(sb, start);
+	ret = ocfs2_get_clusters(inode, cluster, &p_cluster,
+				&nr_clusters, NULL);
+	if (ret)
+		return ret;
+	if (!p_cluster)
+		return 0;
+
+	offset = start_block - ocfs2_clusters_to_blocks(sb, cluster);
+	p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset;
+	return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS);
+}
+
 static int ocfs2_zero_partial_clusters(struct inode *inode,
 				       u64 start, u64 len)
 {
@@ -1538,6 +1577,7 @@ static int ocfs2_zero_partial_clusters(s
 	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
 	unsigned int csize = osb->s_clustersize;
 	handle_t *handle;
+	loff_t isize = i_size_read(inode);
 
 	/*
 	 * The "start" and "end" values are NOT necessarily part of
@@ -1558,6 +1598,26 @@ static int ocfs2_zero_partial_clusters(s
 	if ((start & (csize - 1)) == 0 && (end & (csize - 1)) == 0)
 		goto out;
 
+	/* No page cache for EOF blocks, issue zero out to disk. */
+	if (end > isize) {
+		/*
+		 * zeroout eof blocks in last cluster starting from
+		 * "isize" even "start" > "isize" because it is
+		 * complicated to zeroout just at "start" as "start"
+		 * may be not aligned with block size, buffer write
+		 * would be required to do that, but out of eof buffer
+		 * write is not supported.
+		 */
+		ret = ocfs2_zeroout_partial_cluster(inode, isize,
+					end - isize);
+		if (ret) {
+			mlog_errno(ret);
+			goto out;
+		}
+		if (start >= isize)
+			goto out;
+		end = isize;
+	}
 	handle = ocfs2_start_trans(osb, OCFS2_INODE_UPDATE_CREDITS);
 	if (IS_ERR(handle)) {
 		ret = PTR_ERR(handle);
@@ -1856,45 +1916,6 @@ out:
 }
 
 /*
- * zero out partial blocks of one cluster.
- *
- * start: file offset where zero starts, will be made upper block aligned.
- * len: it will be trimmed to the end of current cluster if "start + len"
- *      is bigger than it.
- */
-static int ocfs2_zeroout_partial_cluster(struct inode *inode,
-					u64 start, u64 len)
-{
-	int ret;
-	u64 start_block, end_block, nr_blocks;
-	u64 p_block, offset;
-	u32 cluster, p_cluster, nr_clusters;
-	struct super_block *sb = inode->i_sb;
-	u64 end = ocfs2_align_bytes_to_clusters(sb, start);
-
-	if (start + len < end)
-		end = start + len;
-
-	start_block = ocfs2_blocks_for_bytes(sb, start);
-	end_block = ocfs2_blocks_for_bytes(sb, end);
-	nr_blocks = end_block - start_block;
-	if (!nr_blocks)
-		return 0;
-
-	cluster = ocfs2_bytes_to_clusters(sb, start);
-	ret = ocfs2_get_clusters(inode, cluster, &p_cluster,
-				&nr_clusters, NULL);
-	if (ret)
-		return ret;
-	if (!p_cluster)
-		return 0;
-
-	offset = start_block - ocfs2_clusters_to_blocks(sb, cluster);
-	p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset;
-	return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS);
-}
-
-/*
  * Parts of this function taken from xfs_change_file_space()
  */
 static int __ocfs2_change_file_space(struct file *file, struct inode *inode,



  parent reply	other threads:[~2021-08-02 14:00 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-02 13:44 [PATCH 5.10 00/67] 5.10.56-rc1 review Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 01/67] io_uring: fix null-ptr-deref in io_sq_offload_start() Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 02/67] x86/asm: Ensure asm/proto.h can be included stand-alone Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 03/67] pipe: make pipe writes always wake up readers Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 04/67] btrfs: fix rw device counting in __btrfs_free_extra_devids Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 05/67] btrfs: mark compressed range uptodate only if all bio succeed Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 06/67] Revert "ACPI: resources: Add checks for ACPI IRQ override" Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 07/67] ACPI: DPTF: Fix reading of attributes Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 08/67] x86/kvm: fix vcpu-id indexed array sizes Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 09/67] KVM: add missing compat KVM_CLEAR_DIRTY_LOG Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 10/67] ocfs2: fix zero out valid data Greg Kroah-Hartman
2021-08-02 13:44 ` Greg Kroah-Hartman [this message]
2021-08-02 13:44 ` [PATCH 5.10 12/67] can: j1939: j1939_xtp_rx_dat_one(): fix rxtimer value between consecutive TP.DT to 750ms Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 13/67] can: raw: raw_setsockopt(): fix raw_rcv panic for sock UAF Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 14/67] can: peak_usb: pcan_usb_handle_bus_evt(): fix reading rxerr/txerr values Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 15/67] can: mcba_usb_start(): add missing urb->transfer_dma initialization Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 16/67] can: usb_8dev: fix memory leak Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 17/67] can: ems_usb: " Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 18/67] can: esd_usb2: " Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 19/67] alpha: register early reserved memory in memblock Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 20/67] HID: wacom: Re-enable touch by default for Cintiq 24HDT / 27QHDT Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 21/67] NIU: fix incorrect error return, missed in previous revert Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 22/67] drm/amd/display: ensure dentist display clock update finished in DCN20 Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 23/67] drm/amdgpu: Avoid printing of stack contents on firmware load error Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 24/67] drm/amdgpu: Fix resource leak on probe error path Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 25/67] blk-iocost: fix operation ordering in iocg_wake_fn() Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 26/67] nfc: nfcsim: fix use after free during module unload Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 27/67] cfg80211: Fix possible memory leak in function cfg80211_bss_update Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 28/67] RDMA/bnxt_re: Fix stats counters Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 29/67] bpf: Fix OOB read when printing XDP link fdinfo Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 30/67] mac80211: fix enabling 4-address mode on a sta vif after assoc Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 31/67] netfilter: conntrack: adjust stop timestamp to real expiry value Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 32/67] netfilter: nft_nat: allow to specify layer 4 protocol NAT only Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 33/67] i40e: Fix logic of disabling queues Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 34/67] i40e: Fix firmware LLDP agent related warning Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 35/67] i40e: Fix queue-to-TC mapping on Tx Greg Kroah-Hartman
2021-08-02 13:44 ` [PATCH 5.10 36/67] i40e: Fix log TC creation failure when max num of queues is exceeded Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 37/67] tipc: fix implicit-connect for SYN+ Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 38/67] tipc: fix sleeping in tipc accept routine Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 39/67] net: Set true network header for ECN decapsulation Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 40/67] net: qrtr: fix memory leaks Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 41/67] ionic: remove intr coalesce update from napi Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 42/67] ionic: fix up dim accounting for tx and rx Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 43/67] ionic: count csum_none when offload enabled Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 44/67] tipc: do not write skb_shinfo frags when doing decrytion Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 45/67] octeontx2-pf: Fix interface down flag on error Greg Kroah-Hartman
2021-08-02 17:40   ` Pavel Machek
2021-08-02 13:45 ` [PATCH 5.10 46/67] mlx4: Fix missing error code in mlx4_load_one() Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 47/67] KVM: x86: Check the right feature bit for MSR_KVM_ASYNC_PF_ACK access Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 48/67] net: llc: fix skb_over_panic Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 49/67] drm/msm/dpu: Fix sm8250_mdp register length Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 50/67] drm/msm/dp: Initialize the INTF_CONFIG register Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 51/67] skmsg: Make sk_psock_destroy() static Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 52/67] net/mlx5: Fix flow table chaining Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 53/67] net/mlx5e: Fix nullptr in mlx5e_hairpin_get_mdev() Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 54/67] sctp: fix return value check in __sctp_rcv_asconf_lookup Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 55/67] tulip: windbond-840: Fix missing pci_disable_device() in probe and remove Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 56/67] sis900: " Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 57/67] can: hi311x: fix a signedness bug in hi3110_cmd() Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 58/67] bpf: Introduce BPF nospec instruction for mitigating Spectre v4 Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 59/67] bpf: Fix leakage due to insufficient speculative store bypass mitigation Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 60/67] bpf: Remove superfluous aux sanitation on subprog rejection Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 61/67] bpf: verifier: Allocate idmap scratch in verifier env Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 62/67] bpf: Fix pointer arithmetic mask tightening under state pruning Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 63/67] SMB3: fix readpage for large swap cache Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 64/67] powerpc/pseries: Fix regression while building external modules Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 65/67] Revert "perf map: Fix dso->nsinfo refcounting" Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 66/67] i40e: Add additional info to PHY type error Greg Kroah-Hartman
2021-08-02 13:45 ` [PATCH 5.10 67/67] can: j1939: j1939_session_deactivate(): clarify lifetime of session object Greg Kroah-Hartman
2021-08-02 16:23 ` [PATCH 5.10 00/67] 5.10.56-rc1 review Fox Chen
2021-08-02 17:49 ` Pavel Machek
2021-08-03  7:23 ` Naresh Kamboju
2021-08-03 10:38 ` Sudip Mukherjee
2021-08-03 11:13 ` Rudi Heitbaum
2021-08-03 19:16 ` Guenter Roeck
2021-08-03 19:26 ` Pavel Machek
2021-08-03 19:37   ` Guenter Roeck
2021-08-03 19:50     ` Pavel Machek
2021-08-03 20:52       ` Guenter Roeck
2021-08-04  6:21   ` Greg Kroah-Hartman
2021-08-04  6:25   ` Thomas Backlund
2021-08-04  6:35     ` Greg Kroah-Hartman
2021-08-04  3:05 ` Samuel Zou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210802134339.402337965@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=gechangwei@live.cn \
    --cc=ghe@suse.com \
    --cc=jlbec@evilplan.org \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=junxiao.bi@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark@fasheh.com \
    --cc=piaojun@huawei.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --subject='Re: [PATCH 5.10 11/67] ocfs2: issue zeroout to EOF blocks' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).