Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: linux-fsdevel@vger.kernel.org
Cc: linux-xfs@vger.kernel.org
Subject: [PATCH v2] iomap: zero cached page over unwritten extent on truncate page
Date: Wed, 21 Oct 2020 09:33:29 -0400
Message-ID: <20201021133329.1337689-1-bfoster@redhat.com> (raw)

iomap_truncate_page() relies on zero range and zero range
unconditionally skips unwritten mappings. This is normally not a
problem as most users synchronize in-core state to the underlying
block mapping by flushing pagecache prior to calling into iomap.
This is not the case for iomap_truncate_page(), however. XFS calls
iomap_truncate_page() on truncate down before flushing the new EOF
page of the file. This means that if the new EOF block is unwritten
but covered by a dirty or writeback page (i.e. awaiting unwritten
conversion after writeback), iomap fails to zero that page. The
subsequent truncate_setsize() call does perform page zeroing, but
doesn't dirty the page. Therefore if the new EOF page is written
back after calling into iomap but before the pagecache truncate, the
post-EOF zeroing is lost on page reclaim. This exposes stale
post-EOF data on mapped reads.

Rework iomap_truncate_page() to check pagecache state before calling
into iomap_apply() and use that info to determine whether we can
safely skip zeroing unwritten extents. The filesystem has locked out
concurrent I/O and mapped operations at this point but is not
serialized against writeback, unwritten extent conversion (I/O
completion) or page reclaim. Therefore if a page does not exist
before we acquire the mapping, we can be certain that an unwritten
extent cannot be converted before we return and thus it is safe to
skip. If a page does exist over an unwritten block, it could be in
the dirty or writeback states, convert the underlying mapping at any
time, and thus should be explicitly written to avoid racing with
writeback. Finally, since iomap_truncate_page() only targets the new
EOF block and must now pass additional state to the actor, open code
the zeroing actor instead of plumbing through zero range.

This does have the tradeoff that an existing clean page is dirtied
and causes unwritten conversion, but this is analogous to historical
behavior implemented by block_truncate_page(). This patch restores
historical behavior to address the data exposure problem and leaves
filtering out the clean page case for a separate patch.

Fixes: 68a9f5e7007c ("xfs: implement iomap based buffered write path")
Signed-off-by: Brian Foster <bfoster@redhat.com>
---

v2:
- Rework to check for cached page explicitly and avoid use of seek data.
v1: https://lore.kernel.org/linux-fsdevel/20201012140350.950064-1-bfoster@redhat.com/

 fs/iomap/buffered-io.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index bcfc288dba3f..2cdfcff02307 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1000,17 +1000,56 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 }
 EXPORT_SYMBOL_GPL(iomap_zero_range);
 
+struct iomap_trunc_priv {
+	bool *did_zero;
+	bool has_page;
+};
+
+static loff_t
+iomap_truncate_page_actor(struct inode *inode, loff_t pos, loff_t count,
+		void *data, struct iomap *iomap, struct iomap *srcmap)
+{
+	struct iomap_trunc_priv	*priv = data;
+	unsigned offset;
+	int status;
+
+	if (srcmap->type == IOMAP_HOLE)
+		return count;
+	if (srcmap->type == IOMAP_UNWRITTEN && !priv->has_page)
+		return count;
+
+	offset = offset_in_page(pos);
+	if (IS_DAX(inode))
+		status = dax_iomap_zero(pos, offset, count, iomap);
+	else
+		status = iomap_zero(inode, pos, offset, count, iomap, srcmap);
+	if (status < 0)
+		return status;
+
+	if (priv->did_zero)
+		*priv->did_zero = true;
+	return count;
+}
+
 int
 iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
 		const struct iomap_ops *ops)
 {
+	struct iomap_trunc_priv priv = { .did_zero = did_zero };
 	unsigned int blocksize = i_blocksize(inode);
 	unsigned int off = pos & (blocksize - 1);
+	loff_t ret;
 
 	/* Block boundary? Nothing to do */
 	if (!off)
 		return 0;
-	return iomap_zero_range(inode, pos, blocksize - off, did_zero, ops);
+
+	priv.has_page = filemap_range_has_page(inode->i_mapping, pos, pos);
+	ret = iomap_apply(inode, pos, blocksize - off, IOMAP_ZERO, ops, &priv,
+			  iomap_truncate_page_actor);
+	if (ret <= 0)
+		return ret;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(iomap_truncate_page);
 
-- 
2.25.4


             reply index

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-21 13:33 Brian Foster [this message]
2020-10-21 16:25 ` Darrick J. Wong
2020-10-21 16:59   ` Brian Foster
2020-10-22 15:38     ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201021133329.1337689-1-bfoster@redhat.com \
    --to=bfoster@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git