Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/2] iomap: zero cached pages over unwritten extents on zero range
Date: Tue, 20 Oct 2020 12:21:50 -0400
Message-ID: <20201020162150.GB1272590@bfoster> (raw)
In-Reply-To: <20201019180144.GC1232435@bfoster>

On Mon, Oct 19, 2020 at 02:01:44PM -0400, Brian Foster wrote:
> On Mon, Oct 19, 2020 at 12:55:19PM -0400, Brian Foster wrote:
> > On Thu, Oct 15, 2020 at 10:49:01AM +0100, Christoph Hellwig wrote:
> > > > +iomap_zero_range_skip_uncached(struct inode *inode, loff_t *pos,
> > > > +		loff_t *count, loff_t *written)
> > > > +{
> > > > +	unsigned dirty_offset, bytes = 0;
> > > > +
> > > > +	dirty_offset = page_cache_seek_hole_data(inode, *pos, *count,
> > > > +				SEEK_DATA);
> > > > +	if (dirty_offset == -ENOENT)
> > > > +		bytes = *count;
> > > > +	else if (dirty_offset > *pos)
> > > > +		bytes = dirty_offset - *pos;
> > > > +
> > > > +	if (bytes) {
> > > > +		*pos += bytes;
> > > > +		*count -= bytes;
> > > > +		*written += bytes;
> > > > +	}
> > > 
> > > I find the calling conventions weird.  why not return bytes and
> > > keep the increments/decrements of the three variables in the caller?
> > > 
> > 
> > No particular reason. IIRC I had it both ways and just landed on this.
> > I'd change it, but as mentioned in the patch 1 thread I don't think this
> > patch is sufficient (with or without patch 1) anyways because the page
> > can also have been reclaimed before we get here.
> > 
> 
> Christoph,
> 
> What do you think about introducing behavior specific to
> iomap_truncate_page() to unconditionally write zeroes over unwritten
> extents? AFAICT that addresses the race and was historical XFS behavior
> (via block_truncate_page()) before iomap, so is not without precedent.
> What I'd probably do is bury the caller's did_zero parameter into a new
> internal struct iomap_zero_data to pass down into
> iomap_zero_range_actor(), then extend that structure with a
> 'zero_unwritten' field such that iomap_zero_range_actor() can do this:
> 

Ugh, so the above doesn't quite describe historical behavior.
block_truncate_page() converts an unwritten block if a page exists
(dirty or not), but bails out if a page doesn't exist. We could still do
the above, but if we wanted something more intelligent I think we need
to check for a page before we get the mapping to know whether we can
safely skip an unwritten block or need to write over it. Otherwise if we
check for a page within the actor, we have no way of knowing whether
there was a (possibly dirty) page that had been written back and/or
reclaimed since ->iomap_begin(). If we check for the page first, I think
that the iolock/mmaplock in the truncate path ensures that a page can't
be added before we complete. We might be able to take that further and
check for a dirty || writeback page, but that might be safer as a
separate patch. See the (compile tested only) diff below for an idea of
what I was thinking.

Brian

--- 8< ---

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index bcfc288dba3f..2cdfcff02307 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1000,17 +1000,56 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 }
 EXPORT_SYMBOL_GPL(iomap_zero_range);
 
+struct iomap_trunc_priv {
+	bool *did_zero;
+	bool has_page;
+};
+
+static loff_t
+iomap_truncate_page_actor(struct inode *inode, loff_t pos, loff_t count,
+		void *data, struct iomap *iomap, struct iomap *srcmap)
+{
+	struct iomap_trunc_priv	*priv = data;
+	unsigned offset;
+	int status;
+
+	if (srcmap->type == IOMAP_HOLE)
+		return count;
+	if (srcmap->type == IOMAP_UNWRITTEN && !priv->has_page)
+		return count;
+
+	offset = offset_in_page(pos);
+	if (IS_DAX(inode))
+		status = dax_iomap_zero(pos, offset, count, iomap);
+	else
+		status = iomap_zero(inode, pos, offset, count, iomap, srcmap);
+	if (status < 0)
+		return status;
+
+	if (priv->did_zero)
+		*priv->did_zero = true;
+	return count;
+}
+
 int
 iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
 		const struct iomap_ops *ops)
 {
+	struct iomap_trunc_priv priv = { .did_zero = did_zero };
 	unsigned int blocksize = i_blocksize(inode);
 	unsigned int off = pos & (blocksize - 1);
+	loff_t ret;
 
 	/* Block boundary? Nothing to do */
 	if (!off)
 		return 0;
-	return iomap_zero_range(inode, pos, blocksize - off, did_zero, ops);
+
+	priv.has_page = filemap_range_has_page(inode->i_mapping, pos, pos);
+	ret = iomap_apply(inode, pos, blocksize - off, IOMAP_ZERO, ops, &priv,
+			  iomap_truncate_page_actor);
+	if (ret <= 0)
+		return ret;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(iomap_truncate_page);
 


  reply index

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-12 14:03 [PATCH 0/2] iomap: zero dirty pages over unwritten extents Brian Foster
2020-10-12 14:03 ` [PATCH 1/2] iomap: use page dirty state to seek data " Brian Foster
2020-10-13 12:30   ` Brian Foster
2020-10-13 22:53   ` Dave Chinner
2020-10-14 12:59     ` Brian Foster
2020-10-14 22:37       ` Dave Chinner
2020-10-15  9:47   ` Christoph Hellwig
2020-10-19 16:55     ` Brian Foster
2020-10-27 18:07       ` Christoph Hellwig
2020-10-28 11:31         ` Brian Foster
2020-10-12 14:03 ` [PATCH 2/2] iomap: zero cached pages over unwritten extents on zero range Brian Foster
2020-10-15  9:49   ` Christoph Hellwig
2020-10-19 16:55     ` Brian Foster
2020-10-19 18:01       ` Brian Foster
2020-10-20 16:21         ` Brian Foster [this message]
2020-10-27 18:15           ` Christoph Hellwig
2020-10-28 11:31             ` Brian Foster
2020-10-23  1:02   ` [iomap] 11b5156248: xfstests.xfs.310.fail kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201020162150.GB1272590@bfoster \
    --to=bfoster@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git