All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 3/7] xfs: use byte ranges for write cleanup ranges
Date: Mon, 7 Nov 2022 15:53:20 -0800	[thread overview]
Message-ID: <Y2macHjWzrwaMQXG@magnolia> (raw)
In-Reply-To: <20221104054015.GL3600936@dread.disaster.area>

On Fri, Nov 04, 2022 at 04:40:15PM +1100, Dave Chinner wrote:
> On Wed, Nov 02, 2022 at 09:32:53AM -0700, Darrick J. Wong wrote:
> > On Tue, Nov 01, 2022 at 11:34:08AM +1100, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > xfs_buffered_write_iomap_end() currently converts the byte ranges
> > > passed to it to filesystem blocks to pass them to the bmap code to
> > > punch out delalloc blocks, but then has to convert filesytem
> > > blocks back to byte ranges for page cache truncate.
> > > 
> > > We're about to make the page cache truncate go away and replace it
> > > with a page cache walk, so having to convert everything to/from/to
> > > filesystem blocks is messy and error-prone. It is much easier to
> > > pass around byte ranges and convert to page indexes and/or
> > > filesystem blocks only where those units are needed.
> > > 
> > > In preparation for the page cache walk being added, add a helper
> > > that converts byte ranges to filesystem blocks and calls
> > > xfs_bmap_punch_delalloc_range() and convert
> > > xfs_buffered_write_iomap_end() to calculate limits in byte ranges.
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> > >  fs/xfs/xfs_iomap.c | 40 +++++++++++++++++++++++++---------------
> > >  1 file changed, 25 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > > index a2e45ea1b0cb..7bb55dbc19d3 100644
> > > --- a/fs/xfs/xfs_iomap.c
> > > +++ b/fs/xfs/xfs_iomap.c
> > > @@ -1120,6 +1120,20 @@ xfs_buffered_write_iomap_begin(
> > >  	return error;
> > >  }
> > >  
> > > +static int
> > > +xfs_buffered_write_delalloc_punch(
> > > +	struct inode		*inode,
> > > +	loff_t			start_byte,
> > > +	loff_t			end_byte)
> > > +{
> > > +	struct xfs_mount	*mp = XFS_M(inode->i_sb);
> > > +	xfs_fileoff_t		start_fsb = XFS_B_TO_FSBT(mp, start_byte);
> > > +	xfs_fileoff_t		end_fsb = XFS_B_TO_FSB(mp, end_byte);
> > > +
> > > +	return xfs_bmap_punch_delalloc_range(XFS_I(inode), start_fsb,
> > > +				end_fsb - start_fsb);
> > > +}
> > 
> > /me echoes hch's comment that the other callers of
> > xfs_bmap_punch_delalloc_range do this byte->block conversion too.
> > 
> > > +
> > >  static int
> > >  xfs_buffered_write_iomap_end(
> > >  	struct inode		*inode,
> > > @@ -1129,10 +1143,9 @@ xfs_buffered_write_iomap_end(
> > >  	unsigned		flags,
> > >  	struct iomap		*iomap)
> > >  {
> > > -	struct xfs_inode	*ip = XFS_I(inode);
> > > -	struct xfs_mount	*mp = ip->i_mount;
> > > -	xfs_fileoff_t		start_fsb;
> > > -	xfs_fileoff_t		end_fsb;
> > > +	struct xfs_mount	*mp = XFS_M(inode->i_sb);
> > > +	loff_t			start_byte;
> > > +	loff_t			end_byte;
> > >  	int			error = 0;
> > >  
> > >  	if (iomap->type != IOMAP_DELALLOC)
> > > @@ -1157,13 +1170,13 @@ xfs_buffered_write_iomap_end(
> > >  	 * the range.
> > >  	 */
> > >  	if (unlikely(!written))
> > > -		start_fsb = XFS_B_TO_FSBT(mp, offset);
> > > +		start_byte = round_down(offset, mp->m_sb.sb_blocksize);
> > >  	else
> >  -		start_fsb = XFS_B_TO_FSB(mp, offset + written);
> > > -	end_fsb = XFS_B_TO_FSB(mp, offset + length);
> > > +		start_byte = round_up(offset + written, mp->m_sb.sb_blocksize);
> > > +	end_byte = round_up(offset + length, mp->m_sb.sb_blocksize);
> > 
> > Technically this is the byte where we should *stop* processing, right?
> > 
> > If we are told to write 1000 bytes at pos 0 and the whole thing fails,
> > the end pos of the range is 1023 and we must stop at pos 1024.  Right?
> 
> Yes, the interval definition being used here is open-ended i.e.
> [start_byte, end_byte) because it makes iterative interval
> operations really easy as the value for the start of the next
> interval is the same as the value for the end of the current
> interval.

(I was never good at remembering which symbol included the number and
which one excluded it.]

> That's the way we've traditionally encoded ranges in XFS
> because there's a much lower risk of off-by-one errors in
> calculations as we iterate through extents. i.e. finding the
> start and end of ranges is as simple as round_down/round_up, there's
> no magic "+ 1" or "- 1" arithmetic needed anywhere to move from one
> interval to the next, etc.

<nod> I've been slowly moving my brain towards "next_pos/block/etc",
more because 'end' is ambiguous enough in my head that my own code
confuses me. :P

(That said I often just look at the variable definition.  As long as the
variable defining the loop invariant doesn't change, it doesn't cause me
any recognition problems.)

> > (The only reason I ask is that Linus ranted about XFS naming these
> > variables incorrectly in the iomap code and the (at the time only) user
> > of it.)
> 
> I don't find that a convincing argument.  What some random dude that
> has never touched the XFS or iomap code thinks about how we define
> intervals or the notations we use that makes the code _easier for
> us to understand_ is just not relevant.

Wellp I woke up to this story[1] this morning.  At this point I say fmeh
and withdraw the question.

[1] https://lore.kernel.org/lkml/20221105222012.4226-1-Jason@zx2c4.com/

> > >  	/* Nothing to do if we've written the entire delalloc extent */
> > > -	if (start_fsb >= end_fsb)
> > > +	if (start_byte >= end_byte)
> > >  		return 0;
> > >  
> > >  	/*
> > > @@ -1173,15 +1186,12 @@ xfs_buffered_write_iomap_end(
> > >  	 * leave dirty pages with no space reservation in the cache.
> > >  	 */
> > >  	filemap_invalidate_lock(inode->i_mapping);
> > > -	truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb),
> > > -				 XFS_FSB_TO_B(mp, end_fsb) - 1);
> > > -
> > > -	error = xfs_bmap_punch_delalloc_range(ip, start_fsb,
> > > -				       end_fsb - start_fsb);
> > > +	truncate_pagecache_range(inode, start_byte, end_byte - 1);
> > 
> > ...because the expression "end_byte - 1" looks a little funny when it's
> > used to compute the "lend" argument to truncate_pagecache_range.
> 
> Yup, truncate_pagecache_range() uses a [] (closed) interval to
> define the range, so we need a "- 1" when passing that open-ended
> interval into a closed interval API.
> 
> But that truncate_pagecache_range() call is going away in the next
> patch, so this whole issue is moot, yes?

Oh, it does.  Heh.  Never mind then.

> > > +	error = xfs_buffered_write_delalloc_punch(inode, start_byte, end_byte);
> > >  	filemap_invalidate_unlock(inode->i_mapping);
> > >  	if (error && !xfs_is_shutdown(mp)) {
> > > -		xfs_alert(mp, "%s: unable to clean up ino %lld",
> > > -			__func__, ip->i_ino);
> > > +		xfs_alert(mp, "%s: unable to clean up ino 0x%llx",
> > > +			__func__, XFS_I(inode)->i_ino);
> > 
> > Oh, you did fix the ino 0x%llx format thing.  Previous comment
> > withdrawn.
> > 
> > With s/end_byte/next_byte/ and the delalloc punch thing sorted out,
> 
> I don't know what you want me to do here, because I don't think this
> code is wrong and changing it to closed intervals and next/stop as
> variable names makes little sense in the context of the code....

Comment withdrawn, per above.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

  reply	other threads:[~2022-11-07 23:53 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-01  0:34 xfs, iomap: fix data corrupton due to stale cached iomaps Dave Chinner
2022-11-01  0:34 ` [PATCH 1/7] xfs: write page faults in iomap are not buffered writes Dave Chinner
2022-11-02  7:17   ` Christoph Hellwig
2022-11-02 16:12   ` Darrick J. Wong
2022-11-02 21:11     ` Dave Chinner
2022-11-01  0:34 ` [PATCH 2/7] xfs: punching delalloc extents on write failure is racy Dave Chinner
2022-11-02  7:18   ` Christoph Hellwig
2022-11-02 16:22   ` Darrick J. Wong
2022-11-01  0:34 ` [PATCH 3/7] xfs: use byte ranges for write cleanup ranges Dave Chinner
2022-11-02  7:20   ` Christoph Hellwig
2022-11-02 16:32   ` Darrick J. Wong
2022-11-04  5:40     ` Dave Chinner
2022-11-07 23:53       ` Darrick J. Wong [this message]
2022-11-01  0:34 ` [PATCH 4/7] xfs: buffered write failure should not truncate the page cache Dave Chinner
2022-11-01 11:57   ` kernel test robot
2022-11-02  7:24   ` Christoph Hellwig
2022-11-02 20:57     ` Dave Chinner
2022-11-02 16:41   ` Darrick J. Wong
2022-11-02 21:04     ` Dave Chinner
2022-11-02 22:26       ` Darrick J. Wong
2022-11-04  8:08   ` Christoph Hellwig
2022-11-04 23:10     ` Dave Chinner
2022-11-07 23:48       ` Darrick J. Wong
2022-11-01  0:34 ` [PATCH 5/7] iomap: write iomap validity checks Dave Chinner
2022-11-02  8:36   ` Christoph Hellwig
2022-11-02 16:43     ` Darrick J. Wong
2022-11-02 16:58       ` Darrick J. Wong
2022-11-03  0:35         ` Dave Chinner
2022-11-04  8:12           ` Christoph Hellwig
2022-11-02 16:57   ` Darrick J. Wong
2022-11-01  0:34 ` [PATCH 6/7] xfs: use iomap_valid method to detect stale cached iomaps Dave Chinner
2022-11-01  9:15   ` kernel test robot
2022-11-02  8:41   ` Christoph Hellwig
2022-11-02 21:39     ` Dave Chinner
2022-11-04  8:14       ` Christoph Hellwig
2022-11-02 17:19   ` Darrick J. Wong
2022-11-02 22:36     ` Dave Chinner
2022-11-08  0:00       ` Darrick J. Wong
2022-11-01  0:34 ` [PATCH 7/7] xfs: drop write error injection is unfixable, remove it Dave Chinner
2022-11-01  3:39 ` xfs, iomap: fix data corrupton due to stale cached iomaps Darrick J. Wong
2022-11-01  4:21   ` Dave Chinner
2022-11-02 17:23     ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2macHjWzrwaMQXG@magnolia \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.