All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: fix off-by-one-block in xfs_discard_folio()
@ 2023-03-01  0:17 Dave Chinner
  2023-03-01  0:47 ` Darrick J. Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2023-03-01  0:17 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

The recent writeback corruption fixes changed the code in
xfs_discard_folio() to calculate a byte range to for punching
delalloc extents. A mistake was made in using round_up(pos) for the
end offset, because when pos points at the first byte of a block, it
does not get rounded up to point to the end byte of the block. hence
the punch range is short, and this leads to unexpected behaviour in
certain cases in xfs_bmap_punch_delalloc_range.

e.g. pos = 0 means we call xfs_bmap_punch_delalloc_range(0,0), so
there is no previous extent and it rounds up the punch to the end of
the delalloc extent it found at offset 0, not the end of the range
given to xfs_bmap_punch_delalloc_range().

Fix this by handling the zero block offset case correctly.

Fixes: 7348b322332d ("xfs: xfs_bmap_punch_delalloc_range() should take a byte range")
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Found-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_aops.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 41734202796f..429f63cfd7d4 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -466,6 +466,7 @@ xfs_discard_folio(
 {
 	struct xfs_inode	*ip = XFS_I(folio->mapping->host);
 	struct xfs_mount	*mp = ip->i_mount;
+	xfs_off_t		end_off;
 	int			error;
 
 	if (xfs_is_shutdown(mp))
@@ -475,8 +476,17 @@ xfs_discard_folio(
 		"page discard on page "PTR_FMT", inode 0x%llx, pos %llu.",
 			folio, ip->i_ino, pos);
 
-	error = xfs_bmap_punch_delalloc_range(ip, pos,
-			round_up(pos, folio_size(folio)));
+	/*
+	 * Need to be careful with the case where the pos passed in points to
+	 * the first byte of the folio - rounding up won't change the value,
+	 * but in all cases here we need to end offset to point to the start
+	 * of the next folio.
+	 */
+	if (pos == folio_pos(folio))
+		end_off = pos + folio_size(folio);
+	else
+		end_off = round_up(pos, folio_size(folio));
+	error = xfs_bmap_punch_delalloc_range(ip, pos, end_off);
 
 	if (error && !xfs_is_shutdown(mp))
 		xfs_alert(mp, "page discard unable to remove delalloc mapping.");
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: fix off-by-one-block in xfs_discard_folio()
  2023-03-01  0:17 [PATCH] xfs: fix off-by-one-block in xfs_discard_folio() Dave Chinner
@ 2023-03-01  0:47 ` Darrick J. Wong
  2023-03-01  1:04   ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2023-03-01  0:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Mar 01, 2023 at 11:17:06AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The recent writeback corruption fixes changed the code in
> xfs_discard_folio() to calculate a byte range to for punching
> delalloc extents. A mistake was made in using round_up(pos) for the
> end offset, because when pos points at the first byte of a block, it
> does not get rounded up to point to the end byte of the block. hence
> the punch range is short, and this leads to unexpected behaviour in
> certain cases in xfs_bmap_punch_delalloc_range.
> 
> e.g. pos = 0 means we call xfs_bmap_punch_delalloc_range(0,0), so
> there is no previous extent and it rounds up the punch to the end of
> the delalloc extent it found at offset 0, not the end of the range
> given to xfs_bmap_punch_delalloc_range().
> 
> Fix this by handling the zero block offset case correctly.
> 
> Fixes: 7348b322332d ("xfs: xfs_bmap_punch_delalloc_range() should take a byte range")
> Reported-by: Pengfei Xu <pengfei.xu@intel.com>
> Found-by: Brian Foster <bfoster@redhat.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_aops.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 41734202796f..429f63cfd7d4 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -466,6 +466,7 @@ xfs_discard_folio(
>  {
>  	struct xfs_inode	*ip = XFS_I(folio->mapping->host);
>  	struct xfs_mount	*mp = ip->i_mount;
> +	xfs_off_t		end_off;
>  	int			error;
>  
>  	if (xfs_is_shutdown(mp))
> @@ -475,8 +476,17 @@ xfs_discard_folio(
>  		"page discard on page "PTR_FMT", inode 0x%llx, pos %llu.",
>  			folio, ip->i_ino, pos);
>  
> -	error = xfs_bmap_punch_delalloc_range(ip, pos,
> -			round_up(pos, folio_size(folio)));
> +	/*
> +	 * Need to be careful with the case where the pos passed in points to
> +	 * the first byte of the folio - rounding up won't change the value,
> +	 * but in all cases here we need to end offset to point to the start
> +	 * of the next folio.
> +	 */
> +	if (pos == folio_pos(folio))
> +		end_off = pos + folio_size(folio);
> +	else
> +		end_off = round_up(pos, folio_size(folio));

Can this construct be simplified to:

	end_off = round_up(pos + 1, folio_size(folio));

If pos is the first byte of the folio, it'll round end_off to the start
of the next folio.  If pos is (somehow) the last byte of the folio, the
first argument to round_up is already the first byte of the next folio,
and rounding won't change it.

I'll run this through fstests since it's getting late already.

--D

> +	error = xfs_bmap_punch_delalloc_range(ip, pos, end_off);
>  
>  	if (error && !xfs_is_shutdown(mp))
>  		xfs_alert(mp, "page discard unable to remove delalloc mapping.");
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: fix off-by-one-block in xfs_discard_folio()
  2023-03-01  0:47 ` Darrick J. Wong
@ 2023-03-01  1:04   ` Dave Chinner
  2023-03-01  1:08     ` Darrick J. Wong
  2023-03-01 13:41     ` Andreas Grünbacher
  0 siblings, 2 replies; 7+ messages in thread
From: Dave Chinner @ 2023-03-01  1:04 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Feb 28, 2023 at 04:47:01PM -0800, Darrick J. Wong wrote:
> On Wed, Mar 01, 2023 at 11:17:06AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > The recent writeback corruption fixes changed the code in
> > xfs_discard_folio() to calculate a byte range to for punching
> > delalloc extents. A mistake was made in using round_up(pos) for the
> > end offset, because when pos points at the first byte of a block, it
> > does not get rounded up to point to the end byte of the block. hence
> > the punch range is short, and this leads to unexpected behaviour in
> > certain cases in xfs_bmap_punch_delalloc_range.
> > 
> > e.g. pos = 0 means we call xfs_bmap_punch_delalloc_range(0,0), so
> > there is no previous extent and it rounds up the punch to the end of
> > the delalloc extent it found at offset 0, not the end of the range
> > given to xfs_bmap_punch_delalloc_range().
> > 
> > Fix this by handling the zero block offset case correctly.
> > 
> > Fixes: 7348b322332d ("xfs: xfs_bmap_punch_delalloc_range() should take a byte range")
> > Reported-by: Pengfei Xu <pengfei.xu@intel.com>
> > Found-by: Brian Foster <bfoster@redhat.com>
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_aops.c | 14 ++++++++++++--
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > index 41734202796f..429f63cfd7d4 100644
> > --- a/fs/xfs/xfs_aops.c
> > +++ b/fs/xfs/xfs_aops.c
> > @@ -466,6 +466,7 @@ xfs_discard_folio(
> >  {
> >  	struct xfs_inode	*ip = XFS_I(folio->mapping->host);
> >  	struct xfs_mount	*mp = ip->i_mount;
> > +	xfs_off_t		end_off;
> >  	int			error;
> >  
> >  	if (xfs_is_shutdown(mp))
> > @@ -475,8 +476,17 @@ xfs_discard_folio(
> >  		"page discard on page "PTR_FMT", inode 0x%llx, pos %llu.",
> >  			folio, ip->i_ino, pos);
> >  
> > -	error = xfs_bmap_punch_delalloc_range(ip, pos,
> > -			round_up(pos, folio_size(folio)));
> > +	/*
> > +	 * Need to be careful with the case where the pos passed in points to
> > +	 * the first byte of the folio - rounding up won't change the value,
> > +	 * but in all cases here we need to end offset to point to the start
> > +	 * of the next folio.
> > +	 */
> > +	if (pos == folio_pos(folio))
> > +		end_off = pos + folio_size(folio);
> > +	else
> > +		end_off = round_up(pos, folio_size(folio));
> 
> Can this construct be simplified to:
> 
> 	end_off = round_up(pos + 1, folio_size(folio));

I thought about that first, but I really, really dislike sprinkling
magic "+ 1" corrections into the code to address non-obvious
unexplained off-by-one problems.


> If pos is the first byte of the folio, it'll round end_off to the start
> of the next folio.  If pos is (somehow) the last byte of the folio, the
> first argument to round_up is already the first byte of the next folio,
> and rounding won't change it.

Yup, and that's exactly the problem I had with doing this - it
relies on the implicit behaviour that by moving last byte of a block
to the first byte of the next block, round_up() won't change the end
offset.  i.e. the correct functioning of the code is just as
non-obvious with a magic "+ 1" as the incorrect functioning was
without it.

Look at it this way: I didn't realise it was wrong when I wrote the
code, and I couldn't find the bug round_up() introduced when reading
the code even after the problem had been bisected to this exact
change. The code I wrote is bad, and adding a magic "+ 1" to fix the
bug doesn't make the code any better.

Given this is a slow path, so I see no point in optimising the code
for efficiency. IMO, clarity of the logic and calculation being made
is far more important - obviously correct logic is better than
relying on the effect of a magic "+ 1" on some other function to
acheive the same thing....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: fix off-by-one-block in xfs_discard_folio()
  2023-03-01  1:04   ` Dave Chinner
@ 2023-03-01  1:08     ` Darrick J. Wong
  2023-03-01  1:21       ` Dave Chinner
  2023-03-01 13:41     ` Andreas Grünbacher
  1 sibling, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2023-03-01  1:08 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Mar 01, 2023 at 12:04:17PM +1100, Dave Chinner wrote:
> On Tue, Feb 28, 2023 at 04:47:01PM -0800, Darrick J. Wong wrote:
> > On Wed, Mar 01, 2023 at 11:17:06AM +1100, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > The recent writeback corruption fixes changed the code in
> > > xfs_discard_folio() to calculate a byte range to for punching
> > > delalloc extents. A mistake was made in using round_up(pos) for the
> > > end offset, because when pos points at the first byte of a block, it
> > > does not get rounded up to point to the end byte of the block. hence
> > > the punch range is short, and this leads to unexpected behaviour in
> > > certain cases in xfs_bmap_punch_delalloc_range.
> > > 
> > > e.g. pos = 0 means we call xfs_bmap_punch_delalloc_range(0,0), so
> > > there is no previous extent and it rounds up the punch to the end of
> > > the delalloc extent it found at offset 0, not the end of the range
> > > given to xfs_bmap_punch_delalloc_range().
> > > 
> > > Fix this by handling the zero block offset case correctly.
> > > 
> > > Fixes: 7348b322332d ("xfs: xfs_bmap_punch_delalloc_range() should take a byte range")
> > > Reported-by: Pengfei Xu <pengfei.xu@intel.com>
> > > Found-by: Brian Foster <bfoster@redhat.com>
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> > >  fs/xfs/xfs_aops.c | 14 ++++++++++++--
> > >  1 file changed, 12 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > > index 41734202796f..429f63cfd7d4 100644
> > > --- a/fs/xfs/xfs_aops.c
> > > +++ b/fs/xfs/xfs_aops.c
> > > @@ -466,6 +466,7 @@ xfs_discard_folio(
> > >  {
> > >  	struct xfs_inode	*ip = XFS_I(folio->mapping->host);
> > >  	struct xfs_mount	*mp = ip->i_mount;
> > > +	xfs_off_t		end_off;
> > >  	int			error;
> > >  
> > >  	if (xfs_is_shutdown(mp))
> > > @@ -475,8 +476,17 @@ xfs_discard_folio(
> > >  		"page discard on page "PTR_FMT", inode 0x%llx, pos %llu.",
> > >  			folio, ip->i_ino, pos);
> > >  
> > > -	error = xfs_bmap_punch_delalloc_range(ip, pos,
> > > -			round_up(pos, folio_size(folio)));
> > > +	/*
> > > +	 * Need to be careful with the case where the pos passed in points to
> > > +	 * the first byte of the folio - rounding up won't change the value,
> > > +	 * but in all cases here we need to end offset to point to the start
> > > +	 * of the next folio.
> > > +	 */
> > > +	if (pos == folio_pos(folio))
> > > +		end_off = pos + folio_size(folio);
> > > +	else
> > > +		end_off = round_up(pos, folio_size(folio));
> > 
> > Can this construct be simplified to:
> > 
> > 	end_off = round_up(pos + 1, folio_size(folio));
> 
> I thought about that first, but I really, really dislike sprinkling
> magic "+ 1" corrections into the code to address non-obvious
> unexplained off-by-one problems.
> 
> 
> > If pos is the first byte of the folio, it'll round end_off to the start
> > of the next folio.  If pos is (somehow) the last byte of the folio, the
> > first argument to round_up is already the first byte of the next folio,
> > and rounding won't change it.
> 
> Yup, and that's exactly the problem I had with doing this - it
> relies on the implicit behaviour that by moving last byte of a block
> to the first byte of the next block, round_up() won't change the end
> offset.  i.e. the correct functioning of the code is just as
> non-obvious with a magic "+ 1" as the incorrect functioning was
> without it.
> 
> Look at it this way: I didn't realise it was wrong when I wrote the
> code, and I couldn't find the bug round_up() introduced when reading
> the code even after the problem had been bisected to this exact
> change. The code I wrote is bad, and adding a magic "+ 1" to fix the
> bug doesn't make the code any better.
> 
> Given this is a slow path, so I see no point in optimising the code
> for efficiency. IMO, clarity of the logic and calculation being made
> is far more important - obviously correct logic is better than
> relying on the effect of a magic "+ 1" on some other function to
> acheive the same thing....

<nod> Just making sure I wasn't missing something.

By the way, was this reported to the list?

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: fix off-by-one-block in xfs_discard_folio()
  2023-03-01  1:08     ` Darrick J. Wong
@ 2023-03-01  1:21       ` Dave Chinner
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Chinner @ 2023-03-01  1:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Feb 28, 2023 at 05:08:25PM -0800, Darrick J. Wong wrote:
> On Wed, Mar 01, 2023 at 12:04:17PM +1100, Dave Chinner wrote:
> > On Tue, Feb 28, 2023 at 04:47:01PM -0800, Darrick J. Wong wrote:
> > > On Wed, Mar 01, 2023 at 11:17:06AM +1100, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > > 
> > > > The recent writeback corruption fixes changed the code in
> > > > xfs_discard_folio() to calculate a byte range to for punching
> > > > delalloc extents. A mistake was made in using round_up(pos) for the
> > > > end offset, because when pos points at the first byte of a block, it
> > > > does not get rounded up to point to the end byte of the block. hence
> > > > the punch range is short, and this leads to unexpected behaviour in
> > > > certain cases in xfs_bmap_punch_delalloc_range.
> > > > 
> > > > e.g. pos = 0 means we call xfs_bmap_punch_delalloc_range(0,0), so
> > > > there is no previous extent and it rounds up the punch to the end of
> > > > the delalloc extent it found at offset 0, not the end of the range
> > > > given to xfs_bmap_punch_delalloc_range().
> > > > 
> > > > Fix this by handling the zero block offset case correctly.
> > > > 
> > > > Fixes: 7348b322332d ("xfs: xfs_bmap_punch_delalloc_range() should take a byte range")
> > > > Reported-by: Pengfei Xu <pengfei.xu@intel.com>
> > > > Found-by: Brian Foster <bfoster@redhat.com>
> > > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > > ---
> > > >  fs/xfs/xfs_aops.c | 14 ++++++++++++--
> > > >  1 file changed, 12 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > > > index 41734202796f..429f63cfd7d4 100644
> > > > --- a/fs/xfs/xfs_aops.c
> > > > +++ b/fs/xfs/xfs_aops.c
> > > > @@ -466,6 +466,7 @@ xfs_discard_folio(
> > > >  {
> > > >  	struct xfs_inode	*ip = XFS_I(folio->mapping->host);
> > > >  	struct xfs_mount	*mp = ip->i_mount;
> > > > +	xfs_off_t		end_off;
> > > >  	int			error;
> > > >  
> > > >  	if (xfs_is_shutdown(mp))
> > > > @@ -475,8 +476,17 @@ xfs_discard_folio(
> > > >  		"page discard on page "PTR_FMT", inode 0x%llx, pos %llu.",
> > > >  			folio, ip->i_ino, pos);
> > > >  
> > > > -	error = xfs_bmap_punch_delalloc_range(ip, pos,
> > > > -			round_up(pos, folio_size(folio)));
> > > > +	/*
> > > > +	 * Need to be careful with the case where the pos passed in points to
> > > > +	 * the first byte of the folio - rounding up won't change the value,
> > > > +	 * but in all cases here we need to end offset to point to the start
> > > > +	 * of the next folio.
> > > > +	 */
> > > > +	if (pos == folio_pos(folio))
> > > > +		end_off = pos + folio_size(folio);
> > > > +	else
> > > > +		end_off = round_up(pos, folio_size(folio));
> > > 
> > > Can this construct be simplified to:
> > > 
> > > 	end_off = round_up(pos + 1, folio_size(folio));
> > 
> > I thought about that first, but I really, really dislike sprinkling
> > magic "+ 1" corrections into the code to address non-obvious
> > unexplained off-by-one problems.
> > 
> > 
> > > If pos is the first byte of the folio, it'll round end_off to the start
> > > of the next folio.  If pos is (somehow) the last byte of the folio, the
> > > first argument to round_up is already the first byte of the next folio,
> > > and rounding won't change it.
> > 
> > Yup, and that's exactly the problem I had with doing this - it
> > relies on the implicit behaviour that by moving last byte of a block
> > to the first byte of the next block, round_up() won't change the end
> > offset.  i.e. the correct functioning of the code is just as
> > non-obvious with a magic "+ 1" as the incorrect functioning was
> > without it.
> > 
> > Look at it this way: I didn't realise it was wrong when I wrote the
> > code, and I couldn't find the bug round_up() introduced when reading
> > the code even after the problem had been bisected to this exact
> > change. The code I wrote is bad, and adding a magic "+ 1" to fix the
> > bug doesn't make the code any better.
> > 
> > Given this is a slow path, so I see no point in optimising the code
> > for efficiency. IMO, clarity of the logic and calculation being made
> > is far more important - obviously correct logic is better than
> > relying on the effect of a magic "+ 1" on some other function to
> > acheive the same thing....
> 
> <nod> Just making sure I wasn't missing something.
> 
> By the way, was this reported to the list?

The original report is here:

https://bugzilla.kernel.org/show_bug.cgi?id=217030 

And in discussion of the initial fix I sent out, Brian found the
off-by-one here:

https://lore.kernel.org/linux-xfs/Y+vOfaxIWX1c%2Fyy9@bfoster/

I still plan to fix up the patchset that removes ->discard_folio and
retain pages dirty in memory after writeback mapping failures, but I
figured getting the bug fix out was more important...

-Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: fix off-by-one-block in xfs_discard_folio()
  2023-03-01  1:04   ` Dave Chinner
  2023-03-01  1:08     ` Darrick J. Wong
@ 2023-03-01 13:41     ` Andreas Grünbacher
  2023-03-01 22:12       ` Dave Chinner
  1 sibling, 1 reply; 7+ messages in thread
From: Andreas Grünbacher @ 2023-03-01 13:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, linux-xfs

Am Mi., 1. März 2023 um 02:07 Uhr schrieb Dave Chinner <david@fromorbit.com>:
> On Tue, Feb 28, 2023 at 04:47:01PM -0800, Darrick J. Wong wrote:
> > On Wed, Mar 01, 2023 at 11:17:06AM +1100, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > >
> > > The recent writeback corruption fixes changed the code in
> > > xfs_discard_folio() to calculate a byte range to for punching
> > > delalloc extents. A mistake was made in using round_up(pos) for the
> > > end offset, because when pos points at the first byte of a block, it
> > > does not get rounded up to point to the end byte of the block. hence
> > > the punch range is short, and this leads to unexpected behaviour in
> > > certain cases in xfs_bmap_punch_delalloc_range.
> > >
> > > e.g. pos = 0 means we call xfs_bmap_punch_delalloc_range(0,0), so
> > > there is no previous extent and it rounds up the punch to the end of
> > > the delalloc extent it found at offset 0, not the end of the range
> > > given to xfs_bmap_punch_delalloc_range().
> > >
> > > Fix this by handling the zero block offset case correctly.
> > >
> > > Fixes: 7348b322332d ("xfs: xfs_bmap_punch_delalloc_range() should take a byte range")
> > > Reported-by: Pengfei Xu <pengfei.xu@intel.com>
> > > Found-by: Brian Foster <bfoster@redhat.com>
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> > >  fs/xfs/xfs_aops.c | 14 ++++++++++++--
> > >  1 file changed, 12 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > > index 41734202796f..429f63cfd7d4 100644
> > > --- a/fs/xfs/xfs_aops.c
> > > +++ b/fs/xfs/xfs_aops.c
> > > @@ -466,6 +466,7 @@ xfs_discard_folio(
> > >  {
> > >     struct xfs_inode        *ip = XFS_I(folio->mapping->host);
> > >     struct xfs_mount        *mp = ip->i_mount;
> > > +   xfs_off_t               end_off;
> > >     int                     error;
> > >
> > >     if (xfs_is_shutdown(mp))
> > > @@ -475,8 +476,17 @@ xfs_discard_folio(
> > >             "page discard on page "PTR_FMT", inode 0x%llx, pos %llu.",
> > >                     folio, ip->i_ino, pos);
> > >
> > > -   error = xfs_bmap_punch_delalloc_range(ip, pos,
> > > -                   round_up(pos, folio_size(folio)));
> > > +   /*
> > > +    * Need to be careful with the case where the pos passed in points to
> > > +    * the first byte of the folio - rounding up won't change the value,
> > > +    * but in all cases here we need to end offset to point to the start
> > > +    * of the next folio.
> > > +    */
> > > +   if (pos == folio_pos(folio))
> > > +           end_off = pos + folio_size(folio);
> > > +   else
> > > +           end_off = round_up(pos, folio_size(folio));
> >
> > Can this construct be simplified to:
> >
> >       end_off = round_up(pos + 1, folio_size(folio));
>
> I thought about that first, but I really, really dislike sprinkling
> magic "+ 1" corrections into the code to address non-obvious
> unexplained off-by-one problems.
>
>
> > If pos is the first byte of the folio, it'll round end_off to the start
> > of the next folio.  If pos is (somehow) the last byte of the folio, the
> > first argument to round_up is already the first byte of the next folio,
> > and rounding won't change it.
>
> Yup, and that's exactly the problem I had with doing this - it
> relies on the implicit behaviour that by moving last byte of a block
> to the first byte of the next block, round_up() won't change the end
> offset.  i.e. the correct functioning of the code is just as
> non-obvious with a magic "+ 1" as the incorrect functioning was
> without it.

Hmm. On the other hand, it's not immediately obvious that the if
statement only does an addition with rounding; it might as well do
something more complex. Darrick's version avoids making things more
complicated than they need to be.

Other ways of doing the same thing would be:

end_off = round_down(pos, folio_size(folio)) + folio_size(folio);
end_off = folio_pos(folio) + folio_size(folio);

> Look at it this way: I didn't realise it was wrong when I wrote the
> code, and I couldn't find the bug round_up() introduced when reading
> the code even after the problem had been bisected to this exact
> change. The code I wrote is bad, and adding a magic "+ 1" to fix the
> bug doesn't make the code any better.
>
> Given this is a slow path, so I see no point in optimising the code
> for efficiency. IMO, clarity of the logic and calculation being made
> is far more important - obviously correct logic is better than
> relying on the effect of a magic "+ 1" on some other function to
> acheive the same thing....
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com

Cheers,
Andreas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: fix off-by-one-block in xfs_discard_folio()
  2023-03-01 13:41     ` Andreas Grünbacher
@ 2023-03-01 22:12       ` Dave Chinner
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Chinner @ 2023-03-01 22:12 UTC (permalink / raw)
  To: Andreas Grünbacher; +Cc: Darrick J. Wong, linux-xfs

On Wed, Mar 01, 2023 at 02:41:53PM +0100, Andreas Grünbacher wrote:
> Am Mi., 1. März 2023 um 02:07 Uhr schrieb Dave Chinner <david@fromorbit.com>:
> > On Tue, Feb 28, 2023 at 04:47:01PM -0800, Darrick J. Wong wrote:
> > > On Wed, Mar 01, 2023 at 11:17:06AM +1100, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > >
> > > > The recent writeback corruption fixes changed the code in
> > > > xfs_discard_folio() to calculate a byte range to for punching
> > > > delalloc extents. A mistake was made in using round_up(pos) for the
> > > > end offset, because when pos points at the first byte of a block, it
> > > > does not get rounded up to point to the end byte of the block. hence
> > > > the punch range is short, and this leads to unexpected behaviour in
> > > > certain cases in xfs_bmap_punch_delalloc_range.
> > > >
> > > > e.g. pos = 0 means we call xfs_bmap_punch_delalloc_range(0,0), so
> > > > there is no previous extent and it rounds up the punch to the end of
> > > > the delalloc extent it found at offset 0, not the end of the range
> > > > given to xfs_bmap_punch_delalloc_range().
> > > >
> > > > Fix this by handling the zero block offset case correctly.
> > > >
> > > > Fixes: 7348b322332d ("xfs: xfs_bmap_punch_delalloc_range() should take a byte range")
> > > > Reported-by: Pengfei Xu <pengfei.xu@intel.com>
> > > > Found-by: Brian Foster <bfoster@redhat.com>
> > > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > > ---
> > > >  fs/xfs/xfs_aops.c | 14 ++++++++++++--
> > > >  1 file changed, 12 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > > > index 41734202796f..429f63cfd7d4 100644
> > > > --- a/fs/xfs/xfs_aops.c
> > > > +++ b/fs/xfs/xfs_aops.c
> > > > @@ -466,6 +466,7 @@ xfs_discard_folio(
> > > >  {
> > > >     struct xfs_inode        *ip = XFS_I(folio->mapping->host);
> > > >     struct xfs_mount        *mp = ip->i_mount;
> > > > +   xfs_off_t               end_off;
> > > >     int                     error;
> > > >
> > > >     if (xfs_is_shutdown(mp))
> > > > @@ -475,8 +476,17 @@ xfs_discard_folio(
> > > >             "page discard on page "PTR_FMT", inode 0x%llx, pos %llu.",
> > > >                     folio, ip->i_ino, pos);
> > > >
> > > > -   error = xfs_bmap_punch_delalloc_range(ip, pos,
> > > > -                   round_up(pos, folio_size(folio)));
> > > > +   /*
> > > > +    * Need to be careful with the case where the pos passed in points to
> > > > +    * the first byte of the folio - rounding up won't change the value,
> > > > +    * but in all cases here we need to end offset to point to the start
> > > > +    * of the next folio.
> > > > +    */
> > > > +   if (pos == folio_pos(folio))
> > > > +           end_off = pos + folio_size(folio);
> > > > +   else
> > > > +           end_off = round_up(pos, folio_size(folio));
> > >
> > > Can this construct be simplified to:
> > >
> > >       end_off = round_up(pos + 1, folio_size(folio));
> >
> > I thought about that first, but I really, really dislike sprinkling
> > magic "+ 1" corrections into the code to address non-obvious
> > unexplained off-by-one problems.
> >
> >
> > > If pos is the first byte of the folio, it'll round end_off to the start
> > > of the next folio.  If pos is (somehow) the last byte of the folio, the
> > > first argument to round_up is already the first byte of the next folio,
> > > and rounding won't change it.
> >
> > Yup, and that's exactly the problem I had with doing this - it
> > relies on the implicit behaviour that by moving last byte of a block
> > to the first byte of the next block, round_up() won't change the end
> > offset.  i.e. the correct functioning of the code is just as
> > non-obvious with a magic "+ 1" as the incorrect functioning was
> > without it.
> 
> Hmm. On the other hand, it's not immediately obvious that the if
> statement only does an addition with rounding; it might as well do
> something more complex.

I don't really understand what you are trying to say here.

> Darrick's version avoids making things more
> complicated than they need to be.

At the expense having magic "+ 1"s in the code. Those always lead to
confusion and future off-by-one bugs.

Following the fundamental principle of obvious correctness: If we
are adding a "+ 1" to fix a bug, then the original code was bad and
needs to be reworked to remove the possibility of future, difficult
to detect off-by-one mistakes in the code.

> Other ways of doing the same thing would be:
> 
> end_off = round_down(pos, folio_size(folio)) + folio_size(folio);
> end_off = folio_pos(folio) + folio_size(folio);

That last one is the best suggestion I've heard, because the end
offset is always the first byte of the next folio. That's an
explicit, unambiguous encoding of the exact requirement we need to
fulfil. It does not require rounding, magic "+ 1" values to be
added, etc.

Thanks, Andreas, I'll change it to that.

-Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-03-01 22:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-01  0:17 [PATCH] xfs: fix off-by-one-block in xfs_discard_folio() Dave Chinner
2023-03-01  0:47 ` Darrick J. Wong
2023-03-01  1:04   ` Dave Chinner
2023-03-01  1:08     ` Darrick J. Wong
2023-03-01  1:21       ` Dave Chinner
2023-03-01 13:41     ` Andreas Grünbacher
2023-03-01 22:12       ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.