All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC] xfs: use iomap new flag for newly allocated delalloc blocks
@ 2017-03-01 14:36 Brian Foster
  2017-03-01 23:56 ` Christoph Hellwig
  0 siblings, 1 reply; 4+ messages in thread
From: Brian Foster @ 2017-03-01 14:36 UTC (permalink / raw)
  To: linux-xfs; +Cc: Xiong Zhou, Christoph Hellwig

Commit fa7f138 ("xfs: clear delalloc and cache on buffered write
failure") fixed one regression in the iomap error handling code and
exposed another. The fundamental problem is that if a buffered write
is a rewrite of preexisting delalloc blocks and the write fails, the
failure handling code can punch out preexisting blocks with valid
file data.

This was reproduced directly by sub-block writes in the LTP
kernel/syscalls/write/write03 test. A first 100 byte write allocates
a single block in a file. A subsequent 100 byte write fails and
punches out the block, including the data successfully written by
the previous write.

To address this problem, update the ->iomap_begin() handler to
distinguish newly allocated delalloc blocks from preexisting
delalloc blocks via the IOMAP_F_NEW flag. Use this flag in the
->iomap_end() handler to decide when a failed or short write should
punch out delalloc blocks.

This introduces the subtle requirement that ->iomap_begin() should
never combine newly allocated delalloc blocks with existing blocks
in the resulting iomap descriptor. This can occur when a new
delalloc reservation merges with a neighboring extent that is part
of the current write, for example. Therefore, update
xfs_bmapi_reserve_delalloc() to conditionally return the merged
extent vs. the allocated extent and map the latter into the iomap.
This ensures that preexisting delalloc blocks are always handled as
"found" blocks and not punched out on a failed rewrite.

Reported-by: Xiong Zhou <xzhou@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
---

This is a stab at fixing the regression discussed in this[1] thread
based on what Christoph mentioned regarding use of the IOMAP_F_NEW flag.
I decided to co-opt the XFS_BMAPI_ENTIRE flag for the delalloc res bit
since it seems logically equivalent, but we could define a new flag too.
I considered as such to preserve default behavior of
_reserve_delalloc(), but otoh there is only one other caller. Otherwise,
this passes all of my testing so far. Thoughts?

Brian

[1] http://www.spinics.net/lists/linux-xfs/msg04500.html

 fs/xfs/libxfs/xfs_bmap.c | 11 +++++++----
 fs/xfs/libxfs/xfs_bmap.h |  3 ++-
 fs/xfs/xfs_iomap.c       | 31 ++++++++++++++++++++++---------
 fs/xfs/xfs_reflink.c     |  3 ++-
 4 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index a9c66d4..f44ebc4 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4159,7 +4159,8 @@ xfs_bmapi_reserve_delalloc(
 	xfs_filblks_t		prealloc,
 	struct xfs_bmbt_irec	*got,
 	xfs_extnum_t		*lastx,
-	int			eof)
+	int			eof,
+	int			flags)
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
@@ -4239,10 +4240,12 @@ xfs_bmapi_reserve_delalloc(
 	xfs_bmap_add_extent_hole_delay(ip, whichfork, lastx, got);
 
 	/*
-	 * Update our extent pointer, given that xfs_bmap_add_extent_hole_delay
-	 * might have merged it into one of the neighbouring ones.
+	 * Update our extent pointer if the caller asked for entire extents.
+	 * xfs_bmap_add_extent_hole_delay() might have merged it into one of
+	 * the neighbouring ones.
 	 */
-	xfs_bmbt_get_all(xfs_iext_get_ext(ifp, *lastx), got);
+	if (flags & XFS_BMAPI_ENTIRE)
+		xfs_bmbt_get_all(xfs_iext_get_ext(ifp, *lastx), got);
 
 	/*
 	 * Tag the inode if blocks were preallocated. Note that COW fork
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index cdef87d..459ba6b 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -243,7 +243,8 @@ int	xfs_bmap_shift_extents(struct xfs_trans *tp, struct xfs_inode *ip,
 int	xfs_bmap_split_extent(struct xfs_inode *ip, xfs_fileoff_t split_offset);
 int	xfs_bmapi_reserve_delalloc(struct xfs_inode *ip, int whichfork,
 		xfs_fileoff_t off, xfs_filblks_t len, xfs_filblks_t prealloc,
-		struct xfs_bmbt_irec *got, xfs_extnum_t *lastx, int eof);
+		struct xfs_bmbt_irec *got, xfs_extnum_t *lastx, int eof,
+		int flags);
 
 enum xfs_bmap_intent_type {
 	XFS_BMAP_MAP = 1,
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 41662fb..4a94895 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -612,8 +612,15 @@ xfs_file_iomap_begin_delay(
 	}
 
 retry:
+	/*
+	 * In order to distinguish newly allocated delalloc blocks from
+	 * preexisting delalloc blocks via IOMAP_F_NEW, it is required that got
+	 * returns _only_ blocks allocated by this write. Hence, do not pass
+	 * BMAPI_ENTIRE to the reserve call.
+	 */
 	error = xfs_bmapi_reserve_delalloc(ip, XFS_DATA_FORK, offset_fsb,
-			end_fsb - offset_fsb, prealloc_blocks, &got, &idx, eof);
+			end_fsb - offset_fsb, prealloc_blocks, &got, &idx,
+			eof, 0);
 	switch (error) {
 	case 0:
 		break;
@@ -629,7 +636,7 @@ xfs_file_iomap_begin_delay(
 	default:
 		goto out_unlock;
 	}
-
+	iomap->flags = IOMAP_F_NEW;
 	trace_xfs_iomap_alloc(ip, offset, count, 0, &got);
 done:
 	if (isnullstartblock(got.br_startblock))
@@ -1071,16 +1078,22 @@ xfs_file_iomap_end_delalloc(
 	struct xfs_inode	*ip,
 	loff_t			offset,
 	loff_t			length,
-	ssize_t			written)
+	ssize_t			written,
+	struct iomap		*iomap)
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	xfs_fileoff_t		start_fsb;
 	xfs_fileoff_t		end_fsb;
 	int			error = 0;
 
-	/* behave as if the write failed if drop writes is enabled */
-	if (xfs_mp_drop_writes(mp))
+	/*
+	 * Behave as if the write failed if drop writes is enabled. Set the NEW
+	 * flag to force delalloc cleanup.
+	 */
+	if (xfs_mp_drop_writes(mp)) {
+		iomap->flags |= IOMAP_F_NEW;
 		written = 0;
+	}
 
 	/*
 	 * start_fsb refers to the first unused block after a short write. If
@@ -1094,14 +1107,14 @@ xfs_file_iomap_end_delalloc(
 	end_fsb = XFS_B_TO_FSB(mp, offset + length);
 
 	/*
-	 * Trim back delalloc blocks if we didn't manage to write the whole
-	 * range reserved.
+	 * Trim delalloc blocks if they were allocated by this write and we
+	 * didn't manage to write the whole range.
 	 *
 	 * We don't need to care about racing delalloc as we hold i_mutex
 	 * across the reserve/allocate/unreserve calls. If there are delalloc
 	 * blocks in the range, they are ours.
 	 */
-	if (start_fsb < end_fsb) {
+	if (iomap->flags & IOMAP_F_NEW && start_fsb < end_fsb) {
 		truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb),
 					 XFS_FSB_TO_B(mp, end_fsb) - 1);
 
@@ -1131,7 +1144,7 @@ xfs_file_iomap_end(
 {
 	if ((flags & IOMAP_WRITE) && iomap->type == IOMAP_DELALLOC)
 		return xfs_file_iomap_end_delalloc(XFS_I(inode), offset,
-				length, written);
+				length, written, iomap);
 	return 0;
 }
 
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index da6d08f..d76a2b6 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -313,7 +313,8 @@ xfs_reflink_reserve_cow(
 		return error;
 
 	error = xfs_bmapi_reserve_delalloc(ip, XFS_COW_FORK, imap->br_startoff,
-			imap->br_blockcount, 0, &got, &idx, eof);
+			imap->br_blockcount, 0, &got, &idx, eof,
+			XFS_BMAPI_ENTIRE);
 	if (error == -ENOSPC || error == -EDQUOT)
 		trace_xfs_reflink_cow_enospc(ip, imap);
 	if (error)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC] xfs: use iomap new flag for newly allocated delalloc blocks
  2017-03-01 14:36 [PATCH RFC] xfs: use iomap new flag for newly allocated delalloc blocks Brian Foster
@ 2017-03-01 23:56 ` Christoph Hellwig
  2017-03-02 12:43   ` Brian Foster
  0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2017-03-01 23:56 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, Xiong Zhou, Christoph Hellwig

On Wed, Mar 01, 2017 at 09:36:49AM -0500, Brian Foster wrote:
> This is a stab at fixing the regression discussed in this[1] thread
> based on what Christoph mentioned regarding use of the IOMAP_F_NEW flag.
> I decided to co-opt the XFS_BMAPI_ENTIRE flag for the delalloc res bit
> since it seems logically equivalent, but we could define a new flag too.
> I considered as such to preserve default behavior of
> _reserve_delalloc(), but otoh there is only one other caller. Otherwise,
> this passes all of my testing so far. Thoughts?

I don't like reusing the flag that much, but I think instead of passing
the flag we could trivially just remove the xfs_bmbt_get_all in
xfs_bmapi_reserve_delalloc and let the caller handle it after
xfs_bmapi_reserve_delalloc returned.  That being said I see no good
reason why the COW would care to see the merged extent, so
unconditionally removing it should be fine as well.  Or did I miss
something?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC] xfs: use iomap new flag for newly allocated delalloc blocks
  2017-03-01 23:56 ` Christoph Hellwig
@ 2017-03-02 12:43   ` Brian Foster
  2017-03-02 16:36     ` Darrick J. Wong
  0 siblings, 1 reply; 4+ messages in thread
From: Brian Foster @ 2017-03-02 12:43 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, Xiong Zhou

On Wed, Mar 01, 2017 at 03:56:24PM -0800, Christoph Hellwig wrote:
> On Wed, Mar 01, 2017 at 09:36:49AM -0500, Brian Foster wrote:
> > This is a stab at fixing the regression discussed in this[1] thread
> > based on what Christoph mentioned regarding use of the IOMAP_F_NEW flag.
> > I decided to co-opt the XFS_BMAPI_ENTIRE flag for the delalloc res bit
> > since it seems logically equivalent, but we could define a new flag too.
> > I considered as such to preserve default behavior of
> > _reserve_delalloc(), but otoh there is only one other caller. Otherwise,
> > this passes all of my testing so far. Thoughts?
> 
> I don't like reusing the flag that much, but I think instead of passing
> the flag we could trivially just remove the xfs_bmbt_get_all in
> xfs_bmapi_reserve_delalloc and let the caller handle it after
> xfs_bmapi_reserve_delalloc returned.  That being said I see no good
> reason why the COW would care to see the merged extent, so
> unconditionally removing it should be fine as well.  Or did I miss
> something?

I think that's correct. The reflink code just feeds got into a
tracepoint and otherwise doesn't seem to care what's returned.

My only concern is that the behavior of not updating got seems
inconsistent with the rest of the bmap code. I also realize now that
this case may have broken the subsequent ASSERT() calls in terms of
their effectiveness (i.e., I'm not reproducing failures, but they sort
of become no-ops).

We could rename the got param and/or update a local record structure to
feed into the asserts, but that kind of confuses the input param
requirements for got (which should be filled in based on a previous
lookup), so I don't like that too much either. Hmm, what about instead
of adding flags, we just add a new, optional output param that returns
the allocated range? E.g.:

int
xfs_bmapi_reserve_delalloc(
        struct xfs_inode        *ip,
        int                     whichfork,
        xfs_fileoff_t           off,
        xfs_filblks_t           len,
        xfs_filblks_t           prealloc,
        struct xfs_bmbt_irec    *got,		/* full rec after alloc */
        xfs_extnum_t            *lastx,
        int                     eof,
        struct xfs_bmbt_irec	*arec)		/* allocation performed */
{
	...
	if (arec)
		arec = ...;
	...
}

The reflink code could just pass NULL, the iomap code will use arec
explicitly to document the requirement, and the function behavior should
be clear enough to avoid landmines from any future callers. Thoughts?

Brian

> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC] xfs: use iomap new flag for newly allocated delalloc blocks
  2017-03-02 12:43   ` Brian Foster
@ 2017-03-02 16:36     ` Darrick J. Wong
  0 siblings, 0 replies; 4+ messages in thread
From: Darrick J. Wong @ 2017-03-02 16:36 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs, Xiong Zhou

On Thu, Mar 02, 2017 at 07:43:53AM -0500, Brian Foster wrote:
> On Wed, Mar 01, 2017 at 03:56:24PM -0800, Christoph Hellwig wrote:
> > On Wed, Mar 01, 2017 at 09:36:49AM -0500, Brian Foster wrote:
> > > This is a stab at fixing the regression discussed in this[1] thread
> > > based on what Christoph mentioned regarding use of the IOMAP_F_NEW flag.
> > > I decided to co-opt the XFS_BMAPI_ENTIRE flag for the delalloc res bit
> > > since it seems logically equivalent, but we could define a new flag too.
> > > I considered as such to preserve default behavior of
> > > _reserve_delalloc(), but otoh there is only one other caller. Otherwise,
> > > this passes all of my testing so far. Thoughts?
> > 
> > I don't like reusing the flag that much, but I think instead of passing
> > the flag we could trivially just remove the xfs_bmbt_get_all in
> > xfs_bmapi_reserve_delalloc and let the caller handle it after
> > xfs_bmapi_reserve_delalloc returned.  That being said I see no good
> > reason why the COW would care to see the merged extent, so
> > unconditionally removing it should be fine as well.  Or did I miss
> > something?
> 
> I think that's correct. The reflink code just feeds got into a
> tracepoint and otherwise doesn't seem to care what's returned.

Yes.  It doesn't need to know the exact results of what happened, so
long as the reservation gets made.  The tracepoint exists so that I
could check that manually while debugging.

> My only concern is that the behavior of not updating got seems
> inconsistent with the rest of the bmap code. I also realize now that
> this case may have broken the subsequent ASSERT() calls in terms of
> their effectiveness (i.e., I'm not reproducing failures, but they sort
> of become no-ops).
> 
> We could rename the got param and/or update a local record structure to
> feed into the asserts, but that kind of confuses the input param
> requirements for got (which should be filled in based on a previous
> lookup), so I don't like that too much either. Hmm, what about instead
> of adding flags, we just add a new, optional output param that returns
> the allocated range? E.g.:
> 
> int
> xfs_bmapi_reserve_delalloc(
>         struct xfs_inode        *ip,
>         int                     whichfork,
>         xfs_fileoff_t           off,
>         xfs_filblks_t           len,
>         xfs_filblks_t           prealloc,
>         struct xfs_bmbt_irec    *got,		/* full rec after alloc */
>         xfs_extnum_t            *lastx,
>         int                     eof,
>         struct xfs_bmbt_irec	*arec)		/* allocation performed */
> {
> 	...
> 	if (arec)
> 		arec = ...;
> 	...
> }
> 
> The reflink code could just pass NULL, the iomap code will use arec
> explicitly to document the requirement, and the function behavior should
> be clear enough to avoid landmines from any future callers. Thoughts?

That seems fine to me too.

--D

> 
> Brian
> 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-03-02 17:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-01 14:36 [PATCH RFC] xfs: use iomap new flag for newly allocated delalloc blocks Brian Foster
2017-03-01 23:56 ` Christoph Hellwig
2017-03-02 12:43   ` Brian Foster
2017-03-02 16:36     ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.