All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] xfs: use iomap new flag for newly allocated delalloc blocks
@ 2017-03-03 17:23 Brian Foster
  2017-03-06 18:00 ` Darrick J. Wong
  0 siblings, 1 reply; 3+ messages in thread
From: Brian Foster @ 2017-03-03 17:23 UTC (permalink / raw)
  To: linux-xfs

Commit fa7f138 ("xfs: clear delalloc and cache on buffered write
failure") fixed one regression in the iomap error handling code and
exposed another. The fundamental problem is that if a buffered write
is a rewrite of preexisting delalloc blocks and the write fails, the
failure handling code can punch out preexisting blocks with valid
file data.

This was reproduced directly by sub-block writes in the LTP
kernel/syscalls/write/write03 test. A first 100 byte write allocates
a single block in a file. A subsequent 100 byte write fails and
punches out the block, including the data successfully written by
the previous write.

To address this problem, update the ->iomap_begin() handler to
distinguish newly allocated delalloc blocks from preexisting
delalloc blocks via the IOMAP_F_NEW flag. Use this flag in the
->iomap_end() handler to decide when a failed or short write should
punch out delalloc blocks.

This introduces the subtle requirement that ->iomap_begin() should
never combine newly allocated delalloc blocks with existing blocks
in the resulting iomap descriptor. This can occur when a new
delalloc reservation merges with a neighboring extent that is part
of the current write, for example. Therefore, update
xfs_bmapi_reserve_delalloc() to conditionally return the allocated
record along with the updated 'got' record and map the former into
the iomap.  This ensures that preexisting delalloc blocks are always
handled as "found" blocks and not punched out on a failed rewrite.

Reported-by: Xiong Zhou <xzhou@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
---

This version simply changes the interface to
xfs_bmapi_reserve_delalloc() for using the allocated range rather than
the fully up-to-date record after the allocation. Thoughts? If this is
still too ugly I suppose we could just do as Christoph suggested
originally and drop the update of got entirely. I'm just not that fond
of the landmine that leaves around should some future caller expect
'got' to be updated, or worse, for somebody to re-add it to
bmapi_reserve_delalloc() without us remembering the requirement for
IOMAP_F_NEW and quietly breaking write failure handling again.

Brian

v2:
- Use arec param rather than bmapi flag in xfs_bmapi_reserve_delalloc().
v1: http://www.spinics.net/lists/linux-xfs/msg04604.html

 fs/xfs/libxfs/xfs_bmap.c | 11 ++++++++---
 fs/xfs/libxfs/xfs_bmap.h |  3 ++-
 fs/xfs/xfs_iomap.c       | 34 +++++++++++++++++++++++++---------
 fs/xfs/xfs_reflink.c     |  2 +-
 4 files changed, 36 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index a9c66d4..dcc6c13 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4159,7 +4159,8 @@ xfs_bmapi_reserve_delalloc(
 	xfs_filblks_t		prealloc,
 	struct xfs_bmbt_irec	*got,
 	xfs_extnum_t		*lastx,
-	int			eof)
+	int			eof,
+	struct xfs_bmbt_irec	*arec)
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
@@ -4236,11 +4237,15 @@ xfs_bmapi_reserve_delalloc(
 	got->br_startblock = nullstartblock(indlen);
 	got->br_blockcount = alen;
 	got->br_state = XFS_EXT_NORM;
+	if (arec)
+		*arec = *got;
+
 	xfs_bmap_add_extent_hole_delay(ip, whichfork, lastx, got);
 
 	/*
-	 * Update our extent pointer, given that xfs_bmap_add_extent_hole_delay
-	 * might have merged it into one of the neighbouring ones.
+	 * Update our extent pointer if the caller asked for entire extents.
+	 * xfs_bmap_add_extent_hole_delay() might have merged it into one of
+	 * the neighbouring ones.
 	 */
 	xfs_bmbt_get_all(xfs_iext_get_ext(ifp, *lastx), got);
 
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index cdef87d..c61f2a7 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -243,7 +243,8 @@ int	xfs_bmap_shift_extents(struct xfs_trans *tp, struct xfs_inode *ip,
 int	xfs_bmap_split_extent(struct xfs_inode *ip, xfs_fileoff_t split_offset);
 int	xfs_bmapi_reserve_delalloc(struct xfs_inode *ip, int whichfork,
 		xfs_fileoff_t off, xfs_filblks_t len, xfs_filblks_t prealloc,
-		struct xfs_bmbt_irec *got, xfs_extnum_t *lastx, int eof);
+		struct xfs_bmbt_irec *got, xfs_extnum_t *lastx, int eof,
+		struct xfs_bmbt_irec *arec);
 
 enum xfs_bmap_intent_type {
 	XFS_BMAP_MAP = 1,
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 41662fb..bdbab0f 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -531,7 +531,7 @@ xfs_file_iomap_begin_delay(
 		XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
 	xfs_fileoff_t		end_fsb;
 	int			error = 0, eof = 0;
-	struct xfs_bmbt_irec	got;
+	struct xfs_bmbt_irec	got, arec;
 	xfs_extnum_t		idx;
 	xfs_fsblock_t		prealloc_blocks = 0;
 
@@ -613,7 +613,8 @@ xfs_file_iomap_begin_delay(
 
 retry:
 	error = xfs_bmapi_reserve_delalloc(ip, XFS_DATA_FORK, offset_fsb,
-			end_fsb - offset_fsb, prealloc_blocks, &got, &idx, eof);
+			end_fsb - offset_fsb, prealloc_blocks, &got, &idx,
+			eof, &arec);
 	switch (error) {
 	case 0:
 		break;
@@ -630,6 +631,15 @@ xfs_file_iomap_begin_delay(
 		goto out_unlock;
 	}
 
+	/*
+	 * IOMAP_F_NEW controls whether we punch out delalloc blocks if the
+	 * write happens to fail, which means we can't combine newly allocated
+	 * blocks with preexisting delalloc blocks in the same iomap. got may
+	 * have been merged with contiguous extents, so use arec to map only the
+	 * newly allocated blocks.
+	 */
+	got = arec;
+	iomap->flags = IOMAP_F_NEW;
 	trace_xfs_iomap_alloc(ip, offset, count, 0, &got);
 done:
 	if (isnullstartblock(got.br_startblock))
@@ -1071,16 +1081,22 @@ xfs_file_iomap_end_delalloc(
 	struct xfs_inode	*ip,
 	loff_t			offset,
 	loff_t			length,
-	ssize_t			written)
+	ssize_t			written,
+	struct iomap		*iomap)
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	xfs_fileoff_t		start_fsb;
 	xfs_fileoff_t		end_fsb;
 	int			error = 0;
 
-	/* behave as if the write failed if drop writes is enabled */
-	if (xfs_mp_drop_writes(mp))
+	/*
+	 * Behave as if the write failed if drop writes is enabled. Set the NEW
+	 * flag to force delalloc cleanup.
+	 */
+	if (xfs_mp_drop_writes(mp)) {
+		iomap->flags |= IOMAP_F_NEW;
 		written = 0;
+	}
 
 	/*
 	 * start_fsb refers to the first unused block after a short write. If
@@ -1094,14 +1110,14 @@ xfs_file_iomap_end_delalloc(
 	end_fsb = XFS_B_TO_FSB(mp, offset + length);
 
 	/*
-	 * Trim back delalloc blocks if we didn't manage to write the whole
-	 * range reserved.
+	 * Trim delalloc blocks if they were allocated by this write and we
+	 * didn't manage to write the whole range.
 	 *
 	 * We don't need to care about racing delalloc as we hold i_mutex
 	 * across the reserve/allocate/unreserve calls. If there are delalloc
 	 * blocks in the range, they are ours.
 	 */
-	if (start_fsb < end_fsb) {
+	if (iomap->flags & IOMAP_F_NEW && start_fsb < end_fsb) {
 		truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb),
 					 XFS_FSB_TO_B(mp, end_fsb) - 1);
 
@@ -1131,7 +1147,7 @@ xfs_file_iomap_end(
 {
 	if ((flags & IOMAP_WRITE) && iomap->type == IOMAP_DELALLOC)
 		return xfs_file_iomap_end_delalloc(XFS_I(inode), offset,
-				length, written);
+				length, written, iomap);
 	return 0;
 }
 
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index da6d08f..c29cc08 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -313,7 +313,7 @@ xfs_reflink_reserve_cow(
 		return error;
 
 	error = xfs_bmapi_reserve_delalloc(ip, XFS_COW_FORK, imap->br_startoff,
-			imap->br_blockcount, 0, &got, &idx, eof);
+			imap->br_blockcount, 0, &got, &idx, eof, NULL);
 	if (error == -ENOSPC || error == -EDQUOT)
 		trace_xfs_reflink_cow_enospc(ip, imap);
 	if (error)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] xfs: use iomap new flag for newly allocated delalloc blocks
  2017-03-03 17:23 [PATCH v2] xfs: use iomap new flag for newly allocated delalloc blocks Brian Foster
@ 2017-03-06 18:00 ` Darrick J. Wong
  2017-03-06 18:23   ` Brian Foster
  0 siblings, 1 reply; 3+ messages in thread
From: Darrick J. Wong @ 2017-03-06 18:00 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Mar 03, 2017 at 12:23:18PM -0500, Brian Foster wrote:
> Commit fa7f138 ("xfs: clear delalloc and cache on buffered write
> failure") fixed one regression in the iomap error handling code and
> exposed another. The fundamental problem is that if a buffered write
> is a rewrite of preexisting delalloc blocks and the write fails, the
> failure handling code can punch out preexisting blocks with valid
> file data.
> 
> This was reproduced directly by sub-block writes in the LTP
> kernel/syscalls/write/write03 test. A first 100 byte write allocates
> a single block in a file. A subsequent 100 byte write fails and
> punches out the block, including the data successfully written by
> the previous write.
> 
> To address this problem, update the ->iomap_begin() handler to
> distinguish newly allocated delalloc blocks from preexisting
> delalloc blocks via the IOMAP_F_NEW flag. Use this flag in the
> ->iomap_end() handler to decide when a failed or short write should
> punch out delalloc blocks.
> 
> This introduces the subtle requirement that ->iomap_begin() should
> never combine newly allocated delalloc blocks with existing blocks
> in the resulting iomap descriptor. This can occur when a new
> delalloc reservation merges with a neighboring extent that is part
> of the current write, for example. Therefore, update
> xfs_bmapi_reserve_delalloc() to conditionally return the allocated
> record along with the updated 'got' record and map the former into
> the iomap.  This ensures that preexisting delalloc blocks are always
> handled as "found" blocks and not punched out on a failed rewrite.
> 
> Reported-by: Xiong Zhou <xzhou@redhat.com>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> 
> This version simply changes the interface to
> xfs_bmapi_reserve_delalloc() for using the allocated range rather than
> the fully up-to-date record after the allocation. Thoughts? If this is
> still too ugly I suppose we could just do as Christoph suggested
> originally and drop the update of got entirely. I'm just not that fond
> of the landmine that leaves around should some future caller expect
> 'got' to be updated, or worse, for somebody to re-add it to
> bmapi_reserve_delalloc() without us remembering the requirement for
> IOMAP_F_NEW and quietly breaking write failure handling again.

Could you add a comment above xfs_bmapi_reserve_delalloc summarizing
what the function is supposed to do and capturing the justification for
for arec?  I concede it's a little funny to mention iomap requirements
in libxfs, but I'd rather it get written down than left as a potential
landmine.

> Brian
> 
> v2:
> - Use arec param rather than bmapi flag in xfs_bmapi_reserve_delalloc().
> v1: http://www.spinics.net/lists/linux-xfs/msg04604.html
> 
>  fs/xfs/libxfs/xfs_bmap.c | 11 ++++++++---
>  fs/xfs/libxfs/xfs_bmap.h |  3 ++-
>  fs/xfs/xfs_iomap.c       | 34 +++++++++++++++++++++++++---------
>  fs/xfs/xfs_reflink.c     |  2 +-
>  4 files changed, 36 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index a9c66d4..dcc6c13 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4159,7 +4159,8 @@ xfs_bmapi_reserve_delalloc(
>  	xfs_filblks_t		prealloc,
>  	struct xfs_bmbt_irec	*got,
>  	xfs_extnum_t		*lastx,
> -	int			eof)
> +	int			eof,
> +	struct xfs_bmbt_irec	*arec)
>  {
>  	struct xfs_mount	*mp = ip->i_mount;
>  	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
> @@ -4236,11 +4237,15 @@ xfs_bmapi_reserve_delalloc(
>  	got->br_startblock = nullstartblock(indlen);
>  	got->br_blockcount = alen;
>  	got->br_state = XFS_EXT_NORM;
> +	if (arec)
> +		*arec = *got;
> +
>  	xfs_bmap_add_extent_hole_delay(ip, whichfork, lastx, got);
>  
>  	/*
> -	 * Update our extent pointer, given that xfs_bmap_add_extent_hole_delay
> -	 * might have merged it into one of the neighbouring ones.
> +	 * Update our extent pointer if the caller asked for entire extents.
> +	 * xfs_bmap_add_extent_hole_delay() might have merged it into one of
> +	 * the neighbouring ones.
>  	 */
>  	xfs_bmbt_get_all(xfs_iext_get_ext(ifp, *lastx), got);
>  
> diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
> index cdef87d..c61f2a7 100644
> --- a/fs/xfs/libxfs/xfs_bmap.h
> +++ b/fs/xfs/libxfs/xfs_bmap.h
> @@ -243,7 +243,8 @@ int	xfs_bmap_shift_extents(struct xfs_trans *tp, struct xfs_inode *ip,
>  int	xfs_bmap_split_extent(struct xfs_inode *ip, xfs_fileoff_t split_offset);
>  int	xfs_bmapi_reserve_delalloc(struct xfs_inode *ip, int whichfork,
>  		xfs_fileoff_t off, xfs_filblks_t len, xfs_filblks_t prealloc,
> -		struct xfs_bmbt_irec *got, xfs_extnum_t *lastx, int eof);
> +		struct xfs_bmbt_irec *got, xfs_extnum_t *lastx, int eof,
> +		struct xfs_bmbt_irec *arec);
>  
>  enum xfs_bmap_intent_type {
>  	XFS_BMAP_MAP = 1,
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 41662fb..bdbab0f 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -531,7 +531,7 @@ xfs_file_iomap_begin_delay(
>  		XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
>  	xfs_fileoff_t		end_fsb;
>  	int			error = 0, eof = 0;
> -	struct xfs_bmbt_irec	got;
> +	struct xfs_bmbt_irec	got, arec;
>  	xfs_extnum_t		idx;
>  	xfs_fsblock_t		prealloc_blocks = 0;
>  
> @@ -613,7 +613,8 @@ xfs_file_iomap_begin_delay(
>  
>  retry:
>  	error = xfs_bmapi_reserve_delalloc(ip, XFS_DATA_FORK, offset_fsb,
> -			end_fsb - offset_fsb, prealloc_blocks, &got, &idx, eof);
> +			end_fsb - offset_fsb, prealloc_blocks, &got, &idx,
> +			eof, &arec);
>  	switch (error) {
>  	case 0:
>  		break;
> @@ -630,6 +631,15 @@ xfs_file_iomap_begin_delay(
>  		goto out_unlock;
>  	}
>  
> +	/*
> +	 * IOMAP_F_NEW controls whether we punch out delalloc blocks if the
> +	 * write happens to fail, which means we can't combine newly allocated
> +	 * blocks with preexisting delalloc blocks in the same iomap. got may
> +	 * have been merged with contiguous extents, so use arec to map only the
> +	 * newly allocated blocks.
> +	 */
> +	got = arec;
> +	iomap->flags = IOMAP_F_NEW;
>  	trace_xfs_iomap_alloc(ip, offset, count, 0, &got);
>  done:
>  	if (isnullstartblock(got.br_startblock))
> @@ -1071,16 +1081,22 @@ xfs_file_iomap_end_delalloc(
>  	struct xfs_inode	*ip,
>  	loff_t			offset,
>  	loff_t			length,
> -	ssize_t			written)
> +	ssize_t			written,
> +	struct iomap		*iomap)
>  {
>  	struct xfs_mount	*mp = ip->i_mount;
>  	xfs_fileoff_t		start_fsb;
>  	xfs_fileoff_t		end_fsb;
>  	int			error = 0;
>  
> -	/* behave as if the write failed if drop writes is enabled */
> -	if (xfs_mp_drop_writes(mp))
> +	/*
> +	 * Behave as if the write failed if drop writes is enabled. Set the NEW
> +	 * flag to force delalloc cleanup.
> +	 */
> +	if (xfs_mp_drop_writes(mp)) {
> +		iomap->flags |= IOMAP_F_NEW;
>  		written = 0;
> +	}
>  
>  	/*
>  	 * start_fsb refers to the first unused block after a short write. If
> @@ -1094,14 +1110,14 @@ xfs_file_iomap_end_delalloc(
>  	end_fsb = XFS_B_TO_FSB(mp, offset + length);
>  
>  	/*
> -	 * Trim back delalloc blocks if we didn't manage to write the whole
> -	 * range reserved.
> +	 * Trim delalloc blocks if they were allocated by this write and we
> +	 * didn't manage to write the whole range.
>  	 *
>  	 * We don't need to care about racing delalloc as we hold i_mutex
>  	 * across the reserve/allocate/unreserve calls. If there are delalloc
>  	 * blocks in the range, they are ours.
>  	 */
> -	if (start_fsb < end_fsb) {
> +	if (iomap->flags & IOMAP_F_NEW && start_fsb < end_fsb) {

Doesn't the IOMAP_F_NEW check here need parentheses?

--D

>  		truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb),
>  					 XFS_FSB_TO_B(mp, end_fsb) - 1);
>  
> @@ -1131,7 +1147,7 @@ xfs_file_iomap_end(
>  {
>  	if ((flags & IOMAP_WRITE) && iomap->type == IOMAP_DELALLOC)
>  		return xfs_file_iomap_end_delalloc(XFS_I(inode), offset,
> -				length, written);
> +				length, written, iomap);
>  	return 0;
>  }
>  
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index da6d08f..c29cc08 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -313,7 +313,7 @@ xfs_reflink_reserve_cow(
>  		return error;
>  
>  	error = xfs_bmapi_reserve_delalloc(ip, XFS_COW_FORK, imap->br_startoff,
> -			imap->br_blockcount, 0, &got, &idx, eof);
> +			imap->br_blockcount, 0, &got, &idx, eof, NULL);
>  	if (error == -ENOSPC || error == -EDQUOT)
>  		trace_xfs_reflink_cow_enospc(ip, imap);
>  	if (error)
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] xfs: use iomap new flag for newly allocated delalloc blocks
  2017-03-06 18:00 ` Darrick J. Wong
@ 2017-03-06 18:23   ` Brian Foster
  0 siblings, 0 replies; 3+ messages in thread
From: Brian Foster @ 2017-03-06 18:23 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Mar 06, 2017 at 10:00:44AM -0800, Darrick J. Wong wrote:
> On Fri, Mar 03, 2017 at 12:23:18PM -0500, Brian Foster wrote:
> > Commit fa7f138 ("xfs: clear delalloc and cache on buffered write
> > failure") fixed one regression in the iomap error handling code and
> > exposed another. The fundamental problem is that if a buffered write
> > is a rewrite of preexisting delalloc blocks and the write fails, the
> > failure handling code can punch out preexisting blocks with valid
> > file data.
> > 
> > This was reproduced directly by sub-block writes in the LTP
> > kernel/syscalls/write/write03 test. A first 100 byte write allocates
> > a single block in a file. A subsequent 100 byte write fails and
> > punches out the block, including the data successfully written by
> > the previous write.
> > 
> > To address this problem, update the ->iomap_begin() handler to
> > distinguish newly allocated delalloc blocks from preexisting
> > delalloc blocks via the IOMAP_F_NEW flag. Use this flag in the
> > ->iomap_end() handler to decide when a failed or short write should
> > punch out delalloc blocks.
> > 
> > This introduces the subtle requirement that ->iomap_begin() should
> > never combine newly allocated delalloc blocks with existing blocks
> > in the resulting iomap descriptor. This can occur when a new
> > delalloc reservation merges with a neighboring extent that is part
> > of the current write, for example. Therefore, update
> > xfs_bmapi_reserve_delalloc() to conditionally return the allocated
> > record along with the updated 'got' record and map the former into
> > the iomap.  This ensures that preexisting delalloc blocks are always
> > handled as "found" blocks and not punched out on a failed rewrite.
> > 
> > Reported-by: Xiong Zhou <xzhou@redhat.com>
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> > 
> > This version simply changes the interface to
> > xfs_bmapi_reserve_delalloc() for using the allocated range rather than
> > the fully up-to-date record after the allocation. Thoughts? If this is
> > still too ugly I suppose we could just do as Christoph suggested
> > originally and drop the update of got entirely. I'm just not that fond
> > of the landmine that leaves around should some future caller expect
> > 'got' to be updated, or worse, for somebody to re-add it to
> > bmapi_reserve_delalloc() without us remembering the requirement for
> > IOMAP_F_NEW and quietly breaking write failure handling again.
> 
> Could you add a comment above xfs_bmapi_reserve_delalloc summarizing
> what the function is supposed to do and capturing the justification for
> for arec?  I concede it's a little funny to mention iomap requirements
> in libxfs, but I'd rather it get written down than left as a potential
> landmine.
> 

Sure... I'm not sure we need to describe the iomap case here (the
comment at the callsite should explain what's going on there), but we
can certainly describe got and arec here.

> > Brian
> > 
> > v2:
> > - Use arec param rather than bmapi flag in xfs_bmapi_reserve_delalloc().
> > v1: http://www.spinics.net/lists/linux-xfs/msg04604.html
> > 
> >  fs/xfs/libxfs/xfs_bmap.c | 11 ++++++++---
> >  fs/xfs/libxfs/xfs_bmap.h |  3 ++-
> >  fs/xfs/xfs_iomap.c       | 34 +++++++++++++++++++++++++---------
> >  fs/xfs/xfs_reflink.c     |  2 +-
> >  4 files changed, 36 insertions(+), 14 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > index a9c66d4..dcc6c13 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.c
> > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > @@ -4159,7 +4159,8 @@ xfs_bmapi_reserve_delalloc(
> >  	xfs_filblks_t		prealloc,
> >  	struct xfs_bmbt_irec	*got,
> >  	xfs_extnum_t		*lastx,
> > -	int			eof)
> > +	int			eof,
> > +	struct xfs_bmbt_irec	*arec)
> >  {
> >  	struct xfs_mount	*mp = ip->i_mount;
> >  	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
> > @@ -4236,11 +4237,15 @@ xfs_bmapi_reserve_delalloc(
> >  	got->br_startblock = nullstartblock(indlen);
> >  	got->br_blockcount = alen;
> >  	got->br_state = XFS_EXT_NORM;
> > +	if (arec)
> > +		*arec = *got;
> > +
> >  	xfs_bmap_add_extent_hole_delay(ip, whichfork, lastx, got);
> >  
> >  	/*
> > -	 * Update our extent pointer, given that xfs_bmap_add_extent_hole_delay
> > -	 * might have merged it into one of the neighbouring ones.
> > +	 * Update our extent pointer if the caller asked for entire extents.
> > +	 * xfs_bmap_add_extent_hole_delay() might have merged it into one of
> > +	 * the neighbouring ones.
> >  	 */
> >  	xfs_bmbt_get_all(xfs_iext_get_ext(ifp, *lastx), got);
> >  
> > diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
> > index cdef87d..c61f2a7 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.h
> > +++ b/fs/xfs/libxfs/xfs_bmap.h
> > @@ -243,7 +243,8 @@ int	xfs_bmap_shift_extents(struct xfs_trans *tp, struct xfs_inode *ip,
> >  int	xfs_bmap_split_extent(struct xfs_inode *ip, xfs_fileoff_t split_offset);
> >  int	xfs_bmapi_reserve_delalloc(struct xfs_inode *ip, int whichfork,
> >  		xfs_fileoff_t off, xfs_filblks_t len, xfs_filblks_t prealloc,
> > -		struct xfs_bmbt_irec *got, xfs_extnum_t *lastx, int eof);
> > +		struct xfs_bmbt_irec *got, xfs_extnum_t *lastx, int eof,
> > +		struct xfs_bmbt_irec *arec);
> >  
> >  enum xfs_bmap_intent_type {
> >  	XFS_BMAP_MAP = 1,
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index 41662fb..bdbab0f 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -531,7 +531,7 @@ xfs_file_iomap_begin_delay(
> >  		XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
> >  	xfs_fileoff_t		end_fsb;
> >  	int			error = 0, eof = 0;
> > -	struct xfs_bmbt_irec	got;
> > +	struct xfs_bmbt_irec	got, arec;
> >  	xfs_extnum_t		idx;
> >  	xfs_fsblock_t		prealloc_blocks = 0;
> >  
> > @@ -613,7 +613,8 @@ xfs_file_iomap_begin_delay(
> >  
> >  retry:
> >  	error = xfs_bmapi_reserve_delalloc(ip, XFS_DATA_FORK, offset_fsb,
> > -			end_fsb - offset_fsb, prealloc_blocks, &got, &idx, eof);
> > +			end_fsb - offset_fsb, prealloc_blocks, &got, &idx,
> > +			eof, &arec);
> >  	switch (error) {
> >  	case 0:
> >  		break;
> > @@ -630,6 +631,15 @@ xfs_file_iomap_begin_delay(
> >  		goto out_unlock;
> >  	}
> >  
> > +	/*
> > +	 * IOMAP_F_NEW controls whether we punch out delalloc blocks if the
> > +	 * write happens to fail, which means we can't combine newly allocated
> > +	 * blocks with preexisting delalloc blocks in the same iomap. got may
> > +	 * have been merged with contiguous extents, so use arec to map only the
> > +	 * newly allocated blocks.
> > +	 */
> > +	got = arec;
> > +	iomap->flags = IOMAP_F_NEW;
> >  	trace_xfs_iomap_alloc(ip, offset, count, 0, &got);
> >  done:
> >  	if (isnullstartblock(got.br_startblock))
> > @@ -1071,16 +1081,22 @@ xfs_file_iomap_end_delalloc(
> >  	struct xfs_inode	*ip,
> >  	loff_t			offset,
> >  	loff_t			length,
> > -	ssize_t			written)
> > +	ssize_t			written,
> > +	struct iomap		*iomap)
> >  {
> >  	struct xfs_mount	*mp = ip->i_mount;
> >  	xfs_fileoff_t		start_fsb;
> >  	xfs_fileoff_t		end_fsb;
> >  	int			error = 0;
> >  
> > -	/* behave as if the write failed if drop writes is enabled */
> > -	if (xfs_mp_drop_writes(mp))
> > +	/*
> > +	 * Behave as if the write failed if drop writes is enabled. Set the NEW
> > +	 * flag to force delalloc cleanup.
> > +	 */
> > +	if (xfs_mp_drop_writes(mp)) {
> > +		iomap->flags |= IOMAP_F_NEW;
> >  		written = 0;
> > +	}
> >  
> >  	/*
> >  	 * start_fsb refers to the first unused block after a short write. If
> > @@ -1094,14 +1110,14 @@ xfs_file_iomap_end_delalloc(
> >  	end_fsb = XFS_B_TO_FSB(mp, offset + length);
> >  
> >  	/*
> > -	 * Trim back delalloc blocks if we didn't manage to write the whole
> > -	 * range reserved.
> > +	 * Trim delalloc blocks if they were allocated by this write and we
> > +	 * didn't manage to write the whole range.
> >  	 *
> >  	 * We don't need to care about racing delalloc as we hold i_mutex
> >  	 * across the reserve/allocate/unreserve calls. If there are delalloc
> >  	 * blocks in the range, they are ours.
> >  	 */
> > -	if (start_fsb < end_fsb) {
> > +	if (iomap->flags & IOMAP_F_NEW && start_fsb < end_fsb) {
> 
> Doesn't the IOMAP_F_NEW check here need parentheses?
> 

I didn't think so, but I can add parens for clarity (which I prefer
anyways, not sure why I didn't use them here) if nothing else. Thanks.

Brian

> --D
> 
> >  		truncate_pagecache_range(VFS_I(ip), XFS_FSB_TO_B(mp, start_fsb),
> >  					 XFS_FSB_TO_B(mp, end_fsb) - 1);
> >  
> > @@ -1131,7 +1147,7 @@ xfs_file_iomap_end(
> >  {
> >  	if ((flags & IOMAP_WRITE) && iomap->type == IOMAP_DELALLOC)
> >  		return xfs_file_iomap_end_delalloc(XFS_I(inode), offset,
> > -				length, written);
> > +				length, written, iomap);
> >  	return 0;
> >  }
> >  
> > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > index da6d08f..c29cc08 100644
> > --- a/fs/xfs/xfs_reflink.c
> > +++ b/fs/xfs/xfs_reflink.c
> > @@ -313,7 +313,7 @@ xfs_reflink_reserve_cow(
> >  		return error;
> >  
> >  	error = xfs_bmapi_reserve_delalloc(ip, XFS_COW_FORK, imap->br_startoff,
> > -			imap->br_blockcount, 0, &got, &idx, eof);
> > +			imap->br_blockcount, 0, &got, &idx, eof, NULL);
> >  	if (error == -ENOSPC || error == -EDQUOT)
> >  		trace_xfs_reflink_cow_enospc(ip, imap);
> >  	if (error)
> > -- 
> > 2.7.4
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-03-06 20:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-03 17:23 [PATCH v2] xfs: use iomap new flag for newly allocated delalloc blocks Brian Foster
2017-03-06 18:00 ` Darrick J. Wong
2017-03-06 18:23   ` Brian Foster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.