All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: xfs_release don't free eofblocks with extsize hint
@ 2012-06-11 19:29 Ben Myers
  2012-06-12  1:11 ` Dave Chinner
  0 siblings, 1 reply; 3+ messages in thread
From: Ben Myers @ 2012-06-11 19:29 UTC (permalink / raw)
  To: xfs

Testing has shown that there is a regression when using extent size
hints with NFS.  There are many more extents created than expected.

This is the workload, from a single NFS client:
# for e in `seq 1 8`;
do
	mkdir -p $e; (cd $e; dd if=/dev/zero of=file bs=4096k count=100) &
done

# xfs_io -c extsize .
[16777216] .

Here is the extent count for each file without this patch:

385 387 387 387 400 398 379 334

Here is the extent count with commit aff3a9ed reverted (using delayed
allocation):

3 4 2 3 4 4 3 3

In a comparison of traces, one with the above workload run locally and
the other with the above workload over NFS, there were occasional calls
to xfs_free_eofblocks in NFS but not on local workloads.  These turned
out to be from xfs_release.

Not calling xfs_free_eofblocks when using the extent size hint resolves this
issue when using extsize hints and NFS.  I suspect that delayed allocation
mitigated the effect of xfs_free_eofblocks until commit aff3a9ed.

Here is the extent count with this patch:

6 3 9 12 3 3 8 10

Signed-off-by: Ben Myers <bpm@sgi.com>

Index: xfs/fs/xfs/xfs_vnodeops.c
===================================================================
--- xfs.orig/fs/xfs/xfs_vnodeops.c
+++ xfs/fs/xfs/xfs_vnodeops.c
@@ -547,8 +547,9 @@ xfs_release(
 	if ((S_ISREG(ip->i_d.di_mode) &&
 	     (VFS_I(ip)->i_size > 0 ||
 	      (VN_CACHED(VFS_I(ip)) > 0 || ip->i_delayed_blks > 0)) &&
-	     (ip->i_df.if_flags & XFS_IFEXTENTS))  &&
-	    (!(ip->i_d.di_flags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)))) {
+	     (ip->i_df.if_flags & XFS_IFEXTENTS)) &&
+	    (!(ip->i_d.di_flags & (XFS_DIFLAG_PREALLOC|XFS_DIFLAG_APPEND))) &&
+	    (!xfs_get_extsz_hint(ip))) {
 
 		/*
 		 * If we can't get the iolock just skip truncating the blocks
@@ -570,6 +571,11 @@ xfs_release(
 		 * release. Hence on the first dirty close we will still remove
 		 * the speculative allocation, but after that we will leave it
 		 * in place.
+		 *
+		 * Additionally, do not free eofblocks for files with the
+		 * extent size hint set.  This can cause excessive
+		 * fragmentation which is inconsistent with the intent
+		 * of the extsize feature.
 		 */
 		if (xfs_iflags_test(ip, XFS_IDIRTY_RELEASE))
 			return 0;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] xfs: xfs_release don't free eofblocks with extsize hint
  2012-06-11 19:29 [PATCH] xfs: xfs_release don't free eofblocks with extsize hint Ben Myers
@ 2012-06-12  1:11 ` Dave Chinner
  2012-06-19  7:48   ` Christoph Hellwig
  0 siblings, 1 reply; 3+ messages in thread
From: Dave Chinner @ 2012-06-12  1:11 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Mon, Jun 11, 2012 at 02:29:48PM -0500, Ben Myers wrote:
> Testing has shown that there is a regression when using extent size
> hints with NFS.  There are many more extents created than expected.
> 
> This is the workload, from a single NFS client:
> # for e in `seq 1 8`;
> do
> 	mkdir -p $e; (cd $e; dd if=/dev/zero of=file bs=4096k count=100) &
> done
> 
> # xfs_io -c extsize .
> [16777216] .
> 
> Here is the extent count for each file without this patch:
> 
> 385 387 387 387 400 398 379 334

So it is a 400MB file with roughly 400 1MB extents? IOWs, you're
getting one extent per 1MB NFS write over the wire?

Why aren't the extents being allocated adjacently? The allocator is
supposed to taka the last block of the previous adjacent extent as
the hint for the location of the new extent. This result implies
that either the allocator is broken or you are crossing the streams.
How many AGs do you have in your filesystem?

I just ran that test with a filesystem that has more than 8 AGs, and
the result it a single extent per file. When I have less than 8 AGs,
there is contention at the AG level for allocation, and the result
is allocation interleaving giving the above fragmentation levels.

So the allocator is working just fine, it's just that your
configuration is causing the streams to interleave, just like
delalloc used to before the speculative prealloc beyond EOF changes
in 2.6.38. (i.e. the I_DIRTY_RELEASE flag)

> In a comparison of traces, one with the above workload run locally and
> the other with the above workload over NFS, there were occasional calls
> to xfs_free_eofblocks in NFS but not on local workloads.  These turned
> out to be from xfs_release.

The reason that xfs_release() is doing this for NFS is that the
I_DIRTY_RELEASE mitigation code triggers off delayed allocation
blocks on the inode, and preallocation does not result in delalloc
being present.


> Not calling xfs_free_eofblocks when using the extent size hint resolves this
> issue when using extsize hints and NFS.  I suspect that delayed allocation
> mitigated the effect of xfs_free_eofblocks until commit aff3a9ed.

No, the I_DIRTY_RELEASE code is what mitigated the effect for
delayed allocation.  Using preallocation for extsize hints does not
result in delalloc being present, hence this stopped working for
such files.

So this could be fixed simply by checking for dirty pages whenteh
extentsize hint is set after eof truncation, just like we do for
delalloc. i.e:

	if (ip->i_delayed_blks ||
	    ((ip->i_d.di_flags & XFS_DIFLAG_EXTSIZE) &&
	     mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_DIRTY)))
		xfs_iflags_set(ip, XFS_IDIRTY_RELEASE);

Alternatively, (and I think a better solution) is to treat files
with extent size hints on them exactly the same as files with
preallocation on them. i.e. in xfs_set_diflags() we set
XFS_DIFLAG_PREALLOC at the same time we set the XFS_DIFLAG_EXTSIZE,
and then all our truncate logic just works as expected for
physical preallocation beyond EOF on extsize hint files without
changing anything else....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] xfs: xfs_release don't free eofblocks with extsize hint
  2012-06-12  1:11 ` Dave Chinner
@ 2012-06-19  7:48   ` Christoph Hellwig
  0 siblings, 0 replies; 3+ messages in thread
From: Christoph Hellwig @ 2012-06-19  7:48 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Ben Myers, xfs

On Tue, Jun 12, 2012 at 11:11:02AM +1000, Dave Chinner wrote:
> Alternatively, (and I think a better solution) is to treat files
> with extent size hints on them exactly the same as files with
> preallocation on them. i.e. in xfs_set_diflags() we set
> XFS_DIFLAG_PREALLOC at the same time we set the XFS_DIFLAG_EXTSIZE,
> and then all our truncate logic just works as expected for
> physical preallocation beyond EOF on extsize hint files without
> changing anything else....

I think that is a much better approach.  The user explicitly asked to
do allocation in the specified unit, so we should not truncate it down.

On something slightly related:  Currently the calls to
xfs_free_eofblocks in xfs_release and in xfs_inactive are guarded by
similar but not quite the same checks on the inode type/flags/allocated
blocks, etc.  It would be really good to factor the common parts into
a helper, and then document why the other checks need to be different.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-06-19  7:48 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-11 19:29 [PATCH] xfs: xfs_release don't free eofblocks with extsize hint Ben Myers
2012-06-12  1:11 ` Dave Chinner
2012-06-19  7:48   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.