[Cluster-devel] [GFS2 PATCH] GFS2: Block reservation doubling scheme

* [Cluster-devel] [GFS2 PATCH] GFS2: Block reservation doubling scheme
       [not found] <1612755211.6184739.1412773976742.JavaMail.zimbra@redhat.com>
@ 2014-10-08 13:29 ` Bob Peterson
  2014-10-08 15:03   ` Bob Peterson
  0 siblings, 1 reply; 5+ messages in thread
From: Bob Peterson @ 2014-10-08 13:29 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

Steve Whitehouse wrote:
| I'd much prefer to see an algorithm that is adaptive, rather than simply
| bumping up the default here. We do need to be able to cope with cases
| where the files are much smaller, and without adaptive sizing, we might
| land up creating small holes in the allocations, which would then cause
| problems for later allocations,

I took this to heart and came up with a new design. The idea is not unlike
the block reservation doubling schemes in other file systems. The results
are fantastic; much better than those gained by hard coding 64 blocks.
This is the best level of fragmentation I've ever achieved with this app:

EXTENT COUNT FOR OUTPUT FILES =  310103
EXTENT COUNT FOR OUTPUT FILES =  343990
EXTENT COUNT FOR OUTPUT FILES =  332818
EXTENT COUNT FOR OUTPUT FILES =  336852
EXTENT COUNT FOR OUTPUT FILES =  334820

Compare these results to counts without the patch:

EXTENT COUNT FOR OUTPUT FILES =  951813
EXTENT COUNT FOR OUTPUT FILES =  966978
EXTENT COUNT FOR OUTPUT FILES =  1065481

The only down side I see is that it makes the gfs2 inode structure bigger.
I also thought about changing the minimum reservation size to 16 blocks
rather than 32, since it's now adaptive, but before I did that, I'd have
to run some performance tests. What do you think?

Regards,

Bob Peterson
Red Hat File Systems
--------------------------------------------------------------------
Patch text:

This patch introduces a new block reservation doubling scheme. If we
get to the end of a multi-block reservation, we probably did not
reserve enough blocks. So we double the size of the reservation for
next time. If we can't find any rgrps that match, we go back to the
default 32 blocks.

Signed-off-by: Bob Peterson <rpeterso@redhat.com> 
---
 fs/gfs2/incore.h | 1 +
 fs/gfs2/main.c   | 2 ++
 fs/gfs2/rgrp.c   | 7 ++++++-
 fs/gfs2/rgrp.h   | 9 ++-------
 4 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 39e7e99..f98fa37 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -398,6 +398,7 @@ struct gfs2_inode {
 	u32 i_diskflags;
 	u8 i_height;
 	u8 i_depth;
+	u32 i_rsrv_minblks; /* minimum blocks per reservation */
 };
 
 /*
diff --git a/fs/gfs2/main.c b/fs/gfs2/main.c
index 82b6ac8..2be2f98 100644
--- a/fs/gfs2/main.c
+++ b/fs/gfs2/main.c
@@ -29,6 +29,7 @@
 #include "glock.h"
 #include "quota.h"
 #include "recovery.h"
+#include "rgrp.h"
 #include "dir.h"
 
 struct workqueue_struct *gfs2_control_wq;
@@ -42,6 +43,7 @@ static void gfs2_init_inode_once(void *foo)
 	INIT_LIST_HEAD(&ip->i_trunc_list);
 	ip->i_res = NULL;
 	ip->i_hash_cache = NULL;
+	ip->i_rsrv_minblks = RGRP_RSRV_MINBLKS;
 }
 
 static void gfs2_init_glock_once(void *foo)
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 7474c41..986c33f 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1465,7 +1465,7 @@ static void rg_mblk_search(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip,
 		extlen = 1;
 	else {
 		extlen = max_t(u32, atomic_read(&rs->rs_sizehint), ap->target);
-		extlen = clamp(extlen, RGRP_RSRV_MINBLKS, free_blocks);
+		extlen = clamp(extlen, ip->i_rsrv_minblks, free_blocks);
 	}
 	if ((rgd->rd_free_clone < rgd->rd_reserved) || (free_blocks < extlen))
 		return;
@@ -2000,6 +2000,7 @@ next_rgrp:
 		 * then this checks for some less likely conditions before
 		 * trying again.
 		 */
+		ip->i_rsrv_minblks = RGRP_RSRV_MINBLKS;
 		loops++;
 		/* Check that fs hasn't grown if writing to rindex */
 		if (ip == GFS2_I(sdp->sd_rindex) && !sdp->sd_rindex_uptodate) {
@@ -2195,6 +2196,10 @@ static void gfs2_adjust_reservation(struct gfs2_inode *ip,
 			trace_gfs2_rs(rs, TRACE_RS_CLAIM);
 			if (rs->rs_free && !ret)
 				goto out;
+			/* We used up our block reservation, so double the
+			   minimum reservation size for the next write. */
+			if (ip->i_rsrv_minblks < RGRP_RSRV_MAXBLKS)
+				ip->i_rsrv_minblks <<= 1;
 		}
 		__rs_deltree(rs);
 	}
diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
index 5d8f085..d081cac 100644
--- a/fs/gfs2/rgrp.h
+++ b/fs/gfs2/rgrp.h
@@ -13,13 +13,8 @@
 #include <linux/slab.h>
 #include <linux/uaccess.h>
 
-/* Since each block in the file system is represented by two bits in the
- * bitmap, one 64-bit word in the bitmap will represent 32 blocks.
- * By reserving 32 blocks at a time, we can optimize / shortcut how we search
- * through the bitmaps by looking a word at a time.
- */
-#define RGRP_RSRV_MINBYTES 8
-#define RGRP_RSRV_MINBLKS ((u32)(RGRP_RSRV_MINBYTES * GFS2_NBBY))
+#define RGRP_RSRV_MINBLKS 32
+#define RGRP_RSRV_MAXBLKS 512
 
 struct gfs2_rgrpd;
 struct gfs2_sbd;



^ permalink raw reply related	[flat|nested] 5+ messages in thread