[Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2
@ 2009-01-15 21:58 Tao Ma
  2009-01-15 22:00 ` [Ocfs2-devel] [PATCH 1/3] ocfs2: Optimize inode allocation by remembering last group Tao Ma
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Tao Ma @ 2009-01-15 21:58 UTC (permalink / raw)
  To: ocfs2-devel

Changelog from V1 to V2:
1. Modify some codes according to Mark's advice.
2. Attach some test statistics in the commit log of patch 3 and in
this e-mail also. See below.

Hi all,
	In ocfs2, when we create a fresh file system and create inodes in it, 
they are contiguous and good for readdir+stat. While if we delete all 
the inodes and created again, the new inodes will get spread out and 
that isn't what we need. The core problem here is that the inode block 
search looks for the "emptiest" inode group to allocate from. So if an 
inode alloc file has many equally (or almost equally) empty groups, new 
inodes will tend to get spread out amongst them, which in turn can put 
them all over the disk. This is undesirable because directory operations 
on conceptually "nearby" inodes force a large number of seeks. For more 
details, please see 
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy. 

So this patch set try to fix this problem.
patch 1: Optimize inode allocation by remembering last group.
We add ip_last_used_group in core directory inodes which records
the last used allocation group. Another field named ip_last_used_slot
is also added in case inode stealing happens. When claiming new inode,
we passed in directory's inode so that the allocation can use this
information.

patch 2: let the Inode group allocs use the global bitmap directly.

patch 3: we add osb_last_alloc_group in ocfs2_super to record the last
used allocation group so that we can make inode groups contiguous enough.

I have done some basic test and the results are cool.
1. single node test:
first column is the result without inode allocation patches, and the
second one with inode allocation patched enabled. You see we have
great improvement with the second "ls -lR".

echo 'y'|mkfs.ocfs2 -b 4K -C 4K -M local /dev/sda11

mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time tar jxvf /home/taoma/linux-2.6.28.tar.bz2 -C /mnt/ocfs2/ 1>/dev/null

real	0m20.548s 0m20.106s

umount /mnt/ocfs2/
echo 2 > /proc/sys/vm/drop_caches
mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time ls -lR /mnt/ocfs2/ 1>/dev/null

real	0m13.965s 0m13.766s

umount /mnt/ocfs2/
echo 2 > /proc/sys/vm/drop_caches
mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time rm /mnt/ocfs2/linux-2.6.28/ -rf

real	0m13.198s 0m13.091s

umount /mnt/ocfs2/
echo 2 > /proc/sys/vm/drop_caches
mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time tar jxvf /home/taoma/linux-2.6.28.tar.bz2 -C /mnt/ocfs2/ 1>/dev/null

real	0m23.022s 0m21.360s

umount /mnt/ocfs2/
echo 2 > /proc/sys/vm/drop_caches
mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time ls -lR /mnt/ocfs2/ 1>/dev/null

real	2m45.189s 0m15.019s 
yes, that is it. ;) I don't know we can improve so much when I start up.

2. Tested with 4 nodes(megabyte switch for both cross-node
communication and iscsi), with the same command sequence(using
openmpi to run the command simultaneously). Although we spend
a lot of time in cross-node communication, we still have some
performance improvement.

the 1st tar:
real	356.22s  357.70s

the 1st ls -lR:
real	187.33s  187.32s

the rm:
real	260.68s  262.42s

the 2nd tar:
real	371.92s  358.47s

the 2nd ls:
real	197.16s  188.36s

Regards,
Tao

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Ocfs2-devel] [PATCH 1/3] ocfs2: Optimize inode allocation by remembering last group.
  2009-01-15 21:58 [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2 Tao Ma
@ 2009-01-15 22:00 ` Tao Ma
  2009-01-15 22:00 ` [Ocfs2-devel] [PATCH 2/3] ocfs2: Allocate inode groups from global_bitmap Tao Ma
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Tao Ma @ 2009-01-15 22:00 UTC (permalink / raw)
  To: ocfs2-devel

In ocfs2, the inode block search looks for the "emptiest" inode
group to allocate from. So if an inode alloc file has many equally
(or almost equally) empty groups, new inodes will tend to get
spread out amongst them, which in turn can put them all over the
disk. This is undesirable because directory operations on conceptually
"nearby" inodes force a large number of seeks.

So we add ip_last_used_group in core directory inodes which records
the last used allocation group. Another field named ip_last_used_slot
is also added in case inode stealing happens. When claiming new inode,
we passed in directory's inode so that the allocation can use this
information.
For more details, please see
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
---
 fs/ocfs2/inode.c    |    2 ++
 fs/ocfs2/inode.h    |    4 ++++
 fs/ocfs2/namei.c    |    4 ++--
 fs/ocfs2/suballoc.c |   36 ++++++++++++++++++++++++++++++++++++
 fs/ocfs2/suballoc.h |    2 ++
 5 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
index 229e707..0435000 100644
--- a/fs/ocfs2/inode.c
+++ b/fs/ocfs2/inode.c
@@ -351,6 +351,8 @@ void ocfs2_populate_inode(struct inode *inode, struct ocfs2_dinode *fe,
 
 	ocfs2_set_inode_flags(inode);
 
+	OCFS2_I(inode)->ip_last_used_slot = 0;
+	OCFS2_I(inode)->ip_last_used_group = 0;
 	mlog_exit_void();
 }
 
diff --git a/fs/ocfs2/inode.h b/fs/ocfs2/inode.h
index eb3c302..e1978ac 100644
--- a/fs/ocfs2/inode.h
+++ b/fs/ocfs2/inode.h
@@ -72,6 +72,10 @@ struct ocfs2_inode_info
 
 	struct inode			vfs_inode;
 	struct jbd2_inode		ip_jinode;
+
+	/* Only valid if the inode is the dir. */
+	u32				ip_last_used_slot;
+	u64				ip_last_used_group;
 };
 
 /*
diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index 084aba8..9372b23 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -469,8 +469,8 @@ static int ocfs2_mknod_locked(struct ocfs2_super *osb,
 
 	*new_fe_bh = NULL;
 
-	status = ocfs2_claim_new_inode(osb, handle, inode_ac, &suballoc_bit,
-				       &fe_blkno);
+	status = ocfs2_claim_new_inode(osb, handle, dir, parent_fe_bh,
+				       inode_ac, &suballoc_bit, &fe_blkno);
 	if (status < 0) {
 		mlog_errno(status);
 		goto leave;
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index a696286..487f00c 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -1618,8 +1618,41 @@ bail:
 	return status;
 }
 
+static void ocfs2_init_inode_ac_group(struct inode *dir,
+				      struct buffer_head *parent_fe_bh,
+				      struct ocfs2_alloc_context *ac)
+{
+	struct ocfs2_dinode *fe = (struct ocfs2_dinode *)parent_fe_bh->b_data;
+	/*
+	 * Try to allocate inodes from some specific group.
+	 *
+	 * If the parent dir has recorded the last group used in allocation,
+	 * cool, use it. Otherwise if we try to allocate new inode from the
+	 * same slot the parent dir belongs to, use the same chunk.
+	 *
+	 * We are very careful here to avoid the mistake of setting
+	 * ac_last_group to a group descriptor from a different (unlocked) slot.
+	 */
+	if (OCFS2_I(dir)->ip_last_used_group &&
+	    OCFS2_I(dir)->ip_last_used_slot == ac->ac_alloc_slot)
+		ac->ac_last_group = OCFS2_I(dir)->ip_last_used_group;
+	else if (le16_to_cpu(fe->i_suballoc_slot) == ac->ac_alloc_slot)
+		ac->ac_last_group = ocfs2_which_suballoc_group(
+					le64_to_cpu(fe->i_blkno),
+					le16_to_cpu(fe->i_suballoc_bit));
+}
+
+static inline void ocfs2_save_inode_ac_group(struct inode *dir,
+					     struct ocfs2_alloc_context *ac)
+{
+	OCFS2_I(dir)->ip_last_used_group = ac->ac_last_group;
+	OCFS2_I(dir)->ip_last_used_slot = ac->ac_alloc_slot;
+}
+
 int ocfs2_claim_new_inode(struct ocfs2_super *osb,
 			  handle_t *handle,
+			  struct inode *dir,
+			  struct buffer_head *parent_fe_bh,
 			  struct ocfs2_alloc_context *ac,
 			  u16 *suballoc_bit,
 			  u64 *fe_blkno)
@@ -1635,6 +1668,8 @@ int ocfs2_claim_new_inode(struct ocfs2_super *osb,
 	BUG_ON(ac->ac_bits_wanted != 1);
 	BUG_ON(ac->ac_which != OCFS2_AC_USE_INODE);
 
+	ocfs2_init_inode_ac_group(dir, parent_fe_bh, ac);
+
 	status = ocfs2_claim_suballoc_bits(osb,
 					   ac,
 					   handle,
@@ -1653,6 +1688,7 @@ int ocfs2_claim_new_inode(struct ocfs2_super *osb,
 
 	*fe_blkno = bg_blkno + (u64) (*suballoc_bit);
 	ac->ac_bits_given++;
+	ocfs2_save_inode_ac_group(dir, ac);
 	status = 0;
 bail:
 	mlog_exit(status);
diff --git a/fs/ocfs2/suballoc.h b/fs/ocfs2/suballoc.h
index e3c13c7..ea85a4c 100644
--- a/fs/ocfs2/suballoc.h
+++ b/fs/ocfs2/suballoc.h
@@ -88,6 +88,8 @@ int ocfs2_claim_metadata(struct ocfs2_super *osb,
 			 u64 *blkno_start);
 int ocfs2_claim_new_inode(struct ocfs2_super *osb,
 			  handle_t *handle,
+			  struct inode *dir,
+			  struct buffer_head *parent_fe_bh,
 			  struct ocfs2_alloc_context *ac,
 			  u16 *suballoc_bit,
 			  u64 *fe_blkno);
-- 
1.5.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Ocfs2-devel] [PATCH 2/3] ocfs2: Allocate inode groups from global_bitmap.
  2009-01-15 21:58 [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2 Tao Ma
  2009-01-15 22:00 ` [Ocfs2-devel] [PATCH 1/3] ocfs2: Optimize inode allocation by remembering last group Tao Ma
@ 2009-01-15 22:00 ` Tao Ma
  2009-01-15 22:00 ` [Ocfs2-devel] [PATCH 3/3] ocfs2: Optimize inode group allocation by recording last used group Tao Ma
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Tao Ma @ 2009-01-15 22:00 UTC (permalink / raw)
  To: ocfs2-devel

Inode groups used to be allocated from local alloc file,
but since we want all inodes to be contiguous enough, we
will try to allocate them directly from global_bitmap.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
---
 fs/ocfs2/suballoc.c |   29 +++++++++++++++++++----------
 1 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index 487f00c..b7a065e 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -48,7 +48,8 @@
 #include "buffer_head_io.h"
 
 #define NOT_ALLOC_NEW_GROUP		0
-#define ALLOC_NEW_GROUP			1
+#define ALLOC_NEW_GROUP			0x1
+#define ALLOC_GROUPS_FROM_GLOBAL	0x2
 
 #define OCFS2_MAX_INODES_TO_STEAL	1024
 
@@ -64,7 +65,8 @@ static int ocfs2_block_group_fill(handle_t *handle,
 static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
 				   struct inode *alloc_inode,
 				   struct buffer_head *bh,
-				   u64 max_block);
+				   u64 max_block,
+				   int flags);
 
 static int ocfs2_cluster_group_search(struct inode *inode,
 				      struct buffer_head *group_bh,
@@ -116,6 +118,7 @@ static inline void ocfs2_block_to_cluster_group(struct inode *inode,
 						u16 *bg_bit_off);
 static int ocfs2_reserve_clusters_with_limit(struct ocfs2_super *osb,
 					     u32 bits_wanted, u64 max_block,
+					     int flags,
 					     struct ocfs2_alloc_context **ac);
 
 void ocfs2_free_ac_resource(struct ocfs2_alloc_context *ac)
@@ -403,7 +406,8 @@ static inline u16 ocfs2_find_smallest_chain(struct ocfs2_chain_list *cl)
 static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
 				   struct inode *alloc_inode,
 				   struct buffer_head *bh,
-				   u64 max_block)
+				   u64 max_block,
+				   int flags)
 {
 	int status, credits;
 	struct ocfs2_dinode *fe = (struct ocfs2_dinode *) bh->b_data;
@@ -423,7 +427,7 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
 	cl = &fe->id2.i_chain;
 	status = ocfs2_reserve_clusters_with_limit(osb,
 						   le16_to_cpu(cl->cl_cpg),
-						   max_block, &ac);
+						   max_block, flags, &ac);
 	if (status < 0) {
 		if (status != -ENOSPC)
 			mlog_errno(status);
@@ -531,7 +535,7 @@ static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb,
 				       struct ocfs2_alloc_context *ac,
 				       int type,
 				       u32 slot,
-				       int alloc_new_group)
+				       int flags)
 {
 	int status;
 	u32 bits_wanted = ac->ac_bits_wanted;
@@ -587,7 +591,7 @@ static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb,
 			goto bail;
 		}
 
-		if (alloc_new_group != ALLOC_NEW_GROUP) {
+		if (!(flags & ALLOC_NEW_GROUP)) {
 			mlog(0, "Alloc File %u Full: wanted=%u, free_bits=%u, "
 			     "and we don't alloc a new group for it.\n",
 			     slot, bits_wanted, free_bits);
@@ -596,7 +600,7 @@ static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb,
 		}
 
 		status = ocfs2_block_group_alloc(osb, alloc_inode, bh,
-						 ac->ac_max_block);
+						 ac->ac_max_block, flags);
 		if (status < 0) {
 			if (status != -ENOSPC)
 				mlog_errno(status);
@@ -740,7 +744,9 @@ int ocfs2_reserve_new_inode(struct ocfs2_super *osb,
 	atomic_set(&osb->s_num_inodes_stolen, 0);
 	status = ocfs2_reserve_suballoc_bits(osb, *ac,
 					     INODE_ALLOC_SYSTEM_INODE,
-					     osb->slot_num, ALLOC_NEW_GROUP);
+					     osb->slot_num,
+					     ALLOC_NEW_GROUP |
+					     ALLOC_GROUPS_FROM_GLOBAL);
 	if (status >= 0) {
 		status = 0;
 
@@ -806,6 +812,7 @@ bail:
  * things a bit. */
 static int ocfs2_reserve_clusters_with_limit(struct ocfs2_super *osb,
 					     u32 bits_wanted, u64 max_block,
+					     int flags,
 					     struct ocfs2_alloc_context **ac)
 {
 	int status;
@@ -823,7 +830,8 @@ static int ocfs2_reserve_clusters_with_limit(struct ocfs2_super *osb,
 	(*ac)->ac_max_block = max_block;
 
 	status = -ENOSPC;
-	if (ocfs2_alloc_should_use_local(osb, bits_wanted)) {
+	if (!(flags & ALLOC_GROUPS_FROM_GLOBAL) &&
+	    ocfs2_alloc_should_use_local(osb, bits_wanted)) {
 		status = ocfs2_reserve_local_alloc_bits(osb,
 							bits_wanted,
 							*ac);
@@ -861,7 +869,8 @@ int ocfs2_reserve_clusters(struct ocfs2_super *osb,
 			   u32 bits_wanted,
 			   struct ocfs2_alloc_context **ac)
 {
-	return ocfs2_reserve_clusters_with_limit(osb, bits_wanted, 0, ac);
+	return ocfs2_reserve_clusters_with_limit(osb, bits_wanted, 0,
+						 ALLOC_NEW_GROUP, ac);
 }
 
 /*
-- 
1.5.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Ocfs2-devel] [PATCH 3/3] ocfs2: Optimize inode group allocation by recording last used group.
  2009-01-15 21:58 [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2 Tao Ma
  2009-01-15 22:00 ` [Ocfs2-devel] [PATCH 1/3] ocfs2: Optimize inode allocation by remembering last group Tao Ma
  2009-01-15 22:00 ` [Ocfs2-devel] [PATCH 2/3] ocfs2: Allocate inode groups from global_bitmap Tao Ma
@ 2009-01-15 22:00 ` Tao Ma
  2009-01-16  8:05 ` [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2 tristan.ye
  2009-02-13  2:42 ` tristan.ye
  4 siblings, 0 replies; 17+ messages in thread
From: Tao Ma @ 2009-01-15 22:00 UTC (permalink / raw)
  To: ocfs2-devel

In ocfs2, the block group search looks for the "emptiest" group
to allocate from. So if the allocator has many equally(or almost
equally) empty groups, new block group will tend to get spread
out amongst them.

So we add osb_inode_alloc_group in ocfs2_super to record the last
used inode allocation group.
For more details, please see
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.

I have done some basic test and the results are cool.
1. single node test:
first column is the result without inode allocation patches, and the
second one with inode allocation patched enabled. You see we have
great improvement with the second "ls -lR".

echo 'y'|mkfs.ocfs2 -b 4K -C 4K -M local /dev/sda11

mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time tar jxvf /home/taoma/linux-2.6.28.tar.bz2 -C /mnt/ocfs2/ 1>/dev/null

real	0m20.548s 0m20.106s

umount /mnt/ocfs2/
echo 2 > /proc/sys/vm/drop_caches
mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time ls -lR /mnt/ocfs2/ 1>/dev/null

real	0m13.965s 0m13.766s

umount /mnt/ocfs2/
echo 2 > /proc/sys/vm/drop_caches
mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time rm /mnt/ocfs2/linux-2.6.28/ -rf

real	0m13.198s 0m13.091s

umount /mnt/ocfs2/
echo 2 > /proc/sys/vm/drop_caches
mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time tar jxvf /home/taoma/linux-2.6.28.tar.bz2 -C /mnt/ocfs2/ 1>/dev/null

real	0m23.022s 0m21.360s

umount /mnt/ocfs2/
echo 2 > /proc/sys/vm/drop_caches
mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
time ls -lR /mnt/ocfs2/ 1>/dev/null

real	2m45.189s 0m15.019s(yes, that is it. :) )

2. Tested with 4 nodes(megabyte switch for both cross-node
communication and iscsi), with the same command sequence(using
openmpi to run the command simultaneously). Although we spend
a lot of time in cross-node communication, we still have some
performance improvement.

the 1st tar:
real	356.22s  357.70s

the 1st ls -lR:
real	187.33s  187.32s

the rm:
real	260.68s  262.42s

the 2nd tar:
real	371.92s  358.47s

the 2nd ls:
real	197.16s  188.36s

Signed-off-by: Tao Ma <tao.ma@oracle.com>
---
 fs/ocfs2/ocfs2.h    |    3 +++
 fs/ocfs2/suballoc.c |   32 ++++++++++++++++++++++++++++----
 2 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index ad5c24a..f0377bd 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -335,6 +335,9 @@ struct ocfs2_super
 	struct ocfs2_node_map		osb_recovering_orphan_dirs;
 	unsigned int			*osb_orphan_wipes;
 	wait_queue_head_t		osb_wipe_event;
+
+	/* the group we used to allocate inodes. */
+	u64				osb_inode_alloc_group;
 };
 
 #define OCFS2_SB(sb)	    ((struct ocfs2_super *)(sb)->s_fs_info)
diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index b7a065e..4c1399c 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -66,6 +66,7 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
 				   struct inode *alloc_inode,
 				   struct buffer_head *bh,
 				   u64 max_block,
+				   u64 *last_alloc_group,
 				   int flags);
 
 static int ocfs2_cluster_group_search(struct inode *inode,
@@ -407,6 +408,7 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
 				   struct inode *alloc_inode,
 				   struct buffer_head *bh,
 				   u64 max_block,
+				   u64 *last_alloc_group,
 				   int flags)
 {
 	int status, credits;
@@ -444,6 +446,11 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
 		goto bail;
 	}
 
+	if (last_alloc_group && *last_alloc_group != 0) {
+		mlog(0, "use old allocation group %llu for block group alloc\n",
+		     (unsigned long long)*last_alloc_group);
+		ac->ac_last_group = *last_alloc_group;
+	}
 	status = ocfs2_claim_clusters(osb,
 				      handle,
 				      ac,
@@ -518,6 +525,11 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb,
 	alloc_inode->i_blocks = ocfs2_inode_sector_count(alloc_inode);
 
 	status = 0;
+
+	/* save the new last alloc group so that the caller can cache it. */
+	if (last_alloc_group)
+		*last_alloc_group = ac->ac_last_group;
+
 bail:
 	if (handle)
 		ocfs2_commit_trans(osb, handle);
@@ -535,6 +547,7 @@ static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb,
 				       struct ocfs2_alloc_context *ac,
 				       int type,
 				       u32 slot,
+				       u64 *last_alloc_group,
 				       int flags)
 {
 	int status;
@@ -600,7 +613,8 @@ static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb,
 		}
 
 		status = ocfs2_block_group_alloc(osb, alloc_inode, bh,
-						 ac->ac_max_block, flags);
+						 ac->ac_max_block,
+						 last_alloc_group, flags);
 		if (status < 0) {
 			if (status != -ENOSPC)
 				mlog_errno(status);
@@ -644,7 +658,7 @@ int ocfs2_reserve_new_metadata_blocks(struct ocfs2_super *osb,
 
 	status = ocfs2_reserve_suballoc_bits(osb, (*ac),
 					     EXTENT_ALLOC_SYSTEM_INODE,
-					     slot, ALLOC_NEW_GROUP);
+					     slot, NULL, ALLOC_NEW_GROUP);
 	if (status < 0) {
 		if (status != -ENOSPC)
 			mlog_errno(status);
@@ -690,7 +704,8 @@ static int ocfs2_steal_inode_from_other_nodes(struct ocfs2_super *osb,
 
 		status = ocfs2_reserve_suballoc_bits(osb, ac,
 						     INODE_ALLOC_SYSTEM_INODE,
-						     slot, NOT_ALLOC_NEW_GROUP);
+						     slot, NULL,
+						     NOT_ALLOC_NEW_GROUP);
 		if (status >= 0) {
 			ocfs2_set_inode_steal_slot(osb, slot);
 			break;
@@ -707,6 +722,7 @@ int ocfs2_reserve_new_inode(struct ocfs2_super *osb,
 {
 	int status;
 	s16 slot = ocfs2_get_inode_steal_slot(osb);
+	u64 alloc_group;
 
 	*ac = kzalloc(sizeof(struct ocfs2_alloc_context), GFP_KERNEL);
 	if (!(*ac)) {
@@ -742,14 +758,22 @@ int ocfs2_reserve_new_inode(struct ocfs2_super *osb,
 		goto inode_steal;
 
 	atomic_set(&osb->s_num_inodes_stolen, 0);
+	alloc_group = osb->osb_inode_alloc_group;
 	status = ocfs2_reserve_suballoc_bits(osb, *ac,
 					     INODE_ALLOC_SYSTEM_INODE,
 					     osb->slot_num,
+					     &alloc_group,
 					     ALLOC_NEW_GROUP |
 					     ALLOC_GROUPS_FROM_GLOBAL);
 	if (status >= 0) {
 		status = 0;
 
+		spin_lock(&osb->osb_lock);
+		osb->osb_inode_alloc_group = alloc_group;
+		spin_unlock(&osb->osb_lock);
+		mlog(0, "after reservation, new allocation group is "
+		     "%llu\n", (unsigned long long)alloc_group);
+
 		/*
 		 * Some inodes must be freed by us, so try to allocate
 		 * from our own next time.
@@ -796,7 +820,7 @@ int ocfs2_reserve_cluster_bitmap_bits(struct ocfs2_super *osb,
 
 	status = ocfs2_reserve_suballoc_bits(osb, ac,
 					     GLOBAL_BITMAP_SYSTEM_INODE,
-					     OCFS2_INVALID_SLOT,
+					     OCFS2_INVALID_SLOT, NULL,
 					     ALLOC_NEW_GROUP);
 	if (status < 0 && status != -ENOSPC) {
 		mlog_errno(status);
-- 
1.5.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2
  2009-01-15 21:58 [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2 Tao Ma
                   ` (2 preceding siblings ...)
  2009-01-15 22:00 ` [Ocfs2-devel] [PATCH 3/3] ocfs2: Optimize inode group allocation by recording last used group Tao Ma
@ 2009-01-16  8:05 ` tristan.ye
  2009-01-16  8:16   ` Tao Ma
  2009-02-13  2:42 ` tristan.ye
  4 siblings, 1 reply; 17+ messages in thread
From: tristan.ye @ 2009-01-16  8:05 UTC (permalink / raw)
  To: ocfs2-devel

On Fri, 2009-01-16 at 05:58 +0800, Tao Ma wrote:
> Changelog from V1 to V2:
> 1. Modify some codes according to Mark's advice.
> 2. Attach some test statistics in the commit log of patch 3 and in
> this e-mail also. See below.
> 
> Hi all,
> 	In ocfs2, when we create a fresh file system and create inodes in it, 
> they are contiguous and good for readdir+stat. While if we delete all 
> the inodes and created again, the new inodes will get spread out and 
> that isn't what we need. The core problem here is that the inode block 
> search looks for the "emptiest" inode group to allocate from. So if an 
> inode alloc file has many equally (or almost equally) empty groups, new 
> inodes will tend to get spread out amongst them, which in turn can put 
> them all over the disk. This is undesirable because directory operations 
> on conceptually "nearby" inodes force a large number of seeks. For more 
> details, please see 
> http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy. 
> 
> So this patch set try to fix this problem.
> patch 1: Optimize inode allocation by remembering last group.
> We add ip_last_used_group in core directory inodes which records
> the last used allocation group. Another field named ip_last_used_slot
> is also added in case inode stealing happens. When claiming new inode,
> we passed in directory's inode so that the allocation can use this
> information.
> 
> patch 2: let the Inode group allocs use the global bitmap directly.
> 
> patch 3: we add osb_last_alloc_group in ocfs2_super to record the last
> used allocation group so that we can make inode groups contiguous enough.
> 
> I have done some basic test and the results are cool.
> 1. single node test:
> first column is the result without inode allocation patches, and the
> second one with inode allocation patched enabled. You see we have
> great improvement with the second "ls -lR".
> 
> echo 'y'|mkfs.ocfs2 -b 4K -C 4K -M local /dev/sda11
> 
> mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
> time tar jxvf /home/taoma/linux-2.6.28.tar.bz2 -C /mnt/ocfs2/ 1>/dev/null
> 
> real	0m20.548s 0m20.106s
> 
> umount /mnt/ocfs2/
> echo 2 > /proc/sys/vm/drop_caches
> mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
> time ls -lR /mnt/ocfs2/ 1>/dev/null
> 
> real	0m13.965s 0m13.766s
> 
> umount /mnt/ocfs2/
> echo 2 > /proc/sys/vm/drop_caches
> mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
> time rm /mnt/ocfs2/linux-2.6.28/ -rf
> 
> real	0m13.198s 0m13.091s
> 
> umount /mnt/ocfs2/
> echo 2 > /proc/sys/vm/drop_caches
> mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
> time tar jxvf /home/taoma/linux-2.6.28.tar.bz2 -C /mnt/ocfs2/ 1>/dev/null
> 
> real	0m23.022s 0m21.360s
> 
> umount /mnt/ocfs2/
> echo 2 > /proc/sys/vm/drop_caches
> mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
> time ls -lR /mnt/ocfs2/ 1>/dev/null
> 
> real	2m45.189s 0m15.019s 
> yes, that is it. ;) I don't know we can improve so much when I start up.

Tao,

I'm wondering why the 1st 'ls -lR' did not show us such a huge
enhancement, are the system load(by uptime) simliar when doing your 2rd
'ls -lR' contrast tests? if so, that's a really significant
gain!!!!:-),great congs!

To get more persuasive testing results, i suggest you do the same tests
by considerable times,and then a average statistic results should be
more attractive to us:-), and it also minimize the influence of some
exceptional system loads:-)

Tristan


> 
> 2. Tested with 4 nodes(megabyte switch for both cross-node
> communication and iscsi), with the same command sequence(using
> openmpi to run the command simultaneously). Although we spend
> a lot of time in cross-node communication, we still have some
> performance improvement.
> 
> the 1st tar:
> real	356.22s  357.70s
> 
> the 1st ls -lR:
> real	187.33s  187.32s
> 
> the rm:
> real	260.68s  262.42s
> 
> the 2nd tar:
> real	371.92s  358.47s
> 
> the 2nd ls:
> real	197.16s  188.36s
> 
> Regards,
> Tao
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2
  2009-01-16  8:05 ` [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2 tristan.ye
@ 2009-01-16  8:16   ` Tao Ma
  2009-01-16 16:08     ` tristan.ye
  0 siblings, 1 reply; 17+ messages in thread
From: Tao Ma @ 2009-01-16  8:16 UTC (permalink / raw)
  To: ocfs2-devel



tristan.ye wrote:
> On Fri, 2009-01-16 at 05:58 +0800, Tao Ma wrote:
>> Changelog from V1 to V2:
>> 1. Modify some codes according to Mark's advice.
>> 2. Attach some test statistics in the commit log of patch 3 and in
>> this e-mail also. See below.
>>
>> Hi all,
>> 	In ocfs2, when we create a fresh file system and create inodes in it, 
>> they are contiguous and good for readdir+stat. While if we delete all 
>> the inodes and created again, the new inodes will get spread out and 
>> that isn't what we need. The core problem here is that the inode block 
>> search looks for the "emptiest" inode group to allocate from. So if an 
>> inode alloc file has many equally (or almost equally) empty groups, new 
>> inodes will tend to get spread out amongst them, which in turn can put 
>> them all over the disk. This is undesirable because directory operations 
>> on conceptually "nearby" inodes force a large number of seeks. For more 
>> details, please see 
>> http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy. 
<snip>
>> echo 'y'|mkfs.ocfs2 -b 4K -C 4K -M local /dev/sda11
>>
>> mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
>> time tar jxvf /home/taoma/linux-2.6.28.tar.bz2 -C /mnt/ocfs2/ 1>/dev/null
>>
>> real	0m20.548s 0m20.106s
>>
>> umount /mnt/ocfs2/
>> echo 2 > /proc/sys/vm/drop_caches
>> mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
>> time ls -lR /mnt/ocfs2/ 1>/dev/null
>>
>> real	0m13.965s 0m13.766s
>>
>> umount /mnt/ocfs2/
>> echo 2 > /proc/sys/vm/drop_caches
>> mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
>> time rm /mnt/ocfs2/linux-2.6.28/ -rf
>>
>> real	0m13.198s 0m13.091s
>>
>> umount /mnt/ocfs2/
>> echo 2 > /proc/sys/vm/drop_caches
>> mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
>> time tar jxvf /home/taoma/linux-2.6.28.tar.bz2 -C /mnt/ocfs2/ 1>/dev/null
>>
>> real	0m23.022s 0m21.360s
>>
>> umount /mnt/ocfs2/
>> echo 2 > /proc/sys/vm/drop_caches
>> mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
>> time ls -lR /mnt/ocfs2/ 1>/dev/null
>>
>> real	2m45.189s 0m15.019s 
>> yes, that is it. ;) I don't know we can improve so much when I start up.
> 
> Tao,
> 
> I'm wondering why the 1st 'ls -lR' did not show us such a huge
> enhancement, are the system load(by uptime) simliar when doing your 2rd
> 'ls -lR' contrast tests? if so, that's a really significant
> gain!!!!:-),great congs!
Because when we do the 1st 'ls -lR', the inodes are almost contiguous. 
So the read is very fast. But with the 2nd 'ls -lR' because the old '2nd 
tar' spread inodes, so we have a poor performance. See
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy 
for more details.
> 
> To get more persuasive testing results, i suggest you do the same tests
> by considerable times,and then a average statistic results should be
> more attractive to us:-), and it also minimize the influence of some
> exceptional system loads:-)
I don't have that many times to do a large number of tests. ;) Actually 
I only run my test cases about 2~3 times and give the average time. btw, 
I have left test env there, if you are interested, you can run it as you 
wish and give us a complete test result. :)

Regards,
Tao

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2
  2009-01-16  8:16   ` Tao Ma
@ 2009-01-16 16:08     ` tristan.ye
  2009-01-18  8:58       ` Tao Ma
  0 siblings, 1 reply; 17+ messages in thread
From: tristan.ye @ 2009-01-16 16:08 UTC (permalink / raw)
  To: ocfs2-devel

On Fri, 2009-01-16 at 16:16 +0800, Tao Ma wrote:
> 
> tristan.ye wrote:
> > On Fri, 2009-01-16 at 05:58 +0800, Tao Ma wrote:
> >> Changelog from V1 to V2:
> >> 1. Modify some codes according to Mark's advice.
> >> 2. Attach some test statistics in the commit log of patch 3 and in
> >> this e-mail also. See below.
> >>
> >> Hi all,
> >> 	In ocfs2, when we create a fresh file system and create inodes in it, 
> >> they are contiguous and good for readdir+stat. While if we delete all 
> >> the inodes and created again, the new inodes will get spread out and 
> >> that isn't what we need. The core problem here is that the inode block 
> >> search looks for the "emptiest" inode group to allocate from. So if an 
> >> inode alloc file has many equally (or almost equally) empty groups, new 
> >> inodes will tend to get spread out amongst them, which in turn can put 
> >> them all over the disk. This is undesirable because directory operations 
> >> on conceptually "nearby" inodes force a large number of seeks. For more 
> >> details, please see 
> >> http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy. 
> <snip>
> >> echo 'y'|mkfs.ocfs2 -b 4K -C 4K -M local /dev/sda11
> >>
> >> mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
> >> time tar jxvf /home/taoma/linux-2.6.28.tar.bz2 -C /mnt/ocfs2/ 1>/dev/null
> >>
> >> real	0m20.548s 0m20.106s
> >>
> >> umount /mnt/ocfs2/
> >> echo 2 > /proc/sys/vm/drop_caches
> >> mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
> >> time ls -lR /mnt/ocfs2/ 1>/dev/null
> >>
> >> real	0m13.965s 0m13.766s
> >>
> >> umount /mnt/ocfs2/
> >> echo 2 > /proc/sys/vm/drop_caches
> >> mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
> >> time rm /mnt/ocfs2/linux-2.6.28/ -rf
> >>
> >> real	0m13.198s 0m13.091s
> >>
> >> umount /mnt/ocfs2/
> >> echo 2 > /proc/sys/vm/drop_caches
> >> mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
> >> time tar jxvf /home/taoma/linux-2.6.28.tar.bz2 -C /mnt/ocfs2/ 1>/dev/null
> >>
> >> real	0m23.022s 0m21.360s
> >>
> >> umount /mnt/ocfs2/
> >> echo 2 > /proc/sys/vm/drop_caches
> >> mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
> >> time ls -lR /mnt/ocfs2/ 1>/dev/null
> >>
> >> real	2m45.189s 0m15.019s 
> >> yes, that is it. ;) I don't know we can improve so much when I start up.
> > 
> > Tao,
> > 
> > I'm wondering why the 1st 'ls -lR' did not show us such a huge
> > enhancement, are the system load(by uptime) simliar when doing your 2rd
> > 'ls -lR' contrast tests? if so, that's a really significant
> > gain!!!!:-),great congs!
> Because when we do the 1st 'ls -lR', the inodes are almost contiguous. 
> So the read is very fast. But with the 2nd 'ls -lR' because the old '2nd 
> tar' spread inodes, so we have a poor performance. See
> http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy 
> for more details.
> > 
> > To get more persuasive testing results, i suggest you do the same tests
> > by considerable times,and then a average statistic results should be
> > more attractive to us:-), and it also minimize the influence of some
> > exceptional system loads:-)
> I don't have that many times to do a large number of tests. ;) Actually 
> I only run my test cases about 2~3 times and give the average time. btw, 
> I have left test env there, if you are interested, you can run it as you 
> wish and give us a complete test result. :)
Tao,

I've done 10 times tests with single-node testcase repeatly, following
is a average statistic reports
=============== Tests with 10 times iteration================ 

1st 'Tar xjvf' result:

Average real time with 10 times: 
Original kernel                            kernel with enhanced patches
 0m 43.578s                                       0m 49.355s

1st 'ls -lR' result:
Average real time with 10 times: 
Original kernel                            kernel with enhanced patches
 0m 23.622s                                        0m 23.508s

1st 'rm -rf' result:
Average real time with 10 times: 
Original kernel                            kernel with enhanced patches
 0m 57.039s                                       0m 58.612s

2rd 'Tar xjvf' result:
Average real time with 10 times: 
Original kernel                            kernel with enhanced patches
 0m 49.550s                                       0m 52.214s

2rd 'ls -lR' result:
Average real time with 10 times: 
Original kernel                            kernel with enhanced patches

0m 23.591s                                       0m 23.487s

===============Tests end============================ 


From above tests, we really have had a speed-up performance gain when
traversing files  by 'ls -lR' against a kernel tree:),but seems also
encountered a performance lose when populating the files by 'tar xvjf'
according to the contrast tests.


Regards,

Tristan

> 
> Regards,
> Tao

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2
  2009-01-16 16:08     ` tristan.ye
@ 2009-01-18  8:58       ` Tao Ma
  2009-01-18 15:17         ` Sunil Mushran
  2009-01-18 15:18         ` Sunil Mushran
  0 siblings, 2 replies; 17+ messages in thread
From: Tao Ma @ 2009-01-18  8:58 UTC (permalink / raw)
  To: ocfs2-devel



tristan.ye wrote:
> On Fri, 2009-01-16 at 16:16 +0800, Tao Ma wrote:
>   
>> tristan.ye wrote:
>>
> Tao,
>
> I've done 10 times tests with single-node testcase repeatly, following
> is a average statistic reports
> =============== Tests with 10 times iteration================ 
>
> 1st 'Tar xjvf' result:
>
> Average real time with 10 times: 
> Original kernel                            kernel with enhanced patches
>  0m 43.578s                                       0m 49.355s
>
> 1st 'ls -lR' result:
> Average real time with 10 times: 
> Original kernel                            kernel with enhanced patches
>  0m 23.622s                                        0m 23.508s
>
> 1st 'rm -rf' result:
> Average real time with 10 times: 
> Original kernel                            kernel with enhanced patches
>  0m 57.039s                                       0m 58.612s
>
> 2rd 'Tar xjvf' result:
> Average real time with 10 times: 
> Original kernel                            kernel with enhanced patches
>  0m 49.550s                                       0m 52.214s
>
> 2rd 'ls -lR' result:
> Average real time with 10 times: 
> Original kernel                            kernel with enhanced patches
>
> 0m 23.591s                                       0m 23.487s
>
> ===============Tests end============================ 
>
>
> >From above tests, we really have had a speed-up performance gain when
> traversing files  by 'ls -lR' against a kernel tree:),but seems also
> encountered a performance lose when populating the files by 'tar xvjf'
> according to the contrast tests.
>   
I am just a little confused with your test result. Especially the last one.
from the statistics, it looks that there is almost no performance gain 
comparing 0m23.591s with 0m 23.487s.
But I see >2mins every time. So are you sure of it?
anyway, thanks for your test and I will discuss it with you later.

Regards,
Tao

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2
  2009-01-18  8:58       ` Tao Ma
@ 2009-01-18 15:17         ` Sunil Mushran
  2009-01-19  7:07           ` tristan.ye
  2009-01-18 15:18         ` Sunil Mushran
  1 sibling, 1 reply; 17+ messages in thread
From: Sunil Mushran @ 2009-01-18 15:17 UTC (permalink / raw)
  To: ocfs2-devel

How big is this disk? Maybe one kernel tree untar is not be enough to  
expose the original issue. Also, use ls -i and/or debugfs to see if  
the inodes have some locality.

On Jan 18, 2009, at 12:58 AM, Tao Ma <tao.ma@oracle.com> wrote:

>
>
> tristan.ye wrote:
>> On Fri, 2009-01-16 at 16:16 +0800, Tao Ma wrote:
>>
>>> tristan.ye wrote:
>>>
>> Tao,
>>
>> I've done 10 times tests with single-node testcase repeatly,  
>> following
>> is a average statistic reports
>> =============== Tests with 10 times iteration================
>>
>> 1st 'Tar xjvf' result:
>>
>> Average real time with 10 times:
>> Original kernel                            kernel with enhanced  
>> patches
>> 0m 43.578s                                       0m 49.355s
>>
>> 1st 'ls -lR' result:
>> Average real time with 10 times:
>> Original kernel                            kernel with enhanced  
>> patches
>> 0m 23.622s                                        0m 23.508s
>>
>> 1st 'rm -rf' result:
>> Average real time with 10 times:
>> Original kernel                            kernel with enhanced  
>> patches
>> 0m 57.039s                                       0m 58.612s
>>
>> 2rd 'Tar xjvf' result:
>> Average real time with 10 times:
>> Original kernel                            kernel with enhanced  
>> patches
>> 0m 49.550s                                       0m 52.214s
>>
>> 2rd 'ls -lR' result:
>> Average real time with 10 times:
>> Original kernel                            kernel with enhanced  
>> patches
>>
>> 0m 23.591s                                       0m 23.487s
>>
>> ===============Tests end============================
>>
>>
>>> From above tests, we really have had a speed-up performance gain  
>>> when
>> traversing files  by 'ls -lR' against a kernel tree:),but seems also
>> encountered a performance lose when populating the files by 'tar  
>> xvjf'
>> according to the contrast tests.
>>
> I am just a little confused with your test result. Especially the  
> last one.
> from the statistics, it looks that there is almost no performance gain
> comparing 0m23.591s with 0m 23.487s.
> But I see >2mins every time. So are you sure of it?
> anyway, thanks for your test and I will discuss it with you later.
>
> Regards,
> Tao
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2
  2009-01-18  8:58       ` Tao Ma
  2009-01-18 15:17         ` Sunil Mushran
@ 2009-01-18 15:18         ` Sunil Mushran
  1 sibling, 0 replies; 17+ messages in thread
From: Sunil Mushran @ 2009-01-18 15:18 UTC (permalink / raw)
  To: ocfs2-devel

Oh btw, we have Monday off in hq.

On Jan 18, 2009, at 12:58 AM, Tao Ma <tao.ma@oracle.com> wrote:

>
>
> tristan.ye wrote:
>> On Fri, 2009-01-16 at 16:16 +0800, Tao Ma wrote:
>>
>>> tristan.ye wrote:
>>>
>> Tao,
>>
>> I've done 10 times tests with single-node testcase repeatly,  
>> following
>> is a average statistic reports
>> =============== Tests with 10 times iteration================
>>
>> 1st 'Tar xjvf' result:
>>
>> Average real time with 10 times:
>> Original kernel                            kernel with enhanced  
>> patches
>> 0m 43.578s                                       0m 49.355s
>>
>> 1st 'ls -lR' result:
>> Average real time with 10 times:
>> Original kernel                            kernel with enhanced  
>> patches
>> 0m 23.622s                                        0m 23.508s
>>
>> 1st 'rm -rf' result:
>> Average real time with 10 times:
>> Original kernel                            kernel with enhanced  
>> patches
>> 0m 57.039s                                       0m 58.612s
>>
>> 2rd 'Tar xjvf' result:
>> Average real time with 10 times:
>> Original kernel                            kernel with enhanced  
>> patches
>> 0m 49.550s                                       0m 52.214s
>>
>> 2rd 'ls -lR' result:
>> Average real time with 10 times:
>> Original kernel                            kernel with enhanced  
>> patches
>>
>> 0m 23.591s                                       0m 23.487s
>>
>> ===============Tests end============================
>>
>>
>>> From above tests, we really have had a speed-up performance gain  
>>> when
>> traversing files  by 'ls -lR' against a kernel tree:),but seems also
>> encountered a performance lose when populating the files by 'tar  
>> xvjf'
>> according to the contrast tests.
>>
> I am just a little confused with your test result. Especially the  
> last one.
> from the statistics, it looks that there is almost no performance gain
> comparing 0m23.591s with 0m 23.487s.
> But I see >2mins every time. So are you sure of it?
> anyway, thanks for your test and I will discuss it with you later.
>
> Regards,
> Tao
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2
  2009-01-18 15:17         ` Sunil Mushran
@ 2009-01-19  7:07           ` tristan.ye
  2009-01-19  7:15             ` Tao Ma
  0 siblings, 1 reply; 17+ messages in thread
From: tristan.ye @ 2009-01-19  7:07 UTC (permalink / raw)
  To: ocfs2-devel

On Sun, 2009-01-18 at 07:17 -0800, Sunil Mushran wrote:
> How big is this disk? Maybe one kernel tree untar is not be enough to  
> expose the original issue. Also, use ls -i and/or debugfs to see if  
> the inodes have some locality.

Tao, sunil

I've tested it out with more attempts, it may have something to do with
the disk size. but the root pause why i did not get a excited speed-up
performance gain was due to the iscsi-target(iscsi server) cache. the
operation in your testcase which was meant for cache dropping is aimed
at client side(a 'echo 2>/proc/sys/vm/drop_cache' only flush the fs's
cache for iscsi initor(isci client)). 

Tao,

You can verify this by pausing the tests at the right point before we
start '2rd ls -lR', then flush the iscsi-target's cache by 'service
iscsi-target restart'(there may be a more graceful way to do this),after
this done, resume the tests, then you'll find the the realtime it
consumed will be up to 3 mins around:)

Btw, i really saw lots of locality by 'ls -li' for inodes under a same
dir, take /mnt/ocfs2/linux-2.6.28/include/linux for instance, almost all
of its inodes are contiguous one by one regarding to its inode number.


Regards,
Tristan

> 
> On Jan 18, 2009, at 12:58 AM, Tao Ma <tao.ma@oracle.com> wrote:
> 
> >
> >
> > tristan.ye wrote:
> >> On Fri, 2009-01-16 at 16:16 +0800, Tao Ma wrote:
> >>
> >>> tristan.ye wrote:
> >>>
> >> Tao,
> >>
> >> I've done 10 times tests with single-node testcase repeatly,  
> >> following
> >> is a average statistic reports
> >> =============== Tests with 10 times iteration================
> >>
> >> 1st 'Tar xjvf' result:
> >>
> >> Average real time with 10 times:
> >> Original kernel                            kernel with enhanced  
> >> patches
> >> 0m 43.578s                                       0m 49.355s
> >>
> >> 1st 'ls -lR' result:
> >> Average real time with 10 times:
> >> Original kernel                            kernel with enhanced  
> >> patches
> >> 0m 23.622s                                        0m 23.508s
> >>
> >> 1st 'rm -rf' result:
> >> Average real time with 10 times:
> >> Original kernel                            kernel with enhanced  
> >> patches
> >> 0m 57.039s                                       0m 58.612s
> >>
> >> 2rd 'Tar xjvf' result:
> >> Average real time with 10 times:
> >> Original kernel                            kernel with enhanced  
> >> patches
> >> 0m 49.550s                                       0m 52.214s
> >>
> >> 2rd 'ls -lR' result:
> >> Average real time with 10 times:
> >> Original kernel                            kernel with enhanced  
> >> patches
> >>
> >> 0m 23.591s                                       0m 23.487s
> >>
> >> ===============Tests end============================
> >>
> >>
> >>> From above tests, we really have had a speed-up performance gain  
> >>> when
> >> traversing files  by 'ls -lR' against a kernel tree:),but seems also
> >> encountered a performance lose when populating the files by 'tar  
> >> xvjf'
> >> according to the contrast tests.
> >>
> > I am just a little confused with your test result. Especially the  
> > last one.
> > from the statistics, it looks that there is almost no performance gain
> > comparing 0m23.591s with 0m 23.487s.
> > But I see >2mins every time. So are you sure of it?
> > anyway, thanks for your test and I will discuss it with you later.
> >
> > Regards,
> > Tao
> >
> > _______________________________________________
> > Ocfs2-devel mailing list
> > Ocfs2-devel at oss.oracle.com
> > http://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2
  2009-01-19  7:07           ` tristan.ye
@ 2009-01-19  7:15             ` Tao Ma
  2009-01-19 12:57               ` tristan.ye
  0 siblings, 1 reply; 17+ messages in thread
From: Tao Ma @ 2009-01-19  7:15 UTC (permalink / raw)
  To: ocfs2-devel



tristan.ye wrote:
> On Sun, 2009-01-18 at 07:17 -0800, Sunil Mushran wrote:
>> How big is this disk? Maybe one kernel tree untar is not be enough to  
>> expose the original issue. Also, use ls -i and/or debugfs to see if  
>> the inodes have some locality.
> 
> Tao, sunil
> 
> I've tested it out with more attempts, it may have something to do with
> the disk size. but the root pause why i did not get a excited speed-up
> performance gain was due to the iscsi-target(iscsi server) cache. the
> operation in your testcase which was meant for cache dropping is aimed
> at client side(a 'echo 2>/proc/sys/vm/drop_cache' only flush the fs's
> cache for iscsi initor(isci client)). 
> 
> Tao,
> 
> You can verify this by pausing the tests at the right point before we
> start '2rd ls -lR', then flush the iscsi-target's cache by 'service
> iscsi-target restart'(there may be a more graceful way to do this),after
> this done, resume the tests, then you'll find the the realtime it
> consumed will be up to 3 mins around:)
cool, so I am very glad that you got the same result as mine. ;)
> 
> Btw, i really saw lots of locality by 'ls -li' for inodes under a same
> dir, take /mnt/ocfs2/linux-2.6.28/include/linux for instance, almost all
> of its inodes are contiguous one by one regarding to its inode number.
yeah, that is the desired behaviour with my 3 patches. :)

Then do you have any updated test statistics?

Regards,
Tao

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2
  2009-01-19  7:15             ` Tao Ma
@ 2009-01-19 12:57               ` tristan.ye
  2009-01-19 18:35                 ` Sunil Mushran
  0 siblings, 1 reply; 17+ messages in thread
From: tristan.ye @ 2009-01-19 12:57 UTC (permalink / raw)
  To: ocfs2-devel

On Mon, 2009-01-19 at 15:15 +0800, Tao Ma wrote:
> 
> tristan.ye wrote:
> > On Sun, 2009-01-18 at 07:17 -0800, Sunil Mushran wrote:
> >> How big is this disk? Maybe one kernel tree untar is not be enough to  
> >> expose the original issue. Also, use ls -i and/or debugfs to see if  
> >> the inodes have some locality.
> > 
> > Tao, sunil
> > 
> > I've tested it out with more attempts, it may have something to do with
> > the disk size. but the root pause why i did not get a excited speed-up
> > performance gain was due to the iscsi-target(iscsi server) cache. the
> > operation in your testcase which was meant for cache dropping is aimed
> > at client side(a 'echo 2>/proc/sys/vm/drop_cache' only flush the fs's
> > cache for iscsi initor(isci client)). 
> > 
> > Tao,
> > 
> > You can verify this by pausing the tests at the right point before we
> > start '2rd ls -lR', then flush the iscsi-target's cache by 'service
> > iscsi-target restart'(there may be a more graceful way to do this),after
> > this done, resume the tests, then you'll find the the realtime it
> > consumed will be up to 3 mins around:)
> cool, so I am very glad that you got the same result as mine. ;)
> > 
> > Btw, i really saw lots of locality by 'ls -li' for inodes under a same
> > dir, take /mnt/ocfs2/linux-2.6.28/include/linux for instance, almost all
> > of its inodes are contiguous one by one regarding to its inode number.
> yeah, that is the desired behaviour with my 3 patches. :)
> 
> Then do you have any updated test statistics?

Tao,

With iscsi-target cache flushed everytime before tests getting started,
following is the updated testing result:

Testing node:test7
Testing volume: iscsi sdd1

=============== Tests with 10 times iteration================ 

1st 'Tar xjvf' result:

Average real time with 10 times: 
Original kernel                            kernel with enhanced patches
 0m 22.468s                                       0m 23.472s

1st 'ls -lR' result:
Average real time with 10 times: 
Original kernel                            kernel with enhanced patches
 0m 30.682s                                        0m 30.414s

1st 'rm -rf' result:
Average real time with 10 times: 
Original kernel                            kernel with enhanced patches
 0m 1m5.715s                                       0m 1m3.835s

2rd 'Tar xjvf' result:
Average real time with 10 times: 
Original kernel                            kernel with enhanced patches
 0m 31.550s                                       0m 28.726s

2rd 'ls -lR' result:
Average real time with 10 times: 
Original kernel                            kernel with enhanced patches

0m 3m5.772s                                       0m 30.274s

===============Tests end============================ 

Glad to see your guy's patch has greatly improved the performance of
inodes traversing:),

Unfortunately, the 1st Tar testcase still get a performance
penality(around 1s) everytime.

I've kept the testing env on test7 for your verification.


Regards,

Tristan

> 
> Regards,
> Tao

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2
  2009-01-19 12:57               ` tristan.ye
@ 2009-01-19 18:35                 ` Sunil Mushran
  2009-01-20  2:34                   ` tristan.ye
  0 siblings, 1 reply; 17+ messages in thread
From: Sunil Mushran @ 2009-01-19 18:35 UTC (permalink / raw)
  To: ocfs2-devel

Good numbers. Did you a 2nd rm -rf too? It should show the largest  
improvement of all.

On Jan 19, 2009, at 4:57 AM, "tristan.ye" <tristan.ye@oracle.com> wrote:

> On Mon, 2009-01-19 at 15:15 +0800, Tao Ma wrote:
>>
>> tristan.ye wrote:
>>> On Sun, 2009-01-18 at 07:17 -0800, Sunil Mushran wrote:
>>>> How big is this disk? Maybe one kernel tree untar is not be  
>>>> enough to
>>>> expose the original issue. Also, use ls -i and/or debugfs to see if
>>>> the inodes have some locality.
>>>
>>> Tao, sunil
>>>
>>> I've tested it out with more attempts, it may have something to do  
>>> with
>>> the disk size. but the root pause why i did not get a excited  
>>> speed-up
>>> performance gain was due to the iscsi-target(iscsi server) cache.  
>>> the
>>> operation in your testcase which was meant for cache dropping is  
>>> aimed
>>> at client side(a 'echo 2>/proc/sys/vm/drop_cache' only flush the  
>>> fs's
>>> cache for iscsi initor(isci client)).
>>>
>>> Tao,
>>>
>>> You can verify this by pausing the tests at the right point before  
>>> we
>>> start '2rd ls -lR', then flush the iscsi-target's cache by 'service
>>> iscsi-target restart'(there may be a more graceful way to do  
>>> this),after
>>> this done, resume the tests, then you'll find the the realtime it
>>> consumed will be up to 3 mins around:)
>> cool, so I am very glad that you got the same result as mine. ;)
>>>
>>> Btw, i really saw lots of locality by 'ls -li' for inodes under a  
>>> same
>>> dir, take /mnt/ocfs2/linux-2.6.28/include/linux for instance,  
>>> almost all
>>> of its inodes are contiguous one by one regarding to its inode  
>>> number.
>> yeah, that is the desired behaviour with my 3 patches. :)
>>
>> Then do you have any updated test statistics?
>
> Tao,
>
> With iscsi-target cache flushed everytime before tests getting  
> started,
> following is the updated testing result:
>
> Testing node:test7
> Testing volume: iscsi sdd1
>
> =============== Tests with 10 times iteration================
>
> 1st 'Tar xjvf' result:
>
> Average real time with 10 times:
> Original kernel                            kernel with enhanced  
> patches
> 0m 22.468s                                       0m 23.472s
>
> 1st 'ls -lR' result:
> Average real time with 10 times:
> Original kernel                            kernel with enhanced  
> patches
> 0m 30.682s                                        0m 30.414s
>
> 1st 'rm -rf' result:
> Average real time with 10 times:
> Original kernel                            kernel with enhanced  
> patches
> 0m 1m5.715s                                       0m 1m3.835s
>
> 2rd 'Tar xjvf' result:
> Average real time with 10 times:
> Original kernel                            kernel with enhanced  
> patches
> 0m 31.550s                                       0m 28.726s
>
> 2rd 'ls -lR' result:
> Average real time with 10 times:
> Original kernel                            kernel with enhanced  
> patches
>
> 0m 3m5.772s                                       0m 30.274s
>
> ===============Tests end============================
>
> Glad to see your guy's patch has greatly improved the performance of
> inodes traversing:),
>
> Unfortunately, the 1st Tar testcase still get a performance
> penality(around 1s) everytime.
>
> I've kept the testing env on test7 for your verification.
>
>
> Regards,
>
> Tristan
>
>>
>> Regards,
>> Tao
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2
  2009-01-19 18:35                 ` Sunil Mushran
@ 2009-01-20  2:34                   ` tristan.ye
  2009-01-21  2:37                     ` Sunil Mushran
  0 siblings, 1 reply; 17+ messages in thread
From: tristan.ye @ 2009-01-20  2:34 UTC (permalink / raw)
  To: ocfs2-devel

On Mon, 2009-01-19 at 10:35 -0800, Sunil Mushran wrote:
> Good numbers. Did you a 2nd rm -rf too? It should show the largest  
> improvement of all.

Yes, fairly cool, we also got up to 3 mins performance gain during 2nd
rm -rf test:

=============== Tests with 10 times iteration================
 2nd 'rm -rf' result:

Average real time with 10 times: 
Original kernel                            kernel with enhanced patches
 4m3.425ss                                      0m59.423ss




Regards,
Tristan

> 
> On Jan 19, 2009, at 4:57 AM, "tristan.ye" <tristan.ye@oracle.com> wrote:
> 
> > On Mon, 2009-01-19 at 15:15 +0800, Tao Ma wrote:
> >>
> >> tristan.ye wrote:
> >>> On Sun, 2009-01-18 at 07:17 -0800, Sunil Mushran wrote:
> >>>> How big is this disk? Maybe one kernel tree untar is not be  
> >>>> enough to
> >>>> expose the original issue. Also, use ls -i and/or debugfs to see if
> >>>> the inodes have some locality.
> >>>
> >>> Tao, sunil
> >>>
> >>> I've tested it out with more attempts, it may have something to do  
> >>> with
> >>> the disk size. but the root pause why i did not get a excited  
> >>> speed-up
> >>> performance gain was due to the iscsi-target(iscsi server) cache.  
> >>> the
> >>> operation in your testcase which was meant for cache dropping is  
> >>> aimed
> >>> at client side(a 'echo 2>/proc/sys/vm/drop_cache' only flush the  
> >>> fs's
> >>> cache for iscsi initor(isci client)).
> >>>
> >>> Tao,
> >>>
> >>> You can verify this by pausing the tests at the right point before  
> >>> we
> >>> start '2rd ls -lR', then flush the iscsi-target's cache by 'service
> >>> iscsi-target restart'(there may be a more graceful way to do  
> >>> this),after
> >>> this done, resume the tests, then you'll find the the realtime it
> >>> consumed will be up to 3 mins around:)
> >> cool, so I am very glad that you got the same result as mine. ;)
> >>>
> >>> Btw, i really saw lots of locality by 'ls -li' for inodes under a  
> >>> same
> >>> dir, take /mnt/ocfs2/linux-2.6.28/include/linux for instance,  
> >>> almost all
> >>> of its inodes are contiguous one by one regarding to its inode  
> >>> number.
> >> yeah, that is the desired behaviour with my 3 patches. :)
> >>
> >> Then do you have any updated test statistics?
> >
> > Tao,
> >
> > With iscsi-target cache flushed everytime before tests getting  
> > started,
> > following is the updated testing result:
> >
> > Testing node:test7
> > Testing volume: iscsi sdd1
> >
> > =============== Tests with 10 times iteration================
> >
> > 1st 'Tar xjvf' result:
> >
> > Average real time with 10 times:
> > Original kernel                            kernel with enhanced  
> > patches
> > 0m 22.468s                                       0m 23.472s
> >
> > 1st 'ls -lR' result:
> > Average real time with 10 times:
> > Original kernel                            kernel with enhanced  
> > patches
> > 0m 30.682s                                        0m 30.414s
> >
> > 1st 'rm -rf' result:
> > Average real time with 10 times:
> > Original kernel                            kernel with enhanced  
> > patches
> > 0m 1m5.715s                                       0m 1m3.835s
> >
> > 2rd 'Tar xjvf' result:
> > Average real time with 10 times:
> > Original kernel                            kernel with enhanced  
> > patches
> > 0m 31.550s                                       0m 28.726s
> >
> > 2rd 'ls -lR' result:
> > Average real time with 10 times:
> > Original kernel                            kernel with enhanced  
> > patches
> >
> > 0m 3m5.772s                                       0m 30.274s
> >
> > ===============Tests end============================
> >
> > Glad to see your guy's patch has greatly improved the performance of
> > inodes traversing:),
> >
> > Unfortunately, the 1st Tar testcase still get a performance
> > penality(around 1s) everytime.
> >
> > I've kept the testing env on test7 for your verification.
> >
> >
> > Regards,
> >
> > Tristan
> >
> >>
> >> Regards,
> >> Tao
> >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2
  2009-01-20  2:34                   ` tristan.ye
@ 2009-01-21  2:37                     ` Sunil Mushran
  0 siblings, 0 replies; 17+ messages in thread
From: Sunil Mushran @ 2009-01-21  2:37 UTC (permalink / raw)
  To: ocfs2-devel

:)

tristan.ye wrote:
> On Mon, 2009-01-19 at 10:35 -0800, Sunil Mushran wrote:
>   
>> Good numbers. Did you a 2nd rm -rf too? It should show the largest  
>> improvement of all.
>>     
>
> Yes, fairly cool, we also got up to 3 mins performance gain during 2nd
> rm -rf test:
>
> =============== Tests with 10 times iteration================
>  2nd 'rm -rf' result:
>
> Average real time with 10 times: 
> Original kernel                            kernel with enhanced patches
>  4m3.425ss                                      0m59.423ss
>
>
>
>
> Regards,
> Tristan
>
>   
>> On Jan 19, 2009, at 4:57 AM, "tristan.ye" <tristan.ye@oracle.com> wrote:
>>
>>     
>>> On Mon, 2009-01-19 at 15:15 +0800, Tao Ma wrote:
>>>       
>>>> tristan.ye wrote:
>>>>         
>>>>> On Sun, 2009-01-18 at 07:17 -0800, Sunil Mushran wrote:
>>>>>           
>>>>>> How big is this disk? Maybe one kernel tree untar is not be  
>>>>>> enough to
>>>>>> expose the original issue. Also, use ls -i and/or debugfs to see if
>>>>>> the inodes have some locality.
>>>>>>             
>>>>> Tao, sunil
>>>>>
>>>>> I've tested it out with more attempts, it may have something to do  
>>>>> with
>>>>> the disk size. but the root pause why i did not get a excited  
>>>>> speed-up
>>>>> performance gain was due to the iscsi-target(iscsi server) cache.  
>>>>> the
>>>>> operation in your testcase which was meant for cache dropping is  
>>>>> aimed
>>>>> at client side(a 'echo 2>/proc/sys/vm/drop_cache' only flush the  
>>>>> fs's
>>>>> cache for iscsi initor(isci client)).
>>>>>
>>>>> Tao,
>>>>>
>>>>> You can verify this by pausing the tests at the right point before  
>>>>> we
>>>>> start '2rd ls -lR', then flush the iscsi-target's cache by 'service
>>>>> iscsi-target restart'(there may be a more graceful way to do  
>>>>> this),after
>>>>> this done, resume the tests, then you'll find the the realtime it
>>>>> consumed will be up to 3 mins around:)
>>>>>           
>>>> cool, so I am very glad that you got the same result as mine. ;)
>>>>         
>>>>> Btw, i really saw lots of locality by 'ls -li' for inodes under a  
>>>>> same
>>>>> dir, take /mnt/ocfs2/linux-2.6.28/include/linux for instance,  
>>>>> almost all
>>>>> of its inodes are contiguous one by one regarding to its inode  
>>>>> number.
>>>>>           
>>>> yeah, that is the desired behaviour with my 3 patches. :)
>>>>
>>>> Then do you have any updated test statistics?
>>>>         
>>> Tao,
>>>
>>> With iscsi-target cache flushed everytime before tests getting  
>>> started,
>>> following is the updated testing result:
>>>
>>> Testing node:test7
>>> Testing volume: iscsi sdd1
>>>
>>> =============== Tests with 10 times iteration================
>>>
>>> 1st 'Tar xjvf' result:
>>>
>>> Average real time with 10 times:
>>> Original kernel                            kernel with enhanced  
>>> patches
>>> 0m 22.468s                                       0m 23.472s
>>>
>>> 1st 'ls -lR' result:
>>> Average real time with 10 times:
>>> Original kernel                            kernel with enhanced  
>>> patches
>>> 0m 30.682s                                        0m 30.414s
>>>
>>> 1st 'rm -rf' result:
>>> Average real time with 10 times:
>>> Original kernel                            kernel with enhanced  
>>> patches
>>> 0m 1m5.715s                                       0m 1m3.835s
>>>
>>> 2rd 'Tar xjvf' result:
>>> Average real time with 10 times:
>>> Original kernel                            kernel with enhanced  
>>> patches
>>> 0m 31.550s                                       0m 28.726s
>>>
>>> 2rd 'ls -lR' result:
>>> Average real time with 10 times:
>>> Original kernel                            kernel with enhanced  
>>> patches
>>>
>>> 0m 3m5.772s                                       0m 30.274s
>>>
>>> ===============Tests end============================
>>>
>>> Glad to see your guy's patch has greatly improved the performance of
>>> inodes traversing:),
>>>
>>> Unfortunately, the 1st Tar testcase still get a performance
>>> penality(around 1s) everytime.
>>>
>>> I've kept the testing env on test7 for your verification.
>>>
>>>
>>> Regards,
>>>
>>> Tristan
>>>
>>>       
>>>> Regards,
>>>> Tao
>>>>         
>
>   

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2
  2009-01-15 21:58 [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2 Tao Ma
                   ` (3 preceding siblings ...)
  2009-01-16  8:05 ` [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2 tristan.ye
@ 2009-02-13  2:42 ` tristan.ye
  4 siblings, 0 replies; 17+ messages in thread
From: tristan.ye @ 2009-02-13  2:42 UTC (permalink / raw)
  To: ocfs2-devel

On Fri, 2009-01-16 at 05:58 +0800, Tao Ma wrote:
> Changelog from V1 to V2:
> 1. Modify some codes according to Mark's advice.
> 2. Attach some test statistics in the commit log of patch 3 and in
> this e-mail also. See below.
> 
> Hi all,
> 	In ocfs2, when we create a fresh file system and create inodes in it, 
> they are contiguous and good for readdir+stat. While if we delete all 
> the inodes and created again, the new inodes will get spread out and 
> that isn't what we need. The core problem here is that the inode block 
> search looks for the "emptiest" inode group to allocate from. So if an 
> inode alloc file has many equally (or almost equally) empty groups, new 
> inodes will tend to get spread out amongst them, which in turn can put 
> them all over the disk. This is undesirable because directory operations 
> on conceptually "nearby" inodes force a large number of seeks. For more 
> details, please see 
> http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy. 
> 
> So this patch set try to fix this problem.
> patch 1: Optimize inode allocation by remembering last group.
> We add ip_last_used_group in core directory inodes which records
> the last used allocation group. Another field named ip_last_used_slot
> is also added in case inode stealing happens. When claiming new inode,
> we passed in directory's inode so that the allocation can use this
> information.
> 
> patch 2: let the Inode group allocs use the global bitmap directly.
> 
> patch 3: we add osb_last_alloc_group in ocfs2_super to record the last
> used allocation group so that we can make inode groups contiguous enough.
> 
> I have done some basic test and the results are cool.
> 1. single node test:
> first column is the result without inode allocation patches, and the
> second one with inode allocation patched enabled. You see we have
> great improvement with the second "ls -lR".
> 
> echo 'y'|mkfs.ocfs2 -b 4K -C 4K -M local /dev/sda11
> 
> mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
> time tar jxvf /home/taoma/linux-2.6.28.tar.bz2 -C /mnt/ocfs2/ 1>/dev/null
> 
> real	0m20.548s 0m20.106s
> 
> umount /mnt/ocfs2/
> echo 2 > /proc/sys/vm/drop_caches
> mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
> time ls -lR /mnt/ocfs2/ 1>/dev/null
> 
> real	0m13.965s 0m13.766s
> 
> umount /mnt/ocfs2/
> echo 2 > /proc/sys/vm/drop_caches
> mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
> time rm /mnt/ocfs2/linux-2.6.28/ -rf
> 
> real	0m13.198s 0m13.091s
> 
> umount /mnt/ocfs2/
> echo 2 > /proc/sys/vm/drop_caches
> mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
> time tar jxvf /home/taoma/linux-2.6.28.tar.bz2 -C /mnt/ocfs2/ 1>/dev/null
> 
> real	0m23.022s 0m21.360s
> 
> umount /mnt/ocfs2/
> echo 2 > /proc/sys/vm/drop_caches
> mount -t ocfs2 /dev/sda11 /mnt/ocfs2/
> time ls -lR /mnt/ocfs2/ 1>/dev/null
> 
> real	2m45.189s 0m15.019s 
> yes, that is it. ;) I don't know we can improve so much when I start up.
> 
> 2. Tested with 4 nodes(megabyte switch for both cross-node
> communication and iscsi), with the same command sequence(using
> openmpi to run the command simultaneously). Although we spend
> a lot of time in cross-node communication, we still have some
> performance improvement.
> 
> the 1st tar:
> real	356.22s  357.70s
> 
> the 1st ls -lR:
> real	187.33s  187.32s
> 
> the rm:
> real	260.68s  262.42s
> 
> the 2nd tar:
> real	371.92s  358.47s
> 
> the 2nd ls:
> real	197.16s  188.36s
> 
> Regards,
> Tao


Tao, mark,

I've done a series of more strict tests with a much higher worload to
prove a performance gain from tao's patches. 

Following are the testing steps,

1st Tar: Untar files to a freshly mkfsed and empty fs by proper
iterations to fill the whole disk up(Here we use 100G volume)

1st Ls:  Try to traverse all inodes in the fs recursivly

1st Rm: remove all inodes in the fs


2nd Tar:Untar files again to the empty fs.

2nd Ls : the same as 1st Ls

2nd Rm: the same as 1st Rm

We use the same testing steps to do a comprison test between patched
kernels and original kernel.

From the above tests, we were expected to see a performance gain during
the 2nd Ls and 2nd RM since we know the patched kernel will provide a
better inode locality when creating by '2nd Tar' while the original
kernel go round robin with the inode allocator that makes a poor
locality.  And i'd like to say the result of real tests were awesome and
encourging...Following are the testing reports.

1. Single node test.

========Time Consumed Statistics(2 iterations)======
            [Patched kernel]   [Original kernel]
1st Tar:      1745.17s            1751.86s
1st Ls:        2128.81s            2262.13s
1st Rm:      1760.66s            1857.06s
2nd Tar:     1924.77s            1917.75s
2nd Ls:       2313.11s            8196.51s
2nd Rm:     1925.14s            2372.10s



2. Multiple nodes tests.

1). From node1:test5

========Time Consumed Statistics(2 iterations)======
            [Patched kernel]   [Original kernel]
1st Tar:      3528.36s            3422.23s
1st Ls:        3035.17s            6009.16s
1st Rm:      2436.65s            2307.37s
2nd Tar:     3131.00s            3521.21s
2nd Ls:       2949.31s            4002.07s
2nd Rm:     2425.09s            3365.42s

2) From node2:test12
========Time Consumed Statistics(2 iterations)======
            [Patched kernel]   [Original kernel]
1st Tar:      3470.28s            3876.46s
1st Ls:        2972.58s            6743.32s
1st Rm:      2413.23s            2572.18s
2nd Tar:     3848.56s            3521.21s
2nd Ls:       2887.13s            8259.07s
2nd Rm:     2478.70s            4152.42s


The data statistics from above tests were persuasive,this patches set
really behaved well during such perf comparison tests:),and it should be
the right time to get such patches committed.


Regards,
Tristan


> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2009-02-13  2:42 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-15 21:58 [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2 Tao Ma
2009-01-15 22:00 ` [Ocfs2-devel] [PATCH 1/3] ocfs2: Optimize inode allocation by remembering last group Tao Ma
2009-01-15 22:00 ` [Ocfs2-devel] [PATCH 2/3] ocfs2: Allocate inode groups from global_bitmap Tao Ma
2009-01-15 22:00 ` [Ocfs2-devel] [PATCH 3/3] ocfs2: Optimize inode group allocation by recording last used group Tao Ma
2009-01-16  8:05 ` [Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement.v2 tristan.ye
2009-01-16  8:16   ` Tao Ma
2009-01-16 16:08     ` tristan.ye
2009-01-18  8:58       ` Tao Ma
2009-01-18 15:17         ` Sunil Mushran
2009-01-19  7:07           ` tristan.ye
2009-01-19  7:15             ` Tao Ma
2009-01-19 12:57               ` tristan.ye
2009-01-19 18:35                 ` Sunil Mushran
2009-01-20  2:34                   ` tristan.ye
2009-01-21  2:37                     ` Sunil Mushran
2009-01-18 15:18         ` Sunil Mushran
2009-02-13  2:42 ` tristan.ye

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.