All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] Decrease meta fragments by using a caterpillar band Method (Ver. 2)
@ 2012-05-29 16:24 WeiFeng Liu
  2012-05-29 17:32 ` Goffredo Baroncelli
  0 siblings, 1 reply; 2+ messages in thread
From: WeiFeng Liu @ 2012-05-29 16:24 UTC (permalink / raw)
  To: linux-btrfs

This is a several bugs fixed version since my first patch commit, and added
patch of btrfs-prog


Introduction and brief speculate of values and penalties:

	When a tree block need to be created, we offer, say, 2 or 3 blocks for it, 
then pick one from the continuous blocks. If this tree block needs a cow,
another free block from these continuous blocks can be grabbed, and the old one
is freed for next cow.

	In the most ideal condition only 2 continuous blocks are kept for any times
of cowing a tree block -- image a caterpillar band by which my method is named.

	Given a scene that there are two file systems: one with COW and the other
NOCOW, each has 1GB space for metadata with it's first 100MB has been used, and
let me focus only on ops of modifying metadata and ignore deleting metadata for
simple reason. 

	As we can image that the NOCOW fs would likely keep the layout of its 100MB
meta to most neat: changes can be overwritten in the original places and leave
the rest 900MB untouched. But it is lack of the excellent feature to assure data
integrity which owned only by COW fs.

	However, only modifying metadata though, the COW fs would make holes in 
the first 100MB and write COWed meta into the rest 900MB space, in the extreme
condition, the whole 1GB space would be fragmented finally and scattered by that
100MB metadata. I don't think btrfs will be easily trap into such bad state, as
I understood we have extent, chunk, cluster and maybe other methods(tell me
please) to slow fragmenting, but it seems that there are no dedicate method
(fix me) to help COW getting rid of this type of fragments which NOCOW fs does
not incline to make.

	I introduce the caterpillar band method as a compromise. It use 300MB for
meta to avoid such fragments and without lost of COW feature, in the instance,
that means three countinues blocks are used for a single tree block, the tree
block will be circularly updated(cowed) within its three countinues blocks.

	Penalties? Yes there are, thanks to Arne Jansen and Liu Bo. As Arne Jansen
indicated, the most disadvantage of the method will be that this will basically
limit us to 1/3th of platter speed to write meta when using spinning drives and
to 1/4th if using four countinues blocks for a tree block.

	About readahead, which will be also down to the 1/3th of NOCOW fs rate, but
I would discreetly think it as an advantage rather than penalty comparing worse
condition which COW would get -- nearly since the first COW, the new tree blocks
cowed would be 50MB far away from their original neighbor blocks normally, and
after frequent random modify ops, would the worst conditition be that every
dozen of tree blocks newly cowed are more than 50MB far away from their original
neighbor blocks if in equal probability?

	So permit me to think readahead is only usefull for NOCOW fs in this
scenario, because it always keeps its original 100MB continued, and my way would
keep 1/3 readahead rate vs maybe-zero by pure COW if worstly.

	Of course, both penalties and values are only for metadata and will not
affect user date read/write, my patch is only applied for cow tree blocks. 
	But if there are large number of small files(size<4k), values and penalties
will also affect those small user data R/W.

	I have not made tests for my patch by now, it still need some time to get
more check for both strategy and code in patch and fix possible bugs before
test, any comments are welcome.


Thanks


signed-off-by WeiFeng Liu
523f28f9b3d9c710cacc31dbba644efb1678cf62

---

diff -uprN a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
--- a/fs/btrfs/ctree.c	2012-05-21 18:42:51.000000000 +0000
+++ b/fs/btrfs/ctree.c	2012-05-29 23:08:19.000000000 +0000
@@ -444,9 +444,21 @@ static noinline int __btrfs_cow_block(st
 	} else
 		parent_start = 0;
 
-	cow = btrfs_alloc_free_block(trans, root, buf->len, parent_start,
-				     root->root_key.objectid, &disk_key,
-				     level, search_start, empty_size, 1);
+ 	if (root->fs_info->cater_factor > 1) {	
+		if (btrfs_cater_factor(btrfs_header_cater(buf)) > 1)			
+			cow = btrfs_grab_cater_block(trans, root, buf, parent_start,
+							root->root_key.objectid, &disk_key,
+							level, search_start, empty_size, 1);
+		else			
+			cow = btrfs_alloc_free_block_cater(trans, root, buf->len, 
+							parent_start,
+							root->root_key.objectid, &disk_key,
+							level, search_start, empty_size, 1);
+	} else {
+		cow = btrfs_alloc_free_block(trans, root, buf->len, parent_start,
+ 				     root->root_key.objectid, &disk_key,
+ 				     level, search_start, empty_size, 1);
+	}
 	if (IS_ERR(cow))
 		return PTR_ERR(cow);
 
diff -uprN a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
--- a/fs/btrfs/ctree.h	2012-05-21 18:42:51.000000000 +0000
+++ b/fs/btrfs/ctree.h	2012-05-29 23:08:19.000000000 +0000
@@ -341,6 +341,7 @@ struct btrfs_header {
 	__le64 owner;
 	__le32 nritems;
 	u8 level;
+	u8 cater_index_factor;
 } __attribute__ ((__packed__));
 
 #define BTRFS_NODEPTRS_PER_BLOCK(r) (((r)->nodesize - \
@@ -544,6 +545,7 @@ struct btrfs_extent_item {
 	__le64 refs;
 	__le64 generation;
 	__le64 flags;
+	u8 cater_index_factor;
 } __attribute__ ((__packed__));
 
 struct btrfs_extent_item_v0 {
@@ -1222,6 +1224,7 @@ struct btrfs_fs_info {
 
 	unsigned data_chunk_allocations;
 	unsigned metadata_ratio;
+	unsigned cater_factor;
 
 	void *bdev_holder;
 
@@ -1761,6 +1764,8 @@ BTRFS_SETGET_FUNCS(extent_refs, struct b
 BTRFS_SETGET_FUNCS(extent_generation, struct btrfs_extent_item,
 		   generation, 64);
 BTRFS_SETGET_FUNCS(extent_flags, struct btrfs_extent_item, flags, 64);
+BTRFS_SETGET_FUNCS(extent_cater, struct btrfs_extent_item,
+			cater_index_factor, 8);
 
 BTRFS_SETGET_FUNCS(extent_refs_v0, struct btrfs_extent_item_v0, refs, 32);
 
@@ -1781,6 +1786,16 @@ static inline void btrfs_set_tree_block_
 	write_eb_member(eb, item, struct btrfs_tree_block_info, key, key);
 }
 
+static inline u8 btrfs_cater_factor(u8 cater_index_factor)
+{
+	return (cater_index_factor & 15);
+}
+
+static inline u8 btrfs_cater_index(u8 cater_index_factor)
+{
+	return (cater_index_factor >> 4);
+}
+
 BTRFS_SETGET_FUNCS(extent_data_ref_root, struct btrfs_extent_data_ref,
 		   root, 64);
 BTRFS_SETGET_FUNCS(extent_data_ref_objectid, struct btrfs_extent_data_ref,
@@ -2042,6 +2057,8 @@ BTRFS_SETGET_HEADER_FUNCS(header_owner,
 BTRFS_SETGET_HEADER_FUNCS(header_nritems, struct btrfs_header, nritems, 32);
 BTRFS_SETGET_HEADER_FUNCS(header_flags, struct btrfs_header, flags, 64);
 BTRFS_SETGET_HEADER_FUNCS(header_level, struct btrfs_header, level, 8);
+BTRFS_SETGET_HEADER_FUNCS(header_cater, struct btrfs_header,
+			cater_index_factor, 8);
 
 static inline int btrfs_header_flag(struct extent_buffer *eb, u64 flag)
 {
@@ -2445,7 +2462,20 @@ struct extent_buffer *btrfs_alloc_free_b
 					struct btrfs_root *root, u32 blocksize,
 					u64 parent, u64 root_objectid,
 					struct btrfs_disk_key *key, int level,
+					u64 hint, u64 empty_size, int for_cow);					
+int btrfs_cater_blocks_free(struct btrfs_root *root, u64 start, u64 len,
+					u8 factor, u8 index);
+struct extent_buffer *btrfs_grab_cater_block(struct btrfs_trans_handle *trans,
+					struct btrfs_root *root, struct extent_buffer *buf,
+					u64 parent, u64 root_objectid,
+					struct btrfs_disk_key *key, int level,
 					u64 hint, u64 empty_size, int for_cow);
+struct extent_buffer *btrfs_alloc_free_block_cater(
+					struct btrfs_trans_handle *trans,
+					struct btrfs_root *root, u32 blocksize,
+					u64 parent, u64 root_objectid,
+					struct btrfs_disk_key *key, int level,
+ 					u64 hint, u64 empty_size, int for_cow);
 void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
 			   struct btrfs_root *root,
 			   struct extent_buffer *buf,
@@ -2453,7 +2483,7 @@ void btrfs_free_tree_block(struct btrfs_
 struct extent_buffer *btrfs_init_new_buffer(struct btrfs_trans_handle *trans,
 					    struct btrfs_root *root,
 					    u64 bytenr, u32 blocksize,
-					    int level);
+					    int level, u8 cater);
 int btrfs_alloc_reserved_file_extent(struct btrfs_trans_handle *trans,
 				     struct btrfs_root *root,
 				     u64 root_objectid, u64 owner,
diff -uprN a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
--- a/fs/btrfs/delayed-ref.h	2012-05-21 18:42:51.000000000 +0000
+++ b/fs/btrfs/delayed-ref.h	2012-05-29 23:08:19.000000000 +0000
@@ -63,6 +63,7 @@ struct btrfs_delayed_extent_op {
 	unsigned int update_key:1;
 	unsigned int update_flags:1;
 	unsigned int is_data:1;
+	u8 cater_index_factor;
 };
 
 /*
diff -uprN a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
--- a/fs/btrfs/extent-tree.c	2012-05-21 18:42:51.000000000 +0000
+++ b/fs/btrfs/extent-tree.c	2012-05-29 23:08:19.000000000 +0000
@@ -87,10 +87,15 @@ static int alloc_reserved_file_extent(st
 				      u64 flags, u64 owner, u64 offset,
 				      struct btrfs_key *ins, int ref_mod);
 static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
-				     struct btrfs_root *root,
-				     u64 parent, u64 root_objectid,
-				     u64 flags, struct btrfs_disk_key *key,
-				     int level, struct btrfs_key *ins);
+						struct btrfs_root *root,
+						u64 parent, u64 root_objectid,
+						u64 flags, struct btrfs_disk_key *key,
+						int level, struct btrfs_key *ins);
+static int alloc_reserved_tree_block_cater(struct btrfs_trans_handle *trans,
+						struct btrfs_root *root,
+						u64 parent, u64 root_objectid,
+						u64 flags, struct btrfs_disk_key *key,
+						int level, struct btrfs_key *ins, u8 cater);
 static int do_chunk_alloc(struct btrfs_trans_handle *trans,
 			  struct btrfs_root *extent_root, u64 alloc_bytes,
 			  u64 flags, int force);
@@ -1979,9 +1984,9 @@ static int run_delayed_data_ref(struct b
 			flags |= extent_op->flags_to_set;
 		}
 		ret = alloc_reserved_file_extent(trans, root,
-						 parent, ref_root, flags,
-						 ref->objectid, ref->offset,
-						 &ins, node->ref_mod);
+						parent, ref_root, flags,
+						ref->objectid, ref->offset,
+						&ins, node->ref_mod);
 	} else if (node->action == BTRFS_ADD_DELAYED_REF) {
 		ret = __btrfs_inc_extent_ref(trans, root, node->bytenr,
 					     node->num_bytes, parent,
@@ -2101,12 +2106,19 @@ static int run_delayed_tree_ref(struct b
 	BUG_ON(node->ref_mod != 1);
 	if (node->action == BTRFS_ADD_DELAYED_REF && insert_reserved) {
 		BUG_ON(!extent_op || !extent_op->update_flags ||
-		       !extent_op->update_key);
-		ret = alloc_reserved_tree_block(trans, root,
+		       !extent_op->update_key);						
+		if (btrfs_cater_factor(extent_op->cater_index_factor < 2))
+			ret = alloc_reserved_tree_block(trans, root,
+ 						parent, ref_root,
+ 						extent_op->flags_to_set,
+ 						&extent_op->key,
+ 						ref->level, &ins);
+		else
+			ret = alloc_reserved_tree_block_cater(trans, root,
 						parent, ref_root,
 						extent_op->flags_to_set,
 						&extent_op->key,
-						ref->level, &ins);
+						ref->level, &ins, extent_op->cater_index_factor);
 	} else if (node->action == BTRFS_ADD_DELAYED_REF) {
 		ret = __btrfs_inc_extent_ref(trans, root, node->bytenr,
 					     node->num_bytes, parent, ref_root,
@@ -4860,6 +4872,8 @@ static int __btrfs_free_extent(struct bt
 	int num_to_del = 1;
 	u32 item_size;
 	u64 refs;
+	u8 factor = btrfs_cater_factor(extent_op->cater_index_factor);
+	u8 index = btrfs_cater_index(extent_op->cater_index_factor);
 
 	path = btrfs_alloc_path();
 	if (!path)
@@ -5024,6 +5038,10 @@ static int __btrfs_free_extent(struct bt
 			     (bytenr + num_bytes - 1) >> PAGE_CACHE_SHIFT);
 		}
 
+		if (btrfs_cater_blocks_free(root, bytenr, num_bytes, factor, index)) {
+			bytenr = bytenr - num_bytes * index;
+			num_bytes = num_bytes * index;
+		}
 		ret = update_block_group(trans, root, bytenr, num_bytes, 0);
 		BUG_ON(ret);
 	}
@@ -5926,6 +5944,7 @@ static int alloc_reserved_file_extent(st
 	btrfs_set_extent_generation(leaf, extent_item, trans->transid);
 	btrfs_set_extent_flags(leaf, extent_item,
 			       flags | BTRFS_EXTENT_FLAG_DATA);
+	btrfs_set_extent_cater(leaf, extent_item, 0);
 
 	iref = (struct btrfs_extent_inline_ref *)(extent_item + 1);
 	btrfs_set_extent_inline_ref_type(leaf, iref, type);
@@ -5987,6 +6006,7 @@ static int alloc_reserved_tree_block(str
 	btrfs_set_extent_generation(leaf, extent_item, trans->transid);
 	btrfs_set_extent_flags(leaf, extent_item,
 			       flags | BTRFS_EXTENT_FLAG_TREE_BLOCK);
+	btrfs_set_extent_cater(leaf, extent_item, 0);
 	block_info = (struct btrfs_tree_block_info *)(extent_item + 1);
 
 	btrfs_set_tree_block_key(leaf, block_info, key);
@@ -6017,6 +6037,77 @@ static int alloc_reserved_tree_block(str
 	return ret;
 }
 
+static int alloc_reserved_tree_block_cater(struct btrfs_trans_handle *trans,
+				     struct btrfs_root *root,
+				     u64 parent, u64 root_objectid,
+				     u64 flags, struct btrfs_disk_key *key,
+				     int level, struct btrfs_key *ins, u8 cater)
+{
+	int ret;
+	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct btrfs_extent_item *extent_item;
+	struct btrfs_tree_block_info *block_info;
+	struct btrfs_extent_inline_ref *iref;
+	struct btrfs_path *path;
+	struct extent_buffer *leaf;
+	u32 size = sizeof(*extent_item) + sizeof(*block_info) + sizeof(*iref);
+	u8 factor = btrfs_cater_factor(cater);
+	u8 index = btrfs_cater_index(cater);
+	u64 bytenr = ins->objectid;
+	u64 num_bytes = ins->offset;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	path->leave_spinning = 1;
+	ret = btrfs_insert_empty_item(trans, fs_info->extent_root, path,
+				      ins, size);
+	BUG_ON(ret);
+
+	leaf = path->nodes[0];
+	extent_item = btrfs_item_ptr(leaf, path->slots[0],
+				     struct btrfs_extent_item);
+	btrfs_set_extent_refs(leaf, extent_item, 1);
+	btrfs_set_extent_generation(leaf, extent_item, trans->transid);
+	btrfs_set_extent_flags(leaf, extent_item,
+			       flags | BTRFS_EXTENT_FLAG_TREE_BLOCK);
+	btrfs_set_extent_cater(leaf, extent_item, cater);
+	
+	block_info = (struct btrfs_tree_block_info *)(extent_item + 1);
+
+	btrfs_set_tree_block_key(leaf, block_info, key);
+	btrfs_set_tree_block_level(leaf, block_info, level);
+
+	iref = (struct btrfs_extent_inline_ref *)(block_info + 1);
+	if (parent > 0) {
+		BUG_ON(!(flags & BTRFS_BLOCK_FLAG_FULL_BACKREF));
+		btrfs_set_extent_inline_ref_type(leaf, iref,
+						 BTRFS_SHARED_BLOCK_REF_KEY);
+		btrfs_set_extent_inline_ref_offset(leaf, iref, parent);
+	} else {
+		btrfs_set_extent_inline_ref_type(leaf, iref,
+						 BTRFS_TREE_BLOCK_REF_KEY);
+		btrfs_set_extent_inline_ref_offset(leaf, iref, root_objectid);
+	}
+
+	btrfs_mark_buffer_dirty(leaf);
+	btrfs_free_path(path);
+
+	if (btrfs_cater_blocks_free(root, bytenr, num_bytes, factor, index)) {
+		bytenr = bytenr - num_bytes * index;
+		num_bytes = num_bytes * index;
+	}
+	ret = update_block_group(trans, root, bytenr, num_bytes, 1);
+	if (ret) {
+		printk(KERN_ERR "btrfs update block group failed for %llu "
+		       "%llu\n", (unsigned long long)ins->objectid,
+		       (unsigned long long)ins->offset);
+		BUG();
+	}
+	return ret;
+}
+
 int btrfs_alloc_reserved_file_extent(struct btrfs_trans_handle *trans,
 				     struct btrfs_root *root,
 				     u64 root_objectid, u64 owner,
@@ -6096,7 +6187,7 @@ int btrfs_alloc_logged_file_extent(struc
 struct extent_buffer *btrfs_init_new_buffer(struct btrfs_trans_handle *trans,
 					    struct btrfs_root *root,
 					    u64 bytenr, u32 blocksize,
-					    int level)
+					    int level, u8 cater)
 {
 	struct extent_buffer *buf;
 
@@ -6104,6 +6195,7 @@ struct extent_buffer *btrfs_init_new_buf
 	if (!buf)
 		return ERR_PTR(-ENOMEM);
 	btrfs_set_header_generation(buf, trans->transid);
+	btrfs_set_header_cater(buf, cater);
 	btrfs_set_buffer_lockdep_class(root->root_key.objectid, buf, level);
 	btrfs_tree_lock(buf);
 	clean_tree_block(trans, root, buf);
@@ -6221,7 +6313,201 @@ struct extent_buffer *btrfs_alloc_free_b
 	}
 
 	buf = btrfs_init_new_buffer(trans, root, ins.objectid,
-				    blocksize, level);
+				    blocksize, level, 0);
+	BUG_ON(IS_ERR(buf));
+
+	if (root_objectid == BTRFS_TREE_RELOC_OBJECTID) {
+		if (parent == 0)
+			parent = ins.objectid;
+		flags |= BTRFS_BLOCK_FLAG_FULL_BACKREF;
+	} else
+		BUG_ON(parent > 0);
+
+	if (root_objectid != BTRFS_TREE_LOG_OBJECTID) {
+		struct btrfs_delayed_extent_op *extent_op;
+		extent_op = kmalloc(sizeof(*extent_op), GFP_NOFS);
+		BUG_ON(!extent_op);
+		if (key)
+			memcpy(&extent_op->key, key, sizeof(extent_op->key));
+		else
+			memset(&extent_op->key, 0, sizeof(extent_op->key));
+		extent_op->flags_to_set = flags;
+		extent_op->update_key = 1;
+		extent_op->update_flags = 1;
+		extent_op->is_data = 0;
+		extent_op->cater_index_factor = 0;
+
+		ret = btrfs_add_delayed_tree_ref(root->fs_info, trans,
+					ins.objectid,
+					ins.offset, parent, root_objectid,
+					level, BTRFS_ADD_DELAYED_EXTENT,
+					extent_op, for_cow);
+		BUG_ON(ret);
+	}
+	return buf;
+}
+
+int btrfs_cater_blocks_free(struct btrfs_root *root, u64 start, u64 len,
+				u8 factor, u8 index)
+{
+	u8 i;
+	int ret;
+	u64 bytenr, cur;
+	struct extent_io_tree *tree;
+	struct extent_buffer *eb;
+
+	if (factor < 2 )
+		return 0;
+
+	bytenr = start - len * index;
+	tree = &((BTRFS_I(root->fs_info->btree_inode))->io_tree);
+
+	for (i = 0; i < factor; i++) {
+		if (i == index)
+			continue;
+		cur = bytenr + len * index;
+		ret = btrfs_lookup_extent(root, cur, len);
+		BUG_ON(ret);
+		if (ret == 0)
+			return 0;
+
+		rcu_read_lock();
+		eb = radix_tree_lookup(&tree->buffer, cur >> PAGE_CACHE_SHIFT);
+		if (eb)
+			return 0;
+		rcu_read_unlock();
+	}
+	
+	return 1;
+}
+
+static noinline struct extent_buffer *__btrfs_grab_cater_block(
+					struct btrfs_trans_handle *trans,
+					struct btrfs_root *root,
+					u64 blockprt, u32 blocksize,
+					u64 parent, u64 root_objectid,
+					struct btrfs_disk_key *key, int level,
+					int for_cow, u8 index, u8 factor)
+{
+	struct btrfs_key ins;
+	struct btrfs_block_rsv *block_rsv;
+	struct extent_buffer *buf;
+	u64 flags = 0;
+	int ret;
+	u8 cater;
+
+	if (factor < 2)
+		return NULL;
+
+	cater = (index << 4 | factor);
+	ins.objectid = blockprt - blocksize * index;
+	ins.offset = blocksize;
+	
+	/* 
+	 * Here reserving metadata is needed, and unreserving somewhere after 
+	 * commit transaction is also needed, it should be fixed in the future
+	 * - WeiFeng Liu. 
+	 */
+	buf = btrfs_init_new_buffer(trans, root, ins.objectid,
+				    blocksize, level, cater);
+	BUG_ON(IS_ERR(buf));
+
+	if (root_objectid == BTRFS_TREE_RELOC_OBJECTID) {
+		if (parent == 0)
+			parent = ins.objectid;
+		flags |= BTRFS_BLOCK_FLAG_FULL_BACKREF;
+	} else
+		BUG_ON(parent > 0);
+
+	if (root_objectid != BTRFS_TREE_LOG_OBJECTID) {
+		struct btrfs_delayed_extent_op *extent_op;
+		extent_op = kmalloc(sizeof(*extent_op), GFP_NOFS);
+		BUG_ON(!extent_op);
+		if (key)
+			memcpy(&extent_op->key, key, sizeof(extent_op->key));
+		else
+			memset(&extent_op->key, 0, sizeof(extent_op->key));
+		extent_op->flags_to_set = flags;
+		extent_op->update_key = 1;
+		extent_op->update_flags = 1;
+		extent_op->is_data = 0;
+		extent_op->cater_index_factor = ((index << 4) | factor);
+
+		ret = btrfs_add_delayed_tree_ref(root->fs_info, trans,
+					ins.objectid,
+					ins.offset, parent, root_objectid,
+					level, BTRFS_ADD_DELAYED_EXTENT,
+					extent_op, for_cow);
+		BUG_ON(ret);
+	}
+	return buf;
+}
+
+struct extent_buffer *btrfs_grab_cater_block(struct btrfs_trans_handle *trans,
+					struct btrfs_root *root, struct extent_buffer *buf,
+					u64 parent, u64 root_objectid,
+					struct btrfs_disk_key *key, int level,
+					u64 hint, u64 empty_size, int for_cow)
+{
+	u8 factor, index, i;
+	int ret;
+	u64 bytenr, cur;
+	struct extent_buffer *eb = NULL;
+	
+	index = btrfs_cater_index(btrfs_header_cater(buf));
+	factor = btrfs_cater_factor(btrfs_header_cater(buf));
+	if (factor < 2)
+		return NULL;
+
+	bytenr = buf->start - buf->len * index;
+	for (i = 0; i < factor; i++) {
+		if (i == index)
+			continue;
+	cur = bytenr + buf->len * i;
+	ret = btrfs_lookup_extent(root, cur, buf->len);	
+	BUG_ON(ret);
+	if (ret == 0)
+			continue;
+	eb = __btrfs_grab_cater_block(trans, root, cur, buf->len,
+					parent, root_objectid,
+					key, level, for_cow, i, factor);
+	if (eb)
+		break;
+	}
+	
+	return eb;
+}
+
+struct extent_buffer *btrfs_alloc_free_block_cater(struct btrfs_trans_handle *trans,
+					struct btrfs_root *root, u32 blocksize,
+					u64 parent, u64 root_objectid,
+					struct btrfs_disk_key *key, int level,
+					u64 hint, u64 empty_size, int for_cow)
+{
+	struct btrfs_key ins;
+	struct btrfs_block_rsv *block_rsv;
+	struct extent_buffer *buf;
+	u64 flags = 0;
+	int ret;
+	u8 factor = root->fs_info->cater_factor;
+	
+
+	if (factor > 1)
+		blocksize = blocksize * factor;
+
+	block_rsv = use_block_rsv(trans, root, blocksize);
+	if (IS_ERR(block_rsv))
+		return ERR_CAST(block_rsv);
+
+	ret = btrfs_reserve_extent(trans, root, blocksize, blocksize,
+				   empty_size, hint, (u64)-1, &ins, 0);
+	if (ret) {
+		unuse_block_rsv(root->fs_info, block_rsv, blocksize);
+		return ERR_PTR(ret);
+	}
+
+	buf = btrfs_init_new_buffer(trans, root, ins.objectid,
+				    blocksize, level, factor);
 	BUG_ON(IS_ERR(buf));
 
 	if (root_objectid == BTRFS_TREE_RELOC_OBJECTID) {
@@ -6243,6 +6529,7 @@ struct extent_buffer *btrfs_alloc_free_b
 		extent_op->update_key = 1;
 		extent_op->update_flags = 1;
 		extent_op->is_data = 0;
+		extent_op->cater_index_factor = factor;
 
 		ret = btrfs_add_delayed_tree_ref(root->fs_info, trans,
 					ins.objectid,
diff -uprN a/fs/btrfs/super.c b/fs/btrfs/super.c
--- a/fs/btrfs/super.c	2012-05-21 18:42:51.000000000 +0000
+++ b/fs/btrfs/super.c	2012-05-29 23:08:19.000000000 +0000
@@ -166,7 +166,7 @@ enum {
 	Opt_enospc_debug, Opt_subvolrootid, Opt_defrag, Opt_inode_cache,
 	Opt_no_space_cache, Opt_recovery, Opt_skip_balance,
 	Opt_check_integrity, Opt_check_integrity_including_extent_data,
-	Opt_check_integrity_print_mask,
+	Opt_check_integrity_print_mask, Opt_cater_factor,
 	Opt_err,
 };
 
@@ -206,6 +206,7 @@ static match_table_t tokens = {
 	{Opt_check_integrity, "check_int"},
 	{Opt_check_integrity_including_extent_data, "check_int_data"},
 	{Opt_check_integrity_print_mask, "check_int_print_mask=%d"},
+	{Opt_cater_factor, "cater_factor=%d"},
 	{Opt_err, NULL},
 };
 
@@ -407,6 +408,15 @@ int btrfs_parse_options(struct btrfs_roo
 		case Opt_skip_balance:
 			btrfs_set_opt(info->mount_opt, SKIP_BALANCE);
 			break;
+		case Opt_cater_factor:
+			match_int(&args[0], &intarg);
+			if (intarg > 16 || intarg < 1)
+				info->cater_factor = 1;
+			else
+				info->cater_factor = intarg;
+			printk(KERN_INFO "btrfs: cater_factor is %d\n",
+						(unsigned char)info->cater_factor);
+			break;
 #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
 		case Opt_check_integrity_including_extent_data:
 			printk(KERN_INFO "btrfs: enabling check integrity"


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [RFC PATCH] Decrease meta fragments by using a caterpillar band Method (Ver. 2)
  2012-05-29 16:24 [RFC PATCH] Decrease meta fragments by using a caterpillar band Method (Ver. 2) WeiFeng Liu
@ 2012-05-29 17:32 ` Goffredo Baroncelli
  0 siblings, 0 replies; 2+ messages in thread
From: Goffredo Baroncelli @ 2012-05-29 17:32 UTC (permalink / raw)
  To: WeiFeng Liu; +Cc: linux-btrfs

Hi Liu,

On 05/29/2012 06:24 PM, WeiFeng Liu wrote:
> This is a several bugs fixed version since my first patch commit, and added
> patch of btrfs-prog
> 
> 
> Introduction and brief speculate of values and penalties:
> 
> 	When a tree block need to be created, we offer, say, 2 or 3 blocks for it, 
> then pick one from the continuous blocks. If this tree block needs a cow,
> another free block from these continuous blocks can be grabbed, and the old one
> is freed for next cow.


What happens if the block is not COW-ed *and freed* but COW-ed only
(think about a snapshot updated) ? I.e, what happens if the user makes
5-6 snapshot and the caterpillar-size is 3 ?

> 
> 	In the most ideal condition only 2 continuous blocks are kept for any times
> of cowing a tree block -- image a caterpillar band by which my method is named.

Apart my doubt above, I am very interested on the performances. However
I have some doubts about the space efficiency.
I have the impression that today BTRFS consumes a lot of space for the
meta-data.

On my linux box:

	ghigo@venice:~$ sudo btrfs fi df /
	Data: total=19.01GB, used=14.10GB
	System, DUP: total=8.00MB, used=4.00KB
	System: total=4.00MB, used=0.00
	Metadata, DUP: total=2.00GB, used=933.55MB
	Metadata: total=8.00MB, used=0.00

So basically the metadata are bout the 6% of the data (0.9GB / 14.1GB ).

But with your idea, BTRFS should reserve 3 blocks for every metadata block.
This means that the BTRFS ratio metadata/data will increase to 18%
(6%*3). Which is not a so negligible value.

GB

> 
> 	Given a scene that there are two file systems: one with COW and the other
> NOCOW, each has 1GB space for metadata with it's first 100MB has been used, and
> let me focus only on ops of modifying metadata and ignore deleting metadata for
> simple reason. 
> 
> 	As we can image that the NOCOW fs would likely keep the layout of its 100MB
> meta to most neat: changes can be overwritten in the original places and leave
> the rest 900MB untouched. But it is lack of the excellent feature to assure data
> integrity which owned only by COW fs.
> 
> 	However, only modifying metadata though, the COW fs would make holes in 
> the first 100MB and write COWed meta into the rest 900MB space, in the extreme
> condition, the whole 1GB space would be fragmented finally and scattered by that
> 100MB metadata. I don't think btrfs will be easily trap into such bad state, as
> I understood we have extent, chunk, cluster and maybe other methods(tell me
> please) to slow fragmenting, but it seems that there are no dedicate method
> (fix me) to help COW getting rid of this type of fragments which NOCOW fs does
> not incline to make.
> 
> 	I introduce the caterpillar band method as a compromise. It use 300MB for
> meta to avoid such fragments and without lost of COW feature, in the instance,
> that means three countinues blocks are used for a single tree block, the tree
> block will be circularly updated(cowed) within its three countinues blocks.
> 
> 	Penalties? Yes there are, thanks to Arne Jansen and Liu Bo. As Arne Jansen
> indicated, the most disadvantage of the method will be that this will basically
> limit us to 1/3th of platter speed to write meta when using spinning drives and
> to 1/4th if using four countinues blocks for a tree block.
> 
> 	About readahead, which will be also down to the 1/3th of NOCOW fs rate, but
> I would discreetly think it as an advantage rather than penalty comparing worse
> condition which COW would get -- nearly since the first COW, the new tree blocks
> cowed would be 50MB far away from their original neighbor blocks normally, and
> after frequent random modify ops, would the worst conditition be that every
> dozen of tree blocks newly cowed are more than 50MB far away from their original
> neighbor blocks if in equal probability?
> 
> 	So permit me to think readahead is only usefull for NOCOW fs in this
> scenario, because it always keeps its original 100MB continued, and my way would
> keep 1/3 readahead rate vs maybe-zero by pure COW if worstly.
> 
> 	Of course, both penalties and values are only for metadata and will not
> affect user date read/write, my patch is only applied for cow tree blocks. 
> 	But if there are large number of small files(size<4k), values and penalties
> will also affect those small user data R/W.
> 
> 	I have not made tests for my patch by now, it still need some time to get
> more check for both strategy and code in patch and fix possible bugs before
> test, any comments are welcome.
> 
> 
> Thanks
> 
> 
> signed-off-by WeiFeng Liu
> 523f28f9b3d9c710cacc31dbba644efb1678cf62
> 
> ---
> 
> diff -uprN a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> --- a/fs/btrfs/ctree.c	2012-05-21 18:42:51.000000000 +0000
> +++ b/fs/btrfs/ctree.c	2012-05-29 23:08:19.000000000 +0000
> @@ -444,9 +444,21 @@ static noinline int __btrfs_cow_block(st
>  	} else
>  		parent_start = 0;
>  
> -	cow = btrfs_alloc_free_block(trans, root, buf->len, parent_start,
> -				     root->root_key.objectid, &disk_key,
> -				     level, search_start, empty_size, 1);
> + 	if (root->fs_info->cater_factor > 1) {	
> +		if (btrfs_cater_factor(btrfs_header_cater(buf)) > 1)			
> +			cow = btrfs_grab_cater_block(trans, root, buf, parent_start,
> +							root->root_key.objectid, &disk_key,
> +							level, search_start, empty_size, 1);
> +		else			
> +			cow = btrfs_alloc_free_block_cater(trans, root, buf->len, 
> +							parent_start,
> +							root->root_key.objectid, &disk_key,
> +							level, search_start, empty_size, 1);
> +	} else {
> +		cow = btrfs_alloc_free_block(trans, root, buf->len, parent_start,
> + 				     root->root_key.objectid, &disk_key,
> + 				     level, search_start, empty_size, 1);
> +	}
>  	if (IS_ERR(cow))
>  		return PTR_ERR(cow);
>  
> diff -uprN a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> --- a/fs/btrfs/ctree.h	2012-05-21 18:42:51.000000000 +0000
> +++ b/fs/btrfs/ctree.h	2012-05-29 23:08:19.000000000 +0000
> @@ -341,6 +341,7 @@ struct btrfs_header {
>  	__le64 owner;
>  	__le32 nritems;
>  	u8 level;
> +	u8 cater_index_factor;
>  } __attribute__ ((__packed__));
>  
>  #define BTRFS_NODEPTRS_PER_BLOCK(r) (((r)->nodesize - \
> @@ -544,6 +545,7 @@ struct btrfs_extent_item {
>  	__le64 refs;
>  	__le64 generation;
>  	__le64 flags;
> +	u8 cater_index_factor;
>  } __attribute__ ((__packed__));
>  
>  struct btrfs_extent_item_v0 {
> @@ -1222,6 +1224,7 @@ struct btrfs_fs_info {
>  
>  	unsigned data_chunk_allocations;
>  	unsigned metadata_ratio;
> +	unsigned cater_factor;
>  
>  	void *bdev_holder;
>  
> @@ -1761,6 +1764,8 @@ BTRFS_SETGET_FUNCS(extent_refs, struct b
>  BTRFS_SETGET_FUNCS(extent_generation, struct btrfs_extent_item,
>  		   generation, 64);
>  BTRFS_SETGET_FUNCS(extent_flags, struct btrfs_extent_item, flags, 64);
> +BTRFS_SETGET_FUNCS(extent_cater, struct btrfs_extent_item,
> +			cater_index_factor, 8);
>  
>  BTRFS_SETGET_FUNCS(extent_refs_v0, struct btrfs_extent_item_v0, refs, 32);
>  
> @@ -1781,6 +1786,16 @@ static inline void btrfs_set_tree_block_
>  	write_eb_member(eb, item, struct btrfs_tree_block_info, key, key);
>  }
>  
> +static inline u8 btrfs_cater_factor(u8 cater_index_factor)
> +{
> +	return (cater_index_factor & 15);
> +}
> +
> +static inline u8 btrfs_cater_index(u8 cater_index_factor)
> +{
> +	return (cater_index_factor >> 4);
> +}
> +
>  BTRFS_SETGET_FUNCS(extent_data_ref_root, struct btrfs_extent_data_ref,
>  		   root, 64);
>  BTRFS_SETGET_FUNCS(extent_data_ref_objectid, struct btrfs_extent_data_ref,
> @@ -2042,6 +2057,8 @@ BTRFS_SETGET_HEADER_FUNCS(header_owner,
>  BTRFS_SETGET_HEADER_FUNCS(header_nritems, struct btrfs_header, nritems, 32);
>  BTRFS_SETGET_HEADER_FUNCS(header_flags, struct btrfs_header, flags, 64);
>  BTRFS_SETGET_HEADER_FUNCS(header_level, struct btrfs_header, level, 8);
> +BTRFS_SETGET_HEADER_FUNCS(header_cater, struct btrfs_header,
> +			cater_index_factor, 8);
>  
>  static inline int btrfs_header_flag(struct extent_buffer *eb, u64 flag)
>  {
> @@ -2445,7 +2462,20 @@ struct extent_buffer *btrfs_alloc_free_b
>  					struct btrfs_root *root, u32 blocksize,
>  					u64 parent, u64 root_objectid,
>  					struct btrfs_disk_key *key, int level,
> +					u64 hint, u64 empty_size, int for_cow);					
> +int btrfs_cater_blocks_free(struct btrfs_root *root, u64 start, u64 len,
> +					u8 factor, u8 index);
> +struct extent_buffer *btrfs_grab_cater_block(struct btrfs_trans_handle *trans,
> +					struct btrfs_root *root, struct extent_buffer *buf,
> +					u64 parent, u64 root_objectid,
> +					struct btrfs_disk_key *key, int level,
>  					u64 hint, u64 empty_size, int for_cow);
> +struct extent_buffer *btrfs_alloc_free_block_cater(
> +					struct btrfs_trans_handle *trans,
> +					struct btrfs_root *root, u32 blocksize,
> +					u64 parent, u64 root_objectid,
> +					struct btrfs_disk_key *key, int level,
> + 					u64 hint, u64 empty_size, int for_cow);
>  void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
>  			   struct btrfs_root *root,
>  			   struct extent_buffer *buf,
> @@ -2453,7 +2483,7 @@ void btrfs_free_tree_block(struct btrfs_
>  struct extent_buffer *btrfs_init_new_buffer(struct btrfs_trans_handle *trans,
>  					    struct btrfs_root *root,
>  					    u64 bytenr, u32 blocksize,
> -					    int level);
> +					    int level, u8 cater);
>  int btrfs_alloc_reserved_file_extent(struct btrfs_trans_handle *trans,
>  				     struct btrfs_root *root,
>  				     u64 root_objectid, u64 owner,
> diff -uprN a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
> --- a/fs/btrfs/delayed-ref.h	2012-05-21 18:42:51.000000000 +0000
> +++ b/fs/btrfs/delayed-ref.h	2012-05-29 23:08:19.000000000 +0000
> @@ -63,6 +63,7 @@ struct btrfs_delayed_extent_op {
>  	unsigned int update_key:1;
>  	unsigned int update_flags:1;
>  	unsigned int is_data:1;
> +	u8 cater_index_factor;
>  };
>  
>  /*
> diff -uprN a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> --- a/fs/btrfs/extent-tree.c	2012-05-21 18:42:51.000000000 +0000
> +++ b/fs/btrfs/extent-tree.c	2012-05-29 23:08:19.000000000 +0000
> @@ -87,10 +87,15 @@ static int alloc_reserved_file_extent(st
>  				      u64 flags, u64 owner, u64 offset,
>  				      struct btrfs_key *ins, int ref_mod);
>  static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
> -				     struct btrfs_root *root,
> -				     u64 parent, u64 root_objectid,
> -				     u64 flags, struct btrfs_disk_key *key,
> -				     int level, struct btrfs_key *ins);
> +						struct btrfs_root *root,
> +						u64 parent, u64 root_objectid,
> +						u64 flags, struct btrfs_disk_key *key,
> +						int level, struct btrfs_key *ins);
> +static int alloc_reserved_tree_block_cater(struct btrfs_trans_handle *trans,
> +						struct btrfs_root *root,
> +						u64 parent, u64 root_objectid,
> +						u64 flags, struct btrfs_disk_key *key,
> +						int level, struct btrfs_key *ins, u8 cater);
>  static int do_chunk_alloc(struct btrfs_trans_handle *trans,
>  			  struct btrfs_root *extent_root, u64 alloc_bytes,
>  			  u64 flags, int force);
> @@ -1979,9 +1984,9 @@ static int run_delayed_data_ref(struct b
>  			flags |= extent_op->flags_to_set;
>  		}
>  		ret = alloc_reserved_file_extent(trans, root,
> -						 parent, ref_root, flags,
> -						 ref->objectid, ref->offset,
> -						 &ins, node->ref_mod);
> +						parent, ref_root, flags,
> +						ref->objectid, ref->offset,
> +						&ins, node->ref_mod);
>  	} else if (node->action == BTRFS_ADD_DELAYED_REF) {
>  		ret = __btrfs_inc_extent_ref(trans, root, node->bytenr,
>  					     node->num_bytes, parent,
> @@ -2101,12 +2106,19 @@ static int run_delayed_tree_ref(struct b
>  	BUG_ON(node->ref_mod != 1);
>  	if (node->action == BTRFS_ADD_DELAYED_REF && insert_reserved) {
>  		BUG_ON(!extent_op || !extent_op->update_flags ||
> -		       !extent_op->update_key);
> -		ret = alloc_reserved_tree_block(trans, root,
> +		       !extent_op->update_key);						
> +		if (btrfs_cater_factor(extent_op->cater_index_factor < 2))
> +			ret = alloc_reserved_tree_block(trans, root,
> + 						parent, ref_root,
> + 						extent_op->flags_to_set,
> + 						&extent_op->key,
> + 						ref->level, &ins);
> +		else
> +			ret = alloc_reserved_tree_block_cater(trans, root,
>  						parent, ref_root,
>  						extent_op->flags_to_set,
>  						&extent_op->key,
> -						ref->level, &ins);
> +						ref->level, &ins, extent_op->cater_index_factor);
>  	} else if (node->action == BTRFS_ADD_DELAYED_REF) {
>  		ret = __btrfs_inc_extent_ref(trans, root, node->bytenr,
>  					     node->num_bytes, parent, ref_root,
> @@ -4860,6 +4872,8 @@ static int __btrfs_free_extent(struct bt
>  	int num_to_del = 1;
>  	u32 item_size;
>  	u64 refs;
> +	u8 factor = btrfs_cater_factor(extent_op->cater_index_factor);
> +	u8 index = btrfs_cater_index(extent_op->cater_index_factor);
>  
>  	path = btrfs_alloc_path();
>  	if (!path)
> @@ -5024,6 +5038,10 @@ static int __btrfs_free_extent(struct bt
>  			     (bytenr + num_bytes - 1) >> PAGE_CACHE_SHIFT);
>  		}
>  
> +		if (btrfs_cater_blocks_free(root, bytenr, num_bytes, factor, index)) {
> +			bytenr = bytenr - num_bytes * index;
> +			num_bytes = num_bytes * index;
> +		}
>  		ret = update_block_group(trans, root, bytenr, num_bytes, 0);
>  		BUG_ON(ret);
>  	}
> @@ -5926,6 +5944,7 @@ static int alloc_reserved_file_extent(st
>  	btrfs_set_extent_generation(leaf, extent_item, trans->transid);
>  	btrfs_set_extent_flags(leaf, extent_item,
>  			       flags | BTRFS_EXTENT_FLAG_DATA);
> +	btrfs_set_extent_cater(leaf, extent_item, 0);
>  
>  	iref = (struct btrfs_extent_inline_ref *)(extent_item + 1);
>  	btrfs_set_extent_inline_ref_type(leaf, iref, type);
> @@ -5987,6 +6006,7 @@ static int alloc_reserved_tree_block(str
>  	btrfs_set_extent_generation(leaf, extent_item, trans->transid);
>  	btrfs_set_extent_flags(leaf, extent_item,
>  			       flags | BTRFS_EXTENT_FLAG_TREE_BLOCK);
> +	btrfs_set_extent_cater(leaf, extent_item, 0);
>  	block_info = (struct btrfs_tree_block_info *)(extent_item + 1);
>  
>  	btrfs_set_tree_block_key(leaf, block_info, key);
> @@ -6017,6 +6037,77 @@ static int alloc_reserved_tree_block(str
>  	return ret;
>  }
>  
> +static int alloc_reserved_tree_block_cater(struct btrfs_trans_handle *trans,
> +				     struct btrfs_root *root,
> +				     u64 parent, u64 root_objectid,
> +				     u64 flags, struct btrfs_disk_key *key,
> +				     int level, struct btrfs_key *ins, u8 cater)
> +{
> +	int ret;
> +	struct btrfs_fs_info *fs_info = root->fs_info;
> +	struct btrfs_extent_item *extent_item;
> +	struct btrfs_tree_block_info *block_info;
> +	struct btrfs_extent_inline_ref *iref;
> +	struct btrfs_path *path;
> +	struct extent_buffer *leaf;
> +	u32 size = sizeof(*extent_item) + sizeof(*block_info) + sizeof(*iref);
> +	u8 factor = btrfs_cater_factor(cater);
> +	u8 index = btrfs_cater_index(cater);
> +	u64 bytenr = ins->objectid;
> +	u64 num_bytes = ins->offset;
> +
> +	path = btrfs_alloc_path();
> +	if (!path)
> +		return -ENOMEM;
> +
> +	path->leave_spinning = 1;
> +	ret = btrfs_insert_empty_item(trans, fs_info->extent_root, path,
> +				      ins, size);
> +	BUG_ON(ret);
> +
> +	leaf = path->nodes[0];
> +	extent_item = btrfs_item_ptr(leaf, path->slots[0],
> +				     struct btrfs_extent_item);
> +	btrfs_set_extent_refs(leaf, extent_item, 1);
> +	btrfs_set_extent_generation(leaf, extent_item, trans->transid);
> +	btrfs_set_extent_flags(leaf, extent_item,
> +			       flags | BTRFS_EXTENT_FLAG_TREE_BLOCK);
> +	btrfs_set_extent_cater(leaf, extent_item, cater);
> +	
> +	block_info = (struct btrfs_tree_block_info *)(extent_item + 1);
> +
> +	btrfs_set_tree_block_key(leaf, block_info, key);
> +	btrfs_set_tree_block_level(leaf, block_info, level);
> +
> +	iref = (struct btrfs_extent_inline_ref *)(block_info + 1);
> +	if (parent > 0) {
> +		BUG_ON(!(flags & BTRFS_BLOCK_FLAG_FULL_BACKREF));
> +		btrfs_set_extent_inline_ref_type(leaf, iref,
> +						 BTRFS_SHARED_BLOCK_REF_KEY);
> +		btrfs_set_extent_inline_ref_offset(leaf, iref, parent);
> +	} else {
> +		btrfs_set_extent_inline_ref_type(leaf, iref,
> +						 BTRFS_TREE_BLOCK_REF_KEY);
> +		btrfs_set_extent_inline_ref_offset(leaf, iref, root_objectid);
> +	}
> +
> +	btrfs_mark_buffer_dirty(leaf);
> +	btrfs_free_path(path);
> +
> +	if (btrfs_cater_blocks_free(root, bytenr, num_bytes, factor, index)) {
> +		bytenr = bytenr - num_bytes * index;
> +		num_bytes = num_bytes * index;
> +	}
> +	ret = update_block_group(trans, root, bytenr, num_bytes, 1);
> +	if (ret) {
> +		printk(KERN_ERR "btrfs update block group failed for %llu "
> +		       "%llu\n", (unsigned long long)ins->objectid,
> +		       (unsigned long long)ins->offset);
> +		BUG();
> +	}
> +	return ret;
> +}
> +
>  int btrfs_alloc_reserved_file_extent(struct btrfs_trans_handle *trans,
>  				     struct btrfs_root *root,
>  				     u64 root_objectid, u64 owner,
> @@ -6096,7 +6187,7 @@ int btrfs_alloc_logged_file_extent(struc
>  struct extent_buffer *btrfs_init_new_buffer(struct btrfs_trans_handle *trans,
>  					    struct btrfs_root *root,
>  					    u64 bytenr, u32 blocksize,
> -					    int level)
> +					    int level, u8 cater)
>  {
>  	struct extent_buffer *buf;
>  
> @@ -6104,6 +6195,7 @@ struct extent_buffer *btrfs_init_new_buf
>  	if (!buf)
>  		return ERR_PTR(-ENOMEM);
>  	btrfs_set_header_generation(buf, trans->transid);
> +	btrfs_set_header_cater(buf, cater);
>  	btrfs_set_buffer_lockdep_class(root->root_key.objectid, buf, level);
>  	btrfs_tree_lock(buf);
>  	clean_tree_block(trans, root, buf);
> @@ -6221,7 +6313,201 @@ struct extent_buffer *btrfs_alloc_free_b
>  	}
>  
>  	buf = btrfs_init_new_buffer(trans, root, ins.objectid,
> -				    blocksize, level);
> +				    blocksize, level, 0);
> +	BUG_ON(IS_ERR(buf));
> +
> +	if (root_objectid == BTRFS_TREE_RELOC_OBJECTID) {
> +		if (parent == 0)
> +			parent = ins.objectid;
> +		flags |= BTRFS_BLOCK_FLAG_FULL_BACKREF;
> +	} else
> +		BUG_ON(parent > 0);
> +
> +	if (root_objectid != BTRFS_TREE_LOG_OBJECTID) {
> +		struct btrfs_delayed_extent_op *extent_op;
> +		extent_op = kmalloc(sizeof(*extent_op), GFP_NOFS);
> +		BUG_ON(!extent_op);
> +		if (key)
> +			memcpy(&extent_op->key, key, sizeof(extent_op->key));
> +		else
> +			memset(&extent_op->key, 0, sizeof(extent_op->key));
> +		extent_op->flags_to_set = flags;
> +		extent_op->update_key = 1;
> +		extent_op->update_flags = 1;
> +		extent_op->is_data = 0;
> +		extent_op->cater_index_factor = 0;
> +
> +		ret = btrfs_add_delayed_tree_ref(root->fs_info, trans,
> +					ins.objectid,
> +					ins.offset, parent, root_objectid,
> +					level, BTRFS_ADD_DELAYED_EXTENT,
> +					extent_op, for_cow);
> +		BUG_ON(ret);
> +	}
> +	return buf;
> +}
> +
> +int btrfs_cater_blocks_free(struct btrfs_root *root, u64 start, u64 len,
> +				u8 factor, u8 index)
> +{
> +	u8 i;
> +	int ret;
> +	u64 bytenr, cur;
> +	struct extent_io_tree *tree;
> +	struct extent_buffer *eb;
> +
> +	if (factor < 2 )
> +		return 0;
> +
> +	bytenr = start - len * index;
> +	tree = &((BTRFS_I(root->fs_info->btree_inode))->io_tree);
> +
> +	for (i = 0; i < factor; i++) {
> +		if (i == index)
> +			continue;
> +		cur = bytenr + len * index;
> +		ret = btrfs_lookup_extent(root, cur, len);
> +		BUG_ON(ret);
> +		if (ret == 0)
> +			return 0;
> +
> +		rcu_read_lock();
> +		eb = radix_tree_lookup(&tree->buffer, cur >> PAGE_CACHE_SHIFT);
> +		if (eb)
> +			return 0;
> +		rcu_read_unlock();
> +	}
> +	
> +	return 1;
> +}
> +
> +static noinline struct extent_buffer *__btrfs_grab_cater_block(
> +					struct btrfs_trans_handle *trans,
> +					struct btrfs_root *root,
> +					u64 blockprt, u32 blocksize,
> +					u64 parent, u64 root_objectid,
> +					struct btrfs_disk_key *key, int level,
> +					int for_cow, u8 index, u8 factor)
> +{
> +	struct btrfs_key ins;
> +	struct btrfs_block_rsv *block_rsv;
> +	struct extent_buffer *buf;
> +	u64 flags = 0;
> +	int ret;
> +	u8 cater;
> +
> +	if (factor < 2)
> +		return NULL;
> +
> +	cater = (index << 4 | factor);
> +	ins.objectid = blockprt - blocksize * index;
> +	ins.offset = blocksize;
> +	
> +	/* 
> +	 * Here reserving metadata is needed, and unreserving somewhere after 
> +	 * commit transaction is also needed, it should be fixed in the future
> +	 * - WeiFeng Liu. 
> +	 */
> +	buf = btrfs_init_new_buffer(trans, root, ins.objectid,
> +				    blocksize, level, cater);
> +	BUG_ON(IS_ERR(buf));
> +
> +	if (root_objectid == BTRFS_TREE_RELOC_OBJECTID) {
> +		if (parent == 0)
> +			parent = ins.objectid;
> +		flags |= BTRFS_BLOCK_FLAG_FULL_BACKREF;
> +	} else
> +		BUG_ON(parent > 0);
> +
> +	if (root_objectid != BTRFS_TREE_LOG_OBJECTID) {
> +		struct btrfs_delayed_extent_op *extent_op;
> +		extent_op = kmalloc(sizeof(*extent_op), GFP_NOFS);
> +		BUG_ON(!extent_op);
> +		if (key)
> +			memcpy(&extent_op->key, key, sizeof(extent_op->key));
> +		else
> +			memset(&extent_op->key, 0, sizeof(extent_op->key));
> +		extent_op->flags_to_set = flags;
> +		extent_op->update_key = 1;
> +		extent_op->update_flags = 1;
> +		extent_op->is_data = 0;
> +		extent_op->cater_index_factor = ((index << 4) | factor);
> +
> +		ret = btrfs_add_delayed_tree_ref(root->fs_info, trans,
> +					ins.objectid,
> +					ins.offset, parent, root_objectid,
> +					level, BTRFS_ADD_DELAYED_EXTENT,
> +					extent_op, for_cow);
> +		BUG_ON(ret);
> +	}
> +	return buf;
> +}
> +
> +struct extent_buffer *btrfs_grab_cater_block(struct btrfs_trans_handle *trans,
> +					struct btrfs_root *root, struct extent_buffer *buf,
> +					u64 parent, u64 root_objectid,
> +					struct btrfs_disk_key *key, int level,
> +					u64 hint, u64 empty_size, int for_cow)
> +{
> +	u8 factor, index, i;
> +	int ret;
> +	u64 bytenr, cur;
> +	struct extent_buffer *eb = NULL;
> +	
> +	index = btrfs_cater_index(btrfs_header_cater(buf));
> +	factor = btrfs_cater_factor(btrfs_header_cater(buf));
> +	if (factor < 2)
> +		return NULL;
> +
> +	bytenr = buf->start - buf->len * index;
> +	for (i = 0; i < factor; i++) {
> +		if (i == index)
> +			continue;
> +	cur = bytenr + buf->len * i;
> +	ret = btrfs_lookup_extent(root, cur, buf->len);	
> +	BUG_ON(ret);
> +	if (ret == 0)
> +			continue;
> +	eb = __btrfs_grab_cater_block(trans, root, cur, buf->len,
> +					parent, root_objectid,
> +					key, level, for_cow, i, factor);
> +	if (eb)
> +		break;
> +	}
> +	
> +	return eb;
> +}
> +
> +struct extent_buffer *btrfs_alloc_free_block_cater(struct btrfs_trans_handle *trans,
> +					struct btrfs_root *root, u32 blocksize,
> +					u64 parent, u64 root_objectid,
> +					struct btrfs_disk_key *key, int level,
> +					u64 hint, u64 empty_size, int for_cow)
> +{
> +	struct btrfs_key ins;
> +	struct btrfs_block_rsv *block_rsv;
> +	struct extent_buffer *buf;
> +	u64 flags = 0;
> +	int ret;
> +	u8 factor = root->fs_info->cater_factor;
> +	
> +
> +	if (factor > 1)
> +		blocksize = blocksize * factor;
> +
> +	block_rsv = use_block_rsv(trans, root, blocksize);
> +	if (IS_ERR(block_rsv))
> +		return ERR_CAST(block_rsv);
> +
> +	ret = btrfs_reserve_extent(trans, root, blocksize, blocksize,
> +				   empty_size, hint, (u64)-1, &ins, 0);
> +	if (ret) {
> +		unuse_block_rsv(root->fs_info, block_rsv, blocksize);
> +		return ERR_PTR(ret);
> +	}
> +
> +	buf = btrfs_init_new_buffer(trans, root, ins.objectid,
> +				    blocksize, level, factor);
>  	BUG_ON(IS_ERR(buf));
>  
>  	if (root_objectid == BTRFS_TREE_RELOC_OBJECTID) {
> @@ -6243,6 +6529,7 @@ struct extent_buffer *btrfs_alloc_free_b
>  		extent_op->update_key = 1;
>  		extent_op->update_flags = 1;
>  		extent_op->is_data = 0;
> +		extent_op->cater_index_factor = factor;
>  
>  		ret = btrfs_add_delayed_tree_ref(root->fs_info, trans,
>  					ins.objectid,
> diff -uprN a/fs/btrfs/super.c b/fs/btrfs/super.c
> --- a/fs/btrfs/super.c	2012-05-21 18:42:51.000000000 +0000
> +++ b/fs/btrfs/super.c	2012-05-29 23:08:19.000000000 +0000
> @@ -166,7 +166,7 @@ enum {
>  	Opt_enospc_debug, Opt_subvolrootid, Opt_defrag, Opt_inode_cache,
>  	Opt_no_space_cache, Opt_recovery, Opt_skip_balance,
>  	Opt_check_integrity, Opt_check_integrity_including_extent_data,
> -	Opt_check_integrity_print_mask,
> +	Opt_check_integrity_print_mask, Opt_cater_factor,
>  	Opt_err,
>  };
>  
> @@ -206,6 +206,7 @@ static match_table_t tokens = {
>  	{Opt_check_integrity, "check_int"},
>  	{Opt_check_integrity_including_extent_data, "check_int_data"},
>  	{Opt_check_integrity_print_mask, "check_int_print_mask=%d"},
> +	{Opt_cater_factor, "cater_factor=%d"},
>  	{Opt_err, NULL},
>  };
>  
> @@ -407,6 +408,15 @@ int btrfs_parse_options(struct btrfs_roo
>  		case Opt_skip_balance:
>  			btrfs_set_opt(info->mount_opt, SKIP_BALANCE);
>  			break;
> +		case Opt_cater_factor:
> +			match_int(&args[0], &intarg);
> +			if (intarg > 16 || intarg < 1)
> +				info->cater_factor = 1;
> +			else
> +				info->cater_factor = intarg;
> +			printk(KERN_INFO "btrfs: cater_factor is %d\n",
> +						(unsigned char)info->cater_factor);
> +			break;
>  #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
>  		case Opt_check_integrity_including_extent_data:
>  			printk(KERN_INFO "btrfs: enabling check integrity"
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> .
> 


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2012-05-29 17:32 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-29 16:24 [RFC PATCH] Decrease meta fragments by using a caterpillar band Method (Ver. 2) WeiFeng Liu
2012-05-29 17:32 ` Goffredo Baroncelli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.