All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] Btrfs: take block group fragmentation into account for allocation
@ 2009-03-08 14:42 Yien Zheng
  2009-03-08 16:37 ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Yien Zheng @ 2009-03-08 14:42 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Josef Bacik

I tried applying this patch, but the fragmentation_percent function is
giving me:

WARNING: "__udivdi3"
[/home/partition6/yien/git/linux-git/fs/btrfs/btrfs.ko] undefined!

when I compile, and the module fails to load, even though the build
completes.  I've traced it to the calculation of max_fragments:

max_fragments = (block_group->key.offset -
		btrfs_block_group_used(&block_group->item)) /
		block_group->fragment_size;

It seems that accessing block_group in here is causing a reference to
__udivdi3 somehow.  Any idea what's going on here?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Btrfs: take block group fragmentation into account for allocation
  2009-03-08 14:42 [PATCH] Btrfs: take block group fragmentation into account for allocation Yien Zheng
@ 2009-03-08 16:37 ` Jens Axboe
  2009-03-09  2:03   ` Yien Zheng
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2009-03-08 16:37 UTC (permalink / raw)
  To: Yien Zheng; +Cc: linux-btrfs, Josef Bacik

On Sun, Mar 08 2009, Yien Zheng wrote:
> I tried applying this patch, but the fragmentation_percent function is
> giving me:
> 
> WARNING: "__udivdi3"
> [/home/partition6/yien/git/linux-git/fs/btrfs/btrfs.ko] undefined!
> 
> when I compile, and the module fails to load, even though the build
> completes.  I've traced it to the calculation of max_fragments:
> 
> max_fragments = (block_group->key.offset -
> 		btrfs_block_group_used(&block_group->item)) /
> 		block_group->fragment_size;
> 
> It seems that accessing block_group in here is causing a reference to
> __udivdi3 somehow.  Any idea what's going on here?

You can't to 64-bit divides on 32-bit archs. Make that

max_fragments = block_group->key.offset - btrfs_block_group_used(&block_group->item);
do_div(max_fragments, block_group->fragment_size);

and it should work.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Btrfs: take block group fragmentation into account for allocation
  2009-03-08 16:37 ` Jens Axboe
@ 2009-03-09  2:03   ` Yien Zheng
  2009-03-09 13:56     ` Simon Holm Thøgersen
  0 siblings, 1 reply; 8+ messages in thread
From: Yien Zheng @ 2009-03-09  2:03 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-btrfs, Josef Bacik

Thanks Jens!  My research had indicated something about 64-bit
division and using do_div, but you cleared it up for me.

The fragmentation_percent function ultimately does another divide to
get the percentage, so here's what I did to get the function to work:

static int fragmentation_percent(struct btrfs_block_group_cache *block_=
group)
{
        u64 max_fragments;
        u64 fragments_ratio;

        max_fragments =3D block_group->key.offset -
                btrfs_block_group_used(&block_group->item);
        do_div(max_fragments, block_group->fragment_size);
        fragments_ratio =3D block_group->fragments;
        do_div(fragments_ratio, max_fragments);
        return (fragments_ratio * 100);
}

On Sun, Mar 8, 2009 at 10:37 AM, Jens Axboe <jens.axboe@oracle.com> wro=
te:
> On Sun, Mar 08 2009, Yien Zheng wrote:
>> I tried applying this patch, but the fragmentation_percent function =
is
>> giving me:
>>
>> WARNING: "__udivdi3"
>> [/home/partition6/yien/git/linux-git/fs/btrfs/btrfs.ko] undefined!
>>
>> when I compile, and the module fails to load, even though the build
>> completes. =A0I've traced it to the calculation of max_fragments:
>>
>> max_fragments =3D (block_group->key.offset -
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 btrfs_block_group_used(&block_group->ite=
m)) /
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 block_group->fragment_size;
>>
>> It seems that accessing block_group in here is causing a reference t=
o
>> __udivdi3 somehow. =A0Any idea what's going on here?
>
> You can't to 64-bit divides on 32-bit archs. Make that
>
> max_fragments =3D block_group->key.offset - btrfs_block_group_used(&b=
lock_group->item);
> do_div(max_fragments, block_group->fragment_size);
>
> and it should work.
>
> --
> Jens Axboe
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Btrfs: take block group fragmentation into account for allocation
  2009-03-09  2:03   ` Yien Zheng
@ 2009-03-09 13:56     ` Simon Holm Thøgersen
  2009-03-09 15:21       ` Oliver Mattos
  0 siblings, 1 reply; 8+ messages in thread
From: Simon Holm Thøgersen @ 2009-03-09 13:56 UTC (permalink / raw)
  To: Yien Zheng; +Cc: Jens Axboe, linux-btrfs, Josef Bacik

s=C3=B8n, 08 03 2009 kl. 20:03 -0600, skrev Yien Zheng:
> Thanks Jens!  My research had indicated something about 64-bit
> division and using do_div, but you cleared it up for me.
>=20
> The fragmentation_percent function ultimately does another divide to
> get the percentage, so here's what I did to get the function to work:
>=20
> static int fragmentation_percent(struct btrfs_block_group_cache *bloc=
k_group)
> {
>         u64 max_fragments;
>         u64 fragments_ratio;
>=20
>         max_fragments =3D block_group->key.offset -
>                 btrfs_block_group_used(&block_group->item);
>         do_div(max_fragments, block_group->fragment_size);
>         fragments_ratio =3D block_group->fragments;
>         do_div(fragments_ratio, max_fragments);
>         return (fragments_ratio * 100);
> }
>=20
So the idea of the function is to return an integer in the range
[0,100]? The problem is that it will only return either 0 or 100, but
nothing in between, right?

Wouldn't something along the lines of the following be better?

u64 tmp =3D (block_group->key.offset -
	btrfs_block_group_used(&block_group->item);
u64 tmp2 =3D 100 * block_group->fragment_size * block_group->fragments;
do_div(tmp2, tmp);
return tmp2;


Simon Holm Th=C3=B8gersen

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Btrfs: take block group fragmentation into account for allocation
  2009-03-09 13:56     ` Simon Holm Thøgersen
@ 2009-03-09 15:21       ` Oliver Mattos
  2009-03-09 15:24         ` Josef Bacik
  0 siblings, 1 reply; 8+ messages in thread
From: Oliver Mattos @ 2009-03-09 15:21 UTC (permalink / raw)
  To: Simon Holm Thøgersen, Yien Zheng
  Cc: Jens Axboe, linux-btrfs, Josef Bacik


> So the idea of the function is to return an integer in the range
> [0,100]?

Why are we using a range of 0 to 100 anyway?  100 seems like an arbitary 
value for kernel space - why not just keep it as a value in the range 
[0,2^32) ?   That eliminates the arbitary constant of 100, and in some cases 
could reduce the effects of rounding and allow finer control at no 
additional expense. 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Btrfs: take block group fragmentation into account for allocation
  2009-03-09 15:21       ` Oliver Mattos
@ 2009-03-09 15:24         ` Josef Bacik
  2009-03-09 17:29           ` jim owens
  0 siblings, 1 reply; 8+ messages in thread
From: Josef Bacik @ 2009-03-09 15:24 UTC (permalink / raw)
  To: Oliver Mattos
  Cc: Simon Holm Thøgersen, Yien Zheng, Jens Axboe, linux-btrfs,
	Josef Bacik

On Mon, Mar 09, 2009 at 03:21:06PM -0000, Oliver Mattos wrote:
>
>> So the idea of the function is to return an integer in the range
>> [0,100]?
>
> Why are we using a range of 0 to 100 anyway?  100 seems like an arbitary  
> value for kernel space - why not just keep it as a value in the range  
> [0,2^32) ?   That eliminates the arbitary constant of 100, and in some 
> cases could reduce the effects of rounding and allow finer control at no  
> additional expense. 
>

Its not arbitrary, its a percentage, so 0-100 percent.

Josef

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Btrfs: take block group fragmentation into account for allocation
  2009-03-09 15:24         ` Josef Bacik
@ 2009-03-09 17:29           ` jim owens
  0 siblings, 0 replies; 8+ messages in thread
From: jim owens @ 2009-03-09 17:29 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Oliver Mattos, Simon Holm Thøgersen, Yien Zheng, Jens Axboe,
	linux-btrfs

Josef Bacik wrote:
> On Mon, Mar 09, 2009 at 03:21:06PM -0000, Oliver Mattos wrote:
>>> So the idea of the function is to return an integer in the range
>>> [0,100]?
>> Why are we using a range of 0 to 100 anyway?  100 seems like an arbitary  
>> value for kernel space - why not just keep it as a value in the range  
>> [0,2^32) ?   That eliminates the arbitary constant of 100, and in some 
>> cases could reduce the effects of rounding and allow finer control at no  
>> additional expense. 
>>
> 
> Its not arbitrary, its a percentage, so 0-100 percent.

True, however the decision to use a percentage scheme is arbitrary.

For triggers like this it is just as good to pick points
like 1/8 (12.5%) or 1/4 (25%) that can be calculated without
doing division.  Forgetting the 32 bit architecture problem,
many architectures are really slow at divide so it is better
to not use them unless it adds real value.  Other than percent
being easy to document, what is the justification for percent.

jim

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] Btrfs: take block group fragmentation into account for allocation
@ 2009-03-06 19:30 Josef Bacik
  0 siblings, 0 replies; 8+ messages in thread
From: Josef Bacik @ 2009-03-06 19:30 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Josef Bacik

This patch improves the allocators packing ability to greatly improve cold-cache
reads.  Instead of handling the empty_size logic within find_free_extent, we
simply pass it to btrfs_find_free_space, where it will check the fragmentation
level of the block group and decide if it wants to look for a empty cluster, or
simply allocate space for what we need.

This is accomplished by taking the total amount of free space fragments and
dividing it by the maximum number of free space fragments there can be and
multiplying it by 100.  This gives us the percentage of free space
fragmentation.  The maximum number of free space fragments is simply the free
space divided by the sectorsize, and then dividing that by 2.  If we are 20% or
more fragmented we will ignore the empty_size and just allocate num_bytes as
close to search_start as we can.  This results in the allocator trying its best
to fill up block groups before moving on to one farther down the disk.

I've also changed how the allocation hints are given.  Instead of relying on the
last allocation for the transaction, we rely on the last offset the particular
root allocated from.  This will help in packing allocations per roots close
together even in the face of fragmentation.  I may take out the alloc hint in
the transaction altogether, but that will be farther down the line.

Signed-off-by: Josef Bacik <jbacik@redhat.com>
---
 fs/btrfs/ctree.h            |    6 +++++-
 fs/btrfs/extent-tree.c      |   25 +++++++++++++++----------
 fs/btrfs/free-space-cache.c |   42 +++++++++++++-----------------------------
 3 files changed, 33 insertions(+), 40 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index bdf826c..64b4ee6 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -651,6 +651,10 @@ struct btrfs_block_group_cache {
 	struct rb_root free_space_bytes;
 	struct rb_root free_space_offset;
 
+	/* free space fragmentation stuff */
+	u64 fragments;
+	u64 fragment_size;
+
 	/* block group cache stuff */
 	struct rb_node cache_node;
 
@@ -2141,7 +2145,7 @@ void btrfs_remove_free_space_cache(struct btrfs_block_group_cache
 				   *block_group);
 struct btrfs_free_space *btrfs_find_free_space(struct btrfs_block_group_cache
 					       *block_group, u64 offset,
-					       u64 bytes);
+					       u64 bytes, u64 empty_size);
 void btrfs_dump_free_space(struct btrfs_block_group_cache *block_group,
 			   u64 bytes);
 u64 btrfs_block_group_free_space(struct btrfs_block_group_cache *block_group);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 38820b1..2a6fe71 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3082,8 +3082,8 @@ static noinline int find_free_extent(struct btrfs_trans_handle *trans,
 {
 	int ret = 0;
 	struct btrfs_root *root = orig_root->fs_info->extent_root;
-	u64 total_needed = num_bytes;
 	u64 *last_ptr = NULL;
+	u64 total_used = 0;
 	struct btrfs_block_group_cache *block_group = NULL;
 	int allowed_chunk_alloc = 0;
 	int fill_root_alloc_info = 0;
@@ -3111,13 +3111,12 @@ static noinline int find_free_extent(struct btrfs_trans_handle *trans,
 	if ((data & BTRFS_BLOCK_GROUP_DATA) && btrfs_test_opt(root, SSD))
 		last_ptr = &root->fs_info->last_data_alloc;
 
-	if (last_ptr && *last_ptr)
+	if (last_ptr && *last_ptr && !hint_byte)
 		hint_byte = *last_ptr;
 
 	search_start = max(search_start, first_logical_byte(root, 0));
 	search_start = max(search_start, hint_byte);
 
-	total_needed += empty_size;
 	block_group = btrfs_lookup_block_group(root->fs_info, search_start);
 	if (!block_group)
 		block_group = btrfs_lookup_first_block_group(root->fs_info,
@@ -3163,10 +3162,12 @@ have_block_group:
 			goto loop;
 
 		free_space = btrfs_find_free_space(block_group, search_start,
-						   total_needed);
+						   num_bytes, empty_size);
 		if (!free_space)
 			goto loop;
 
+		total_used = min_t(u64, num_bytes + empty_size,
+				   free_space->bytes);
 		search_start = stripe_align(root, free_space->offset);
 
 		/* past where we're allowed to search */
@@ -3222,7 +3223,7 @@ have_block_group:
 			orig_root->alloc_bytes -= num_bytes;
 
 			if (!orig_root->alloc_bytes) {
-				orig_root->alloc_bytes = total_needed;
+				orig_root->alloc_bytes = total_used;
 				orig_root->alloc_offset = search_start;
 				used = 1;
 			}
@@ -3232,23 +3233,23 @@ have_block_group:
 			u64 bytes = orig_root->alloc_bytes;
 
 			orig_root->alloc_offset = search_start + num_bytes;
-			orig_root->alloc_bytes = total_needed - num_bytes;
+			orig_root->alloc_bytes = total_used - num_bytes;
 			used = 1;
 			spin_unlock(&orig_root->alloc_lock);
 
 			btrfs_add_free_space_lock(block_group, offset, bytes);
 		} else {
 			orig_root->alloc_offset = search_start + num_bytes;
-			orig_root->alloc_bytes = total_needed - num_bytes;
+			orig_root->alloc_bytes = total_used - num_bytes;
 			used = 1;
 			spin_unlock(&orig_root->alloc_lock);
 		}
 
 		if (used) {
 			btrfs_remove_free_space_lock(block_group, search_start,
-						     total_needed);
+						     total_used);
 			if (last_ptr)
-				*last_ptr = search_start + total_needed;
+				*last_ptr = search_start + total_used;
 		}
 
 		mutex_unlock(&block_group->alloc_mutex);
@@ -3271,7 +3272,6 @@ loop:
 		int try_again = empty_size;
 
 		block_group = NULL;
-		total_needed -= empty_size;
 		empty_size = 0;
 
 		if (allowed_chunk_alloc) {
@@ -3356,6 +3356,7 @@ again:
 			spin_unlock(&root->alloc_lock);
 			return 0;
 		}
+		hint_byte = root->alloc_offset;
 		spin_unlock(&root->alloc_lock);
 	}
 
@@ -6398,6 +6399,8 @@ int btrfs_read_block_groups(struct btrfs_root *root)
 		set_avail_alloc_bits(root->fs_info, cache->flags);
 		if (btrfs_chunk_readonly(root, cache->key.objectid))
 			set_block_group_readonly(cache);
+		cache->fragments = 0;
+		cache->fragment_size = root->sectorsize;
 	}
 	ret = 0;
 error:
@@ -6430,6 +6433,8 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans,
 	mutex_init(&cache->alloc_mutex);
 	mutex_init(&cache->cache_mutex);
 	INIT_LIST_HEAD(&cache->list);
+	cache->fragments = 0;
+	cache->fragment_size = root->sectorsize;
 
 	btrfs_set_block_group_used(&cache->item, bytes_used);
 	btrfs_set_block_group_chunk_objectid(&cache->item, chunk_objectid);
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index d1e5f0e..23cb12c 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -163,6 +163,7 @@ static void unlink_free_space(struct btrfs_block_group_cache *block_group,
 {
 	rb_erase(&info->offset_index, &block_group->free_space_offset);
 	rb_erase(&info->bytes_index, &block_group->free_space_bytes);
+	block_group->fragments -= 1;
 }
 
 static int link_free_space(struct btrfs_block_group_cache *block_group,
@@ -181,6 +182,8 @@ static int link_free_space(struct btrfs_block_group_cache *block_group,
 	if (ret)
 		return ret;
 
+	block_group->fragments += 1;
+
 	return ret;
 }
 
@@ -447,44 +450,25 @@ void btrfs_remove_free_space_cache(struct btrfs_block_group_cache *block_group)
 	mutex_unlock(&block_group->alloc_mutex);
 }
 
-#if 0
-static struct btrfs_free_space *btrfs_find_free_space_offset(struct
-						      btrfs_block_group_cache
-						      *block_group, u64 offset,
-						      u64 bytes)
+static int fragmentation_percent(struct btrfs_block_group_cache *block_group)
 {
-	struct btrfs_free_space *ret;
+	u64 max_fragments;
 
-	mutex_lock(&block_group->alloc_mutex);
-	ret = tree_search_offset(&block_group->free_space_offset, offset,
-				 bytes, 0);
-	mutex_unlock(&block_group->alloc_mutex);
-
-	return ret;
+	max_fragments = (block_group->key.offset -
+		btrfs_block_group_used(&block_group->item)) /
+		block_group->fragment_size;
+	return ((block_group->fragments / max_fragments) * 100);
 }
 
-static struct btrfs_free_space *btrfs_find_free_space_bytes(struct
-						     btrfs_block_group_cache
-						     *block_group, u64 offset,
-						     u64 bytes)
-{
-	struct btrfs_free_space *ret;
-
-	mutex_lock(&block_group->alloc_mutex);
-
-	ret = tree_search_bytes(&block_group->free_space_bytes, offset, bytes);
-	mutex_unlock(&block_group->alloc_mutex);
-
-	return ret;
-}
-#endif
-
 struct btrfs_free_space *btrfs_find_free_space(struct btrfs_block_group_cache
 					       *block_group, u64 offset,
-					       u64 bytes)
+					       u64 bytes, u64 empty_size)
 {
 	struct btrfs_free_space *ret = NULL;
 
+	if (empty_size && fragmentation_percent(block_group) < 20)
+		bytes += empty_size;
+
 	ret = tree_search_offset(&block_group->free_space_offset, offset,
 				 bytes, 0);
 	if (!ret)
-- 
1.5.4.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-03-09 17:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-08 14:42 [PATCH] Btrfs: take block group fragmentation into account for allocation Yien Zheng
2009-03-08 16:37 ` Jens Axboe
2009-03-09  2:03   ` Yien Zheng
2009-03-09 13:56     ` Simon Holm Thøgersen
2009-03-09 15:21       ` Oliver Mattos
2009-03-09 15:24         ` Josef Bacik
2009-03-09 17:29           ` jim owens
  -- strict thread matches above, loose matches on Subject: below --
2009-03-06 19:30 Josef Bacik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.