All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/1] ext4: Fix ext4_mb_normalize_request
@ 2014-06-13 13:55 Lukas Czerner
  2014-06-13 13:55 ` [PATCH 1/1] " Lukas Czerner
  2014-06-24 12:36 ` [RFC][PATCH 0/1] " Lukáš Czerner
  0 siblings, 2 replies; 6+ messages in thread
From: Lukas Czerner @ 2014-06-13 13:55 UTC (permalink / raw)
  To: linux-ext4

This is my first attempt to fix the ext4_mb_normalize_request() function in
in ext4 which deals with file preallocations.

This is not yet a final version as it needs more testing, however I'd like
to see some suggestions.


Currently there are couple of problems with ext4_mb_normalize_request().

- We're trying to normalize unwritten extents allocation which is
  entirely unnecessary, because user exactly knows what how much space
  he is going to need - no need for file system to do preallocations.

- ext4_mb_normalize_request() unnecessarily divides bigger allocation
  requests to small ones (8MB). I believe that this is a bug rather than
  design.

- For smaller allocations (or smaller files) we do not even respect the
  fe_logical. Although we do respect it for bigger files.

- Overall the logic within ext4_mb_normalize_request() is weird and
  no-one really understand why it is the way it is.

Fix all of this by:

- Disabling preallocation for unwritten extent allocation. However
  because the maximum size of the unwritten extent is one block smaller
  than written, in order to avoid unnecessary fragmentation we limit the
  request to EXT_INIT_MAX_LEN / 2

- Get rid of the "if table" in ext4_mb_normalize_request() and replace
  it with simply aligning the assumed end of the file up to power of
  two. But we still limit the allocation size to EXT4_BLOCKS_PER_GROUP.
  Also do this on file system block units to take into account different
  block sized file systems.


It passes xfstests cleanly in default configuration, I've not tried any
non-default options yet.

I've tried to test how much it changes allocation. The test and some results
can be found at

http://people.redhat.com/lczerner/mballoc/

normalize.sh is the simple script I run and output.normalize_orig[34]
contains result from the vanila  3.15.0 while output.normalize_patch[56]
contains results with this patch.

>From the performance stand point I do not see any major differences except
that untar seems to always generate better results (which might be because
of bigger continuous extents).

Free space fragmentation seems to be about the same, however with the patch
there seems to be less smaller free space extents and more bigger ones which
is expected due to bigger preallocations (and I think it's a good thing).

The biggest difference which is obvious from the results is that extent tree
is much smaller (sometimes five times smaller) with the patch. Except of the
fallocate case because we now limit the requests to (EXT_INIT_MAX_LEN / 2)
so we can not merge them - it might be worth experimenting with something
smaller which is a factor of unwritten extent size.

But as I said the extent tree is much smaller which means that the extents
overall are bigger which again is a good thing. This becomes very obvious
when we look at the extent tree of the image file (the last steps in the
test).

What do you think ?

Thanks!
-Lukas




^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/1] ext4: Fix ext4_mb_normalize_request
  2014-06-13 13:55 [RFC][PATCH 0/1] ext4: Fix ext4_mb_normalize_request Lukas Czerner
@ 2014-06-13 13:55 ` Lukas Czerner
  2014-06-24 12:36 ` [RFC][PATCH 0/1] " Lukáš Czerner
  1 sibling, 0 replies; 6+ messages in thread
From: Lukas Czerner @ 2014-06-13 13:55 UTC (permalink / raw)
  To: linux-ext4; +Cc: Lukas Czerner

Currently there are couple of problems with ext4_mb_normalize_request().

- We're trying to normalize unwritten extents allocation which is
  entirely unnecessary, because user exactly knows what how much space
  he is going to need - no need for file system to do preallocations.

- ext4_mb_normalize_request() unnecessarily divides bigger allocation
  requests to small ones (8MB). I believe that this is a bug rather than
  design.

- For smaller allocations (or smaller files) we do not even respect the
  fe_logical. Although we do respect it for bigger files.

- Overall the logic within ext4_mb_normalize_request() is weird and
  no-one really understand why it is the way it is.

Fix all of this by:

- Disabling preallocation for unwritten extent allocation. However
  because the maximum size of the unwritten extent is one block smaller
  than written, in order to avoid unnecessary fragmentation we limit the
  request to EXT_INIT_MAX_LEN / 2

- Get rid of the "if table" in ext4_mb_normalize_request() and replace
  it with simply aligning the assumed end of the file up to power of
  two. But we still limit the allocation size to EXT4_BLOCKS_PER_GROUP.
  Also do this on file system block units to take into account different
  block sized file systems.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
 fs/ext4/extents.c | 29 +++++++++++++++------
 fs/ext4/mballoc.c | 78 +++++++++++++++----------------------------------------
 2 files changed, 42 insertions(+), 65 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 4da228a..0af9aaf 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4412,12 +4412,28 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
 	 * EXT_INIT_MAX_LEN and for an unwritten extent this limit is
 	 * EXT_UNWRITTEN_MAX_LEN.
 	 */
+	ar.flags = 0;
 	if (map->m_len > EXT_INIT_MAX_LEN &&
-	    !(flags & EXT4_GET_BLOCKS_UNWRIT_EXT))
+	    !(flags & EXT4_GET_BLOCKS_UNWRIT_EXT)) {
 		map->m_len = EXT_INIT_MAX_LEN;
-	else if (map->m_len > EXT_UNWRITTEN_MAX_LEN &&
-		 (flags & EXT4_GET_BLOCKS_UNWRIT_EXT))
-		map->m_len = EXT_UNWRITTEN_MAX_LEN;
+		/*
+		 * We're going to allocate as much as we can
+		 * so there is no reason to do preallocation
+		 */
+		ar.flags |= EXT4_MB_HINT_NOPREALLOC;
+	} else if (map->m_len >= EXT_UNWRITTEN_MAX_LEN &&
+		 (flags & EXT4_GET_BLOCKS_UNWRIT_EXT)) {
+		/*
+		 * We want to avoid unnecessary fragmentation. Setting this
+		 * to EXT_UNWRITTEN_MAX_LEN would leave 1 block in the block
+		 * group.
+		 */
+		map->m_len = EXT_INIT_MAX_LEN / 2;
+	}
+
+	/* Disable preallocation for unwritten extents. */
+	if (flags & EXT4_GET_BLOCKS_UNWRIT_EXT)
+		ar.flags |= EXT4_MB_HINT_NOPREALLOC;
 
 	/* Check if we can really insert (m_lblk)::(m_lblk + m_len) extent */
 	newex.ee_len = cpu_to_le16(map->m_len);
@@ -4444,10 +4460,7 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
 	ar.goal -= offset;
 	ar.logical -= offset;
 	if (S_ISREG(inode->i_mode))
-		ar.flags = EXT4_MB_HINT_DATA;
-	else
-		/* disable in-core preallocation for non-regular files */
-		ar.flags = 0;
+		ar.flags |= EXT4_MB_HINT_DATA;
 	if (flags & EXT4_GET_BLOCKS_NO_NORMALIZE)
 		ar.flags |= EXT4_MB_HINT_NOPREALLOC;
 	newblock = ext4_mb_new_blocks(handle, &ar, &err);
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 59e3162..cc55150 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2993,10 +2993,9 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac,
 				struct ext4_allocation_request *ar)
 {
 	struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb);
-	int bsbits, max;
-	ext4_lblk_t end;
-	loff_t size, start_off;
-	loff_t orig_size __maybe_unused;
+	int bsbits;
+	loff_t end;
+	loff_t size;
 	ext4_lblk_t start;
 	struct ext4_inode_info *ei = EXT4_I(ac->ac_inode);
 	struct ext4_prealloc_space *pa;
@@ -3022,64 +3021,29 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac,
 
 	bsbits = ac->ac_sb->s_blocksize_bits;
 
-	/* first, let's learn actual file size
-	 * given current request is allocated */
-	size = ac->ac_o_ex.fe_logical + EXT4_C2B(sbi, ac->ac_o_ex.fe_len);
-	size = size << bsbits;
-	if (size < i_size_read(ac->ac_inode))
-		size = i_size_read(ac->ac_inode);
-	orig_size = size;
-
-	/* max size of free chunks */
-	max = 2 << bsbits;
-
-#define NRL_CHECK_SIZE(req, size, max, chunk_size)	\
-		(req <= (size) || max <= (chunk_size))
-
-	/* first, try to predict filesize */
-	/* XXX: should this table be tunable? */
-	start_off = 0;
-	if (size <= 16 * 1024) {
-		size = 16 * 1024;
-	} else if (size <= 32 * 1024) {
-		size = 32 * 1024;
-	} else if (size <= 64 * 1024) {
-		size = 64 * 1024;
-	} else if (size <= 128 * 1024) {
-		size = 128 * 1024;
-	} else if (size <= 256 * 1024) {
-		size = 256 * 1024;
-	} else if (size <= 512 * 1024) {
-		size = 512 * 1024;
-	} else if (size <= 1024 * 1024) {
-		size = 1024 * 1024;
-	} else if (NRL_CHECK_SIZE(size, 4 * 1024 * 1024, max, 2 * 1024)) {
-		start_off = ((loff_t)ac->ac_o_ex.fe_logical >>
-						(21 - bsbits)) << 21;
-		size = 2 * 1024 * 1024;
-	} else if (NRL_CHECK_SIZE(size, 8 * 1024 * 1024, max, 4 * 1024)) {
-		start_off = ((loff_t)ac->ac_o_ex.fe_logical >>
-							(22 - bsbits)) << 22;
-		size = 4 * 1024 * 1024;
-	} else if (NRL_CHECK_SIZE(ac->ac_o_ex.fe_len,
-					(8<<20)>>bsbits, max, 8 * 1024)) {
-		start_off = ((loff_t)ac->ac_o_ex.fe_logical >>
-							(23 - bsbits)) << 23;
-		size = 8 * 1024 * 1024;
-	} else {
-		start_off = (loff_t)ac->ac_o_ex.fe_logical << bsbits;
-		size	  = ac->ac_o_ex.fe_len << bsbits;
-	}
-	size = size >> bsbits;
-	start = start_off >> bsbits;
+	end = ac->ac_o_ex.fe_logical + EXT4_C2B(sbi, ac->ac_o_ex.fe_len);
+	if ((end << bsbits) < i_size_read(ac->ac_inode))
+		end = i_size_read(ac->ac_inode) >> bsbits;
+
+	/* Guess the file size -- round up to nearest power of two */
+	end = roundup_pow_of_two(end + 1);
+	start = ac->ac_o_ex.fe_logical;
+
+	/* Get the actual allocation size */
+	size = end - start;
+
+	/* We can not allocate more blocks than can be found in the group */
+	if (size > EXT4_BLOCKS_PER_GROUP(ac->ac_sb))
+		size = EXT4_BLOCKS_PER_GROUP(ac->ac_sb);
 
 	/* don't cover already allocated blocks in selected range */
 	if (ar->pleft && start <= ar->lleft) {
 		size -= ar->lleft + 1 - start;
 		start = ar->lleft + 1;
 	}
-	if (ar->pright && start + size - 1 >= ar->lright)
-		size -= start + size - ar->lright;
+	end = start + size;
+	if (ar->pright && end - 1 >= ar->lright)
+		size -= (end - ar->lright);
 
 	end = start + size;
 
@@ -3173,7 +3137,7 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac,
 	}
 
 	mb_debug(1, "goal: %u(was %u) blocks at %u\n", (unsigned) size,
-		(unsigned) orig_size, (unsigned) start);
+		(unsigned) ac->ac_o_ex.fe_logical, (unsigned) start);
 }
 
 static void ext4_mb_collect_stats(struct ext4_allocation_context *ac)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH 0/1] ext4: Fix ext4_mb_normalize_request
  2014-06-13 13:55 [RFC][PATCH 0/1] ext4: Fix ext4_mb_normalize_request Lukas Czerner
  2014-06-13 13:55 ` [PATCH 1/1] " Lukas Czerner
@ 2014-06-24 12:36 ` Lukáš Czerner
  2014-06-24 16:25   ` Andreas Dilger
  1 sibling, 1 reply; 6+ messages in thread
From: Lukáš Czerner @ 2014-06-24 12:36 UTC (permalink / raw)
  To: linux-ext4

On Fri, 13 Jun 2014, Lukas Czerner wrote:

> Date: Fri, 13 Jun 2014 15:55:35 +0200
> From: Lukas Czerner <lczerner@redhat.com>
> To: linux-ext4@vger.kernel.org
> Subject: [RFC][PATCH 0/1] ext4: Fix ext4_mb_normalize_request
> 
> This is my first attempt to fix the ext4_mb_normalize_request() function in
> in ext4 which deals with file preallocations.
> 
> This is not yet a final version as it needs more testing, however I'd like
> to see some suggestions.

Does anyone have any comments on this and the related patch ?

Thanks!
-Lukas

> 
> 
> Currently there are couple of problems with ext4_mb_normalize_request().
> 
> - We're trying to normalize unwritten extents allocation which is
>   entirely unnecessary, because user exactly knows what how much space
>   he is going to need - no need for file system to do preallocations.
> 
> - ext4_mb_normalize_request() unnecessarily divides bigger allocation
>   requests to small ones (8MB). I believe that this is a bug rather than
>   design.
> 
> - For smaller allocations (or smaller files) we do not even respect the
>   fe_logical. Although we do respect it for bigger files.
> 
> - Overall the logic within ext4_mb_normalize_request() is weird and
>   no-one really understand why it is the way it is.
> 
> Fix all of this by:
> 
> - Disabling preallocation for unwritten extent allocation. However
>   because the maximum size of the unwritten extent is one block smaller
>   than written, in order to avoid unnecessary fragmentation we limit the
>   request to EXT_INIT_MAX_LEN / 2
> 
> - Get rid of the "if table" in ext4_mb_normalize_request() and replace
>   it with simply aligning the assumed end of the file up to power of
>   two. But we still limit the allocation size to EXT4_BLOCKS_PER_GROUP.
>   Also do this on file system block units to take into account different
>   block sized file systems.
> 
> 
> It passes xfstests cleanly in default configuration, I've not tried any
> non-default options yet.
> 
> I've tried to test how much it changes allocation. The test and some results
> can be found at
> 
> http://people.redhat.com/lczerner/mballoc/
> 
> normalize.sh is the simple script I run and output.normalize_orig[34]
> contains result from the vanila  3.15.0 while output.normalize_patch[56]
> contains results with this patch.
> 
> From the performance stand point I do not see any major differences except
> that untar seems to always generate better results (which might be because
> of bigger continuous extents).
> 
> Free space fragmentation seems to be about the same, however with the patch
> there seems to be less smaller free space extents and more bigger ones which
> is expected due to bigger preallocations (and I think it's a good thing).
> 
> The biggest difference which is obvious from the results is that extent tree
> is much smaller (sometimes five times smaller) with the patch. Except of the
> fallocate case because we now limit the requests to (EXT_INIT_MAX_LEN / 2)
> so we can not merge them - it might be worth experimenting with something
> smaller which is a factor of unwritten extent size.
> 
> But as I said the extent tree is much smaller which means that the extents
> overall are bigger which again is a good thing. This becomes very obvious
> when we look at the extent tree of the image file (the last steps in the
> test).
> 
> What do you think ?
> 
> Thanks!
> -Lukas
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH 0/1] ext4: Fix ext4_mb_normalize_request
  2014-06-24 12:36 ` [RFC][PATCH 0/1] " Lukáš Czerner
@ 2014-06-24 16:25   ` Andreas Dilger
  2014-06-25 13:43     ` Lukáš Czerner
  0 siblings, 1 reply; 6+ messages in thread
From: Andreas Dilger @ 2014-06-24 16:25 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: Ext4 Developers List, Alex Zhuravlev

[-- Attachment #1: Type: text/plain, Size: 7677 bytes --]

On Jun 24, 2014, at 6:36 AM, Lukáš Czerner <lczerner@redhat.com> wrote:
> On Fri, 13 Jun 2014, Lukas Czerner wrote:
> 
>> Date: Fri, 13 Jun 2014 15:55:35 +0200
>> From: Lukas Czerner <lczerner@redhat.com>
>> To: linux-ext4@vger.kernel.org
>> Subject: [RFC][PATCH 0/1] ext4: Fix ext4_mb_normalize_request
>> 
>> This is my first attempt to fix the ext4_mb_normalize_request()
>> function in in ext4 which deals with file preallocations.
>> 
>> This is not yet a final version as it needs more testing, however
>> I'd like to see some suggestions.
> 
> Does anyone have any comments on this and the related patch ?

Comments inline.

>> Currently there are couple of problems with ext4_mb_normalize_request().
>> 
>> - We're trying to normalize unwritten extents allocation which is
>>  entirely unnecessary, because user exactly knows what how much space
>>  he is going to need - no need for file system to do preallocations.
>> 
>> - ext4_mb_normalize_request() unnecessarily divides bigger allocation
>>  requests to small ones (8MB). I believe that this is a bug rather
>>  than design.

The reason that the large requests were broken into smaller ones is
because it becomes increasingly difficult to find large contiguous
extents as the filesystem becomes more full.  If there was a single
buddy bitmap for the whole filesystem then it would be possible to
scan for e.g. 64MB extents of free blocks, but with the current code
this may require loading up a new block bitmap for each allocation.

It may be that with the optimizations that have been landed since the
mballoc code was originally written (to cache the largest free extent
in memory for each group) that this group descriptor walk may be fast
enough to handle large allocations.  In that case, the limit on the
number of groups to scan for an allocation may need to be increased.

>> - For smaller allocations (or smaller files) we do not even respect the
>>  fe_logical. Although we do respect it for bigger files.

This is done so it is possible to pack many small allocations into a
single large RAID stripe to avoid read-modify-write overhead.

>> - Overall the logic within ext4_mb_normalize_request() is weird and
>>  no-one really understand why it is the way it is.
>> 
>> Fix all of this by:
>> 
>> - Disabling preallocation for unwritten extent allocation. However
>>  because the maximum size of the unwritten extent is one block smaller
>>  than written, in order to avoid unnecessary fragmentation we limit the
>>  request to EXT_INIT_MAX_LEN / 2

That should work out well.  Once the extents are converted to initialized
extents they can be merged, and this will also leave some room to split
uninitialized extents if they are not completely overwritten.

>> - Get rid of the "if table" in ext4_mb_normalize_request() and replace
>>  it with simply aligning the assumed end of the file up to power of
>>  two. But we still limit the allocation size to EXT4_BLOCKS_PER_GROUP.
>>  Also do this on file system block units to take into account different
>>  block sized file systems.

We have a patch to make this table tunable, but in the end we never used
anything other than the default power-of-two values, so I don't think it
is a loss to remove this code.  That said, it would actually be better to
align with the s_stripe_width instead of the next power-of-two value.

It is important to note that having more extents is not a significant
performance impact if they are at least 4-8MB in size, but if the allocator
causes read-modify-write on a RAID array can cut performance in half.

>> It passes xfstests cleanly in default configuration, I've not tried any
>> non-default options yet.
>> 
>> I've tried to test how much it changes allocation. The test and some results
>> can be found at
>> 
>> http://people.redhat.com/lczerner/mballoc/
>> 
>> normalize.sh is the simple script I run and output.normalize_orig[34]
>> contains result from the vanila  3.15.0 while output.normalize_patch[56]
>> contains results with this patch.
>> 
>> From the performance stand point I do not see any major differences except
>> that untar seems to always generate better results (which might be because
>> of bigger continuous extents).

Actually, looking at the results, while the extent allocations look more
"regular" with the patched code, the actual performance is significantly
worse for some tests, both in terms of speed and in terms of free space
fragmentation:

orig3:					patch5:
[+] Fallocate test			[+] Fallocate test

real	0m0.216s			real	0m0.290s
user	0m0.000s			user	0m0.001s
sys	0m0.061s			sys	0m0.037s
Device: /dev/loop0			Device: /dev/loop0
Blocksize: 4096 bytes			Blocksize: 4096 bytes
Total blocks: 22282240			Total blocks: 22282240
Free blocks: 3535514 (15.9%)		Free blocks: 3535512 (15.9%)

Min. free extent: 32 KB 		Min. free extent: 32 KB 
Max. free extent: 2064256 KB		Max. free extent: 2064256 KB
Avg. free extent: 235700 KB		Avg. free extent: 228096 KB

orig3:					patch5:
[+] Copy linux source			[+] Copy linux source

real	4m17.888s			real	5m24.326s
user	0m2.265s			user	0m2.486s
sys	2m4.205s			sys	2m34.918s
Device: /dev/loop0			Device: /dev/loop0
Blocksize: 4096 bytes			Blocksize: 4096 bytes
Total blocks: 22282240			Total blocks: 22282240
Free blocks: 17536027 (78.7%)		Free blocks: 17536042 (78.7%)

Min. free extent: 4 KB 			Min. free extent: 4 KB 
Max. free extent: 2064256 KB		Max. free extent: 2064256 KB
Avg. free extent: 267724 KB		Avg. free extent: 209384 KB

orig3:					patch5:
[+] Untar linux source			[+] Untar linux source

real	3m34.945s			real	3m43.807s
user	0m3.459s			user	0m3.687s
sys	1m35.126s			sys	1m42.839s
Device: /dev/loop0			Device: /dev/loop0
Blocksize: 4096 bytes			Blocksize: 4096 bytes
Total blocks: 22282240			Total blocks: 22282240
Free blocks: 8852805 (39.7%)		Free blocks: 8852831 (39.7%)

Min. free extent: 4 KB 			Min. free extent: 4 KB 
Max. free extent: 2064256 KB		Max. free extent: 2064256 KB
Avg. free extent: 102936 KB		Avg. free extent: 72120 KB

The same is true for the "single dd" and "multiple dd" tests.  The only one
that shows somewhat better performance and fragmentation results is fsstress,
but I wouldn't exactly call that representative of normal user workloads.


>> Free space fragmentation seems to be about the same, however with the patch
>> there seems to be less smaller free space extents and more bigger ones which
>> is expected due to bigger preallocations (and I think it's a good thing).

Hmm, this is exactly the opposite of what I see in the output files?

>> The biggest difference which is obvious from the results is that extent tree
>> is much smaller (sometimes five times smaller) with the patch. Except of the
>> fallocate case because we now limit the requests to (EXT_INIT_MAX_LEN / 2)
>> so we can not merge them - it might be worth experimenting with something
>> smaller which is a factor of unwritten extent size.
>> 
>> But as I said the extent tree is much smaller which means that the extents
>> overall are bigger which again is a good thing. This becomes very obvious
>> when we look at the extent tree of the image file (the last steps in the
>> test).
>> 
>> What do you think ?

I definitely agree that there is work to be done to improve this code, but
since the results are quite mixed I think it makes sense to separate out
some of the changes into different patches and test them independently.  That
will simplify the isolation of which changes are affecting the performance.


Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH 0/1] ext4: Fix ext4_mb_normalize_request
  2014-06-24 16:25   ` Andreas Dilger
@ 2014-06-25 13:43     ` Lukáš Czerner
  2014-06-25 14:20       ` Lukáš Czerner
  0 siblings, 1 reply; 6+ messages in thread
From: Lukáš Czerner @ 2014-06-25 13:43 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Ext4 Developers List, Alex Zhuravlev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 12781 bytes --]

On Tue, 24 Jun 2014, Andreas Dilger wrote:

> Date: Tue, 24 Jun 2014 10:25:56 -0600
> From: Andreas Dilger <adilger@dilger.ca>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>,
>     Alex Zhuravlev <alexey.zhuravlev@intel.com>
> Subject: Re: [RFC][PATCH 0/1] ext4: Fix ext4_mb_normalize_request
> 
> On Jun 24, 2014, at 6:36 AM, Lukáš Czerner <lczerner@redhat.com> wrote:
> > On Fri, 13 Jun 2014, Lukas Czerner wrote:
> > 
> >> Date: Fri, 13 Jun 2014 15:55:35 +0200
> >> From: Lukas Czerner <lczerner@redhat.com>
> >> To: linux-ext4@vger.kernel.org
> >> Subject: [RFC][PATCH 0/1] ext4: Fix ext4_mb_normalize_request
> >> 
> >> This is my first attempt to fix the ext4_mb_normalize_request()
> >> function in in ext4 which deals with file preallocations.
> >> 
> >> This is not yet a final version as it needs more testing, however
> >> I'd like to see some suggestions.
> > 
> > Does anyone have any comments on this and the related patch ?
> 
> Comments inline.
> 
> >> Currently there are couple of problems with ext4_mb_normalize_request().
> >> 
> >> - We're trying to normalize unwritten extents allocation which is
> >>  entirely unnecessary, because user exactly knows what how much space
> >>  he is going to need - no need for file system to do preallocations.
> >> 
> >> - ext4_mb_normalize_request() unnecessarily divides bigger allocation
> >>  requests to small ones (8MB). I believe that this is a bug rather
> >>  than design.
> 
> The reason that the large requests were broken into smaller ones is
> because it becomes increasingly difficult to find large contiguous
> extents as the filesystem becomes more full.  If there was a single
> buddy bitmap for the whole filesystem then it would be possible to
> scan for e.g. 64MB extents of free blocks, but with the current code
> this may require loading up a new block bitmap for each allocation.
> 
> It may be that with the optimizations that have been landed since the
> mballoc code was originally written (to cache the largest free extent
> in memory for each group) that this group descriptor walk may be fast
> enough to handle large allocations.  In that case, the limit on the
> number of groups to scan for an allocation may need to be increased.

I think that it definitelly got better, moreover with features like
flex_bg we pack it more closely together making it easier for
readahead to be actually usefull.

This was mostly a experiment on how can we change it and what the
results will be. However I am a bit puzzled about your answer. While
I do understand that making the allocation unnecessarily large is
harder for the allocator, there is no question about that. We might
want to add automatic tweak to make it smaller as the file system
fills up, or as the free space becomes more fragmented. However this
applies to the rounding up the allocation request for speculative
preallocation.

What I described was not the case, it was always capped at 8MB and I
do not think it was a design decision


But I do not think this was a design decision but rather a bug. See
the code...

	} else if (NRL_CHECK_SIZE(ac->ac_o_ex.fe_len,
					(8<<20)>>bsbits, max, 8 * 1024)) {
		start_off = ((loff_t)ac->ac_o_ex.fe_logical >>
							(23 - bsbits)) << 23;
		size = 8 * 1024 * 1024;
	} else {
		start_off = (loff_t)ac->ac_o_ex.fe_logical << bsbits;
		size	  = ac->ac_o_ex.fe_len << bsbits;
	}

the last "else" is there exactly for the big allocation, however as
it stands the last else condition will never be false when we get to
it simply because:

max = 2 << bsbits;

#define NRL_CHECK_SIZE(req, size, max, chunk_size)	\
		(req <= (size) || max <= (chunk_size))

which means that (max <= 8 * 1024) will always be true not matter
what is the block size (well, for the maximum of 4k block size).

Or do you think that having several smaller allocations is faster
than one bigger allocation ? I do not think so, if nothing the file
might get more fragmented as shown by the tests.


> 
> >> - For smaller allocations (or smaller files) we do not even respect the
> >>  fe_logical. Although we do respect it for bigger files.
> 
> This is done so it is possible to pack many small allocations into a
> single large RAID stripe to avoid read-modify-write overhead.

I do not think that ext4_mb_normalize_request() is supposed to care
about raid. Also ac_g_ex.fe_logical does not seem to play any role
in raid alignment, or am I mistaken ?

In the case of small file it will be set to 0.

> 
> >> - Overall the logic within ext4_mb_normalize_request() is weird and
> >>  no-one really understand why it is the way it is.
> >> 
> >> Fix all of this by:
> >> 
> >> - Disabling preallocation for unwritten extent allocation. However
> >>  because the maximum size of the unwritten extent is one block smaller
> >>  than written, in order to avoid unnecessary fragmentation we limit the
> >>  request to EXT_INIT_MAX_LEN / 2
> 
> That should work out well.  Once the extents are converted to initialized
> extents they can be merged, and this will also leave some room to split
> uninitialized extents if they are not completely overwritten.
> 
> >> - Get rid of the "if table" in ext4_mb_normalize_request() and replace
> >>  it with simply aligning the assumed end of the file up to power of
> >>  two. But we still limit the allocation size to EXT4_BLOCKS_PER_GROUP.
> >>  Also do this on file system block units to take into account different
> >>  block sized file systems.
> 
> We have a patch to make this table tunable, but in the end we never used
> anything other than the default power-of-two values, so I don't think it
> is a loss to remove this code.  That said, it would actually be better to
> align with the s_stripe_width instead of the next power-of-two value.

Good point, we can align it to s_stripe_width after rouding up to
nearest power of two, though for smaller allocation this might not
be so helpful.

> 
> It is important to note that having more extents is not a significant
> performance impact if they are at least 4-8MB in size, but if the allocator
> causes read-modify-write on a RAID array can cut performance in half.

I agree, however ext4_mb_normalize_request() is mostly about
preallocation, it does not do any allocation decisions based on the
raid. However I agree that we can align the size to s_stripe_width.

> 
> >> It passes xfstests cleanly in default configuration, I've not tried any
> >> non-default options yet.
> >> 
> >> I've tried to test how much it changes allocation. The test and some results
> >> can be found at
> >> 
> >> http://people.redhat.com/lczerner/mballoc/
> >> 
> >> normalize.sh is the simple script I run and output.normalize_orig[34]
> >> contains result from the vanila  3.15.0 while output.normalize_patch[56]
> >> contains results with this patch.
> >> 
> >> From the performance stand point I do not see any major differences except
> >> that untar seems to always generate better results (which might be because
> >> of bigger continuous extents).
> 
> Actually, looking at the results, while the extent allocations look more
> "regular" with the patched code, the actual performance is significantly
> worse for some tests, both in terms of speed and in terms of free space
> fragmentation:

The test is far from being perfect and most of the times varies
widely, but it does not look like it's worse with the patch:


		Orig3		Patch5		Orig4		Patch6
-------------------------------------------------------------------------
Fallocate	0m0.276s	0m0.324s	0m0.258s	0m0.197s
Copy		4m58.145s	5m37.090s	4m43.531s	4m5.406s
Tar		6m10.526s	6m5.518s	5m42.999s	5m32.376s
Untar		7m23.806s	7m6.787s	7m27.812s	7m11.225s
Single_dd	0m15.979s	0m16.504s	0m18.130s	0m16.358s
Multiple_dd	3m13.562s	3m15.353s	3m11.031s	3m14.018s
Fsstress	0m6.401s	0m7.994s	0m6.555s	0m6.375s
-------------------------- TEST ON IMAGE FILE ---------------------------
Fallocate	0m0.216s	0m0.290s	0m0.272s	0m0.206s
Copy		4m17.888s	5m24.326s	4m14.834s	4m19.867s
Tar		4m31.356s	4m39.639s	4m33.940s	4m37.231s
Untar		3m34.945s	3m43.807s	4m4.969s	3m28.390s
Single_dd	0m14.875s	0m16.735s	0m19.060s	0m16.500s
Multiple_dd	3m14.358s	3m24.676s	3m21.938s	3m41.215s
Fsstress	0m11.472s	0m9.079s	0m9.705s	0m11.167s

Nice table, I should have done that before ;) So you can see that it
really differs too much. But there are some patterns to be seen.

Here patched ext4 was faster although the difference is too small to
be taken seriously. I need to do more runs and proper statistics.
Tar		6m10.526s	6m5.518s	5m42.999s	5m32.376s

Here patched ext4 is slower, but the difference is again
insignificant.
Multiple_dd	3m13.562s	3m15.353s	3m11.031s	3m14.018s

So I'd say as fat as running times are concerned the appear to be
the no significant difference, but more testing and better
statistics should be done.

You're right that Avg. free extent seems to be slightly smaller,
however looking at the free extents histogram I think that Weighted
Average would show results more favourable for the patches ext4. The
reason is that currently we're packing allocations closely together
which will give us a bit more bigger free extents, however we also
have much more smaller ones. Whole with my patch there seems to be
less really small extents and more medium sized extents (4M-128M).

But that needs to be confirmed.


> 
> orig3:					patch5:
> [+] Fallocate test			[+] Fallocate test
> 
> real	0m0.216s			real	0m0.290s
> user	0m0.000s			user	0m0.001s
> sys	0m0.061s			sys	0m0.037s
> Device: /dev/loop0			Device: /dev/loop0
> Blocksize: 4096 bytes			Blocksize: 4096 bytes
> Total blocks: 22282240			Total blocks: 22282240
> Free blocks: 3535514 (15.9%)		Free blocks: 3535512 (15.9%)
> 
> Min. free extent: 32 KB 		Min. free extent: 32 KB 
> Max. free extent: 2064256 KB		Max. free extent: 2064256 KB
> Avg. free extent: 235700 KB		Avg. free extent: 228096 KB
> 
> orig3:					patch5:
> [+] Copy linux source			[+] Copy linux source
> 
> real	4m17.888s			real	5m24.326s
> user	0m2.265s			user	0m2.486s
> sys	2m4.205s			sys	2m34.918s
> Device: /dev/loop0			Device: /dev/loop0
> Blocksize: 4096 bytes			Blocksize: 4096 bytes
> Total blocks: 22282240			Total blocks: 22282240
> Free blocks: 17536027 (78.7%)		Free blocks: 17536042 (78.7%)
> 
> Min. free extent: 4 KB 			Min. free extent: 4 KB 
> Max. free extent: 2064256 KB		Max. free extent: 2064256 KB
> Avg. free extent: 267724 KB		Avg. free extent: 209384 KB
> 
> orig3:					patch5:
> [+] Untar linux source			[+] Untar linux source
> 
> real	3m34.945s			real	3m43.807s
> user	0m3.459s			user	0m3.687s
> sys	1m35.126s			sys	1m42.839s
> Device: /dev/loop0			Device: /dev/loop0
> Blocksize: 4096 bytes			Blocksize: 4096 bytes
> Total blocks: 22282240			Total blocks: 22282240
> Free blocks: 8852805 (39.7%)		Free blocks: 8852831 (39.7%)
> 
> Min. free extent: 4 KB 			Min. free extent: 4 KB 
> Max. free extent: 2064256 KB		Max. free extent: 2064256 KB
> Avg. free extent: 102936 KB		Avg. free extent: 72120 KB
> 
> The same is true for the "single dd" and "multiple dd" tests.  The only one
> that shows somewhat better performance and fragmentation results is fsstress,
> but I wouldn't exactly call that representative of normal user workloads.
> 
> 
> >> Free space fragmentation seems to be about the same, however with the patch
> >> there seems to be less smaller free space extents and more bigger ones which
> >> is expected due to bigger preallocations (and I think it's a good thing).
> 
> Hmm, this is exactly the opposite of what I see in the output files?

Seem my explanation above, I need to do a better job at presenting
the numbers in more understandable way.

> 
> >> The biggest difference which is obvious from the results is that extent tree
> >> is much smaller (sometimes five times smaller) with the patch. Except of the
> >> fallocate case because we now limit the requests to (EXT_INIT_MAX_LEN / 2)
> >> so we can not merge them - it might be worth experimenting with something
> >> smaller which is a factor of unwritten extent size.
> >> 
> >> But as I said the extent tree is much smaller which means that the extents
> >> overall are bigger which again is a good thing. This becomes very obvious
> >> when we look at the extent tree of the image file (the last steps in the
> >> test).
> >> 
> >> What do you think ?
> 
> I definitely agree that there is work to be done to improve this code, but
> since the results are quite mixed I think it makes sense to separate out
> some of the changes into different patches and test them independently.  That
> will simplify the isolation of which changes are affecting the performance.

I agree, more experimentation is needed.

Thanks for taking your time to look at this!
-Lukas

> 
> 
> Cheers, Andreas
> 
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH 0/1] ext4: Fix ext4_mb_normalize_request
  2014-06-25 13:43     ` Lukáš Czerner
@ 2014-06-25 14:20       ` Lukáš Czerner
  0 siblings, 0 replies; 6+ messages in thread
From: Lukáš Czerner @ 2014-06-25 14:20 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Ext4 Developers List, Alex Zhuravlev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 13879 bytes --]

On Wed, 25 Jun 2014, Lukáš Czerner wrote:

> Date: Wed, 25 Jun 2014 15:43:43 +0200 (CEST)
> From: Lukáš Czerner <lczerner@redhat.com>
> To: Andreas Dilger <adilger@dilger.ca>
> Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>,
>     Alex Zhuravlev <alexey.zhuravlev@intel.com>
> Subject: Re: [RFC][PATCH 0/1] ext4: Fix ext4_mb_normalize_request
> 
> On Tue, 24 Jun 2014, Andreas Dilger wrote:
> 
> > Date: Tue, 24 Jun 2014 10:25:56 -0600
> > From: Andreas Dilger <adilger@dilger.ca>
> > To: Lukáš Czerner <lczerner@redhat.com>
> > Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>,
> >     Alex Zhuravlev <alexey.zhuravlev@intel.com>
> > Subject: Re: [RFC][PATCH 0/1] ext4: Fix ext4_mb_normalize_request
> > 
> > On Jun 24, 2014, at 6:36 AM, Lukáš Czerner <lczerner@redhat.com> wrote:
> > > On Fri, 13 Jun 2014, Lukas Czerner wrote:
> > > 
> > >> Date: Fri, 13 Jun 2014 15:55:35 +0200
> > >> From: Lukas Czerner <lczerner@redhat.com>
> > >> To: linux-ext4@vger.kernel.org
> > >> Subject: [RFC][PATCH 0/1] ext4: Fix ext4_mb_normalize_request
> > >> 
> > >> This is my first attempt to fix the ext4_mb_normalize_request()
> > >> function in in ext4 which deals with file preallocations.
> > >> 
> > >> This is not yet a final version as it needs more testing, however
> > >> I'd like to see some suggestions.
> > > 
> > > Does anyone have any comments on this and the related patch ?
> > 
> > Comments inline.
> > 
> > >> Currently there are couple of problems with ext4_mb_normalize_request().
> > >> 
> > >> - We're trying to normalize unwritten extents allocation which is
> > >>  entirely unnecessary, because user exactly knows what how much space
> > >>  he is going to need - no need for file system to do preallocations.
> > >> 
> > >> - ext4_mb_normalize_request() unnecessarily divides bigger allocation
> > >>  requests to small ones (8MB). I believe that this is a bug rather
> > >>  than design.
> > 
> > The reason that the large requests were broken into smaller ones is
> > because it becomes increasingly difficult to find large contiguous
> > extents as the filesystem becomes more full.  If there was a single
> > buddy bitmap for the whole filesystem then it would be possible to
> > scan for e.g. 64MB extents of free blocks, but with the current code
> > this may require loading up a new block bitmap for each allocation.
> > 
> > It may be that with the optimizations that have been landed since the
> > mballoc code was originally written (to cache the largest free extent
> > in memory for each group) that this group descriptor walk may be fast
> > enough to handle large allocations.  In that case, the limit on the
> > number of groups to scan for an allocation may need to be increased.
> 
> I think that it definitelly got better, moreover with features like
> flex_bg we pack it more closely together making it easier for
> readahead to be actually usefull.
> 
> This was mostly a experiment on how can we change it and what the
> results will be. However I am a bit puzzled about your answer. While
> I do understand that making the allocation unnecessarily large is
> harder for the allocator, there is no question about that. We might
> want to add automatic tweak to make it smaller as the file system
> fills up, or as the free space becomes more fragmented. However this
> applies to the rounding up the allocation request for speculative
> preallocation.
> 
> What I described was not the case, it was always capped at 8MB and I
> do not think it was a design decision
> 
> 
> But I do not think this was a design decision but rather a bug. See
> the code...
> 
> 	} else if (NRL_CHECK_SIZE(ac->ac_o_ex.fe_len,
> 					(8<<20)>>bsbits, max, 8 * 1024)) {
> 		start_off = ((loff_t)ac->ac_o_ex.fe_logical >>
> 							(23 - bsbits)) << 23;
> 		size = 8 * 1024 * 1024;
> 	} else {
> 		start_off = (loff_t)ac->ac_o_ex.fe_logical << bsbits;
> 		size	  = ac->ac_o_ex.fe_len << bsbits;
> 	}
> 
> the last "else" is there exactly for the big allocation, however as
> it stands the last else condition will never be false when we get to
> it simply because:
> 
> max = 2 << bsbits;
> 
> #define NRL_CHECK_SIZE(req, size, max, chunk_size)	\
> 		(req <= (size) || max <= (chunk_size))
> 
> which means that (max <= 8 * 1024) will always be true not matter
> what is the block size (well, for the maximum of 4k block size).
> 
> Or do you think that having several smaller allocations is faster
> than one bigger allocation ? I do not think so, if nothing the file
> might get more fragmented as shown by the tests.
> 
> 
> > 
> > >> - For smaller allocations (or smaller files) we do not even respect the
> > >>  fe_logical. Although we do respect it for bigger files.
> > 
> > This is done so it is possible to pack many small allocations into a
> > single large RAID stripe to avoid read-modify-write overhead.
> 
> I do not think that ext4_mb_normalize_request() is supposed to care
> about raid. Also ac_g_ex.fe_logical does not seem to play any role
> in raid alignment, or am I mistaken ?
> 
> In the case of small file it will be set to 0.
> 
> > 
> > >> - Overall the logic within ext4_mb_normalize_request() is weird and
> > >>  no-one really understand why it is the way it is.
> > >> 
> > >> Fix all of this by:
> > >> 
> > >> - Disabling preallocation for unwritten extent allocation. However
> > >>  because the maximum size of the unwritten extent is one block smaller
> > >>  than written, in order to avoid unnecessary fragmentation we limit the
> > >>  request to EXT_INIT_MAX_LEN / 2
> > 
> > That should work out well.  Once the extents are converted to initialized
> > extents they can be merged, and this will also leave some room to split
> > uninitialized extents if they are not completely overwritten.
> > 
> > >> - Get rid of the "if table" in ext4_mb_normalize_request() and replace
> > >>  it with simply aligning the assumed end of the file up to power of
> > >>  two. But we still limit the allocation size to EXT4_BLOCKS_PER_GROUP.
> > >>  Also do this on file system block units to take into account different
> > >>  block sized file systems.
> > 
> > We have a patch to make this table tunable, but in the end we never used
> > anything other than the default power-of-two values, so I don't think it
> > is a loss to remove this code.  That said, it would actually be better to
> > align with the s_stripe_width instead of the next power-of-two value.
> 
> Good point, we can align it to s_stripe_width after rouding up to
> nearest power of two, though for smaller allocation this might not
> be so helpful.
> 
> > 
> > It is important to note that having more extents is not a significant
> > performance impact if they are at least 4-8MB in size, but if the allocator
> > causes read-modify-write on a RAID array can cut performance in half.
> 
> I agree, however ext4_mb_normalize_request() is mostly about
> preallocation, it does not do any allocation decisions based on the
> raid. However I agree that we can align the size to s_stripe_width.
> 
> > 
> > >> It passes xfstests cleanly in default configuration, I've not tried any
> > >> non-default options yet.
> > >> 
> > >> I've tried to test how much it changes allocation. The test and some results
> > >> can be found at
> > >> 
> > >> http://people.redhat.com/lczerner/mballoc/
> > >> 
> > >> normalize.sh is the simple script I run and output.normalize_orig[34]
> > >> contains result from the vanila  3.15.0 while output.normalize_patch[56]
> > >> contains results with this patch.
> > >> 
> > >> From the performance stand point I do not see any major differences except
> > >> that untar seems to always generate better results (which might be because
> > >> of bigger continuous extents).
> > 
> > Actually, looking at the results, while the extent allocations look more
> > "regular" with the patched code, the actual performance is significantly
> > worse for some tests, both in terms of speed and in terms of free space
> > fragmentation:
> 
> The test is far from being perfect and most of the times varies
> widely, but it does not look like it's worse with the patch:
> 
> 
> 		Orig3		Patch5		Orig4		Patch6
> -------------------------------------------------------------------------
> Fallocate	0m0.276s	0m0.324s	0m0.258s	0m0.197s
> Copy		4m58.145s	5m37.090s	4m43.531s	4m5.406s
> Tar		6m10.526s	6m5.518s	5m42.999s	5m32.376s
> Untar		7m23.806s	7m6.787s	7m27.812s	7m11.225s
> Single_dd	0m15.979s	0m16.504s	0m18.130s	0m16.358s
> Multiple_dd	3m13.562s	3m15.353s	3m11.031s	3m14.018s
> Fsstress	0m6.401s	0m7.994s	0m6.555s	0m6.375s
> -------------------------- TEST ON IMAGE FILE ---------------------------
> Fallocate	0m0.216s	0m0.290s	0m0.272s	0m0.206s
> Copy		4m17.888s	5m24.326s	4m14.834s	4m19.867s
> Tar		4m31.356s	4m39.639s	4m33.940s	4m37.231s
> Untar		3m34.945s	3m43.807s	4m4.969s	3m28.390s
> Single_dd	0m14.875s	0m16.735s	0m19.060s	0m16.500s
> Multiple_dd	3m14.358s	3m24.676s	3m21.938s	3m41.215s
> Fsstress	0m11.472s	0m9.079s	0m9.705s	0m11.167s
> 
> Nice table, I should have done that before ;) So you can see that it
> really differs too much. But there are some patterns to be seen.
> 
> Here patched ext4 was faster although the difference is too small to
> be taken seriously. I need to do more runs and proper statistics.
> Tar		6m10.526s	6m5.518s	5m42.999s	5m32.376s
> 
> Here patched ext4 is slower, but the difference is again
> insignificant.
> Multiple_dd	3m13.562s	3m15.353s	3m11.031s	3m14.018s
> 
> So I'd say as fat as running times are concerned the appear to be
> the no significant difference, but more testing and better
> statistics should be done.
> 
> You're right that Avg. free extent seems to be slightly smaller,
> however looking at the free extents histogram I think that Weighted
> Average would show results more favourable for the patches ext4. The

Sorry I meant to say median.

> reason is that currently we're packing allocations closely together
> which will give us a bit more bigger free extents, however we also
> have much more smaller ones. Whole with my patch there seems to be
> less really small extents and more medium sized extents (4M-128M).

Which along with more contiguous file and less extents seems like a
good thing.

-Lukas

> 
> But that needs to be confirmed.
> 
> 
> > 
> > orig3:					patch5:
> > [+] Fallocate test			[+] Fallocate test
> > 
> > real	0m0.216s			real	0m0.290s
> > user	0m0.000s			user	0m0.001s
> > sys	0m0.061s			sys	0m0.037s
> > Device: /dev/loop0			Device: /dev/loop0
> > Blocksize: 4096 bytes			Blocksize: 4096 bytes
> > Total blocks: 22282240			Total blocks: 22282240
> > Free blocks: 3535514 (15.9%)		Free blocks: 3535512 (15.9%)
> > 
> > Min. free extent: 32 KB 		Min. free extent: 32 KB 
> > Max. free extent: 2064256 KB		Max. free extent: 2064256 KB
> > Avg. free extent: 235700 KB		Avg. free extent: 228096 KB
> > 
> > orig3:					patch5:
> > [+] Copy linux source			[+] Copy linux source
> > 
> > real	4m17.888s			real	5m24.326s
> > user	0m2.265s			user	0m2.486s
> > sys	2m4.205s			sys	2m34.918s
> > Device: /dev/loop0			Device: /dev/loop0
> > Blocksize: 4096 bytes			Blocksize: 4096 bytes
> > Total blocks: 22282240			Total blocks: 22282240
> > Free blocks: 17536027 (78.7%)		Free blocks: 17536042 (78.7%)
> > 
> > Min. free extent: 4 KB 			Min. free extent: 4 KB 
> > Max. free extent: 2064256 KB		Max. free extent: 2064256 KB
> > Avg. free extent: 267724 KB		Avg. free extent: 209384 KB
> > 
> > orig3:					patch5:
> > [+] Untar linux source			[+] Untar linux source
> > 
> > real	3m34.945s			real	3m43.807s
> > user	0m3.459s			user	0m3.687s
> > sys	1m35.126s			sys	1m42.839s
> > Device: /dev/loop0			Device: /dev/loop0
> > Blocksize: 4096 bytes			Blocksize: 4096 bytes
> > Total blocks: 22282240			Total blocks: 22282240
> > Free blocks: 8852805 (39.7%)		Free blocks: 8852831 (39.7%)
> > 
> > Min. free extent: 4 KB 			Min. free extent: 4 KB 
> > Max. free extent: 2064256 KB		Max. free extent: 2064256 KB
> > Avg. free extent: 102936 KB		Avg. free extent: 72120 KB
> > 
> > The same is true for the "single dd" and "multiple dd" tests.  The only one
> > that shows somewhat better performance and fragmentation results is fsstress,
> > but I wouldn't exactly call that representative of normal user workloads.
> > 
> > 
> > >> Free space fragmentation seems to be about the same, however with the patch
> > >> there seems to be less smaller free space extents and more bigger ones which
> > >> is expected due to bigger preallocations (and I think it's a good thing).
> > 
> > Hmm, this is exactly the opposite of what I see in the output files?
> 
> Seem my explanation above, I need to do a better job at presenting
> the numbers in more understandable way.
> 
> > 
> > >> The biggest difference which is obvious from the results is that extent tree
> > >> is much smaller (sometimes five times smaller) with the patch. Except of the
> > >> fallocate case because we now limit the requests to (EXT_INIT_MAX_LEN / 2)
> > >> so we can not merge them - it might be worth experimenting with something
> > >> smaller which is a factor of unwritten extent size.
> > >> 
> > >> But as I said the extent tree is much smaller which means that the extents
> > >> overall are bigger which again is a good thing. This becomes very obvious
> > >> when we look at the extent tree of the image file (the last steps in the
> > >> test).
> > >> 
> > >> What do you think ?
> > 
> > I definitely agree that there is work to be done to improve this code, but
> > since the results are quite mixed I think it makes sense to separate out
> > some of the changes into different patches and test them independently.  That
> > will simplify the isolation of which changes are affecting the performance.
> 
> I agree, more experimentation is needed.
> 
> Thanks for taking your time to look at this!
> -Lukas
> 
> > 
> > 
> > Cheers, Andreas
> > 
> > 
> > 
> > 
> > 
> > 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-06-25 14:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-13 13:55 [RFC][PATCH 0/1] ext4: Fix ext4_mb_normalize_request Lukas Czerner
2014-06-13 13:55 ` [PATCH 1/1] " Lukas Czerner
2014-06-24 12:36 ` [RFC][PATCH 0/1] " Lukáš Czerner
2014-06-24 16:25   ` Andreas Dilger
2014-06-25 13:43     ` Lukáš Czerner
2014-06-25 14:20       ` Lukáš Czerner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.