All of lore.kernel.org
 help / color / mirror / Atom feed
* Questions about mballoc's stream allocation
@ 2009-08-07 13:07 Theodore Ts'o
  2009-08-08  7:51 ` Andreas Dilger
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Theodore Ts'o @ 2009-08-07 13:07 UTC (permalink / raw)
  To: Andreas Dilger, Alex Tomas; +Cc: linux-ext4


I've got two questions about mballoc's stream allocation.

First of all, in ext4_mb_regular_allocator(), I'm 99% sure this is a
bug:

	/* if stream allocation is enabled, use global goal */
	size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len;
	isize = i_size_read(ac->ac_inode) >> bsbits;
	if (size < isize)
		size = isize;

	if (size < sbi->s_mb_stream_request &&
	    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
			(ac->ac_flags & EXT4_MB_HINT_DATA)) {
		/* TBD: may be hot point */
		spin_lock(&sbi->s_md_lock);
		ac->ac_g_ex.fe_group = sbi->s_mb_last_group;
		ac->ac_g_ex.fe_start = sbi->s_mb_last_start;
		spin_unlock(&sbi->s_md_lock);
	}

Shouldn't that be ">=", not "<".  We want to use the values saved in
sbi->s_mb_last_{group,start} only if we are doing a stream allocation,
which would be true only if the file is *larger* than
s_mb_stream_request, no?


The second question I have is with regards to ext4_mb_use_best_found(),
we set sbi->s_mb_last_{group,start} on any data allocation; shouldn't we
only be setting those values only if we were doing a stream allocation
in the first place?

Otherwise, any kind of allocation will end up moving the global goal
block for stream allocations; even if it is a small allocation in the
middle of some block group caused by the flag EXT4_MB_HINT_NO_PREALLOC
being set.

Am I missing anything?

     	   	      	       		     - Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Questions about mballoc's stream allocation
  2009-08-07 13:07 Questions about mballoc's stream allocation Theodore Ts'o
@ 2009-08-08  7:51 ` Andreas Dilger
  2009-08-08 22:52 ` [PATCH,RFC 1/2] ext4: Fix bugs in mballoc's stream allocation mode Theodore Ts'o
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Andreas Dilger @ 2009-08-08  7:51 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Alex Tomas, linux-ext4

On Aug 07, 2009  09:07 -0400, Theodore Ts'o wrote:
> I've got two questions about mballoc's stream allocation.

Sorry, I don't know the answers.  Hopefully Alex can chime in.

> First of all, in ext4_mb_regular_allocator(), I'm 99% sure this is a
> bug:
> 
> 	/* if stream allocation is enabled, use global goal */
> 	size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len;
> 	isize = i_size_read(ac->ac_inode) >> bsbits;
> 	if (size < isize)
> 		size = isize;
> 
> 	if (size < sbi->s_mb_stream_request &&
> 	    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 			(ac->ac_flags & EXT4_MB_HINT_DATA)) {
> 		/* TBD: may be hot point */
> 		spin_lock(&sbi->s_md_lock);
> 		ac->ac_g_ex.fe_group = sbi->s_mb_last_group;
> 		ac->ac_g_ex.fe_start = sbi->s_mb_last_start;
> 		spin_unlock(&sbi->s_md_lock);
> 	}
> 
> Shouldn't that be ">=", not "<".  We want to use the values saved in
> sbi->s_mb_last_{group,start} only if we are doing a stream allocation,
> which would be true only if the file is *larger* than
> s_mb_stream_request, no?
> 
> 
> The second question I have is with regards to ext4_mb_use_best_found(),
> we set sbi->s_mb_last_{group,start} on any data allocation; shouldn't we
> only be setting those values only if we were doing a stream allocation
> in the first place?
> 
> Otherwise, any kind of allocation will end up moving the global goal
> block for stream allocations; even if it is a small allocation in the
> middle of some block group caused by the flag EXT4_MB_HINT_NO_PREALLOC
> being set.
> 
> Am I missing anything?
> 
>      	   	      	       		     - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH,RFC 1/2] ext4: Fix bugs in mballoc's stream allocation mode
  2009-08-07 13:07 Questions about mballoc's stream allocation Theodore Ts'o
  2009-08-08  7:51 ` Andreas Dilger
@ 2009-08-08 22:52 ` Theodore Ts'o
  2009-08-08 22:52 ` [PATCH,RFC 2/2] ext4: Avoid group preallocation for closed files Theodore Ts'o
  2009-08-11 15:39 ` Questions about mballoc's stream allocation Aneesh Kumar K.V
  3 siblings, 0 replies; 6+ messages in thread
From: Theodore Ts'o @ 2009-08-08 22:52 UTC (permalink / raw)
  To: linux-ext4; +Cc: Andreas Dilger, Alex Tomas, Theodore Ts'o

The logic around sbi->s_mb_last_group and sbi->s_mb_last_start was all
screwed up.  These fields were getting unconditionally allt he time,
set even when stream allocation had not taken place, and if they were
being used when the file was smaller than s_mb_stream_request, which
is when the allocation should _not_ be doing stream allocation.

Fix this by determining whether or not we stream allocation should
take place once, in ext4_mb_group_or_file(), and setting a flag which
gets used in ext4_mb_regular_allocator() and ext4_mb_use_best_found().
This simplifies the code and assures that we are consistently using
(or not using) the stream allocation logic.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/ext4/ext4.h    |    2 ++
 fs/ext4/mballoc.c |   23 ++++++++++-------------
 2 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 9714db3..1ca6995 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -88,6 +88,8 @@ typedef unsigned int ext4_group_t;
 #define EXT4_MB_HINT_TRY_GOAL		512
 /* blocks already pre-reserved by delayed allocation */
 #define EXT4_MB_DELALLOC_RESERVED      1024
+/* We are doing stream allocation */
+#define EXT4_MB_STREAM_ALLOC	       2048
 
 
 struct ext4_allocation_request {
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 68cde59..f019a50 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1360,7 +1360,7 @@ static void ext4_mb_use_best_found(struct ext4_allocation_context *ac,
 	ac->alloc_semp =  e4b->alloc_semp;
 	e4b->alloc_semp = NULL;
 	/* store last allocated for subsequent stream allocation */
-	if ((ac->ac_flags & EXT4_MB_HINT_DATA)) {
+	if (ac->ac_flags & EXT4_MB_STREAM_ALLOC) {
 		spin_lock(&sbi->s_md_lock);
 		sbi->s_mb_last_group = ac->ac_f_ex.fe_group;
 		sbi->s_mb_last_start = ac->ac_f_ex.fe_start;
@@ -1938,7 +1938,6 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
 	struct ext4_sb_info *sbi;
 	struct super_block *sb;
 	struct ext4_buddy e4b;
-	loff_t size, isize;
 
 	sb = ac->ac_sb;
 	sbi = EXT4_SB(sb);
@@ -1974,20 +1973,16 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
 	}
 
 	bsbits = ac->ac_sb->s_blocksize_bits;
-	/* if stream allocation is enabled, use global goal */
-	size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len;
-	isize = i_size_read(ac->ac_inode) >> bsbits;
-	if (size < isize)
-		size = isize;
 
-	if (size < sbi->s_mb_stream_request &&
-			(ac->ac_flags & EXT4_MB_HINT_DATA)) {
+	/* if stream allocation is enabled, use global goal */
+	if (ac->ac_flags & EXT4_MB_STREAM_ALLOC) {
 		/* TBD: may be hot point */
 		spin_lock(&sbi->s_md_lock);
 		ac->ac_g_ex.fe_group = sbi->s_mb_last_group;
 		ac->ac_g_ex.fe_start = sbi->s_mb_last_start;
 		spin_unlock(&sbi->s_md_lock);
 	}
+
 	/* Let's just scan groups to find more-less suitable blocks */
 	cr = ac->ac_2order ? 0 : 1;
 	/*
@@ -4155,16 +4150,18 @@ static void ext4_mb_group_or_file(struct ext4_allocation_context *ac)
 	if (!(ac->ac_flags & EXT4_MB_HINT_DATA))
 		return;
 
+	if (unlikely(ac->ac_flags & EXT4_MB_HINT_GOAL_ONLY))
+		return;
+
 	size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len;
 	isize = i_size_read(ac->ac_inode) >> bsbits;
 	size = max(size, isize);
 
 	/* don't use group allocation for large files */
-	if (size >= sbi->s_mb_stream_request)
-		return;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH,RFC 2/2] ext4: Avoid group preallocation for closed files
  2009-08-07 13:07 Questions about mballoc's stream allocation Theodore Ts'o
  2009-08-08  7:51 ` Andreas Dilger
  2009-08-08 22:52 ` [PATCH,RFC 1/2] ext4: Fix bugs in mballoc's stream allocation mode Theodore Ts'o
@ 2009-08-08 22:52 ` Theodore Ts'o
  2009-08-11 15:39 ` Questions about mballoc's stream allocation Aneesh Kumar K.V
  3 siblings, 0 replies; 6+ messages in thread
From: Theodore Ts'o @ 2009-08-08 22:52 UTC (permalink / raw)
  To: linux-ext4; +Cc: Andreas Dilger, Alex Tomas, Theodore Ts'o

Currently the group preallocation code tries to find a large (512)
free block from which to do per-cpu group allocation for small files.
The problem with this scheme is that it leaves the filesystem horribly
fragmented.  In the worst case, if the filesystem is unmounted and
remounted (after a system shutdown, for example) we forget the fact
that wee were using a particular (now-partially filled) 512 block
extent.  So the next time we try to allocate space for a small file,
we will find *another* completely free 512 block chunk to allocate
small files.  Given that there are 32,768 blocks in a block group,
after 64 iterations of "mount, write one 4k file in a directory,
unmount", the block group will have 64 files, each separated by 511
blocks, and the block group will no longer have any free 512
completely free chunks of blocks for group preallocation space.

So if we try to allocate blocks for a file that has been closed, such
that we know the final size of the file, and the filesystem is not
busy, avoid using group preallocation.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/ext4/ext4.h    |   24 +++++++++++++++++++++++-
 fs/ext4/mballoc.c |   10 +++++++++-
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 1ca6995..c2d98f8 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -90,6 +90,8 @@ typedef unsigned int ext4_group_t;
 #define EXT4_MB_DELALLOC_RESERVED      1024
 /* We are doing stream allocation */
 #define EXT4_MB_STREAM_ALLOC	       2048
+/* We suppressed group preallocation */
+#define EXT4_MB_SUPPRESS_GROUP_PREALLOC 4096
 
 
 struct ext4_allocation_request {
@@ -952,6 +954,7 @@ struct ext4_sb_info {
 	atomic_t s_mb_lost_chunks;
 	atomic_t s_mb_preallocated;
 	atomic_t s_mb_discarded;
+	atomic_t s_lock_busy;
 
 	/* locality groups */
 	struct ext4_locality_group *s_locality_groups;
@@ -1593,15 +1596,34 @@ struct ext4_group_info {
 #define EXT4_MB_GRP_NEED_INIT(grp)	\
 	(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
 
+#define EXT4_MAX_CONTENTION		8
+#define EXT4_CONTENTION_THRESHOLD	2
+
 static inline spinlock_t *ext4_group_lock_ptr(struct super_block *sb,
 					      ext4_group_t group)
 {
 	return bgl_lock_ptr(EXT4_SB(sb)->s_blockgroup_lock, group);
 }
 
+/*
+ * Returns true if the filesystem is busy enough that attempts to
+ * access the block group locks has run into contention.
+ */
+static inline int ext4_fs_is_busy(struct ext4_sb_info *sbi)
+{
+	return (atomic_read(&sbi->s_lock_busy) > EXT4_CONTENTION_THRESHOLD);
+}
+
 static inline void ext4_lock_group(struct super_block *sb, ext4_group_t group)
 {
-	spin_lock(ext4_group_lock_ptr(sb, group));
+	spinlock_t *lock = ext4_group_lock_ptr(sb, group);
+	if (spin_trylock(lock))
+		atomic_add_unless(&EXT4_SB(sb)->s_lock_busy, 1,
+				  EXT4_MAX_CONTENTION);
+	else {
+		atomic_add_unless(&EXT4_SB(sb)->s_lock_busy, -1, 0);
+		spin_lock(lock);
+	}
 }
 
 static inline void ext4_unlock_group(struct super_block *sb,
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index f019a50..482bf40 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4154,9 +4154,17 @@ static void ext4_mb_group_or_file(struct ext4_allocation_context *ac)
 		return;
 
 	size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len;
-	isize = i_size_read(ac->ac_inode) >> bsbits;
+	isize = (i_size_read(ac->ac_inode) + ac->ac_sb->s_blocksize - 1)
+		>> bsbits;
 	size = max(size, isize);
 
+	if ((size == isize) &&
+	    ext4_fs_is_busy(sbi) &&
+	    (atomic_read(&ac->ac_inode->i_writecount) == 0)) {
+		ac->ac_flags |= EXT4_MB_SUPPRESS_GROUP_PREALLOC;
+		return;
+	}
+
 	/* don't use group allocation for large files */
 	if (size >= sbi->s_mb_stream_request) {
 		ac->ac_flags |= EXT4_MB_STREAM_ALLOC;
-- 
1.6.3.2.1.gb9f7d.dirty


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Questions about mballoc's stream allocation
  2009-08-07 13:07 Questions about mballoc's stream allocation Theodore Ts'o
                   ` (2 preceding siblings ...)
  2009-08-08 22:52 ` [PATCH,RFC 2/2] ext4: Avoid group preallocation for closed files Theodore Ts'o
@ 2009-08-11 15:39 ` Aneesh Kumar K.V
  2009-08-11 17:37   ` Aneesh Kumar K.V
  3 siblings, 1 reply; 6+ messages in thread
From: Aneesh Kumar K.V @ 2009-08-11 15:39 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Andreas Dilger, Alex Tomas, linux-ext4

On Fri, Aug 07, 2009 at 09:07:53AM -0400, Theodore Ts'o wrote:
> 
> I've got two questions about mballoc's stream allocation.
> 
> First of all, in ext4_mb_regular_allocator(), I'm 99% sure this is a
> bug:
> 
> 	/* if stream allocation is enabled, use global goal */
> 	size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len;
> 	isize = i_size_read(ac->ac_inode) >> bsbits;
> 	if (size < isize)
> 		size = isize;
> 
> 	if (size < sbi->s_mb_stream_request &&
> 	    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 			(ac->ac_flags & EXT4_MB_HINT_DATA)) {
> 		/* TBD: may be hot point */
> 		spin_lock(&sbi->s_md_lock);
> 		ac->ac_g_ex.fe_group = sbi->s_mb_last_group;
> 		ac->ac_g_ex.fe_start = sbi->s_mb_last_start;
> 		spin_unlock(&sbi->s_md_lock);
> 	}
> 
> Shouldn't that be ">=", not "<".  We want to use the values saved in
> sbi->s_mb_last_{group,start} only if we are doing a stream allocation,
> which would be true only if the file is *larger* than
> s_mb_stream_request, no?
> 
> 
> The second question I have is with regards to ext4_mb_use_best_found(),
> we set sbi->s_mb_last_{group,start} on any data allocation; shouldn't we
> only be setting those values only if we were doing a stream allocation
> in the first place?
> 
> Otherwise, any kind of allocation will end up moving the global goal
> block for stream allocations; even if it is a small allocation in the
> middle of some block group caused by the flag EXT4_MB_HINT_NO_PREALLOC
> being set.
> 
> Am I missing anything?
> 

I guess we should be setting the sbi->s_mb_last_{group,start} only when doing
small file allocation. We want to make sure small file allocation always
use the goal block near to the previous small file allocation request. So
(size <  sbi->s_mb_stream_request) is correct. But we should not be doing

         sbi->s_mb_last_group = ac->ac_f_ex.fe_group;  
always.

For large file allocation we wanted the new blocks to closer to that files previous
allocated block which ext4_ext_find_goal return as the goal value. So for
large files the goal value passed should represent the correct value.

NOTE: I am still behind ext4 list mails due to other commitments. Will start looking
at the mails, but mostly would be slow in replies

-aneesh

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Questions about mballoc's stream allocation
  2009-08-11 15:39 ` Questions about mballoc's stream allocation Aneesh Kumar K.V
@ 2009-08-11 17:37   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 6+ messages in thread
From: Aneesh Kumar K.V @ 2009-08-11 17:37 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Andreas Dilger, Alex Tomas, linux-ext4

On Tue, Aug 11, 2009 at 09:09:05PM +0530, Aneesh Kumar K.V wrote:
> On Fri, Aug 07, 2009 at 09:07:53AM -0400, Theodore Ts'o wrote:
> > 
> > I've got two questions about mballoc's stream allocation.
> > 
> > First of all, in ext4_mb_regular_allocator(), I'm 99% sure this is a
> > bug:
> > 
> > 	/* if stream allocation is enabled, use global goal */
> > 	size = ac->ac_o_ex.fe_logical + ac->ac_o_ex.fe_len;
> > 	isize = i_size_read(ac->ac_inode) >> bsbits;
> > 	if (size < isize)
> > 		size = isize;
> > 
> > 	if (size < sbi->s_mb_stream_request &&
> > 	    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > 			(ac->ac_flags & EXT4_MB_HINT_DATA)) {
> > 		/* TBD: may be hot point */
> > 		spin_lock(&sbi->s_md_lock);
> > 		ac->ac_g_ex.fe_group = sbi->s_mb_last_group;
> > 		ac->ac_g_ex.fe_start = sbi->s_mb_last_start;
> > 		spin_unlock(&sbi->s_md_lock);
> > 	}
> > 
> > Shouldn't that be ">=", not "<".  We want to use the values saved in
> > sbi->s_mb_last_{group,start} only if we are doing a stream allocation,
> > which would be true only if the file is *larger* than
> > s_mb_stream_request, no?
> > 
> > 
> > The second question I have is with regards to ext4_mb_use_best_found(),
> > we set sbi->s_mb_last_{group,start} on any data allocation; shouldn't we
> > only be setting those values only if we were doing a stream allocation
> > in the first place?
> > 
> > Otherwise, any kind of allocation will end up moving the global goal
> > block for stream allocations; even if it is a small allocation in the
> > middle of some block group caused by the flag EXT4_MB_HINT_NO_PREALLOC
> > being set.
> > 
> > Am I missing anything?
> > 
> 
> I guess we should be setting the sbi->s_mb_last_{group,start} only when doing
> small file allocation. We want to make sure small file allocation always
> use the goal block near to the previous small file allocation request. So
> (size <  sbi->s_mb_stream_request) is correct. But we should not be doing
> 
>          sbi->s_mb_last_group = ac->ac_f_ex.fe_group;  
> always.
> 
> For large file allocation we wanted the new blocks to closer to that files previous
> allocated block which ext4_ext_find_goal return as the goal value. So for
> large files the goal value passed should represent the correct value.
> 

Or may be the current code is fine. I guess what the aim was to make sure we spread
small file allocation near to the context. We use the ac_g_ex only when we don't find
blocks in prealloc space. The goal could be that, if we create lot of small files it would be that
the new small files created may be related to the process that did last block allocation.

-aneesh

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-08-11 17:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-07 13:07 Questions about mballoc's stream allocation Theodore Ts'o
2009-08-08  7:51 ` Andreas Dilger
2009-08-08 22:52 ` [PATCH,RFC 1/2] ext4: Fix bugs in mballoc's stream allocation mode Theodore Ts'o
2009-08-08 22:52 ` [PATCH,RFC 2/2] ext4: Avoid group preallocation for closed files Theodore Ts'o
2009-08-11 15:39 ` Questions about mballoc's stream allocation Aneesh Kumar K.V
2009-08-11 17:37   ` Aneesh Kumar K.V

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.