All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 2/2] ext4: mballoc - limit scanning of uninitialized groups
@ 2020-05-20  9:45 Alex Zhuravlev
  2020-05-21  7:04 ` Ritesh Harjani
  0 siblings, 1 reply; 4+ messages in thread
From: Alex Zhuravlev @ 2020-05-20  9:45 UTC (permalink / raw)
  To: linux-ext4

cr=0 is supposed to be an optimization to save CPU cycles, but if
buddy data (in memory) is not initialized then all this makes no
sense as we have to do sync IO taking a lot of cycles.
also, at cr=0 mballoc doesn't store any available chunk. cr=1 also
skips groups using heuristic based on avg. fragment size. it's more
useful to skip such groups and switch to cr=2 where groups will be
scanned for available chunks.

The goal group is not skipped to prevent allocations in foreign groups,
which can happen after mount while buddy data is still being populated.

using sparse image and dm-slow virtual device of 120TB was simulated.
then the image was formatted and filled using debugfs to mark ~85% of
available space as busy. the very first allocation w/o the patch could
not complete in half an hour (according to vmstat it would take ~10-1
hours). with the patch applied the allocation took ~20 seconds.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>

 fs/ext4/mballoc.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 30d5d97548c4..f719714862b5 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1877,6 +1877,21 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac,
 	return 0;
 }
 
+static inline int ext4_mb_uninit_on_disk(struct super_block *sb,
+				    ext4_group_t group)
+{
+	struct ext4_group_desc *desc;
+
+	if (!ext4_has_group_desc_csum(sb))
+		return 0;
+
+	desc = ext4_get_group_desc(sb, group, NULL);
+	if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))
+		return 1;
+
+	return 0;
+}
+
 /*
  * The routine scans buddy structures (not bitmap!) from given order
  * to max order and tries to find big enough chunk to satisfy the req
@@ -2060,7 +2075,20 @@ static int ext4_mb_good_group(struct ext4_allocation_context *ac,
 
 	/* We only do this if the grp has never been initialized */
 	if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) {
-		int ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS);
+		int ret;
+
+		/* cr=0/1 is a very optimistic search to find large
+		 * good chunks almost for free. if buddy data is
+		 * not ready, then this optimization makes no sense.
+		 * instead it leads to loading (synchronously) lots
+		 * of groups and very slow allocations.
+		 * but don't skip the goal group to keep blocks in
+		 * the inode's group. */
+
+		if (cr < 2 && !ext4_mb_uninit_on_disk(ac->ac_sb, group) &&
+		    ac->ac_g_ex.fe_group != group)
+			return 0;
+		ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS);
 		if (ret)
 			return ret;
 	}
-- 
2.21.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] ext4: mballoc - limit scanning of uninitialized groups
  2020-05-20  9:45 [PATCH 2/2] ext4: mballoc - limit scanning of uninitialized groups Alex Zhuravlev
@ 2020-05-21  7:04 ` Ritesh Harjani
  2020-08-03 20:34   ` Jan Kara
  0 siblings, 1 reply; 4+ messages in thread
From: Ritesh Harjani @ 2020-05-21  7:04 UTC (permalink / raw)
  To: Alex Zhuravlev, linux-ext4



On 5/20/20 3:15 PM, Alex Zhuravlev wrote:
> cr=0 is supposed to be an optimization to save CPU cycles, but if
> buddy data (in memory) is not initialized then all this makes no
> sense as we have to do sync IO taking a lot of cycles.
> also, at cr=0 mballoc doesn't store any available chunk. cr=1 also
> skips groups using heuristic based on avg. fragment size. it's more
> useful to skip such groups and switch to cr=2 where groups will be
> scanned for available chunks.
> 
> The goal group is not skipped to prevent allocations in foreign groups,
> which can happen after mount while buddy data is still being populated.
> 
> using sparse image and dm-slow virtual device of 120TB was simulated.
> then the image was formatted and filled using debugfs to mark ~85% of
> available space as busy. the very first allocation w/o the patch could
> not complete in half an hour (according to vmstat it would take ~10-1
> hours). with the patch applied the allocation took ~20 seconds.
> 
> Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>

This looks even better to me. Feel free to add:
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>


> 
>   fs/ext4/mballoc.c | 30 +++++++++++++++++++++++++++++-
>   1 file changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 30d5d97548c4..f719714862b5 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1877,6 +1877,21 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac,
>   	return 0;
>   }
>   
> +static inline int ext4_mb_uninit_on_disk(struct super_block *sb,
> +				    ext4_group_t group)
> +{
> +	struct ext4_group_desc *desc;
> +
> +	if (!ext4_has_group_desc_csum(sb))
> +		return 0;
> +
> +	desc = ext4_get_group_desc(sb, group, NULL);
> +	if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))
> +		return 1;
> +
> +	return 0;
> +}
> +
>   /*
>    * The routine scans buddy structures (not bitmap!) from given order
>    * to max order and tries to find big enough chunk to satisfy the req
> @@ -2060,7 +2075,20 @@ static int ext4_mb_good_group(struct ext4_allocation_context *ac,
>   
>   	/* We only do this if the grp has never been initialized */
>   	if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) {
> -		int ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS);
> +		int ret;
> +
> +		/* cr=0/1 is a very optimistic search to find large
> +		 * good chunks almost for free. if buddy data is
> +		 * not ready, then this optimization makes no sense.
> +		 * instead it leads to loading (synchronously) lots
> +		 * of groups and very slow allocations.
> +		 * but don't skip the goal group to keep blocks in
> +		 * the inode's group. */
> +
> +		if (cr < 2 && !ext4_mb_uninit_on_disk(ac->ac_sb, group) &&
> +		    ac->ac_g_ex.fe_group != group)
> +			return 0;
> +		ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS);
>   		if (ret)
>   			return ret;
>   	}
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] ext4: mballoc - limit scanning of uninitialized groups
  2020-05-21  7:04 ` Ritesh Harjani
@ 2020-08-03 20:34   ` Jan Kara
  0 siblings, 0 replies; 4+ messages in thread
From: Jan Kara @ 2020-08-03 20:34 UTC (permalink / raw)
  To: Ted Tso; +Cc: Alex Zhuravlev, linux-ext4, Ritesh Harjani

On Thu 21-05-20 12:34:29, Ritesh Harjani wrote:
> 
> 
> On 5/20/20 3:15 PM, Alex Zhuravlev wrote:
> > cr=0 is supposed to be an optimization to save CPU cycles, but if
> > buddy data (in memory) is not initialized then all this makes no
> > sense as we have to do sync IO taking a lot of cycles.
> > also, at cr=0 mballoc doesn't store any available chunk. cr=1 also
> > skips groups using heuristic based on avg. fragment size. it's more
> > useful to skip such groups and switch to cr=2 where groups will be
> > scanned for available chunks.
> > 
> > The goal group is not skipped to prevent allocations in foreign groups,
> > which can happen after mount while buddy data is still being populated.
> > 
> > using sparse image and dm-slow virtual device of 120TB was simulated.
> > then the image was formatted and filled using debugfs to mark ~85% of
> > available space as busy. the very first allocation w/o the patch could
> > not complete in half an hour (according to vmstat it would take ~10-1
> > hours). with the patch applied the allocation took ~20 seconds.
> > 
> > Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
> > Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
> 
> This looks even better to me. Feel free to add:
> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>

Going through some old email... Ted, why wasn't this patch merged?

								Honza

> >   fs/ext4/mballoc.c | 30 +++++++++++++++++++++++++++++-
> >   1 file changed, 29 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> > index 30d5d97548c4..f719714862b5 100644
> > --- a/fs/ext4/mballoc.c
> > +++ b/fs/ext4/mballoc.c
> > @@ -1877,6 +1877,21 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac,
> >   	return 0;
> >   }
> > +static inline int ext4_mb_uninit_on_disk(struct super_block *sb,
> > +				    ext4_group_t group)
> > +{
> > +	struct ext4_group_desc *desc;
> > +
> > +	if (!ext4_has_group_desc_csum(sb))
> > +		return 0;
> > +
> > +	desc = ext4_get_group_desc(sb, group, NULL);
> > +	if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))
> > +		return 1;
> > +
> > +	return 0;
> > +}
> > +
> >   /*
> >    * The routine scans buddy structures (not bitmap!) from given order
> >    * to max order and tries to find big enough chunk to satisfy the req
> > @@ -2060,7 +2075,20 @@ static int ext4_mb_good_group(struct ext4_allocation_context *ac,
> >   	/* We only do this if the grp has never been initialized */
> >   	if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) {
> > -		int ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS);
> > +		int ret;
> > +
> > +		/* cr=0/1 is a very optimistic search to find large
> > +		 * good chunks almost for free. if buddy data is
> > +		 * not ready, then this optimization makes no sense.
> > +		 * instead it leads to loading (synchronously) lots
> > +		 * of groups and very slow allocations.
> > +		 * but don't skip the goal group to keep blocks in
> > +		 * the inode's group. */
> > +
> > +		if (cr < 2 && !ext4_mb_uninit_on_disk(ac->ac_sb, group) &&
> > +		    ac->ac_g_ex.fe_group != group)
> > +			return 0;
> > +		ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS);
> >   		if (ret)
> >   			return ret;
> >   	}
> > 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/2] ext4: mballoc - limit scanning of uninitialized groups
@ 2020-05-15 10:14 Alex Zhuravlev
  0 siblings, 0 replies; 4+ messages in thread
From: Alex Zhuravlev @ 2020-05-15 10:14 UTC (permalink / raw)
  To: linux-ext4

cr=0 is supposed to be an optimization to save CPU cycles, but if
buddy data (in memory) is not initialized then all this makes no
sense as we have to do sync IO taking a lot of cycles.
also, at cr=0 mballoc doesn't store any available chunk. cr=1 also
skips groups using heuristic based on avg. fragment size. it's more
useful to skip such groups and switch to cr=2 where groups will be
scanned for available chunks.

using sparse image and dm-slow virtual device of 120TB was simulated.
then the image was formatted and filled using debugfs to mark ~85% of
available space as busy. the very first allocation w/o the patch could
not complete in half an hour (according to vmstat it would take ~10-1
hours). with the patch applied the allocation took ~20 seconds.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
---
 fs/ext4/mballoc.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 30d5d97548c4..afb8bd9a10e9 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1877,6 +1877,21 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac,
 	return 0;
 }
 
+static inline int ext4_mb_uninit_on_disk(struct super_block *sb,
+				    ext4_group_t group)
+{
+	struct ext4_group_desc *desc;
+
+	if (!ext4_has_group_desc_csum(sb))
+		return 0;
+
+	desc = ext4_get_group_desc(sb, group, NULL);
+	if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))
+		return 1;
+
+	return 0;
+}
+
 /*
  * The routine scans buddy structures (not bitmap!) from given order
  * to max order and tries to find big enough chunk to satisfy the req
@@ -2060,7 +2075,15 @@ static int ext4_mb_good_group(struct ext4_allocation_context *ac,
 
 	/* We only do this if the grp has never been initialized */
 	if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) {
-		int ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS);
+		int ret;
+
+		/* cr=0/1 is a very optimistic search to find large
+		 * good chunks almost for free. if buddy data is
+		 * not ready, then this optimization makes no sense */
+
+		if (cr < 2 && !ext4_mb_uninit_on_disk(ac->ac_sb, group))
+			return 0;
+		ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS);
 		if (ret)
 			return ret;
 	}
-- 
2.21.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-08-03 20:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-20  9:45 [PATCH 2/2] ext4: mballoc - limit scanning of uninitialized groups Alex Zhuravlev
2020-05-21  7:04 ` Ritesh Harjani
2020-08-03 20:34   ` Jan Kara
  -- strict thread matches above, loose matches on Subject: below --
2020-05-15 10:14 Alex Zhuravlev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.