All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ext4: improve ext4lazyinit scalability V2
@ 2016-08-15 12:23 Dmitry Monakhov
  2016-08-15 15:05 ` Jan Kara
  0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Monakhov @ 2016-08-15 12:23 UTC (permalink / raw)
  To: linux-ext4; +Cc: jack, tytso, Dmitry Monakhov

ext4lazyinit is global thread. This thread performs itable initalization
under li_list_mtx mutex.

It basically does following:
ext4_lazyinit_thread
  ->mutex_lock(&eli->li_list_mtx);
  ->ext4_run_li_request(elr)
    ->ext4_init_inode_table-> Do a lot of IO if the list is large

And when new mount/umount arrive they have to block on ->li_list_mtx
because  lazy_thread holds it during full walk procedure.
ext4_fill_super
 ->ext4_register_li_request
   ->mutex_lock(&ext4_li_info->li_list_mtx);
   ->list_add(&elr->lr_request, &ext4_li_info >li_request_list);
In my case mount takes 40minutes on server with 36 * 4Tb HDD.
Common user may face this in case of very slow dev ( /dev/mmcblkXXX)
Even more. If one of filesystems was frozen lazyinit_thread will simply
blocks on sb_start_write() so other mount/umount will be suck forever.

This patch changes logic like follows:
- grap ->s_umount read sem before processing new li_request.
  After that it is safe to drop li_list_mtx because all callers of
  li_remove_request are holding ->s_umount for write.
- li_thread skips frozen SB's

Locking order:
Order is asserted by umout path like follows: s_umount ->li_list_mtx so
the only way to to grab ->s_mount inside li_thread is via down_read_trylock

xfstests:ext4/023
#PSBM-49658

Changes from V1
 - spell fixes according to jack@ comments
 - do not use temporal list.


Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/super.c | 43 +++++++++++++++++++++++++++++++------------
 1 file changed, 31 insertions(+), 12 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 3822a5a..0e45344 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2635,7 +2635,6 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
 	sb = elr->lr_super;
 	ngroups = EXT4_SB(sb)->s_groups_count;
 
-	sb_start_write(sb);
 	for (group = elr->lr_next_group; group < ngroups; group++) {
 		gdp = ext4_get_group_desc(sb, group, NULL);
 		if (!gdp) {
@@ -2662,8 +2661,6 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
 		elr->lr_next_sched = jiffies + elr->lr_timeout;
 		elr->lr_next_group = group + 1;
 	}
-	sb_end_write(sb);
-
 	return ret;
 }
 
@@ -2728,21 +2725,43 @@ cont_thread:
 			mutex_unlock(&eli->li_list_mtx);
 			goto exit_thread;
 		}
-
 		list_for_each_safe(pos, n, &eli->li_request_list) {
+			int err = 0;
+			int progress = 0;
 			elr = list_entry(pos, struct ext4_li_request,
 					 lr_request);
 
-			if (time_after_eq(jiffies, elr->lr_next_sched)) {
-				if (ext4_run_li_request(elr) != 0) {
-					/* error, remove the lazy_init job */
-					ext4_remove_li_request(elr);
-					continue;
+			if (time_before(jiffies, elr->lr_next_sched)) {
+				if (time_before(elr->lr_next_sched, next_wakeup))
+					next_wakeup = elr->lr_next_sched;
+				continue;
+			}
+			if (down_read_trylock(&elr->lr_super->s_umount)) {
+				if (sb_start_write_trylock(elr->lr_super)) {
+					progress = 1;
+					/*
+					 * We hold sb->s_umount, sb can not
+					 * be removed from the list, it is
+					 * now safe to drop li_list_mtx
+					 */
+					mutex_unlock(&eli->li_list_mtx);
+					err = ext4_run_li_request(elr);
+					sb_end_write(elr->lr_super);
+					mutex_lock(&eli->li_list_mtx);
+					n = pos->next;
 				}
+				up_read((&elr->lr_super->s_umount));
+			}
+			/* error, remove the lazy_init job */
+			if (err) {
+				ext4_remove_li_request(elr);
+				continue;
+			}
+			if (!progress) {
+				elr->lr_next_sched = jiffies +
+					(prandom_u32()
+					 % (EXT4_DEF_LI_MAX_START_DELAY * HZ));
 			}

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] ext4: improve ext4lazyinit scalability V2
  2016-08-15 12:23 [PATCH] ext4: improve ext4lazyinit scalability V2 Dmitry Monakhov
@ 2016-08-15 15:05 ` Jan Kara
  2016-09-06  3:39   ` Theodore Ts'o
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Kara @ 2016-08-15 15:05 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4, jack, tytso

Hello,

Thanks for the patch. Couple of spelling fixes below and one functional
comment...

On Mon 15-08-16 16:23:35, Dmitry Monakhov wrote:
> ext4lazyinit is global thread. This thread performs itable initalization
                 ^^^ a global thread

> under li_list_mtx mutex.
> 
> It basically does following:
                   ^ the

> ext4_lazyinit_thread
>   ->mutex_lock(&eli->li_list_mtx);
>   ->ext4_run_li_request(elr)
>     ->ext4_init_inode_table-> Do a lot of IO if the list is large
> 
> And when new mount/umount arrive they have to block on ->li_list_mtx
> because  lazy_thread holds it during full walk procedure.
> ext4_fill_super
>  ->ext4_register_li_request
>    ->mutex_lock(&ext4_li_info->li_list_mtx);
>    ->list_add(&elr->lr_request, &ext4_li_info >li_request_list);
> In my case mount takes 40minutes on server with 36 * 4Tb HDD.
> Common user may face this in case of very slow dev ( /dev/mmcblkXXX)
> Even more. If one of filesystems was frozen lazyinit_thread will simply
> blocks on sb_start_write() so other mount/umount will be suck forever.
  ^^^ block                                                ^^ stuck

> This patch changes logic like follows:
> - grap ->s_umount read sem before processing new li_request.
    ^^^ grab

>   After that it is safe to drop li_list_mtx because all callers of
>   li_remove_request are holding ->s_umount for write.
> - li_thread skips frozen SB's
> 
> Locking order:
> Order is asserted by umout path like follows: s_umount ->li_list_mtx so
                        ^^^ umount

> the only way to to grab ->s_mount inside li_thread is via down_read_trylock
> 
> xfstests:ext4/023
> #PSBM-49658
> 
> Changes from V1
>  - spell fixes according to jack@ comments
>  - do not use temporal list.
> 
> 
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
>  fs/ext4/super.c | 43 +++++++++++++++++++++++++++++++------------
>  1 file changed, 31 insertions(+), 12 deletions(-)
...
> +			if (!progress) {
> +				elr->lr_next_sched = jiffies +
> +					(prandom_u32()
> +					 % (EXT4_DEF_LI_MAX_START_DELAY * HZ));
>  			}

I think we need to update next_wakeup here based on updated value of
lr_next_sched and also in case ext4_run_li_request() didn't complete the
request but ended up rescheduling it. Otherwise the patch looks fine.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ext4: improve ext4lazyinit scalability V2
  2016-08-15 15:05 ` Jan Kara
@ 2016-09-06  3:39   ` Theodore Ts'o
  2016-09-06  8:36     ` Jan Kara
  0 siblings, 1 reply; 6+ messages in thread
From: Theodore Ts'o @ 2016-09-06  3:39 UTC (permalink / raw)
  To: Jan Kara; +Cc: Dmitry Monakhov, linux-ext4

Dmitry, thanks for the patch, and Jan, thanks for the review.

This is what I've added to the ext4 tree, which should reflect all of
the comments which Jan made.

				- Ted

commit 933ef679501f124bc5eb762f6068d099a0579937
Author: Dmitry Monakhov <dmonakhov@openvz.org>
Date:   Mon Sep 5 23:38:36 2016 -0400

    ext4: improve ext4lazyinit scalability
    
    ext4lazyinit is a global thread. This thread performs itable
    initalization under li_list_mtx mutex.
    
    It basically does the following:
    ext4_lazyinit_thread
      ->mutex_lock(&eli->li_list_mtx);
      ->ext4_run_li_request(elr)
        ->ext4_init_inode_table-> Do a lot of IO if the list is large
    
    And when new mount/umount arrive they have to block on ->li_list_mtx
    because  lazy_thread holds it during full walk procedure.
    ext4_fill_super
     ->ext4_register_li_request
       ->mutex_lock(&ext4_li_info->li_list_mtx);
       ->list_add(&elr->lr_request, &ext4_li_info >li_request_list);
    In my case mount takes 40minutes on server with 36 * 4Tb HDD.
    Common user may face this in case of very slow dev ( /dev/mmcblkXXX)
    Even more. If one of filesystems was frozen lazyinit_thread will simply
    block on sb_start_write() so other mount/umount will be stuck forever.
    
    This patch changes logic like follows:
    - grab ->s_umount read sem before processing new li_request.
      After that it is safe to drop li_list_mtx because all callers of
      li_remove_request are holding ->s_umount for write.
    - li_thread skips frozen SB's
    
    Locking order:
    Mh KOrder is asserted by umount path like follows: s_umount ->li_list_mtx so
    the only way to to grab ->s_mount inside li_thread is via down_read_trylock
    
    xfstests:ext4/023
    #PSBM-49658
    
    Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 5819b0e..ebeebf5 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2749,7 +2749,6 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
 	sb = elr->lr_super;
 	ngroups = EXT4_SB(sb)->s_groups_count;
 
-	sb_start_write(sb);
 	for (group = elr->lr_next_group; group < ngroups; group++) {
 		gdp = ext4_get_group_desc(sb, group, NULL);
 		if (!gdp) {
@@ -2776,8 +2775,6 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
 		elr->lr_next_sched = jiffies + elr->lr_timeout;
 		elr->lr_next_group = group + 1;
 	}
-	sb_end_write(sb);
-
 	return ret;
 }
 
@@ -2842,21 +2839,46 @@ cont_thread:
 			mutex_unlock(&eli->li_list_mtx);
 			goto exit_thread;
 		}
-
 		list_for_each_safe(pos, n, &eli->li_request_list) {
+			int err = 0;
+			int progress = 0;
 			elr = list_entry(pos, struct ext4_li_request,
 					 lr_request);
 
-			if (time_after_eq(jiffies, elr->lr_next_sched)) {
-				if (ext4_run_li_request(elr) != 0) {
-					/* error, remove the lazy_init job */
-					ext4_remove_li_request(elr);
-					continue;
+			if (time_before(jiffies, elr->lr_next_sched)) {
+				if (time_before(elr->lr_next_sched, next_wakeup))
+					next_wakeup = elr->lr_next_sched;
+				continue;
+			}
+			if (down_read_trylock(&elr->lr_super->s_umount)) {
+				if (sb_start_write_trylock(elr->lr_super)) {
+					progress = 1;
+					/*
+					 * We hold sb->s_umount, sb can not
+					 * be removed from the list, it is
+					 * now safe to drop li_list_mtx
+					 */
+					mutex_unlock(&eli->li_list_mtx);
+					err = ext4_run_li_request(elr);
+					sb_end_write(elr->lr_super);
+					mutex_lock(&eli->li_list_mtx);
+					n = pos->next;
 				}
+				up_read((&elr->lr_super->s_umount));
+			}
+			/* error, remove the lazy_init job */
+			if (err) {
+				ext4_remove_li_request(elr);
+				continue;
+			}
+			if (!progress) {
+				elr->lr_next_sched = jiffies +
+					(prandom_u32()
+					 % (EXT4_DEF_LI_MAX_START_DELAY * HZ));
+				if (time_before(elr->lr_next_sched,
+						next_wakeup))
+					next_wakeup = elr->lr_next_sched;
 			}

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] ext4: improve ext4lazyinit scalability V2
  2016-09-06  3:39   ` Theodore Ts'o
@ 2016-09-06  8:36     ` Jan Kara
  2016-09-06  9:49       ` Dmitry Monakhov
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Kara @ 2016-09-06  8:36 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Jan Kara, Dmitry Monakhov, linux-ext4

On Mon 05-09-16 23:39:54, Ted Tso wrote:
> Dmitry, thanks for the patch, and Jan, thanks for the review.
> 
> This is what I've added to the ext4 tree, which should reflect all of
> the comments which Jan made.

Hum, one comment still:

> +			if (down_read_trylock(&elr->lr_super->s_umount)) {
> +				if (sb_start_write_trylock(elr->lr_super)) {
> +					progress = 1;
> +					/*
> +					 * We hold sb->s_umount, sb can not
> +					 * be removed from the list, it is
> +					 * now safe to drop li_list_mtx
> +					 */
> +					mutex_unlock(&eli->li_list_mtx);
> +					err = ext4_run_li_request(elr);
> +					sb_end_write(elr->lr_super);
> +					mutex_lock(&eli->li_list_mtx);
> +					n = pos->next;
>  				}
> +				up_read((&elr->lr_super->s_umount));
> +			}
> +			/* error, remove the lazy_init job */
> +			if (err) {
> +				ext4_remove_li_request(elr);
> +				continue;
> +			}
> +			if (!progress) {
> +				elr->lr_next_sched = jiffies +
> +					(prandom_u32()
> +					 % (EXT4_DEF_LI_MAX_START_DELAY * HZ));
> +				if (time_before(elr->lr_next_sched,
> +						next_wakeup))
> +					next_wakeup = elr->lr_next_sched;
>  			}
> -
> -			if (time_before(elr->lr_next_sched, next_wakeup))
> -				next_wakeup = elr->lr_next_sched;
>  		}

ext4_run_li_request() can also update elr->lr_next_sched. So we need to
update next_wakeup even in progress == 1 case (i.e., I'd leave the update
of next_wakeup as is in the old code...). Otherwise the patch looks good.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ext4: improve ext4lazyinit scalability V2
  2016-09-06  8:36     ` Jan Kara
@ 2016-09-06  9:49       ` Dmitry Monakhov
  2016-09-15 15:22         ` Theodore Ts'o
  0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Monakhov @ 2016-09-06  9:49 UTC (permalink / raw)
  To: Jan Kara, Theodore Ts'o; +Cc: Jan Kara, linux-ext4


[-- Attachment #1.1: Type: text/plain, Size: 1707 bytes --]

Jan Kara <jack@suse.cz> writes:

> On Mon 05-09-16 23:39:54, Ted Tso wrote:
>> Dmitry, thanks for the patch, and Jan, thanks for the review.
>> 
>> This is what I've added to the ext4 tree, which should reflect all of
>> the comments which Jan made.
>
> Hum, one comment still:
>
>> +			if (down_read_trylock(&elr->lr_super->s_umount)) {
>> +				if (sb_start_write_trylock(elr->lr_super)) {
>> +					progress = 1;
>> +					/*
>> +					 * We hold sb->s_umount, sb can not
>> +					 * be removed from the list, it is
>> +					 * now safe to drop li_list_mtx
>> +					 */
>> +					mutex_unlock(&eli->li_list_mtx);
>> +					err = ext4_run_li_request(elr);
>> +					sb_end_write(elr->lr_super);
>> +					mutex_lock(&eli->li_list_mtx);
>> +					n = pos->next;
>>  				}
>> +				up_read((&elr->lr_super->s_umount));
>> +			}
>> +			/* error, remove the lazy_init job */
>> +			if (err) {
>> +				ext4_remove_li_request(elr);
>> +				continue;
>> +			}
>> +			if (!progress) {
>> +				elr->lr_next_sched = jiffies +
>> +					(prandom_u32()
>> +					 % (EXT4_DEF_LI_MAX_START_DELAY * HZ));
>> +				if (time_before(elr->lr_next_sched,
>> +						next_wakeup))
>> +					next_wakeup = elr->lr_next_sched;
>>  			}
>> -
>> -			if (time_before(elr->lr_next_sched, next_wakeup))
>> -				next_wakeup = elr->lr_next_sched;
>>  		}
>
> ext4_run_li_request() can also update elr->lr_next_sched. So we need to
> update next_wakeup even in progress == 1 case (i.e., I'd leave the update
> of next_wakeup as is in the old code...). Otherwise the patch looks good.
Yes. That is correct. Simply last hank shoul not be deleted.

Theodore, please fold patch attached to original one.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: lazyinit-v2-update-next-sched.patch --]
[-- Type: text/x-patch, Size: 361 bytes --]

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index bf91679..9692bbf 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2871,6 +2871,9 @@ cont_thread:
 						next_wakeup))
 					next_wakeup = elr->lr_next_sched;
 			}
+
+			if (time_before(elr->lr_next_sched, next_wakeup))
+				next_wakeup = elr->lr_next_sched;
 		}
 		mutex_unlock(&eli->li_list_mtx);
 

[-- Attachment #3: Type: text/plain, Size: 271 bytes --]



>
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] ext4: improve ext4lazyinit scalability V2
  2016-09-06  9:49       ` Dmitry Monakhov
@ 2016-09-15 15:22         ` Theodore Ts'o
  0 siblings, 0 replies; 6+ messages in thread
From: Theodore Ts'o @ 2016-09-15 15:22 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: Jan Kara, linux-ext4

On Tue, Sep 06, 2016 at 12:49:09PM +0300, Dmitry Monakhov wrote:
> > ext4_run_li_request() can also update elr->lr_next_sched. So we need to
> > update next_wakeup even in progress == 1 case (i.e., I'd leave the update
> > of next_wakeup as is in the old code...). Otherwise the patch looks good.
> Yes. That is correct. Simply last hank shoul not be deleted.
> 
> Theodore, please fold patch attached to original one.

Done, thanks!!

					- Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-09-15 15:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-15 12:23 [PATCH] ext4: improve ext4lazyinit scalability V2 Dmitry Monakhov
2016-08-15 15:05 ` Jan Kara
2016-09-06  3:39   ` Theodore Ts'o
2016-09-06  8:36     ` Jan Kara
2016-09-06  9:49       ` Dmitry Monakhov
2016-09-15 15:22         ` Theodore Ts'o

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.