linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] kernel/module: Reschedule while waiting for modules to finish loading
@ 2019-04-30 22:22 Prarit Bhargava
  2019-05-01  7:49 ` Jessica Yu
  2019-05-01 21:26 ` Prarit Bhargava
  0 siblings, 2 replies; 13+ messages in thread
From: Prarit Bhargava @ 2019-04-30 22:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: Prarit Bhargava, Heiko Carstens, Jessica Yu

On a s390 z14 LAR with 2 cpus about stalls about 3% of the time while
loading the s390_trng.ko module.

Add a reschedule point to the loop that waits for modules to complete
loading.

v3: cleanup Fixes line.

Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Fixes: f9a75c1d717f ("modules: Only return -EEXIST for modules that have finished loading")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 kernel/module.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/module.c b/kernel/module.c
index 410eeb7e4f1d..48748cfec991 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3585,6 +3585,7 @@ static int add_unformed_module(struct module *mod)
 					       finished_loading(mod->name));
 			if (err)
 				goto out_unlocked;
+			cond_resched();
 			goto again;
 		}
 		err = -EEXIST;
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] kernel/module: Reschedule while waiting for modules to finish loading
  2019-04-30 22:22 [PATCH v3] kernel/module: Reschedule while waiting for modules to finish loading Prarit Bhargava
@ 2019-05-01  7:49 ` Jessica Yu
  2019-05-01 21:26 ` Prarit Bhargava
  1 sibling, 0 replies; 13+ messages in thread
From: Jessica Yu @ 2019-05-01  7:49 UTC (permalink / raw)
  To: Prarit Bhargava; +Cc: linux-kernel, Heiko Carstens

+++ Prarit Bhargava [30/04/19 18:22 -0400]:
>On a s390 z14 LAR with 2 cpus about stalls about 3% of the time while
>loading the s390_trng.ko module.
>
>Add a reschedule point to the loop that waits for modules to complete
>loading.
>
>v3: cleanup Fixes line.

Thanks, this patch has been re-applied to modules-next.

Jessica

>Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
>Fixes: f9a75c1d717f ("modules: Only return -EEXIST for modules that have finished loading")
>Signed-off-by: Prarit Bhargava <prarit@redhat.com>
>Cc: Jessica Yu <jeyu@kernel.org>
>Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>---
> kernel/module.c | 1 +
> 1 file changed, 1 insertion(+)
>
>diff --git a/kernel/module.c b/kernel/module.c
>index 410eeb7e4f1d..48748cfec991 100644
>--- a/kernel/module.c
>+++ b/kernel/module.c
>@@ -3585,6 +3585,7 @@ static int add_unformed_module(struct module *mod)
> 					       finished_loading(mod->name));
> 			if (err)
> 				goto out_unlocked;
>+			cond_resched();
> 			goto again;
> 		}
> 		err = -EEXIST;
>-- 
>2.18.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] kernel/module: Reschedule while waiting for modules to finish loading
  2019-04-30 22:22 [PATCH v3] kernel/module: Reschedule while waiting for modules to finish loading Prarit Bhargava
  2019-05-01  7:49 ` Jessica Yu
@ 2019-05-01 21:26 ` Prarit Bhargava
  2019-05-02  9:48   ` Jessica Yu
  1 sibling, 1 reply; 13+ messages in thread
From: Prarit Bhargava @ 2019-05-01 21:26 UTC (permalink / raw)
  To: linux-kernel, Jessica Yu; +Cc: Heiko Carstens, David Arcari



On 4/30/19 6:22 PM, Prarit Bhargava wrote:
> On a s390 z14 LAR with 2 cpus about stalls about 3% of the time while
> loading the s390_trng.ko module.
> 
> Add a reschedule point to the loop that waits for modules to complete
> loading.
> 
> v3: cleanup Fixes line.

Jessica, even with this additional patch there appears to be some other issues
in the module code that are causing significant delays in boot up on large
systems.

Please revert these fixes from linux-next & modules-next.  I apologize for the
extra work but I think it is for the best until I come up with a more complete &
better tested patch.

FWIW, the logic in the original patch is correct.  It's just that there's, as
Heiko discovered, some poor scheduling, etc., that is impacting the module
loading code after these changes.

Again, my apologies,

P.

> 
> Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> Fixes: f9a75c1d717f ("modules: Only return -EEXIST for modules that have finished loading")
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> Cc: Jessica Yu <jeyu@kernel.org>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> ---
>  kernel/module.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 410eeb7e4f1d..48748cfec991 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3585,6 +3585,7 @@ static int add_unformed_module(struct module *mod)
>  					       finished_loading(mod->name));
>  			if (err)
>  				goto out_unlocked;
> +			cond_resched();
>  			goto again;
>  		}
>  		err = -EEXIST;
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] kernel/module: Reschedule while waiting for modules to finish loading
  2019-05-01 21:26 ` Prarit Bhargava
@ 2019-05-02  9:48   ` Jessica Yu
  2019-05-02 12:41     ` Prarit Bhargava
  0 siblings, 1 reply; 13+ messages in thread
From: Jessica Yu @ 2019-05-02  9:48 UTC (permalink / raw)
  To: Prarit Bhargava; +Cc: linux-kernel, Heiko Carstens, David Arcari

+++ Prarit Bhargava [01/05/19 17:26 -0400]:
>
>
>On 4/30/19 6:22 PM, Prarit Bhargava wrote:
>> On a s390 z14 LAR with 2 cpus about stalls about 3% of the time while
>> loading the s390_trng.ko module.
>>
>> Add a reschedule point to the loop that waits for modules to complete
>> loading.
>>
>> v3: cleanup Fixes line.
>
>Jessica, even with this additional patch there appears to be some other issues
>in the module code that are causing significant delays in boot up on large
>systems.

Is this limited to only s390? Or are you seeing this on other arches
as well? And is it limited to specific modules (like s390_trng)?

>Please revert these fixes from linux-next & modules-next.  I apologize for the
>extra work but I think it is for the best until I come up with a more complete &
>better tested patch.

Sure, then I will revert this patch and the other one as well
("modules: Only return -EEXIST for modules that have finished
loading").

>FWIW, the logic in the original patch is correct.  It's just that there's, as
>Heiko discovered, some poor scheduling, etc., that is impacting the module
>loading code after these changes.

I am really curious to see what these performance regressions look
like :/ Please update us when you find out more.

>Again, my apologies,

No problem! Thanks again.

Jessica


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] kernel/module: Reschedule while waiting for modules to finish loading
  2019-05-02  9:48   ` Jessica Yu
@ 2019-05-02 12:41     ` Prarit Bhargava
  2019-05-02 17:46       ` Prarit Bhargava
  0 siblings, 1 reply; 13+ messages in thread
From: Prarit Bhargava @ 2019-05-02 12:41 UTC (permalink / raw)
  To: Jessica Yu; +Cc: linux-kernel, Heiko Carstens, David Arcari



On 5/2/19 5:48 AM, Jessica Yu wrote:
> +++ Prarit Bhargava [01/05/19 17:26 -0400]:
>>
>>
>> On 4/30/19 6:22 PM, Prarit Bhargava wrote:
>>> On a s390 z14 LAR with 2 cpus about stalls about 3% of the time while
>>> loading the s390_trng.ko module.
>>>
>>> Add a reschedule point to the loop that waits for modules to complete
>>> loading.
>>>
>>> v3: cleanup Fixes line.
>>
>> Jessica, even with this additional patch there appears to be some other issues
>> in the module code that are causing significant delays in boot up on large
>> systems.
> 
> Is this limited to only s390? Or are you seeing this on other arches
> as well? And is it limited to specific modules (like s390_trng)?

Other arches.  We're seeing a hang on a new 192 CPU x86_64 box & the
acpi_cpufreq driver.  The system is MUCH faster than any other x86_64 box I've
seen and that's likely why I'm seeing a problem.

> 
>> FWIW, the logic in the original patch is correct.  It's just that there's, as
>> Heiko discovered, some poor scheduling, etc., that is impacting the module
>> loading code after these changes.
> 
> I am really curious to see what these performance regressions look
> like :/ Please update us when you find out more.
> 

I sent Heiko a private v4 RFC last night with this patch (sorry for the
cut-and-paste)

diff --git a/kernel/module.c b/kernel/module.c
index 1c429d8d2d74..a4ef8628f26f 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3568,12 +3568,12 @@ static int add_unformed_module(struct module *mod)
	mutex_lock(&module_mutex);
	old = find_module_all(mod->name, strlen(mod->name), true);
	if (old != NULL) {
-		if (old->state == MODULE_STATE_COMING
-		    || old->state == MODULE_STATE_UNFORMED) {
+		if (old->state != MODULE_STATE_LIVE) {
			/* Wait in case it fails to load. */
			mutex_unlock(&module_mutex);
-			err = wait_event_interruptible(module_wq,
-					       finished_loading(mod->name));
+			err = wait_event_interruptible_timeout(module_wq,
+					       finished_loading(mod->name),
+					       HZ / 10000);
			if (err)
				goto out_unlocked;
			goto again;

The original module dependency race issue is fixed simply by changing the
conditional to checking !MODULE_STATE_LIVE.  This, unfortunately, exposed some
other problems within the code.

The module_wq is only run when a module fails to load.  It's possible that
the time between the module's failed init() call and running module_wq
(kernel/module.c:3455) takes a while.  Any thread entering the
add_unformed_module() code while the old module is unloading is put to sleep
waiting for the module_wq to execute.

On the 192 thread box I have noticed that the acpi_cpufreq module attempts
to load 392 times (that is not a typo and I am going to try to figure that
problem out after this one).  This means 191 cpus are put to sleep, and one
cpu is executing the acpi_cpufreq module unload which is executing
do_init_module() and is now at

fail_free_freeinit:
        kfree(freeinit);
fail:
        /* Try to protect us from buggy refcounters. */
        mod->state = MODULE_STATE_GOING;
        synchronize_rcu();
        module_put(mod);
        blocking_notifier_call_chain(&module_notify_list,
                                     MODULE_STATE_GOING, mod);
        klp_module_going(mod);
        ftrace_release_mod(mod);
        free_module(mod);
        wake_up_all(&module_wq);
        return ret;
}

The 191 threads cannot schedule and the system is effectively stuck.  It *does*
eventually free itself but in some cases it takes minutes to do so.

A simple fix for this is to, as I've done above, to add a timeout so that
the threads can be scheduled which allows other processes to run.  After
thinking about it a bit more, however, I wonder if a better approach is to
change the mod->state to MODULE_FAILED & running the module_wq immediately so
that the threads can return an error.

I'm experimenting with that now.

P.

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] kernel/module: Reschedule while waiting for modules to finish loading
  2019-05-02 12:41     ` Prarit Bhargava
@ 2019-05-02 17:46       ` Prarit Bhargava
  2019-05-10 18:40         ` Barret Rhoden
  0 siblings, 1 reply; 13+ messages in thread
From: Prarit Bhargava @ 2019-05-02 17:46 UTC (permalink / raw)
  To: Jessica Yu; +Cc: linux-kernel, Heiko Carstens, David Arcari



On 5/2/19 8:41 AM, Prarit Bhargava wrote:
> 
> 
> On 5/2/19 5:48 AM, Jessica Yu wrote:
>> +++ Prarit Bhargava [01/05/19 17:26 -0400]:
>>>
>>>
>>> On 4/30/19 6:22 PM, Prarit Bhargava wrote:
>>>> On a s390 z14 LAR with 2 cpus about stalls about 3% of the time while
>>>> loading the s390_trng.ko module.
>>>>
>>>> Add a reschedule point to the loop that waits for modules to complete
>>>> loading.
>>>>
>>>> v3: cleanup Fixes line.
>>>
>>> Jessica, even with this additional patch there appears to be some other issues
>>> in the module code that are causing significant delays in boot up on large
>>> systems.
>>
>> Is this limited to only s390? Or are you seeing this on other arches
>> as well? And is it limited to specific modules (like s390_trng)?
> 
> Other arches.  We're seeing a hang on a new 192 CPU x86_64 box & the
> acpi_cpufreq driver.  The system is MUCH faster than any other x86_64 box I've
> seen and that's likely why I'm seeing a problem.
> 
>>
>>> FWIW, the logic in the original patch is correct.  It's just that there's, as
>>> Heiko discovered, some poor scheduling, etc., that is impacting the module
>>> loading code after these changes.
>>
>> I am really curious to see what these performance regressions look
>> like :/ Please update us when you find out more.
>>
> 
> I sent Heiko a private v4 RFC last night with this patch (sorry for the
> cut-and-paste)
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 1c429d8d2d74..a4ef8628f26f 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3568,12 +3568,12 @@ static int add_unformed_module(struct module *mod)
> 	mutex_lock(&module_mutex);
> 	old = find_module_all(mod->name, strlen(mod->name), true);
> 	if (old != NULL) {
> -		if (old->state == MODULE_STATE_COMING
> -		    || old->state == MODULE_STATE_UNFORMED) {
> +		if (old->state != MODULE_STATE_LIVE) {
> 			/* Wait in case it fails to load. */
> 			mutex_unlock(&module_mutex);
> -			err = wait_event_interruptible(module_wq,
> -					       finished_loading(mod->name));
> +			err = wait_event_interruptible_timeout(module_wq,
> +					       finished_loading(mod->name),
> +					       HZ / 10000);
> 			if (err)
> 				goto out_unlocked;
> 			goto again;
> 
> The original module dependency race issue is fixed simply by changing the
> conditional to checking !MODULE_STATE_LIVE.  This, unfortunately, exposed some
> other problems within the code.
> 
> The module_wq is only run when a module fails to load.  It's possible that
> the time between the module's failed init() call and running module_wq
> (kernel/module.c:3455) takes a while.  Any thread entering the
> add_unformed_module() code while the old module is unloading is put to sleep
> waiting for the module_wq to execute.
> 
> On the 192 thread box I have noticed that the acpi_cpufreq module attempts
> to load 392 times (that is not a typo and I am going to try to figure that
> problem out after this one).  This means 191 cpus are put to sleep, and one
> cpu is executing the acpi_cpufreq module unload which is executing
> do_init_module() and is now at
> 
> fail_free_freeinit:
>         kfree(freeinit);
> fail:
>         /* Try to protect us from buggy refcounters. */
>         mod->state = MODULE_STATE_GOING;
>         synchronize_rcu();
>         module_put(mod);
>         blocking_notifier_call_chain(&module_notify_list,
>                                      MODULE_STATE_GOING, mod);
>         klp_module_going(mod);
>         ftrace_release_mod(mod);
>         free_module(mod);
>         wake_up_all(&module_wq);
>         return ret;
> }
> 
> The 191 threads cannot schedule and the system is effectively stuck.  It *does*
> eventually free itself but in some cases it takes minutes to do so.
> 
> A simple fix for this is to, as I've done above, to add a timeout so that
> the threads can be scheduled which allows other processes to run.  

After taking a much closer look the above patch appears to be correct.  I am not
seeing any boot failures associated with it anywhere.  I would like to hear from
Heiko as to whether or not this works for him though.

P.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] kernel/module: Reschedule while waiting for modules to finish loading
  2019-05-02 17:46       ` Prarit Bhargava
@ 2019-05-10 18:40         ` Barret Rhoden
  2019-05-10 18:42           ` [PATCH] modules: fix livelock in add_unformed_module() Barret Rhoden
  0 siblings, 1 reply; 13+ messages in thread
From: Barret Rhoden @ 2019-05-10 18:40 UTC (permalink / raw)
  To: Prarit Bhargava, Jessica Yu; +Cc: linux-kernel, Heiko Carstens, David Arcari

Hi -

On 5/2/19 1:46 PM, Prarit Bhargava wrote:
> On 5/2/19 8:41 AM, Prarit Bhargava wrote:
>> On 5/2/19 5:48 AM, Jessica Yu wrote:
>>> +++ Prarit Bhargava [01/05/19 17:26 -0400]:
>>>> On 4/30/19 6:22 PM, Prarit Bhargava wrote:
>>>>> On a s390 z14 LAR with 2 cpus about stalls about 3% of the time while
>>>>> loading the s390_trng.ko module.
>>>>>
>>>>> Add a reschedule point to the loop that waits for modules to complete
>>>>> loading.
>>>>>
>>>>> v3: cleanup Fixes line.
>>>>
>>>> Jessica, even with this additional patch there appears to be some other issues
>>>> in the module code that are causing significant delays in boot up on large
>>>> systems.
>>>
>>> Is this limited to only s390? Or are you seeing this on other arches
>>> as well? And is it limited to specific modules (like s390_trng)?
>>
>> Other arches.  We're seeing a hang on a new 192 CPU x86_64 box & the
>> acpi_cpufreq driver.  The system is MUCH faster than any other x86_64 box I've
>> seen and that's likely why I'm seeing a problem.
>>
>>>
>>>> FWIW, the logic in the original patch is correct.  It's just that there's, as
>>>> Heiko discovered, some poor scheduling, etc., that is impacting the module
>>>> loading code after these changes.
>>>
>>> I am really curious to see what these performance regressions look
>>> like :/ Please update us when you find out more.
>>>
>>
>> I sent Heiko a private v4 RFC last night with this patch (sorry for the
>> cut-and-paste)
>>
>> diff --git a/kernel/module.c b/kernel/module.c
>> index 1c429d8d2d74..a4ef8628f26f 100644
>> --- a/kernel/module.c
>> +++ b/kernel/module.c
>> @@ -3568,12 +3568,12 @@ static int add_unformed_module(struct module *mod)
>> 	mutex_lock(&module_mutex);
>> 	old = find_module_all(mod->name, strlen(mod->name), true);
>> 	if (old != NULL) {
>> -		if (old->state == MODULE_STATE_COMING
>> -		    || old->state == MODULE_STATE_UNFORMED) {
>> +		if (old->state != MODULE_STATE_LIVE) {
>> 			/* Wait in case it fails to load. */
>> 			mutex_unlock(&module_mutex);
>> -			err = wait_event_interruptible(module_wq,
>> -					       finished_loading(mod->name));
>> +			err = wait_event_interruptible_timeout(module_wq,
>> +					       finished_loading(mod->name),
>> +					       HZ / 10000);
>> 			if (err)
>> 				goto out_unlocked;
>> 			goto again;
>>
>> The original module dependency race issue is fixed simply by changing the
>> conditional to checking !MODULE_STATE_LIVE.  This, unfortunately, exposed some
>> other problems within the code.
>>
>> The module_wq is only run when a module fails to load.  It's possible that
>> the time between the module's failed init() call and running module_wq
>> (kernel/module.c:3455) takes a while.  Any thread entering the
>> add_unformed_module() code while the old module is unloading is put to sleep
>> waiting for the module_wq to execute.
>>
>> On the 192 thread box I have noticed that the acpi_cpufreq module attempts
>> to load 392 times (that is not a typo and I am going to try to figure that
>> problem out after this one).  This means 191 cpus are put to sleep, and one
>> cpu is executing the acpi_cpufreq module unload which is executing
>> do_init_module() and is now at
>>
>> fail_free_freeinit:
>>          kfree(freeinit);
>> fail:
>>          /* Try to protect us from buggy refcounters. */
>>          mod->state = MODULE_STATE_GOING;
>>          synchronize_rcu();
>>          module_put(mod);
>>          blocking_notifier_call_chain(&module_notify_list,
>>                                       MODULE_STATE_GOING, mod);
>>          klp_module_going(mod);
>>          ftrace_release_mod(mod);
>>          free_module(mod);
>>          wake_up_all(&module_wq);
>>          return ret;
>> }
>>
>> The 191 threads cannot schedule and the system is effectively stuck.  It *does*
>> eventually free itself but in some cases it takes minutes to do so.
>>
>> A simple fix for this is to, as I've done above, to add a timeout so that
>> the threads can be scheduled which allows other processes to run.
> 
> After taking a much closer look the above patch appears to be correct.  I am not
> seeing any boot failures associated with it anywhere.  I would like to hear from
> Heiko as to whether or not this works for him though.

I think I found the issue here.  The original patch changed the state 
check to "not LIVE", which made it include GOING.  However, the wake 
condition was not changed.  That could lead to a livelock, which I 
experienced.

I have a patch that fixes it, which I'll send out shortly.  With my 
change, I think you won't need any of the scheduler functions 
(cond_resched(), wait timeouts, etc).  Those were probably just papering 
over the issue.

Barret


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] modules: fix livelock in add_unformed_module()
  2019-05-10 18:40         ` Barret Rhoden
@ 2019-05-10 18:42           ` Barret Rhoden
  2019-05-13 11:23             ` Prarit Bhargava
  0 siblings, 1 reply; 13+ messages in thread
From: Barret Rhoden @ 2019-05-10 18:42 UTC (permalink / raw)
  To: Jessica Yu, Prarit Bhargava; +Cc: linux-kernel, Heiko Carstens, David Arcari

When add_unformed_module() finds an existing module with the same name,
it waits until the preexisting module finished loading.  Prior to commit
f9a75c1d717f, this meant if the module was either UNFORMED or COMING,
we'd wait.  That commit changed the check to be !LIVE, so that we'd wait
for UNFORMED, COMING, or GOING.

The problem with that commit was that we wait on finished_loading(), and
that function's state checks were not changed.  If a module was
GOING, finished_loading() was returning true, meaning to recheck the
state and presumably break out of the loop.  This mismatch between the
state checked by add_unformed_module() and the state checked in
finished_loading() could result in a hang.

Specifically, when a module was GOING, wait_event_interruptible() would
immediately return with no error, we'd goto 'again,' see the state !=
LIVE, and try again.

This commit changes finished_loading() such that we only consider a
module 'finished' when it doesn't exist or is LIVE, which are the cases
that break from the wait-loop in add_unformed_module().

Fixes: f9a75c1d717f ("modules: Only return -EEXIST for modules that have finished loading")
Signed-off-by: Barret Rhoden <brho@google.com>
---
 kernel/module.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 410eeb7e4f1d..0744eea7bb90 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3407,8 +3407,7 @@ static bool finished_loading(const char *name)
 	sched_annotate_sleep();
 	mutex_lock(&module_mutex);
 	mod = find_module_all(name, strlen(name), true);
-	ret = !mod || mod->state == MODULE_STATE_LIVE
-		|| mod->state == MODULE_STATE_GOING;
+	ret = !mod || mod->state == MODULE_STATE_LIVE;
 	mutex_unlock(&module_mutex);
 
 	return ret;
-- 
2.21.0.1020.gf2820cf01a-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] modules: fix livelock in add_unformed_module()
  2019-05-10 18:42           ` [PATCH] modules: fix livelock in add_unformed_module() Barret Rhoden
@ 2019-05-13 11:23             ` Prarit Bhargava
  2019-05-13 14:37               ` Barret Rhoden
  0 siblings, 1 reply; 13+ messages in thread
From: Prarit Bhargava @ 2019-05-13 11:23 UTC (permalink / raw)
  To: Barret Rhoden, Jessica Yu; +Cc: linux-kernel, Heiko Carstens, David Arcari



On 5/10/19 2:42 PM, Barret Rhoden wrote:
> When add_unformed_module() finds an existing module with the same name,
> it waits until the preexisting module finished loading.  Prior to commit
> f9a75c1d717f, this meant if the module was either UNFORMED or COMING,
> we'd wait.  That commit changed the check to be !LIVE, so that we'd wait
> for UNFORMED, COMING, or GOING.

Hi Barret, thanks for the fix but I dropped that patch for other reasons.
Please see below.

> 
> The problem with that commit was that we wait on finished_loading(), and
> that function's state checks were not changed.  If a module was
> GOING, finished_loading() was returning true, meaning to recheck the
> state and presumably break out of the loop.  This mismatch between the
> state checked by add_unformed_module() and the state checked in
> finished_loading() could result in a hang.
> 
> Specifically, when a module was GOING, wait_event_interruptible() would
> immediately return with no error, we'd goto 'again,' see the state !=
> LIVE, and try again.
> 
> This commit changes finished_loading() such that we only consider a
> module 'finished' when it doesn't exist or is LIVE, which are the cases
> that break from the wait-loop in add_unformed_module().
> 
> Fixes: f9a75c1d717f ("modules: Only return -EEXIST for modules that have finished loading")
> Signed-off-by: Barret Rhoden <brho@google.com>
> ---
>  kernel/module.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 410eeb7e4f1d..0744eea7bb90 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3407,8 +3407,7 @@ static bool finished_loading(const char *name)
>  	sched_annotate_sleep();
>  	mutex_lock(&module_mutex);
>  	mod = find_module_all(name, strlen(name), true);
> -	ret = !mod || mod->state == MODULE_STATE_LIVE
> -		|| mod->state == MODULE_STATE_GOING;
> +	ret = !mod || mod->state == MODULE_STATE_LIVE;
>  	mutex_unlock(&module_mutex);

With your change the above my x86 system still locks up during boot and adds
minutes to boot time because the many CPUs are sleeping waiting for the
module_wq to be run.

A module is loaded once for each cpu.

CPU 0 loads the module, adds the module to the modules list, and sets the module
state to MODULE_STATE_UNFORMED.

Simultaneously all the other CPUs also load the module and sleep in
add_unformed_module().

If CPU 0 is interrupted/rescheduled for any reason that means the other CPUs are
stuck waiting until the module thread on CPU0 is executed again.  I mentioned
earlier that on some systems systemd is launching greater than the number of CPU
module loads so this occurs quite often on large CPU systems.  This is an odd
bug that also needs to be resolved but I suspect that it is in systemd & not the
kernel.

My follow-up patch changes from wait_event_interruptible() to
wait_event_interruptible_timeout() so the CPUs are no longer sleeping and can
make progress on other tasks, which changes the return values from
wait_event_interruptible().

https://marc.info/?l=linux-kernel&m=155724085927589&w=2

I believe this also takes your concern into account?

Please correct me if I'm wrong and am not understanding your issue.

P.

>  
>  	return ret;
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] modules: fix livelock in add_unformed_module()
  2019-05-13 11:23             ` Prarit Bhargava
@ 2019-05-13 14:37               ` Barret Rhoden
  2019-05-22 17:08                 ` Prarit Bhargava
  0 siblings, 1 reply; 13+ messages in thread
From: Barret Rhoden @ 2019-05-13 14:37 UTC (permalink / raw)
  To: Prarit Bhargava, Jessica Yu; +Cc: linux-kernel, Heiko Carstens, David Arcari

Hi -

On 5/13/19 7:23 AM, Prarit Bhargava wrote:
[snip]
> A module is loaded once for each cpu.

Does one CPU succeed in loading the module, and the others fail with EEXIST?

> My follow-up patch changes from wait_event_interruptible() to
> wait_event_interruptible_timeout() so the CPUs are no longer sleeping and can
> make progress on other tasks, which changes the return values from
> wait_event_interruptible().
> 
> https://marc.info/?l=linux-kernel&m=155724085927589&w=2
> 
> I believe this also takes your concern into account?

That patch might work for me, but I think it papers over the bug where 
the check on old->state that you make before sleeping (was COMING || 
UNFORMED, now !LIVE) doesn't match the check to wake up in 
finished_loading().

The reason the issue might not show up in practice is that your patch 
basically polls, so the condition checks in finished_loading() are only 
a quicker exit.

If you squash my patch into yours, I think it will cover that case. 
Though if polling is the right answer here, it also raises the question 
of whether or not we even need finished_loading().

Barret

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] modules: fix livelock in add_unformed_module()
  2019-05-13 14:37               ` Barret Rhoden
@ 2019-05-22 17:08                 ` Prarit Bhargava
  2019-05-28 14:30                   ` Prarit Bhargava
  0 siblings, 1 reply; 13+ messages in thread
From: Prarit Bhargava @ 2019-05-22 17:08 UTC (permalink / raw)
  To: Barret Rhoden, Jessica Yu; +Cc: linux-kernel, Heiko Carstens, David Arcari



On 5/13/19 10:37 AM, Barret Rhoden wrote:
> Hi -
> 

Hey Barret, my apologies for not getting back to you earlier.  I got caught up
in something that took me away from this issue.

> On 5/13/19 7:23 AM, Prarit Bhargava wrote:
> [snip]
>> A module is loaded once for each cpu.
> 
> Does one CPU succeed in loading the module, and the others fail with EEXIST?
> 
>> My follow-up patch changes from wait_event_interruptible() to
>> wait_event_interruptible_timeout() so the CPUs are no longer sleeping and can
>> make progress on other tasks, which changes the return values from
>> wait_event_interruptible().
>>
>> https://marc.info/?l=linux-kernel&m=155724085927589&w=2
>>
>> I believe this also takes your concern into account?
> 
> That patch might work for me, but I think it papers over the bug where the check
> on old->state that you make before sleeping (was COMING || UNFORMED, now !LIVE)
> doesn't match the check to wake up in finished_loading().
> 
> The reason the issue might not show up in practice is that your patch basically
> polls, so the condition checks in finished_loading() are only a quicker exit.
> 
> If you squash my patch into yours, I think it will cover that case. Though if
> polling is the right answer here, it also raises the question of whether or not
> we even need finished_loading().
> 

The more I look at this I think you're right.  Let me do some additional testing
with your patch + my original patch.

P.


> Barret

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] modules: fix livelock in add_unformed_module()
  2019-05-22 17:08                 ` Prarit Bhargava
@ 2019-05-28 14:30                   ` Prarit Bhargava
  2019-05-28 14:47                     ` Jessica Yu
  0 siblings, 1 reply; 13+ messages in thread
From: Prarit Bhargava @ 2019-05-28 14:30 UTC (permalink / raw)
  To: Barret Rhoden, Jessica Yu; +Cc: linux-kernel, Heiko Carstens, David Arcari



On 5/22/19 1:08 PM, Prarit Bhargava wrote:
> 
> 
> On 5/13/19 10:37 AM, Barret Rhoden wrote:
>> Hi -
>>
> 
> Hey Barret, my apologies for not getting back to you earlier.  I got caught up
> in something that took me away from this issue.
> 
>> On 5/13/19 7:23 AM, Prarit Bhargava wrote:
>> [snip]
>>> A module is loaded once for each cpu.
>>
>> Does one CPU succeed in loading the module, and the others fail with EEXIST?
>>
>>> My follow-up patch changes from wait_event_interruptible() to
>>> wait_event_interruptible_timeout() so the CPUs are no longer sleeping and can
>>> make progress on other tasks, which changes the return values from
>>> wait_event_interruptible().
>>>
>>> https://marc.info/?l=linux-kernel&m=155724085927589&w=2
>>>
>>> I believe this also takes your concern into account?
>>
>> That patch might work for me, but I think it papers over the bug where the check
>> on old->state that you make before sleeping (was COMING || UNFORMED, now !LIVE)
>> doesn't match the check to wake up in finished_loading().
>>
>> The reason the issue might not show up in practice is that your patch basically
>> polls, so the condition checks in finished_loading() are only a quicker exit.
>>
>> If you squash my patch into yours, I think it will cover that case. Though if
>> polling is the right answer here, it also raises the question of whether or not
>> we even need finished_loading().
>>
> 
> The more I look at this I think you're right.  Let me do some additional testing
> with your patch + my original patch.
> 

I have done testing on arm64, s390x, ppc64le, ppc64, and x86 and have not seen
any issues.

Jessica, how would you like me to proceed?  Would you like an updated patch with
Signed-off's from both Barret & myself?

P.

> P.
> 
> 
>> Barret

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] modules: fix livelock in add_unformed_module()
  2019-05-28 14:30                   ` Prarit Bhargava
@ 2019-05-28 14:47                     ` Jessica Yu
  0 siblings, 0 replies; 13+ messages in thread
From: Jessica Yu @ 2019-05-28 14:47 UTC (permalink / raw)
  To: Prarit Bhargava; +Cc: Barret Rhoden, linux-kernel, Heiko Carstens, David Arcari

+++ Prarit Bhargava [28/05/19 10:30 -0400]:
>
>
>On 5/22/19 1:08 PM, Prarit Bhargava wrote:
>>
>>
>> On 5/13/19 10:37 AM, Barret Rhoden wrote:
>>> Hi -
>>>
>>
>> Hey Barret, my apologies for not getting back to you earlier.  I got caught up
>> in something that took me away from this issue.
>>
>>> On 5/13/19 7:23 AM, Prarit Bhargava wrote:
>>> [snip]
>>>> A module is loaded once for each cpu.
>>>
>>> Does one CPU succeed in loading the module, and the others fail with EEXIST?
>>>
>>>> My follow-up patch changes from wait_event_interruptible() to
>>>> wait_event_interruptible_timeout() so the CPUs are no longer sleeping and can
>>>> make progress on other tasks, which changes the return values from
>>>> wait_event_interruptible().
>>>>
>>>> https://marc.info/?l=linux-kernel&m=155724085927589&w=2
>>>>
>>>> I believe this also takes your concern into account?
>>>
>>> That patch might work for me, but I think it papers over the bug where the check
>>> on old->state that you make before sleeping (was COMING || UNFORMED, now !LIVE)
>>> doesn't match the check to wake up in finished_loading().
>>>
>>> The reason the issue might not show up in practice is that your patch basically
>>> polls, so the condition checks in finished_loading() are only a quicker exit.
>>>
>>> If you squash my patch into yours, I think it will cover that case. Though if
>>> polling is the right answer here, it also raises the question of whether or not
>>> we even need finished_loading().
>>>
>>
>> The more I look at this I think you're right.  Let me do some additional testing
>> with your patch + my original patch.
>>
>
>I have done testing on arm64, s390x, ppc64le, ppc64, and x86 and have not seen
>any issues.
>
>Jessica, how would you like me to proceed?  Would you like an updated patch with
>Signed-off's from both Barret & myself?

Hi Prarit,

A freshly sent patch with the squashed changes and Signed-off's would
be great. Thank you both for testing and looking into this!

Jessica

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-05-28 14:47 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-30 22:22 [PATCH v3] kernel/module: Reschedule while waiting for modules to finish loading Prarit Bhargava
2019-05-01  7:49 ` Jessica Yu
2019-05-01 21:26 ` Prarit Bhargava
2019-05-02  9:48   ` Jessica Yu
2019-05-02 12:41     ` Prarit Bhargava
2019-05-02 17:46       ` Prarit Bhargava
2019-05-10 18:40         ` Barret Rhoden
2019-05-10 18:42           ` [PATCH] modules: fix livelock in add_unformed_module() Barret Rhoden
2019-05-13 11:23             ` Prarit Bhargava
2019-05-13 14:37               ` Barret Rhoden
2019-05-22 17:08                 ` Prarit Bhargava
2019-05-28 14:30                   ` Prarit Bhargava
2019-05-28 14:47                     ` Jessica Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).