From: Barret Rhoden <brho@google.com>
To: Prarit Bhargava <prarit@redhat.com>, Jessica Yu <jeyu@kernel.org>
Cc: linux-kernel@vger.kernel.org,
Heiko Carstens <heiko.carstens@de.ibm.com>,
David Arcari <darcari@redhat.com>
Subject: Re: [PATCH v3] kernel/module: Reschedule while waiting for modules to finish loading
Date: Fri, 10 May 2019 14:40:30 -0400 [thread overview]
Message-ID: <be47ac01-a5ac-7be1-d387-5c841007b45f@google.com> (raw)
In-Reply-To: <4849a5af-b2bb-95ad-6b58-2f0c403c9ebb@redhat.com>
Hi -
On 5/2/19 1:46 PM, Prarit Bhargava wrote:
> On 5/2/19 8:41 AM, Prarit Bhargava wrote:
>> On 5/2/19 5:48 AM, Jessica Yu wrote:
>>> +++ Prarit Bhargava [01/05/19 17:26 -0400]:
>>>> On 4/30/19 6:22 PM, Prarit Bhargava wrote:
>>>>> On a s390 z14 LAR with 2 cpus about stalls about 3% of the time while
>>>>> loading the s390_trng.ko module.
>>>>>
>>>>> Add a reschedule point to the loop that waits for modules to complete
>>>>> loading.
>>>>>
>>>>> v3: cleanup Fixes line.
>>>>
>>>> Jessica, even with this additional patch there appears to be some other issues
>>>> in the module code that are causing significant delays in boot up on large
>>>> systems.
>>>
>>> Is this limited to only s390? Or are you seeing this on other arches
>>> as well? And is it limited to specific modules (like s390_trng)?
>>
>> Other arches. We're seeing a hang on a new 192 CPU x86_64 box & the
>> acpi_cpufreq driver. The system is MUCH faster than any other x86_64 box I've
>> seen and that's likely why I'm seeing a problem.
>>
>>>
>>>> FWIW, the logic in the original patch is correct. It's just that there's, as
>>>> Heiko discovered, some poor scheduling, etc., that is impacting the module
>>>> loading code after these changes.
>>>
>>> I am really curious to see what these performance regressions look
>>> like :/ Please update us when you find out more.
>>>
>>
>> I sent Heiko a private v4 RFC last night with this patch (sorry for the
>> cut-and-paste)
>>
>> diff --git a/kernel/module.c b/kernel/module.c
>> index 1c429d8d2d74..a4ef8628f26f 100644
>> --- a/kernel/module.c
>> +++ b/kernel/module.c
>> @@ -3568,12 +3568,12 @@ static int add_unformed_module(struct module *mod)
>> mutex_lock(&module_mutex);
>> old = find_module_all(mod->name, strlen(mod->name), true);
>> if (old != NULL) {
>> - if (old->state == MODULE_STATE_COMING
>> - || old->state == MODULE_STATE_UNFORMED) {
>> + if (old->state != MODULE_STATE_LIVE) {
>> /* Wait in case it fails to load. */
>> mutex_unlock(&module_mutex);
>> - err = wait_event_interruptible(module_wq,
>> - finished_loading(mod->name));
>> + err = wait_event_interruptible_timeout(module_wq,
>> + finished_loading(mod->name),
>> + HZ / 10000);
>> if (err)
>> goto out_unlocked;
>> goto again;
>>
>> The original module dependency race issue is fixed simply by changing the
>> conditional to checking !MODULE_STATE_LIVE. This, unfortunately, exposed some
>> other problems within the code.
>>
>> The module_wq is only run when a module fails to load. It's possible that
>> the time between the module's failed init() call and running module_wq
>> (kernel/module.c:3455) takes a while. Any thread entering the
>> add_unformed_module() code while the old module is unloading is put to sleep
>> waiting for the module_wq to execute.
>>
>> On the 192 thread box I have noticed that the acpi_cpufreq module attempts
>> to load 392 times (that is not a typo and I am going to try to figure that
>> problem out after this one). This means 191 cpus are put to sleep, and one
>> cpu is executing the acpi_cpufreq module unload which is executing
>> do_init_module() and is now at
>>
>> fail_free_freeinit:
>> kfree(freeinit);
>> fail:
>> /* Try to protect us from buggy refcounters. */
>> mod->state = MODULE_STATE_GOING;
>> synchronize_rcu();
>> module_put(mod);
>> blocking_notifier_call_chain(&module_notify_list,
>> MODULE_STATE_GOING, mod);
>> klp_module_going(mod);
>> ftrace_release_mod(mod);
>> free_module(mod);
>> wake_up_all(&module_wq);
>> return ret;
>> }
>>
>> The 191 threads cannot schedule and the system is effectively stuck. It *does*
>> eventually free itself but in some cases it takes minutes to do so.
>>
>> A simple fix for this is to, as I've done above, to add a timeout so that
>> the threads can be scheduled which allows other processes to run.
>
> After taking a much closer look the above patch appears to be correct. I am not
> seeing any boot failures associated with it anywhere. I would like to hear from
> Heiko as to whether or not this works for him though.
I think I found the issue here. The original patch changed the state
check to "not LIVE", which made it include GOING. However, the wake
condition was not changed. That could lead to a livelock, which I
experienced.
I have a patch that fixes it, which I'll send out shortly. With my
change, I think you won't need any of the scheduler functions
(cond_resched(), wait timeouts, etc). Those were probably just papering
over the issue.
Barret
next prev parent reply other threads:[~2019-05-10 18:40 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-30 22:22 [PATCH v3] kernel/module: Reschedule while waiting for modules to finish loading Prarit Bhargava
2019-05-01 7:49 ` Jessica Yu
2019-05-01 21:26 ` Prarit Bhargava
2019-05-02 9:48 ` Jessica Yu
2019-05-02 12:41 ` Prarit Bhargava
2019-05-02 17:46 ` Prarit Bhargava
2019-05-10 18:40 ` Barret Rhoden [this message]
2019-05-10 18:42 ` [PATCH] modules: fix livelock in add_unformed_module() Barret Rhoden
2019-05-13 11:23 ` Prarit Bhargava
2019-05-13 14:37 ` Barret Rhoden
2019-05-22 17:08 ` Prarit Bhargava
2019-05-28 14:30 ` Prarit Bhargava
2019-05-28 14:47 ` Jessica Yu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=be47ac01-a5ac-7be1-d387-5c841007b45f@google.com \
--to=brho@google.com \
--cc=darcari@redhat.com \
--cc=heiko.carstens@de.ibm.com \
--cc=jeyu@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=prarit@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).