* thousands of kworker processes with 4.7.x and 4.8-rc*
@ 2016-09-19 7:08 Tomasz Chmielewski
2016-09-23 13:23 ` Tomasz Chmielewski
0 siblings, 1 reply; 9+ messages in thread
From: Tomasz Chmielewski @ 2016-09-19 7:08 UTC (permalink / raw)
To: LKML
On several servers running 4.7.x and 4.8-rc6/7 kernels I'm seeing
thousands of kworker processes.
# ps auxf|grep -c kworker
2104
Load average goes into hundreds on a pretty much idle server (biggest
CPU and RAM consumers are probably SSHD with one user logged in and
rsyslog writing ~1 line per minute):
# uptime
06:58:56 up 26 min, 1 user, load average: 146.11, 215.46, 105.70
# uptime
06:59:48 up 26 min, 1 user, load average: 305.20, 240.84, 120.25
Sometimes seeing lots of them in "D" state:
root 19474 0.0 0.0 0 0 ? D 06:54 0:00 \_
[kworker/0:208]
root 19475 0.0 0.0 0 0 ? D 06:54 0:00 \_
[kworker/0:209]
root 19477 0.0 0.0 0 0 ? D 06:54 0:00 \_
[kworker/0:211]
root 19480 0.0 0.0 0 0 ? D 06:54 0:00 \_
[kworker/0:214]
root 19483 0.0 0.0 0 0 ? D 06:54 0:00 \_
[kworker/0:217]
root 19485 0.0 0.0 0 0 ? D 06:54 0:00 \_
[kworker/0:219]
root 19486 0.0 0.0 0 0 ? D 06:54 0:00 \_
[kworker/0:220]
root 19487 0.0 0.0 0 0 ? D 06:54 0:00 \_
[kworker/0:221]
root 19492 0.0 0.0 0 0 ? D 06:54 0:00 \_
[kworker/0:226]
root 19533 0.0 0.0 0 0 ? D 06:54 0:00 \_
[kworker/4:257]
Is it a known issue?
The server has 8 CPUs and 32 GB RAM.
Tomasz Chmielewski
https://lxadm.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: thousands of kworker processes with 4.7.x and 4.8-rc*
2016-09-19 7:08 thousands of kworker processes with 4.7.x and 4.8-rc* Tomasz Chmielewski
@ 2016-09-23 13:23 ` Tomasz Chmielewski
2016-09-23 14:10 ` Mike Galbraith
0 siblings, 1 reply; 9+ messages in thread
From: Tomasz Chmielewski @ 2016-09-23 13:23 UTC (permalink / raw)
To: LKML
On 2016-09-19 16:08, Tomasz Chmielewski wrote:
> On several servers running 4.7.x and 4.8-rc6/7 kernels I'm seeing
> thousands of kworker processes.
> # ps auxf|grep -c kworker
> 2104
> Load average goes into hundreds on a pretty much idle server (biggest
> CPU and RAM consumers are probably SSHD with one user logged in and
> rsyslog writing ~1 line per minute):
> # uptime
> 06:58:56 up 26 min, 1 user, load average: 146.11, 215.46, 105.70
> # uptime
> 06:59:48 up 26 min, 1 user, load average: 305.20, 240.84, 120.25
> Sometimes seeing lots of them in "D" state:
> root 19474 0.0 0.0 0 0 ? D 06:54 0:00 \_
> [kworker/0:208]
> root 19475 0.0 0.0 0 0 ? D 06:54 0:00 \_
> [kworker/0:209]
I did some experiments to see when the problem first appeared. Thousands
of kworker processes start to show up in 4.7.0-rc5.
kernel version | kworker count after boot
-------------------------------------------
4.6.3 37
4.6.4 47
4.6.5 46
4.6.6 49
4.6.7 49
4.7.0-rc1 46
4.7.0-rc2 49
4.7.0-rc3 45
4.7.0-rc4 47
4.7.0-rc5 1592
4.7.0-rc6 1714
4.7.0-rc7 1955
4.7.0 2088
4.7.1 1222
4.7.2 1699
4.7.3 1446
4.7.4 1781
4.8-rc1 (not tested)
4.8-rc2 2012
4.8-rc3 1696
4.8-rc4 1210
4.8-rc5 1890
4.8-rc6 1657
4.8-rc7 1647
Tomasz Chmielewski
https://lxadm.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: thousands of kworker processes with 4.7.x and 4.8-rc*
2016-09-23 13:23 ` Tomasz Chmielewski
@ 2016-09-23 14:10 ` Mike Galbraith
2016-09-23 16:15 ` Tomasz Chmielewski
0 siblings, 1 reply; 9+ messages in thread
From: Mike Galbraith @ 2016-09-23 14:10 UTC (permalink / raw)
To: Tomasz Chmielewski, LKML
On Fri, 2016-09-23 at 22:23 +0900, Tomasz Chmielewski wrote:
> On 2016-09-19 16:08, Tomasz Chmielewski wrote:
> > On several servers running 4.7.x and 4.8-rc6/7 kernels I'm seeing
> > thousands of kworker processes.
> > # ps auxf|grep -c kworker
> > 2104
> > Load average goes into hundreds on a pretty much idle server (biggest
> > CPU and RAM consumers are probably SSHD with one user logged in and
> > rsyslog writing ~1 line per minute):
> > # uptime
> > 06:58:56 up 26 min, 1 user, load average: 146.11, 215.46, 105.70
> > # uptime
> > 06:59:48 up 26 min, 1 user, load average: 305.20, 240.84, 120.25
> > Sometimes seeing lots of them in "D" state:
> > root 19474 0.0 0.0 0 0 ? D 06:54 0:00 \_
> > [kworker/0:208]
> > root 19475 0.0 0.0 0 0 ? D 06:54 0:00 \_
> > [kworker/0:209]
>
>
> I did some experiments to see when the problem first appeared. Thousands
> of kworker processes start to show up in 4.7.0-rc5.
>
> kernel version | kworker count after boot
> -------------------------------------------
> 4.6.3 > > 37
> 4.6.4 > > 47
> 4.6.5 > > 46
> 4.6.6 > > 49
> 4.6.7 > > 49
> 4.7.0-rc1 > > 46
> 4.7.0-rc2 > > 49
> 4.7.0-rc3> > 45
> 4.7.0-rc4> > 47
> 4.7.0-rc5> > 1592
Best bet would be to use 'git bisect' to locate the exact commit that
caused this, and post the bisection result along with your config.
AFAIK, nobody else is seeing this, is the kernel virgin source?
-Mike
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: thousands of kworker processes with 4.7.x and 4.8-rc*
2016-09-23 14:10 ` Mike Galbraith
@ 2016-09-23 16:15 ` Tomasz Chmielewski
[not found] ` <a08b61cee6c43de99ac3c6a8cacfdc04@admin.virtall.com>
0 siblings, 1 reply; 9+ messages in thread
From: Tomasz Chmielewski @ 2016-09-23 16:15 UTC (permalink / raw)
To: Mike Galbraith; +Cc: LKML
On 2016-09-23 23:10, Mike Galbraith wrote:
>> I did some experiments to see when the problem first appeared.
>> Thousands
>> of kworker processes start to show up in 4.7.0-rc5.
>>
>> kernel version | kworker count after boot
>> -------------------------------------------
>> 4.6.3 > > 37
>> 4.6.4 > > 47
>> 4.6.5 > > 46
>> 4.6.6 > > 49
>> 4.6.7 > > 49
>> 4.7.0-rc1 > > 46
>> 4.7.0-rc2 > > 49
>> 4.7.0-rc3> > 45
>> 4.7.0-rc4> > 47
>> 4.7.0-rc5> > 1592
>
> Best bet would be to use 'git bisect' to locate the exact commit that
> caused this, and post the bisection result along with your config.
>
> AFAIK, nobody else is seeing this, is the kernel virgin source?
Yes, it's a kernel.org kernel.
I found some similar reports, though without much more info:
https://github.com/zfsonlinux/zfs/issues/5036 - kernel 4.7.2, initially
attributed to ZFS on Linux, but then reproduced without ZFS
https://github.com/systemd/systemd/issues/4069 - kernel 4.7.2
I'll try to bisect.
Tomasz Chmielewski
https://lxadm.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: thousands of kworker processes with 4.7.x and 4.8-rc*
[not found] ` <a08b61cee6c43de99ac3c6a8cacfdc04@admin.virtall.com>
@ 2016-09-25 12:40 ` Tomasz Chmielewski
2016-09-25 17:21 ` Mike Galbraith
2016-09-25 19:07 ` Nikolay Borisov
0 siblings, 2 replies; 9+ messages in thread
From: Tomasz Chmielewski @ 2016-09-25 12:40 UTC (permalink / raw)
To: Mike Galbraith; +Cc: LKML
On 2016-09-25 18:29, Tomasz Chmielewski wrote:
>> I'll try to bisect.
>
> OK, not a kernel regression, but some config change caused it.
> However, I'm not able to locate which change exactly.
>
> I'm attaching two configs which I've tried with 4.7.3 - one results in
> thousands of kworkers, and the other doesn't. Also included a diff
> between them.
>
> Any obvious changes I should try?
The problem is the allocator.
-CONFIG_SLUB=y
+CONFIG_SLAB=y
With SLUB, I'm getting a handful of kworker processes, as expected.
With SLAB, I'm getting thousands of kworker processes.
Not sure if that's expected behaviour or not.
Tomasz Chmielewski
https://lxadm.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: thousands of kworker processes with 4.7.x and 4.8-rc*
2016-09-25 12:40 ` Tomasz Chmielewski
@ 2016-09-25 17:21 ` Mike Galbraith
2016-09-25 18:22 ` Mike Galbraith
2016-09-25 19:07 ` Nikolay Borisov
1 sibling, 1 reply; 9+ messages in thread
From: Mike Galbraith @ 2016-09-25 17:21 UTC (permalink / raw)
To: Tomasz Chmielewski; +Cc: LKML
On Sun, 2016-09-25 at 21:40 +0900, Tomasz Chmielewski wrote:
> The problem is the allocator.
>
> -CONFIG_SLUB=y
> +CONFIG_SLAB=y
>
>
> With SLUB, I'm getting a handful of kworker processes, as expected.
>
> With SLAB, I'm getting thousands of kworker processes.
>
>
> Not sure if that's expected behaviour or not.
I seriously doubt 1500+ kworkers piling up is expected.
4.7.0-rc4 47
4.7.0-rc5 1592
Presuming you didn't switch to SLAB and/or change userspace all around
while testing, that would still indicate a kernel regression lurking
between rc4 and rc5.
-Mike
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: thousands of kworker processes with 4.7.x and 4.8-rc*
2016-09-25 17:21 ` Mike Galbraith
@ 2016-09-25 18:22 ` Mike Galbraith
0 siblings, 0 replies; 9+ messages in thread
From: Mike Galbraith @ 2016-09-25 18:22 UTC (permalink / raw)
To: Tomasz Chmielewski; +Cc: LKML
On Sun, 2016-09-25 at 19:21 +0200, Mike Galbraith wrote:
> On Sun, 2016-09-25 at 21:40 +0900, Tomasz Chmielewski wrote:
>
> > The problem is the allocator.
> >
> > -CONFIG_SLUB=y
> > +CONFIG_SLAB=y
> >
> >
> > With SLUB, I'm getting a handful of kworker processes, as expected.
> >
> > With SLAB, I'm getting thousands of kworker processes.
> >
> >
> > Not sure if that's expected behaviour or not.
>
> I seriously doubt 1500+ kworkers piling up is expected.
>
> 4.7.0-rc4 47
> 4.7.0-rc5 1592
>
> Presuming you didn't switch to SLAB and/or change userspace all around
> while testing, that would still indicate a kernel regression lurking
> between rc4 and rc5.
Nevermind, seems config change was how you met the thing.
FWIW, I turned on SLAB with a distro config here, and see no such
behavior. If I could reproduce, I'd take the busted config back to
older kernels.
-Mike
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: thousands of kworker processes with 4.7.x and 4.8-rc*
2016-09-25 12:40 ` Tomasz Chmielewski
2016-09-25 17:21 ` Mike Galbraith
@ 2016-09-25 19:07 ` Nikolay Borisov
2016-09-27 3:49 ` Tomasz Chmielewski
1 sibling, 1 reply; 9+ messages in thread
From: Nikolay Borisov @ 2016-09-25 19:07 UTC (permalink / raw)
To: Tomasz Chmielewski, Mike Galbraith; +Cc: LKML
On 25.09.2016 15:40, Tomasz Chmielewski wrote:
> On 2016-09-25 18:29, Tomasz Chmielewski wrote:
>
>>> I'll try to bisect.
>>
>> OK, not a kernel regression, but some config change caused it.
>> However, I'm not able to locate which change exactly.
>>
>> I'm attaching two configs which I've tried with 4.7.3 - one results in
>> thousands of kworkers, and the other doesn't. Also included a diff
>> between them.
>>
>> Any obvious changes I should try?
>
> The problem is the allocator.
>
> -CONFIG_SLUB=y
> +CONFIG_SLAB=y
>
>
> With SLUB, I'm getting a handful of kworker processes, as expected.
>
> With SLAB, I'm getting thousands of kworker processes.
>
>
> Not sure if that's expected behaviour or not.
Why don't you sample the stacks of some of those kworker processes to
see if they are all executing a parituclar piece of work. That might
help you narrow down where they originate from. Cat multiple
/proc/$kworker-pid/stack files and see if a pattern emerges.
Regards,
Nikolay
>
>
> Tomasz Chmielewski
> https://lxadm.com
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: thousands of kworker processes with 4.7.x and 4.8-rc*
2016-09-25 19:07 ` Nikolay Borisov
@ 2016-09-27 3:49 ` Tomasz Chmielewski
0 siblings, 0 replies; 9+ messages in thread
From: Tomasz Chmielewski @ 2016-09-27 3:49 UTC (permalink / raw)
To: Nikolay Borisov; +Cc: Mike Galbraith, LKML
On 2016-09-26 04:07, Nikolay Borisov wrote:
>> Not sure if that's expected behaviour or not.
>
>
> Why don't you sample the stacks of some of those kworker processes to
> see if they are all executing a parituclar piece of work. That might
> help you narrow down where they originate from. Cat multiple
> /proc/$kworker-pid/stack files and see if a pattern emerges.
FYI, it was reproduced and bisected here (scroll to the bottom):
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1626564
Tomasz Chmielewski
https://lxadm.com
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-09-27 3:50 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-19 7:08 thousands of kworker processes with 4.7.x and 4.8-rc* Tomasz Chmielewski
2016-09-23 13:23 ` Tomasz Chmielewski
2016-09-23 14:10 ` Mike Galbraith
2016-09-23 16:15 ` Tomasz Chmielewski
[not found] ` <a08b61cee6c43de99ac3c6a8cacfdc04@admin.virtall.com>
2016-09-25 12:40 ` Tomasz Chmielewski
2016-09-25 17:21 ` Mike Galbraith
2016-09-25 18:22 ` Mike Galbraith
2016-09-25 19:07 ` Nikolay Borisov
2016-09-27 3:49 ` Tomasz Chmielewski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).