linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* thousands of kworker processes with 4.7.x and 4.8-rc*
@ 2016-09-19  7:08 Tomasz Chmielewski
  2016-09-23 13:23 ` Tomasz Chmielewski
  0 siblings, 1 reply; 9+ messages in thread
From: Tomasz Chmielewski @ 2016-09-19  7:08 UTC (permalink / raw)
  To: LKML

On several servers running 4.7.x and 4.8-rc6/7 kernels I'm seeing 
thousands of kworker processes.

# ps auxf|grep -c kworker
2104


Load average goes into hundreds on a pretty much idle server (biggest 
CPU and RAM consumers are probably SSHD with one user logged in and 
rsyslog writing ~1 line per minute):

# uptime
  06:58:56 up 26 min,  1 user,  load average: 146.11, 215.46, 105.70


# uptime
  06:59:48 up 26 min,  1 user,  load average: 305.20, 240.84, 120.25


Sometimes seeing lots of them in "D" state:

root     19474  0.0  0.0      0     0 ?        D    06:54   0:00  \_ 
[kworker/0:208]
root     19475  0.0  0.0      0     0 ?        D    06:54   0:00  \_ 
[kworker/0:209]
root     19477  0.0  0.0      0     0 ?        D    06:54   0:00  \_ 
[kworker/0:211]
root     19480  0.0  0.0      0     0 ?        D    06:54   0:00  \_ 
[kworker/0:214]
root     19483  0.0  0.0      0     0 ?        D    06:54   0:00  \_ 
[kworker/0:217]
root     19485  0.0  0.0      0     0 ?        D    06:54   0:00  \_ 
[kworker/0:219]
root     19486  0.0  0.0      0     0 ?        D    06:54   0:00  \_ 
[kworker/0:220]
root     19487  0.0  0.0      0     0 ?        D    06:54   0:00  \_ 
[kworker/0:221]
root     19492  0.0  0.0      0     0 ?        D    06:54   0:00  \_ 
[kworker/0:226]
root     19533  0.0  0.0      0     0 ?        D    06:54   0:00  \_ 
[kworker/4:257]


Is it a known issue?

The server has 8 CPUs and 32 GB RAM.


Tomasz Chmielewski
https://lxadm.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: thousands of kworker processes with 4.7.x and 4.8-rc*
  2016-09-19  7:08 thousands of kworker processes with 4.7.x and 4.8-rc* Tomasz Chmielewski
@ 2016-09-23 13:23 ` Tomasz Chmielewski
  2016-09-23 14:10   ` Mike Galbraith
  0 siblings, 1 reply; 9+ messages in thread
From: Tomasz Chmielewski @ 2016-09-23 13:23 UTC (permalink / raw)
  To: LKML

On 2016-09-19 16:08, Tomasz Chmielewski wrote:
> On several servers running 4.7.x and 4.8-rc6/7 kernels I'm seeing
> thousands of kworker processes.
> # ps auxf|grep -c kworker
> 2104
> Load average goes into hundreds on a pretty much idle server (biggest
> CPU and RAM consumers are probably SSHD with one user logged in and
> rsyslog writing ~1 line per minute):
> # uptime
>  06:58:56 up 26 min,  1 user,  load average: 146.11, 215.46, 105.70
> # uptime
>  06:59:48 up 26 min,  1 user,  load average: 305.20, 240.84, 120.25
> Sometimes seeing lots of them in "D" state:
> root     19474  0.0  0.0      0     0 ?        D    06:54   0:00  \_
> [kworker/0:208]
> root     19475  0.0  0.0      0     0 ?        D    06:54   0:00  \_
> [kworker/0:209]


I did some experiments to see when the problem first appeared. Thousands 
of kworker processes start to show up in 4.7.0-rc5.

kernel version | kworker count after boot
-------------------------------------------
4.6.3 	        37
4.6.4 	        47
4.6.5 	        46
4.6.6 	        49
4.6.7 	        49
4.7.0-rc1 	46
4.7.0-rc2 	49
4.7.0-rc3	45
4.7.0-rc4	47
4.7.0-rc5	1592
4.7.0-rc6	1714
4.7.0-rc7 	1955
4.7.0 	        2088
4.7.1 	        1222
4.7.2 	        1699
4.7.3           1446
4.7.4 	        1781
4.8-rc1 	(not tested)
4.8-rc2 	2012
4.8-rc3 	1696
4.8-rc4 	1210
4.8-rc5 	1890
4.8-rc6 	1657
4.8-rc7 	1647


Tomasz Chmielewski
https://lxadm.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: thousands of kworker processes with 4.7.x and 4.8-rc*
  2016-09-23 13:23 ` Tomasz Chmielewski
@ 2016-09-23 14:10   ` Mike Galbraith
  2016-09-23 16:15     ` Tomasz Chmielewski
  0 siblings, 1 reply; 9+ messages in thread
From: Mike Galbraith @ 2016-09-23 14:10 UTC (permalink / raw)
  To: Tomasz Chmielewski, LKML

On Fri, 2016-09-23 at 22:23 +0900, Tomasz Chmielewski wrote:
> On 2016-09-19 16:08, Tomasz Chmielewski wrote:
> > On several servers running 4.7.x and 4.8-rc6/7 kernels I'm seeing
> > thousands of kworker processes.
> > # ps auxf|grep -c kworker
> > 2104
> > Load average goes into hundreds on a pretty much idle server (biggest
> > CPU and RAM consumers are probably SSHD with one user logged in and
> > rsyslog writing ~1 line per minute):
> > # uptime
> >  06:58:56 up 26 min,  1 user,  load average: 146.11, 215.46, 105.70
> > # uptime
> >  06:59:48 up 26 min,  1 user,  load average: 305.20, 240.84, 120.25
> > Sometimes seeing lots of them in "D" state:
> > root     19474  0.0  0.0      0     0 ?        D    06:54   0:00  \_
> > [kworker/0:208]
> > root     19475  0.0  0.0      0     0 ?        D    06:54   0:00  \_
> > [kworker/0:209]
> 
> 
> I did some experiments to see when the problem first appeared. Thousands 
> of kworker processes start to show up in 4.7.0-rc5.
> 
> kernel version | kworker count after boot
> -------------------------------------------
> 4.6.3 > 	>         37
> 4.6.4 > 	>         47
> 4.6.5 > 	>         46
> 4.6.6 > 	>         49
> 4.6.7 > 	>         49
> 4.7.0-rc1 > 	> 46
> 4.7.0-rc2 > 	> 49
> 4.7.0-rc3> 	> 45
> 4.7.0-rc4> 	> 47
> 4.7.0-rc5> 	> 1592

Best bet would be to use 'git bisect' to locate the exact commit that
caused this, and post the bisection result along with your config.

AFAIK, nobody else is seeing this, is the kernel virgin source?

	-Mike

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: thousands of kworker processes with 4.7.x and 4.8-rc*
  2016-09-23 14:10   ` Mike Galbraith
@ 2016-09-23 16:15     ` Tomasz Chmielewski
       [not found]       ` <a08b61cee6c43de99ac3c6a8cacfdc04@admin.virtall.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Tomasz Chmielewski @ 2016-09-23 16:15 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML

On 2016-09-23 23:10, Mike Galbraith wrote:

>> I did some experiments to see when the problem first appeared. 
>> Thousands
>> of kworker processes start to show up in 4.7.0-rc5.
>> 
>> kernel version | kworker count after boot
>> -------------------------------------------
>> 4.6.3 > 	>         37
>> 4.6.4 > 	>         47
>> 4.6.5 > 	>         46
>> 4.6.6 > 	>         49
>> 4.6.7 > 	>         49
>> 4.7.0-rc1 > 	> 46
>> 4.7.0-rc2 > 	> 49
>> 4.7.0-rc3> 	> 45
>> 4.7.0-rc4> 	> 47
>> 4.7.0-rc5> 	> 1592
> 
> Best bet would be to use 'git bisect' to locate the exact commit that
> caused this, and post the bisection result along with your config.
> 
> AFAIK, nobody else is seeing this, is the kernel virgin source?

Yes, it's a kernel.org kernel.

I found some similar reports, though without much more info:

https://github.com/zfsonlinux/zfs/issues/5036 - kernel 4.7.2, initially 
attributed to ZFS on Linux, but then reproduced without ZFS

https://github.com/systemd/systemd/issues/4069 - kernel 4.7.2


I'll try to bisect.



Tomasz Chmielewski
https://lxadm.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: thousands of kworker processes with 4.7.x and 4.8-rc*
       [not found]       ` <a08b61cee6c43de99ac3c6a8cacfdc04@admin.virtall.com>
@ 2016-09-25 12:40         ` Tomasz Chmielewski
  2016-09-25 17:21           ` Mike Galbraith
  2016-09-25 19:07           ` Nikolay Borisov
  0 siblings, 2 replies; 9+ messages in thread
From: Tomasz Chmielewski @ 2016-09-25 12:40 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML

On 2016-09-25 18:29, Tomasz Chmielewski wrote:

>> I'll try to bisect.
> 
> OK, not a kernel regression, but some config change caused it.
> However, I'm not able to locate which change exactly.
> 
> I'm attaching two configs which I've tried with 4.7.3 - one results in
> thousands of kworkers, and the other doesn't. Also included a diff
> between them.
> 
> Any obvious changes I should try?

The problem is the allocator.

-CONFIG_SLUB=y
+CONFIG_SLAB=y


With SLUB, I'm getting a handful of kworker processes, as expected.

With SLAB, I'm getting thousands of kworker processes.


Not sure if that's expected behaviour or not.


Tomasz Chmielewski
https://lxadm.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: thousands of kworker processes with 4.7.x and 4.8-rc*
  2016-09-25 12:40         ` Tomasz Chmielewski
@ 2016-09-25 17:21           ` Mike Galbraith
  2016-09-25 18:22             ` Mike Galbraith
  2016-09-25 19:07           ` Nikolay Borisov
  1 sibling, 1 reply; 9+ messages in thread
From: Mike Galbraith @ 2016-09-25 17:21 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: LKML

On Sun, 2016-09-25 at 21:40 +0900, Tomasz Chmielewski wrote:

> The problem is the allocator.
> 
> -CONFIG_SLUB=y
> +CONFIG_SLAB=y
> 
> 
> With SLUB, I'm getting a handful of kworker processes, as expected.
> 
> With SLAB, I'm getting thousands of kworker processes.
> 
> 
> Not sure if that's expected behaviour or not.

I seriously doubt 1500+ kworkers piling up is expected.

4.7.0-rc4       47
4.7.0-rc5       1592

Presuming you didn't switch to SLAB and/or change userspace all around
while testing, that would still indicate a kernel regression lurking
between rc4 and rc5.

	-Mike

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: thousands of kworker processes with 4.7.x and 4.8-rc*
  2016-09-25 17:21           ` Mike Galbraith
@ 2016-09-25 18:22             ` Mike Galbraith
  0 siblings, 0 replies; 9+ messages in thread
From: Mike Galbraith @ 2016-09-25 18:22 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: LKML

On Sun, 2016-09-25 at 19:21 +0200, Mike Galbraith wrote:
> On Sun, 2016-09-25 at 21:40 +0900, Tomasz Chmielewski wrote:
> 
> > The problem is the allocator.
> > 
> > -CONFIG_SLUB=y
> > +CONFIG_SLAB=y
> > 
> > 
> > With SLUB, I'm getting a handful of kworker processes, as expected.
> > 
> > With SLAB, I'm getting thousands of kworker processes.
> > 
> > 
> > Not sure if that's expected behaviour or not.
> 
> I seriously doubt 1500+ kworkers piling up is expected.
> 
> 4.7.0-rc4       47
> 4.7.0-rc5       1592
> 
> Presuming you didn't switch to SLAB and/or change userspace all around
> while testing, that would still indicate a kernel regression lurking
> between rc4 and rc5.

Nevermind, seems config change was how you met the thing.

FWIW, I turned on SLAB with a distro config here, and see no such
behavior.  If I could reproduce, I'd take the busted config back to
older kernels.

	-Mike

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: thousands of kworker processes with 4.7.x and 4.8-rc*
  2016-09-25 12:40         ` Tomasz Chmielewski
  2016-09-25 17:21           ` Mike Galbraith
@ 2016-09-25 19:07           ` Nikolay Borisov
  2016-09-27  3:49             ` Tomasz Chmielewski
  1 sibling, 1 reply; 9+ messages in thread
From: Nikolay Borisov @ 2016-09-25 19:07 UTC (permalink / raw)
  To: Tomasz Chmielewski, Mike Galbraith; +Cc: LKML



On 25.09.2016 15:40, Tomasz Chmielewski wrote:
> On 2016-09-25 18:29, Tomasz Chmielewski wrote:
> 
>>> I'll try to bisect.
>>
>> OK, not a kernel regression, but some config change caused it.
>> However, I'm not able to locate which change exactly.
>>
>> I'm attaching two configs which I've tried with 4.7.3 - one results in
>> thousands of kworkers, and the other doesn't. Also included a diff
>> between them.
>>
>> Any obvious changes I should try?
> 
> The problem is the allocator.
> 
> -CONFIG_SLUB=y
> +CONFIG_SLAB=y
> 
> 
> With SLUB, I'm getting a handful of kworker processes, as expected.
> 
> With SLAB, I'm getting thousands of kworker processes.
> 
> 
> Not sure if that's expected behaviour or not.


Why don't you sample the stacks of some of those kworker processes to
see if they are all executing a parituclar piece of work. That might
help you narrow down where they originate from. Cat multiple
/proc/$kworker-pid/stack files and see if a pattern emerges.

Regards,
Nikolay

> 
> 
> Tomasz Chmielewski
> https://lxadm.com
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: thousands of kworker processes with 4.7.x and 4.8-rc*
  2016-09-25 19:07           ` Nikolay Borisov
@ 2016-09-27  3:49             ` Tomasz Chmielewski
  0 siblings, 0 replies; 9+ messages in thread
From: Tomasz Chmielewski @ 2016-09-27  3:49 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Mike Galbraith, LKML

On 2016-09-26 04:07, Nikolay Borisov wrote:

>> Not sure if that's expected behaviour or not.
> 
> 
> Why don't you sample the stacks of some of those kworker processes to
> see if they are all executing a parituclar piece of work. That might
> help you narrow down where they originate from. Cat multiple
> /proc/$kworker-pid/stack files and see if a pattern emerges.

FYI, it was reproduced and bisected here (scroll to the bottom):

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1626564


Tomasz Chmielewski
https://lxadm.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-09-27  3:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-19  7:08 thousands of kworker processes with 4.7.x and 4.8-rc* Tomasz Chmielewski
2016-09-23 13:23 ` Tomasz Chmielewski
2016-09-23 14:10   ` Mike Galbraith
2016-09-23 16:15     ` Tomasz Chmielewski
     [not found]       ` <a08b61cee6c43de99ac3c6a8cacfdc04@admin.virtall.com>
2016-09-25 12:40         ` Tomasz Chmielewski
2016-09-25 17:21           ` Mike Galbraith
2016-09-25 18:22             ` Mike Galbraith
2016-09-25 19:07           ` Nikolay Borisov
2016-09-27  3:49             ` Tomasz Chmielewski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).