All of lore.kernel.org
 help / color / mirror / Atom feed
* [REGRESSION] [BISECTED] kswapd high CPU usage
@ 2016-01-21 14:28 Nalorokk
  2016-01-21 14:37   ` Nalorokk
  2016-01-21 16:16   ` Kirill A. Shutemov
  0 siblings, 2 replies; 17+ messages in thread
From: Nalorokk @ 2016-01-21 14:28 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Stefan Strogin, Andrew Morton, Sasha Levin, Mel Gorman, linux-mm,
	linux-kernel, oleksandr

[-- Attachment #1: Type: text/plain, Size: 1768 bytes --]

It appears that kernels newer than 4.1 have kswapd-related bug resulting in
high CPU usage. CPU 100% usage could last for several minutes or several
days, with CPU being busy entirely with serving kswapd. It happens usually
after server being mostly idle, sometimes after days, sometimes after weeks
of uptime. But the issue appears much sooner if the machine is loaded with
something like building a kernel.

Here are the graphs of CPU load: first
<http://i.piccy.info/i9/9ee6c0620c9481a974908484b2a52a0f/1453384595/44012/994698/cpu_month.png>,
second
<http://i.piccy.info/i9/7c97c2f39620bb9d7ea93096312dbbb6/1453384649/41222/994698/cpu_year.png>.
Perf top output is here <http://pastebin.com/aRzTjb2x>as well.

To find the cause of this problem I've started with the fact that the issue
appeared after 4.1 kernel update. Then I performed longterm test of 3.18,
and discovered that 3.18 is unaffected by this bug. Then I did some tests
of 4.0 to confirm that this version behaves well too.

Then I performed git bisect from tag v4.0 to v4.1-rc1 and found exact
commits that seem to be reason of high CPU usage.

The first really "bad" commit is 79553da293d38d63097278de13e28a3b371f43c1.
2 previous commits cause weird behavior as well resulting in kswapd
consuming more CPU than unaffected kernels, but not that much as the commit
pointed above. I believe those commits are related to the same mm tree
merge.

I tried to add transparent_hugepage=never to kernel boot parameters, but it
did not change anything. Changing allocator to SLAB from SLUB alters
behavior and makes CPU load lower, but don't solve a problem at all.

Here <https://bugzilla.kernel.org/show_bug.cgi?id=110501>is kernel bugzilla
bugreport as well.

Ideas? ​

[-- Attachment #2: Type: text/html, Size: 1956 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread
* Re: [REGRESSION] [BISECTED] kswapd high CPU usage
@ 2020-07-15 10:04 Alexey Vlasov
  2020-08-10 13:47 ` Alexey Vlasov
  0 siblings, 1 reply; 17+ messages in thread
From: Alexey Vlasov @ 2020-07-15 10:04 UTC (permalink / raw)
  To: linux-kernel; +Cc: kirill

Hi,

After upgrading from 3.14 to 4.14.173, I ran into exactly the same problem
that the starter topic described. Namely, sometimes kswapd starts to consume 100% 
of the CPU, and the system freezes for several minutes.

Below is an example of such an event (orange - system cpu, red - total cpu):
https://www.dropbox.com/s/5wr5su3p0fubq0a/kswapd_100.png?dl=0

Here is the top:

top - 23:44:16 up 9 days,  2:06, 14 users,  load average: 14.03, 12.32, 13.07
Tasks: 7108 total,  16 running, 6921 sleeping,   0 stopped,   9 zombie
%Cpu(s): 28.1 us, 18.1 sy,  0.0 ni, 51.7 id,  1.2 wa,  0.0 hi,  0.9 si,  0.0 st
KiB Mem : 19803248+total,   596160 free, 11094233+used, 86493992 buff/cache
KiB Swap: 62914556 total, 62302912 free,   611644 used. 71269504 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  134 root      20   0       0      0      0 R  86.2  0.0 383:21.35 kswapd0
  135 root      20   0       0      0      0 R  84.9  0.0 344:00.17 kswapd1

this is a begin of the collapse, some minutes later the system has thousands of D
processes and does not answer:

top - 23:57:33 up 9 days,  2:19, 14 users,  load average: 1223.43, 1083.85, 662.
Tasks: 8356 total, 344 running, 7821 sleeping,   0 stopped,  44 zombie
%Cpu(s): 28.1 us, 18.2 sy,  0.0 ni, 51.6 id,  1.2 wa,  0.0 hi,  0.9 si,  0.0 st
KiB Mem : 19803248+total,   800516 free, 11587540+used, 81356560 buff/cache
KiB Swap: 62914556 total, 62130072 free,   784484 used. 62231208 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
10704 w_defau+  20   0  393476 117160  15160 D 100.0  0.1   0:00.16 httpd
16056 w_sti46+  20   0  599048  21528   9504 S 100.0  0.0   0:00.00 httpd
12649 w_divan+  20   0   41764   8064   3904 D 100.0  0.0   0:06.62 menu1.pl
13739 w_defau+  20   0  248696  24168  14132 S 100.0  0.0   0:00.01 httpd
 5172 mysql     20   0 6993508 2.310g   9660 D  38.9  1.2   3866:26 mysqld_aux3
 4683 mysql     20   0 9974.1m 4.366g   8268 D  38.7  2.3   2553:14 mysqld
 4791 mysql     20   0 10.359g 4.180g   9784 D  28.5  2.2   1659:40 mysqld_aux1
 5078 mysql     20   0  9.871g 3.774g   9888 D  25.4  2.0   2445:08 mysqld_aux2
    9 root      20   0       0      0      0 I   3.4  0.0  13:56.16 rcu_sched
  135 root      20   0       0      0      0 D   2.8  0.0 344:29.12 kswapd1
  134 root      20   0       0      0      0 D   2.6  0.0 383:49.86 kswapd0

Nevertheless there is not any I/O activity before after and during this collapse.

I tried to use your patch about "late_initcall(set_recommended_min_free_kbytes)",
unfortunately it did not help.

In my experience this could be solved by adding RAM but unfortunately this server
no longer has free slots. 188 GB RAM is the maximum for it.

Also I cannot go back to 3.14 kernel, since one of the partitions contains xfs with
the superblock of the new version v5, which is not supported by 3.14 kernel.

If you need more information, for example, vmstat, /proc/meminfo, I can send.

Is there any solution to this problem?

> On Fri, Jan 22, 2016 at 12:28:10AM +1000, Nalorokk wrote:
>> It appears that kernels newer than 4.1 have kswapd-related bug resulting in
>> high CPU usage. CPU 100% usage could last for several minutes or several
>> days, with CPU being busy entirely with serving kswapd. It happens usually
>> after server being mostly idle, sometimes after days, sometimes after weeks
>> of uptime. But the issue appears much sooner if the machine is loaded with
>> something like building a kernel.
>>
>> Here are the graphs of CPU load: first
>> <http://i.piccy.info/i9/9ee6c0620c9481a974908484b2a52a0f/1453384595/44012/994698/cpu_month.png>,
>> second
>> <http://i.piccy.info/i9/7c97c2f39620bb9d7ea93096312dbbb6/1453384649/41222/994698/cpu_year.png>.
>> Perf top output is here <http://pastebin.com/aRzTjb2x>as well.
>>
>> To find the cause of this problem I've started with the fact that the issue
>> appeared after 4.1 kernel update. Then I performed longterm test of 3.18,
>> and discovered that 3.18 is unaffected by this bug. Then I did some tests
>> of 4.0 to confirm that this version behaves well too.
>>
>> Then I performed git bisect from tag v4.0 to v4.1-rc1 and found exact
>> commits that seem to be reason of high CPU usage.
>>
>> The first really "bad" commit is 79553da293d38d63097278de13e28a3b371f43c1.
>> 2 previous commits cause weird behavior as well resulting in kswapd
>> consuming more CPU than unaffected kernels, but not that much as the commit
>> pointed above. I believe those commits are related to the same mm tree
>> merge.
>>
>> I tried to add transparent_hugepage=never to kernel boot parameters, but it
>> did not change anything. Changing allocator to SLAB from SLUB alters
>> behavior and makes CPU load lower, but don't solve a problem at all.
>>
>> Here <https://bugzilla.kernel.org/show_bug.cgi?id=110501>is kernel bugzilla
>> bugreport as well.
>>
>> Ideas? â
>
> Could you try to insert "late_initcall(set_recommended_min_free_kbytes);"
> back and check if makes any difference.
>
>-- 
>Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-08-10 14:19 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-21 14:28 [REGRESSION] [BISECTED] kswapd high CPU usage Nalorokk
2016-01-21 14:37 ` Fwd: " Nalorokk
2016-01-21 14:37   ` Nalorokk
2016-01-21 16:16 ` Kirill A. Shutemov
2016-01-21 16:16   ` Kirill A. Shutemov
2016-01-23 15:57   ` Hugh Greenberg
2016-01-25 10:38     ` Kirill A. Shutemov
2016-01-25 16:37       ` Hugh Greenberg
2016-01-25 16:46       ` Hugh Greenberg
2016-02-02 13:59         ` Kirill A. Shutemov
2016-02-02 14:41           ` Sergey Senozhatsky
2016-02-02 16:24           ` Minchan Kim
2016-03-14 16:00           ` Hugh Greenberg
2016-03-14 16:07           ` Hugh Greenberg
2016-03-15 17:16           ` Hugh Greenberg
2020-07-15 10:04 Alexey Vlasov
2020-08-10 13:47 ` Alexey Vlasov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.