[REGRESSION] [BISECTED] kswapd high CPU usage

* [REGRESSION] [BISECTED] kswapd high CPU usage
@ 2016-01-21 14:28 Nalorokk
  2016-01-21 14:37   ` Nalorokk
  2016-01-21 16:16   ` Kirill A. Shutemov
  0 siblings, 2 replies; 17+ messages in thread
From: Nalorokk @ 2016-01-21 14:28 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Stefan Strogin, Andrew Morton, Sasha Levin, Mel Gorman, linux-mm,
	linux-kernel, oleksandr

[-- Attachment #1: Type: text/plain, Size: 1768 bytes --]

It appears that kernels newer than 4.1 have kswapd-related bug resulting in
high CPU usage. CPU 100% usage could last for several minutes or several
days, with CPU being busy entirely with serving kswapd. It happens usually
after server being mostly idle, sometimes after days, sometimes after weeks
of uptime. But the issue appears much sooner if the machine is loaded with
something like building a kernel.

Here are the graphs of CPU load: first
<http://i.piccy.info/i9/9ee6c0620c9481a974908484b2a52a0f/1453384595/44012/994698/cpu_month.png>,
second
<http://i.piccy.info/i9/7c97c2f39620bb9d7ea93096312dbbb6/1453384649/41222/994698/cpu_year.png>.
Perf top output is here <http://pastebin.com/aRzTjb2x>as well.

To find the cause of this problem I've started with the fact that the issue
appeared after 4.1 kernel update. Then I performed longterm test of 3.18,
and discovered that 3.18 is unaffected by this bug. Then I did some tests
of 4.0 to confirm that this version behaves well too.

Then I performed git bisect from tag v4.0 to v4.1-rc1 and found exact
commits that seem to be reason of high CPU usage.

The first really "bad" commit is 79553da293d38d63097278de13e28a3b371f43c1.
2 previous commits cause weird behavior as well resulting in kswapd
consuming more CPU than unaffected kernels, but not that much as the commit
pointed above. I believe those commits are related to the same mm tree
merge.

I tried to add transparent_hugepage=never to kernel boot parameters, but it
did not change anything. Changing allocator to SLAB from SLUB alters
behavior and makes CPU load lower, but don't solve a problem at all.

Here <https://bugzilla.kernel.org/show_bug.cgi?id=110501>is kernel bugzilla
bugreport as well.

Ideas? 

[-- Attachment #2: Type: text/html, Size: 1956 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread