All of lore.kernel.org
 help / color / mirror / Atom feed
* [REGRESSION] [BISECTED] kswapd high CPU usage
@ 2016-01-21 14:28 Nalorokk
  2016-01-21 14:37   ` Nalorokk
  2016-01-21 16:16   ` Kirill A. Shutemov
  0 siblings, 2 replies; 17+ messages in thread
From: Nalorokk @ 2016-01-21 14:28 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Stefan Strogin, Andrew Morton, Sasha Levin, Mel Gorman, linux-mm,
	linux-kernel, oleksandr

[-- Attachment #1: Type: text/plain, Size: 1768 bytes --]

It appears that kernels newer than 4.1 have kswapd-related bug resulting in
high CPU usage. CPU 100% usage could last for several minutes or several
days, with CPU being busy entirely with serving kswapd. It happens usually
after server being mostly idle, sometimes after days, sometimes after weeks
of uptime. But the issue appears much sooner if the machine is loaded with
something like building a kernel.

Here are the graphs of CPU load: first
<http://i.piccy.info/i9/9ee6c0620c9481a974908484b2a52a0f/1453384595/44012/994698/cpu_month.png>,
second
<http://i.piccy.info/i9/7c97c2f39620bb9d7ea93096312dbbb6/1453384649/41222/994698/cpu_year.png>.
Perf top output is here <http://pastebin.com/aRzTjb2x>as well.

To find the cause of this problem I've started with the fact that the issue
appeared after 4.1 kernel update. Then I performed longterm test of 3.18,
and discovered that 3.18 is unaffected by this bug. Then I did some tests
of 4.0 to confirm that this version behaves well too.

Then I performed git bisect from tag v4.0 to v4.1-rc1 and found exact
commits that seem to be reason of high CPU usage.

The first really "bad" commit is 79553da293d38d63097278de13e28a3b371f43c1.
2 previous commits cause weird behavior as well resulting in kswapd
consuming more CPU than unaffected kernels, but not that much as the commit
pointed above. I believe those commits are related to the same mm tree
merge.

I tried to add transparent_hugepage=never to kernel boot parameters, but it
did not change anything. Changing allocator to SLAB from SLUB alters
behavior and makes CPU load lower, but don't solve a problem at all.

Here <https://bugzilla.kernel.org/show_bug.cgi?id=110501>is kernel bugzilla
bugreport as well.

Ideas? ​

[-- Attachment #2: Type: text/html, Size: 1956 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Fwd: [REGRESSION] [BISECTED] kswapd high CPU usage
  2016-01-21 14:28 [REGRESSION] [BISECTED] kswapd high CPU usage Nalorokk
@ 2016-01-21 14:37   ` Nalorokk
  2016-01-21 16:16   ` Kirill A. Shutemov
  1 sibling, 0 replies; 17+ messages in thread
From: Nalorokk @ 2016-01-21 14:37 UTC (permalink / raw)
  To: Kirill A. Shutemov, Stefan Strogin, Andrew Morton, Sasha Levin,
	Mel Gorman, linux-mm, linux-kernel, oleksandr

It appears that kernels newer than 4.1 have kswapd-related bug
resulting in high CPU usage. CPU 100% usage could last for several
minutes or several days, with CPU being busy entirely with serving
kswapd. It happens usually after server being mostly idle, sometimes
after days, sometimes after weeks of uptime. But the issue appears
much sooner if the machine is loaded with something like building a
kernel.

Here are the graphs of CPU load: first [1], second [2]. Perf top
output is here [3] as well.

To find the cause of this problem I've started with the fact that the
issue appeared after 4.1 kernel update. Then I performed longterm test
of 3.18, and discovered that 3.18 is unaffected by this bug. Then I
did some tests of 4.0 to confirm that this version behaves well too.

Then I performed git bisect from tag v4.0 to v4.1-rc1 and found exact
commits that seem to be reason of high CPU usage.

The first really "bad" commit is
79553da293d38d63097278de13e28a3b371f43c1. 2 previous commits cause
weird behavior as well resulting in kswapd consuming more CPU than
unaffected kernels, but not that much as the commit pointed above. I
believe those commits are related to the same mm tree merge.

I tried to add transparent_hugepage=never to kernel boot parameters,
but it did not change anything. Changing allocator to SLAB from SLUB
alters behavior and makes CPU load lower, but don't solve a problem at
all.

Here [4] is kernel bugzilla bugreport as well.

Ideas?

[1] http://i.piccy.info/i9/9ee6c0620c9481a974908484b2a52a0f/1453384595/44012/994698/cpu_month.png
[2] http://i.piccy.info/i9/7c97c2f39620bb9d7ea93096312dbbb6/1453384649/41222/994698/cpu_year.png
[3] http://pastebin.com/aRzTjb2x
[4] https://bugzilla.kernel.org/show_bug.cgi?id=110501

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Fwd: [REGRESSION] [BISECTED] kswapd high CPU usage
@ 2016-01-21 14:37   ` Nalorokk
  0 siblings, 0 replies; 17+ messages in thread
From: Nalorokk @ 2016-01-21 14:37 UTC (permalink / raw)
  To: Kirill A. Shutemov, Stefan Strogin, Andrew Morton, Sasha Levin,
	Mel Gorman, linux-mm, linux-kernel, oleksandr

It appears that kernels newer than 4.1 have kswapd-related bug
resulting in high CPU usage. CPU 100% usage could last for several
minutes or several days, with CPU being busy entirely with serving
kswapd. It happens usually after server being mostly idle, sometimes
after days, sometimes after weeks of uptime. But the issue appears
much sooner if the machine is loaded with something like building a
kernel.

Here are the graphs of CPU load: first [1], second [2]. Perf top
output is here [3] as well.

To find the cause of this problem I've started with the fact that the
issue appeared after 4.1 kernel update. Then I performed longterm test
of 3.18, and discovered that 3.18 is unaffected by this bug. Then I
did some tests of 4.0 to confirm that this version behaves well too.

Then I performed git bisect from tag v4.0 to v4.1-rc1 and found exact
commits that seem to be reason of high CPU usage.

The first really "bad" commit is
79553da293d38d63097278de13e28a3b371f43c1. 2 previous commits cause
weird behavior as well resulting in kswapd consuming more CPU than
unaffected kernels, but not that much as the commit pointed above. I
believe those commits are related to the same mm tree merge.

I tried to add transparent_hugepage=never to kernel boot parameters,
but it did not change anything. Changing allocator to SLAB from SLUB
alters behavior and makes CPU load lower, but don't solve a problem at
all.

Here [4] is kernel bugzilla bugreport as well.

Ideas?

[1] http://i.piccy.info/i9/9ee6c0620c9481a974908484b2a52a0f/1453384595/44012/994698/cpu_month.png
[2] http://i.piccy.info/i9/7c97c2f39620bb9d7ea93096312dbbb6/1453384649/41222/994698/cpu_year.png
[3] http://pastebin.com/aRzTjb2x
[4] https://bugzilla.kernel.org/show_bug.cgi?id=110501

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [REGRESSION] [BISECTED] kswapd high CPU usage
  2016-01-21 14:28 [REGRESSION] [BISECTED] kswapd high CPU usage Nalorokk
@ 2016-01-21 16:16   ` Kirill A. Shutemov
  2016-01-21 16:16   ` Kirill A. Shutemov
  1 sibling, 0 replies; 17+ messages in thread
From: Kirill A. Shutemov @ 2016-01-21 16:16 UTC (permalink / raw)
  To: Nalorokk
  Cc: Kirill A. Shutemov, Stefan Strogin, Andrew Morton, Sasha Levin,
	Mel Gorman, linux-mm, linux-kernel, oleksandr

On Fri, Jan 22, 2016 at 12:28:10AM +1000, Nalorokk wrote:
> It appears that kernels newer than 4.1 have kswapd-related bug resulting in
> high CPU usage. CPU 100% usage could last for several minutes or several
> days, with CPU being busy entirely with serving kswapd. It happens usually
> after server being mostly idle, sometimes after days, sometimes after weeks
> of uptime. But the issue appears much sooner if the machine is loaded with
> something like building a kernel.
> 
> Here are the graphs of CPU load: first
> <http://i.piccy.info/i9/9ee6c0620c9481a974908484b2a52a0f/1453384595/44012/994698/cpu_month.png>,
> second
> <http://i.piccy.info/i9/7c97c2f39620bb9d7ea93096312dbbb6/1453384649/41222/994698/cpu_year.png>.
> Perf top output is here <http://pastebin.com/aRzTjb2x>as well.
> 
> To find the cause of this problem I've started with the fact that the issue
> appeared after 4.1 kernel update. Then I performed longterm test of 3.18,
> and discovered that 3.18 is unaffected by this bug. Then I did some tests
> of 4.0 to confirm that this version behaves well too.
> 
> Then I performed git bisect from tag v4.0 to v4.1-rc1 and found exact
> commits that seem to be reason of high CPU usage.
> 
> The first really "bad" commit is 79553da293d38d63097278de13e28a3b371f43c1.
> 2 previous commits cause weird behavior as well resulting in kswapd
> consuming more CPU than unaffected kernels, but not that much as the commit
> pointed above. I believe those commits are related to the same mm tree
> merge.
> 
> I tried to add transparent_hugepage=never to kernel boot parameters, but it
> did not change anything. Changing allocator to SLAB from SLUB alters
> behavior and makes CPU load lower, but don't solve a problem at all.
> 
> Here <https://bugzilla.kernel.org/show_bug.cgi?id=110501>is kernel bugzilla
> bugreport as well.
> 
> Ideas? ​

Could you try to insert "late_initcall(set_recommended_min_free_kbytes);"
back and check if makes any difference.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [REGRESSION] [BISECTED] kswapd high CPU usage
@ 2016-01-21 16:16   ` Kirill A. Shutemov
  0 siblings, 0 replies; 17+ messages in thread
From: Kirill A. Shutemov @ 2016-01-21 16:16 UTC (permalink / raw)
  To: Nalorokk
  Cc: Kirill A. Shutemov, Stefan Strogin, Andrew Morton, Sasha Levin,
	Mel Gorman, linux-mm, linux-kernel, oleksandr

On Fri, Jan 22, 2016 at 12:28:10AM +1000, Nalorokk wrote:
> It appears that kernels newer than 4.1 have kswapd-related bug resulting in
> high CPU usage. CPU 100% usage could last for several minutes or several
> days, with CPU being busy entirely with serving kswapd. It happens usually
> after server being mostly idle, sometimes after days, sometimes after weeks
> of uptime. But the issue appears much sooner if the machine is loaded with
> something like building a kernel.
> 
> Here are the graphs of CPU load: first
> <http://i.piccy.info/i9/9ee6c0620c9481a974908484b2a52a0f/1453384595/44012/994698/cpu_month.png>,
> second
> <http://i.piccy.info/i9/7c97c2f39620bb9d7ea93096312dbbb6/1453384649/41222/994698/cpu_year.png>.
> Perf top output is here <http://pastebin.com/aRzTjb2x>as well.
> 
> To find the cause of this problem I've started with the fact that the issue
> appeared after 4.1 kernel update. Then I performed longterm test of 3.18,
> and discovered that 3.18 is unaffected by this bug. Then I did some tests
> of 4.0 to confirm that this version behaves well too.
> 
> Then I performed git bisect from tag v4.0 to v4.1-rc1 and found exact
> commits that seem to be reason of high CPU usage.
> 
> The first really "bad" commit is 79553da293d38d63097278de13e28a3b371f43c1.
> 2 previous commits cause weird behavior as well resulting in kswapd
> consuming more CPU than unaffected kernels, but not that much as the commit
> pointed above. I believe those commits are related to the same mm tree
> merge.
> 
> I tried to add transparent_hugepage=never to kernel boot parameters, but it
> did not change anything. Changing allocator to SLAB from SLUB alters
> behavior and makes CPU load lower, but don't solve a problem at all.
> 
> Here <https://bugzilla.kernel.org/show_bug.cgi?id=110501>is kernel bugzilla
> bugreport as well.
> 
> Ideas? a??

Could you try to insert "late_initcall(set_recommended_min_free_kbytes);"
back and check if makes any difference.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [REGRESSION] [BISECTED] kswapd high CPU usage
  2016-01-21 16:16   ` Kirill A. Shutemov
  (?)
@ 2016-01-23 15:57   ` Hugh Greenberg
  2016-01-25 10:38     ` Kirill A. Shutemov
  -1 siblings, 1 reply; 17+ messages in thread
From: Hugh Greenberg @ 2016-01-23 15:57 UTC (permalink / raw)
  To: linux-mm

Kirill A. Shutemov <kirill <at> shutemov.name> writes:
> 
> Could you try to insert "late_initcall(set_recommended_min_free_kbytes);"
> back and check if makes any difference.
> 

We tested adding late_initcall(set_recommended_min_free_kbytes); 
back in 4.1.14 and it made a huge difference. We aren't sure if the
issue is 100% fixed, but it could be. We will keep testing it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [REGRESSION] [BISECTED] kswapd high CPU usage
  2016-01-23 15:57   ` Hugh Greenberg
@ 2016-01-25 10:38     ` Kirill A. Shutemov
  2016-01-25 16:37       ` Hugh Greenberg
  2016-01-25 16:46       ` Hugh Greenberg
  0 siblings, 2 replies; 17+ messages in thread
From: Kirill A. Shutemov @ 2016-01-25 10:38 UTC (permalink / raw)
  To: Hugh Greenberg; +Cc: linux-mm

On Sat, Jan 23, 2016 at 03:57:21PM +0000, Hugh Greenberg wrote:
> Kirill A. Shutemov <kirill <at> shutemov.name> writes:
> > 
> > Could you try to insert "late_initcall(set_recommended_min_free_kbytes);"
> > back and check if makes any difference.
> > 
> 
> We tested adding late_initcall(set_recommended_min_free_kbytes); 
> back in 4.1.14 and it made a huge difference. We aren't sure if the
> issue is 100% fixed, but it could be. We will keep testing it.

It would be nice to have values of min_free_kbytes before and after
set_recommended_min_free_kbytes() in your configuration.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [REGRESSION] [BISECTED] kswapd high CPU usage
  2016-01-25 10:38     ` Kirill A. Shutemov
@ 2016-01-25 16:37       ` Hugh Greenberg
  2016-01-25 16:46       ` Hugh Greenberg
  1 sibling, 0 replies; 17+ messages in thread
From: Hugh Greenberg @ 2016-01-25 16:37 UTC (permalink / raw)
  To: linux-mm

Kirill A. Shutemov <kirill <at> shutemov.name> writes:

> 
> On Sat, Jan 23, 2016 at 03:57:21PM +0000, Hugh Greenberg wrote:
> > Kirill A. Shutemov <kirill <at> shutemov.name> writes:
> > > 
> > > Could you try to insert 
"late_initcall(set_recommended_min_free_kbytes);"
> > > back and check if makes any difference.
> > > 
> > 
> > We tested adding late_initcall(set_recommended_min_free_kbytes); 
> > back in 4.1.14 and it made a huge difference. We aren't sure if the
> > issue is 100% fixed, but it could be. We will keep testing it.
> 
> It would be nice to have values of min_free_kbytes before and after
> set_recommended_min_free_kbytes() in your configuration.
> 

We'll try to get this for you. 

The reports are coming from 2GB Haswell chromebooks. I have a 4GB haswell 
chromebook and I cannot reproduce the issue, even if I boot the kernel with 
2GB or less.

After more testing, we were still able to reproduce the issue. It seems to 
have taken longer to show up this time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [REGRESSION] [BISECTED] kswapd high CPU usage
  2016-01-25 10:38     ` Kirill A. Shutemov
  2016-01-25 16:37       ` Hugh Greenberg
@ 2016-01-25 16:46       ` Hugh Greenberg
  2016-02-02 13:59         ` Kirill A. Shutemov
  1 sibling, 1 reply; 17+ messages in thread
From: Hugh Greenberg @ 2016-01-25 16:46 UTC (permalink / raw)
  To: linux-mm

Kirill A. Shutemov <kirill <at> shutemov.name> writes:

> 
> On Sat, Jan 23, 2016 at 03:57:21PM +0000, Hugh Greenberg wrote:
> > Kirill A. Shutemov <kirill <at> shutemov.name> writes:
> > > 
> > > Could you try to insert 
"late_initcall(set_recommended_min_free_kbytes);"
> > > back and check if makes any difference.
> > > 
> > 
> > We tested adding late_initcall(set_recommended_min_free_kbytes); 
> > back in 4.1.14 and it made a huge difference. We aren't sure if the
> > issue is 100% fixed, but it could be. We will keep testing it.
> 
> It would be nice to have values of min_free_kbytes before and after
> set_recommended_min_free_kbytes() in your configuration.
> 

Before adding set_recommended_min_free_kbytes: 5391
After: 67584



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [REGRESSION] [BISECTED] kswapd high CPU usage
  2016-01-25 16:46       ` Hugh Greenberg
@ 2016-02-02 13:59         ` Kirill A. Shutemov
  2016-02-02 14:41           ` Sergey Senozhatsky
                             ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Kirill A. Shutemov @ 2016-02-02 13:59 UTC (permalink / raw)
  To: Hugh Greenberg, Andrea Arcangeli, Mel Gorman, Vlastimil Babka,
	Rik van Riel, Minchan Kim, Nitin Gupta, Sergey Senozhatsky
  Cc: linux-mm, Andrew Morton

On Mon, Jan 25, 2016 at 04:46:58PM +0000, Hugh Greenberg wrote:
> Kirill A. Shutemov <kirill <at> shutemov.name> writes:
> 
> > 
> > On Sat, Jan 23, 2016 at 03:57:21PM +0000, Hugh Greenberg wrote:
> > > Kirill A. Shutemov <kirill <at> shutemov.name> writes:
> > > > 
> > > > Could you try to insert 
> "late_initcall(set_recommended_min_free_kbytes);"
> > > > back and check if makes any difference.
> > > > 
> > > 
> > > We tested adding late_initcall(set_recommended_min_free_kbytes); 
> > > back in 4.1.14 and it made a huge difference. We aren't sure if the
> > > issue is 100% fixed, but it could be. We will keep testing it.
> > 
> > It would be nice to have values of min_free_kbytes before and after
> > set_recommended_min_free_kbytes() in your configuration.
> > 
> 
> Before adding set_recommended_min_free_kbytes: 5391
> After: 67584

[ add more people to the thread ]

The 'before' value look low to me for machine with 2G of RAM.

In the bugzilla[1], you've mentioned zram. I wounder if we need to
increase min_free_kbytes when zram is in use as we do for THP.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=110501

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [REGRESSION] [BISECTED] kswapd high CPU usage
  2016-02-02 13:59         ` Kirill A. Shutemov
@ 2016-02-02 14:41           ` Sergey Senozhatsky
  2016-02-02 16:24           ` Minchan Kim
                             ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Sergey Senozhatsky @ 2016-02-02 14:41 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Greenberg, Andrea Arcangeli, Mel Gorman, Vlastimil Babka,
	Rik van Riel, Minchan Kim, Nitin Gupta, Sergey Senozhatsky,
	linux-mm, Andrew Morton, Seth Jennings

On (02/02/16 15:59), Kirill A. Shutemov wrote:
> > > On Sat, Jan 23, 2016 at 03:57:21PM +0000, Hugh Greenberg wrote:
> > > > Kirill A. Shutemov <kirill <at> shutemov.name> writes:
> > > > > 
> > > > > Could you try to insert 
> > "late_initcall(set_recommended_min_free_kbytes);"
> > > > > back and check if makes any difference.
> > > > > 
> > > > 
> > > > We tested adding late_initcall(set_recommended_min_free_kbytes); 
> > > > back in 4.1.14 and it made a huge difference. We aren't sure if the
> > > > issue is 100% fixed, but it could be. We will keep testing it.
> > > 
> > > It would be nice to have values of min_free_kbytes before and after
> > > set_recommended_min_free_kbytes() in your configuration.
> > > 
> > 
> > Before adding set_recommended_min_free_kbytes: 5391
> > After: 67584
> 
> [ add more people to the thread ]
> 
> The 'before' value look low to me for machine with 2G of RAM.
> 
> In the bugzilla[1], you've mentioned zram. I wounder if we need to
> increase min_free_kbytes when zram is in use as we do for THP.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=110501

[add Seth Jennings]


additional info: http://marc.info/?l=linux-mm&m=145373987224270
> The reports are coming from 2GB Haswell chromebooks. I have a 4GB haswell
> chromebook and I cannot reproduce the issue, even if I boot the kernel with
> 2GB or less.
>
> After more testing, we were still able to reproduce the issue. It seems to
> have taken longer to show up this time.


a small note,
I assume, IF min_free_kbytes must be set then probably in zsmalloc (zbud?)
init, not in zram. zswap can use both -- zsmalloc and zbud.

	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [REGRESSION] [BISECTED] kswapd high CPU usage
  2016-02-02 13:59         ` Kirill A. Shutemov
  2016-02-02 14:41           ` Sergey Senozhatsky
@ 2016-02-02 16:24           ` Minchan Kim
  2016-03-14 16:00           ` Hugh Greenberg
                             ` (2 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Minchan Kim @ 2016-02-02 16:24 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Hugh Greenberg, Andrea Arcangeli, Mel Gorman, Vlastimil Babka,
	Rik van Riel, Nitin Gupta, Sergey Senozhatsky, linux-mm,
	Andrew Morton

On Tue, Feb 02, 2016 at 03:59:50PM +0200, Kirill A. Shutemov wrote:
> On Mon, Jan 25, 2016 at 04:46:58PM +0000, Hugh Greenberg wrote:
> > Kirill A. Shutemov <kirill <at> shutemov.name> writes:
> > 
> > > 
> > > On Sat, Jan 23, 2016 at 03:57:21PM +0000, Hugh Greenberg wrote:
> > > > Kirill A. Shutemov <kirill <at> shutemov.name> writes:
> > > > > 
> > > > > Could you try to insert 
> > "late_initcall(set_recommended_min_free_kbytes);"
> > > > > back and check if makes any difference.
> > > > > 
> > > > 
> > > > We tested adding late_initcall(set_recommended_min_free_kbytes); 
> > > > back in 4.1.14 and it made a huge difference. We aren't sure if the
> > > > issue is 100% fixed, but it could be. We will keep testing it.
> > > 
> > > It would be nice to have values of min_free_kbytes before and after
> > > set_recommended_min_free_kbytes() in your configuration.
> > > 
> > 
> > Before adding set_recommended_min_free_kbytes: 5391
> > After: 67584
> 
> [ add more people to the thread ]
> 
> The 'before' value look low to me for machine with 2G of RAM.
> 
> In the bugzilla[1], you've mentioned zram. I wounder if we need to
> increase min_free_kbytes when zram is in use as we do for THP.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=110501

Normally, it's recommended to increate min_free_kbytes when zram is
used for swap because zram should allocate a page in reclaim path
dynamically to keep compressed page.

However, when I read bugzilla's perf profile, I can't find
any zram related things and if there is lack of free memory for
zram page allocation due to changing min_free_kbytes, user will see
below error message.

pr_err("Error allocating memory for compressed page: %u, size=%zu\n"

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [REGRESSION] [BISECTED] kswapd high CPU usage
  2016-02-02 13:59         ` Kirill A. Shutemov
  2016-02-02 14:41           ` Sergey Senozhatsky
  2016-02-02 16:24           ` Minchan Kim
@ 2016-03-14 16:00           ` Hugh Greenberg
  2016-03-14 16:07           ` Hugh Greenberg
  2016-03-15 17:16           ` Hugh Greenberg
  4 siblings, 0 replies; 17+ messages in thread
From: Hugh Greenberg @ 2016-03-14 16:00 UTC (permalink / raw)
  To: linux-mm

Kirill A. Shutemov <kirill <at> shutemov.name> writes:


> In the bugzilla[1], you've mentioned zram. I wounder if we need to
> increase min_free_kbytes when zram is in use as we do for THP.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=110501
> 

We've tried increasing the min_free_kbytes. It may help for a time, 
but then the issue returns.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [REGRESSION] [BISECTED] kswapd high CPU usage
  2016-02-02 13:59         ` Kirill A. Shutemov
                             ` (2 preceding siblings ...)
  2016-03-14 16:00           ` Hugh Greenberg
@ 2016-03-14 16:07           ` Hugh Greenberg
  2016-03-15 17:16           ` Hugh Greenberg
  4 siblings, 0 replies; 17+ messages in thread
From: Hugh Greenberg @ 2016-03-14 16:07 UTC (permalink / raw)
  To: linux-mm

A lot of users are still reporting the issue. 
The bugzilla report has some new information.
https://bugzilla.kernel.org/show_bug.cgi?id=110501
 




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [REGRESSION] [BISECTED] kswapd high CPU usage
  2016-02-02 13:59         ` Kirill A. Shutemov
                             ` (3 preceding siblings ...)
  2016-03-14 16:07           ` Hugh Greenberg
@ 2016-03-15 17:16           ` Hugh Greenberg
  4 siblings, 0 replies; 17+ messages in thread
From: Hugh Greenberg @ 2016-03-15 17:16 UTC (permalink / raw)
  To: linux-mm

There is also another bug report for the same issue 
that has some good information: 
https://bugzilla.kernel.org/show_bug.cgi?id=65201

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [REGRESSION] [BISECTED] kswapd high CPU usage
  2020-07-15 10:04 Alexey Vlasov
@ 2020-08-10 13:47 ` Alexey Vlasov
  0 siblings, 0 replies; 17+ messages in thread
From: Alexey Vlasov @ 2020-08-10 13:47 UTC (permalink / raw)
  To: linux-kernel; +Cc: kirill

I have found a workaround preventing these hangs.
Primarily, disable THP:

echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

and next, we should increase vm.min_free_kbytes, in my case 16Gb is
enough

vm.min_free_kbytes = 16777216

On Wed, Jul 15, 2020 at 01:04:38PM +0300, Alexey Vlasov wrote:
> Hi,
> 
> After upgrading from 3.14 to 4.14.173, I ran into exactly the same problem
> that the starter topic described. Namely, sometimes kswapd starts to consume 100% 
> of the CPU, and the system freezes for several minutes.
> 
> Below is an example of such an event (orange - system cpu, red - total cpu):
> https://www.dropbox.com/s/5wr5su3p0fubq0a/kswapd_100.png?dl=0
> 
> Here is the top:
> 
> top - 23:44:16 up 9 days,  2:06, 14 users,  load average: 14.03, 12.32, 13.07
> Tasks: 7108 total,  16 running, 6921 sleeping,   0 stopped,   9 zombie
> %Cpu(s): 28.1 us, 18.1 sy,  0.0 ni, 51.7 id,  1.2 wa,  0.0 hi,  0.9 si,  0.0 st
> KiB Mem : 19803248+total,   596160 free, 11094233+used, 86493992 buff/cache
> KiB Swap: 62914556 total, 62302912 free,   611644 used. 71269504 avail Mem
> 
>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
>   134 root      20   0       0      0      0 R  86.2  0.0 383:21.35 kswapd0
>   135 root      20   0       0      0      0 R  84.9  0.0 344:00.17 kswapd1
> 
> this is a begin of the collapse, some minutes later the system has thousands of D
> processes and does not answer:
> 
> top - 23:57:33 up 9 days,  2:19, 14 users,  load average: 1223.43, 1083.85, 662.
> Tasks: 8356 total, 344 running, 7821 sleeping,   0 stopped,  44 zombie
> %Cpu(s): 28.1 us, 18.2 sy,  0.0 ni, 51.6 id,  1.2 wa,  0.0 hi,  0.9 si,  0.0 st
> KiB Mem : 19803248+total,   800516 free, 11587540+used, 81356560 buff/cache
> KiB Swap: 62914556 total, 62130072 free,   784484 used. 62231208 avail Mem
> 
>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
> 10704 w_defau+  20   0  393476 117160  15160 D 100.0  0.1   0:00.16 httpd
> 16056 w_sti46+  20   0  599048  21528   9504 S 100.0  0.0   0:00.00 httpd
> 12649 w_divan+  20   0   41764   8064   3904 D 100.0  0.0   0:06.62 menu1.pl
> 13739 w_defau+  20   0  248696  24168  14132 S 100.0  0.0   0:00.01 httpd
>  5172 mysql     20   0 6993508 2.310g   9660 D  38.9  1.2   3866:26 mysqld_aux3
>  4683 mysql     20   0 9974.1m 4.366g   8268 D  38.7  2.3   2553:14 mysqld
>  4791 mysql     20   0 10.359g 4.180g   9784 D  28.5  2.2   1659:40 mysqld_aux1
>  5078 mysql     20   0  9.871g 3.774g   9888 D  25.4  2.0   2445:08 mysqld_aux2
>     9 root      20   0       0      0      0 I   3.4  0.0  13:56.16 rcu_sched
>   135 root      20   0       0      0      0 D   2.8  0.0 344:29.12 kswapd1
>   134 root      20   0       0      0      0 D   2.6  0.0 383:49.86 kswapd0
> 
> Nevertheless there is not any I/O activity before after and during this collapse.
> 
> I tried to use your patch about "late_initcall(set_recommended_min_free_kbytes)",
> unfortunately it did not help.
> 
> In my experience this could be solved by adding RAM but unfortunately this server
> no longer has free slots. 188 GB RAM is the maximum for it.
> 
> Also I cannot go back to 3.14 kernel, since one of the partitions contains xfs with
> the superblock of the new version v5, which is not supported by 3.14 kernel.
> 
> If you need more information, for example, vmstat, /proc/meminfo, I can send.
> 
> Is there any solution to this problem?
> 
> > On Fri, Jan 22, 2016 at 12:28:10AM +1000, Nalorokk wrote:
> >> It appears that kernels newer than 4.1 have kswapd-related bug resulting in
> >> high CPU usage. CPU 100% usage could last for several minutes or several
> >> days, with CPU being busy entirely with serving kswapd. It happens usually
> >> after server being mostly idle, sometimes after days, sometimes after weeks
> >> of uptime. But the issue appears much sooner if the machine is loaded with
> >> something like building a kernel.
> >>
> >> Here are the graphs of CPU load: first
> >> <http://i.piccy.info/i9/9ee6c0620c9481a974908484b2a52a0f/1453384595/44012/994698/cpu_month.png>,
> >> second
> >> <http://i.piccy.info/i9/7c97c2f39620bb9d7ea93096312dbbb6/1453384649/41222/994698/cpu_year.png>.
> >> Perf top output is here <http://pastebin.com/aRzTjb2x>as well.
> >>
> >> To find the cause of this problem I've started with the fact that the issue
> >> appeared after 4.1 kernel update. Then I performed longterm test of 3.18,
> >> and discovered that 3.18 is unaffected by this bug. Then I did some tests
> >> of 4.0 to confirm that this version behaves well too.
> >>
> >> Then I performed git bisect from tag v4.0 to v4.1-rc1 and found exact
> >> commits that seem to be reason of high CPU usage.
> >>
> >> The first really "bad" commit is 79553da293d38d63097278de13e28a3b371f43c1.
> >> 2 previous commits cause weird behavior as well resulting in kswapd
> >> consuming more CPU than unaffected kernels, but not that much as the commit
> >> pointed above. I believe those commits are related to the same mm tree
> >> merge.
> >>
> >> I tried to add transparent_hugepage=never to kernel boot parameters, but it
> >> did not change anything. Changing allocator to SLAB from SLUB alters
> >> behavior and makes CPU load lower, but don't solve a problem at all.
> >>
> >> Here <https://bugzilla.kernel.org/show_bug.cgi?id=110501>is kernel bugzilla
> >> bugreport as well.
> >>
> >> Ideas? â
> >
> > Could you try to insert "late_initcall(set_recommended_min_free_kbytes);"
> > back and check if makes any difference.
> >
> >-- 
> >Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [REGRESSION] [BISECTED] kswapd high CPU usage
@ 2020-07-15 10:04 Alexey Vlasov
  2020-08-10 13:47 ` Alexey Vlasov
  0 siblings, 1 reply; 17+ messages in thread
From: Alexey Vlasov @ 2020-07-15 10:04 UTC (permalink / raw)
  To: linux-kernel; +Cc: kirill

Hi,

After upgrading from 3.14 to 4.14.173, I ran into exactly the same problem
that the starter topic described. Namely, sometimes kswapd starts to consume 100% 
of the CPU, and the system freezes for several minutes.

Below is an example of such an event (orange - system cpu, red - total cpu):
https://www.dropbox.com/s/5wr5su3p0fubq0a/kswapd_100.png?dl=0

Here is the top:

top - 23:44:16 up 9 days,  2:06, 14 users,  load average: 14.03, 12.32, 13.07
Tasks: 7108 total,  16 running, 6921 sleeping,   0 stopped,   9 zombie
%Cpu(s): 28.1 us, 18.1 sy,  0.0 ni, 51.7 id,  1.2 wa,  0.0 hi,  0.9 si,  0.0 st
KiB Mem : 19803248+total,   596160 free, 11094233+used, 86493992 buff/cache
KiB Swap: 62914556 total, 62302912 free,   611644 used. 71269504 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  134 root      20   0       0      0      0 R  86.2  0.0 383:21.35 kswapd0
  135 root      20   0       0      0      0 R  84.9  0.0 344:00.17 kswapd1

this is a begin of the collapse, some minutes later the system has thousands of D
processes and does not answer:

top - 23:57:33 up 9 days,  2:19, 14 users,  load average: 1223.43, 1083.85, 662.
Tasks: 8356 total, 344 running, 7821 sleeping,   0 stopped,  44 zombie
%Cpu(s): 28.1 us, 18.2 sy,  0.0 ni, 51.6 id,  1.2 wa,  0.0 hi,  0.9 si,  0.0 st
KiB Mem : 19803248+total,   800516 free, 11587540+used, 81356560 buff/cache
KiB Swap: 62914556 total, 62130072 free,   784484 used. 62231208 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
10704 w_defau+  20   0  393476 117160  15160 D 100.0  0.1   0:00.16 httpd
16056 w_sti46+  20   0  599048  21528   9504 S 100.0  0.0   0:00.00 httpd
12649 w_divan+  20   0   41764   8064   3904 D 100.0  0.0   0:06.62 menu1.pl
13739 w_defau+  20   0  248696  24168  14132 S 100.0  0.0   0:00.01 httpd
 5172 mysql     20   0 6993508 2.310g   9660 D  38.9  1.2   3866:26 mysqld_aux3
 4683 mysql     20   0 9974.1m 4.366g   8268 D  38.7  2.3   2553:14 mysqld
 4791 mysql     20   0 10.359g 4.180g   9784 D  28.5  2.2   1659:40 mysqld_aux1
 5078 mysql     20   0  9.871g 3.774g   9888 D  25.4  2.0   2445:08 mysqld_aux2
    9 root      20   0       0      0      0 I   3.4  0.0  13:56.16 rcu_sched
  135 root      20   0       0      0      0 D   2.8  0.0 344:29.12 kswapd1
  134 root      20   0       0      0      0 D   2.6  0.0 383:49.86 kswapd0

Nevertheless there is not any I/O activity before after and during this collapse.

I tried to use your patch about "late_initcall(set_recommended_min_free_kbytes)",
unfortunately it did not help.

In my experience this could be solved by adding RAM but unfortunately this server
no longer has free slots. 188 GB RAM is the maximum for it.

Also I cannot go back to 3.14 kernel, since one of the partitions contains xfs with
the superblock of the new version v5, which is not supported by 3.14 kernel.

If you need more information, for example, vmstat, /proc/meminfo, I can send.

Is there any solution to this problem?

> On Fri, Jan 22, 2016 at 12:28:10AM +1000, Nalorokk wrote:
>> It appears that kernels newer than 4.1 have kswapd-related bug resulting in
>> high CPU usage. CPU 100% usage could last for several minutes or several
>> days, with CPU being busy entirely with serving kswapd. It happens usually
>> after server being mostly idle, sometimes after days, sometimes after weeks
>> of uptime. But the issue appears much sooner if the machine is loaded with
>> something like building a kernel.
>>
>> Here are the graphs of CPU load: first
>> <http://i.piccy.info/i9/9ee6c0620c9481a974908484b2a52a0f/1453384595/44012/994698/cpu_month.png>,
>> second
>> <http://i.piccy.info/i9/7c97c2f39620bb9d7ea93096312dbbb6/1453384649/41222/994698/cpu_year.png>.
>> Perf top output is here <http://pastebin.com/aRzTjb2x>as well.
>>
>> To find the cause of this problem I've started with the fact that the issue
>> appeared after 4.1 kernel update. Then I performed longterm test of 3.18,
>> and discovered that 3.18 is unaffected by this bug. Then I did some tests
>> of 4.0 to confirm that this version behaves well too.
>>
>> Then I performed git bisect from tag v4.0 to v4.1-rc1 and found exact
>> commits that seem to be reason of high CPU usage.
>>
>> The first really "bad" commit is 79553da293d38d63097278de13e28a3b371f43c1.
>> 2 previous commits cause weird behavior as well resulting in kswapd
>> consuming more CPU than unaffected kernels, but not that much as the commit
>> pointed above. I believe those commits are related to the same mm tree
>> merge.
>>
>> I tried to add transparent_hugepage=never to kernel boot parameters, but it
>> did not change anything. Changing allocator to SLAB from SLUB alters
>> behavior and makes CPU load lower, but don't solve a problem at all.
>>
>> Here <https://bugzilla.kernel.org/show_bug.cgi?id=110501>is kernel bugzilla
>> bugreport as well.
>>
>> Ideas? â
>
> Could you try to insert "late_initcall(set_recommended_min_free_kbytes);"
> back and check if makes any difference.
>
>-- 
>Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-08-10 14:19 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-21 14:28 [REGRESSION] [BISECTED] kswapd high CPU usage Nalorokk
2016-01-21 14:37 ` Fwd: " Nalorokk
2016-01-21 14:37   ` Nalorokk
2016-01-21 16:16 ` Kirill A. Shutemov
2016-01-21 16:16   ` Kirill A. Shutemov
2016-01-23 15:57   ` Hugh Greenberg
2016-01-25 10:38     ` Kirill A. Shutemov
2016-01-25 16:37       ` Hugh Greenberg
2016-01-25 16:46       ` Hugh Greenberg
2016-02-02 13:59         ` Kirill A. Shutemov
2016-02-02 14:41           ` Sergey Senozhatsky
2016-02-02 16:24           ` Minchan Kim
2016-03-14 16:00           ` Hugh Greenberg
2016-03-14 16:07           ` Hugh Greenberg
2016-03-15 17:16           ` Hugh Greenberg
2020-07-15 10:04 Alexey Vlasov
2020-08-10 13:47 ` Alexey Vlasov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.