* Re: [Bug 200627] New: Stutters and high kernel CPU usage from list_lru_count_one when cache fills memory
[not found] <bug-200627-27@https.bugzilla.kernel.org/>
@ 2018-07-22 23:40 ` Andrew Morton
2018-07-22 23:44 ` Kevin Liu
0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2018-07-22 23:40 UTC (permalink / raw)
To: kevin; +Cc: bugzilla-daemon, linux-mm
(switched to email. Please respond via emailed reply-to-all, not via the
bugzilla web interface).
On Sun, 22 Jul 2018 23:33:57 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=200627
>
> Bug ID: 200627
> Summary: Stutters and high kernel CPU usage from
> list_lru_count_one when cache fills memory
Thanks. Please do note the above request.
> Product: Memory Management
> Version: 2.5
> Kernel Version: 4.18-rc4, 4.16
> Hardware: x86-64
> OS: Linux
> Tree: Mainline
> Status: NEW
> Severity: normal
> Priority: P1
> Component: Other
> Assignee: akpm@linux-foundation.org
> Reporter: kevin@potatofrom.space
> Regression: No
>
> I've recently noticed stuttering and general sluggishness, in Xorg, Firefox,
> and other graphical applications, when the memory becomes completely filled
> with cache. In `htop`, the stuttering manifests as all CPU cores at 100% usage,
> mostly in kernel mode.
How recently? Were earlier kernels better behaved?
> Doing a `perf top` shows that `list_lru_count_one` causes a lot of overhead:
>
> ```
> Overhead Shared Object Symbol
> 18.38% [kernel] [k] list_lru_count_one
> 4.90% [kernel] [k] nmi
> 3.27% [kernel] [k] read_hpet
> 2.66% [kernel] [k] super_cache_count
> 1.84% [kernel] [k] shrink_slab.part.52
> 1.63% [kernel] [k]
> shmem_unused_huge_count
> 1.19% restic [.] 0x00000000002e696c
> 0.98% restic [.] 0x00000000002e6a2f
> 0.81% restic [.] 0x00000000002e699b
> 0.80% restic [.] 0x00000000002e69b6
> 0.79% restic [.] 0x00000000002e697d
> 0.74% .perf-wrapped [.] rb_next
> 0.62% [kernel] [k] _aesni_dec4
> 0.57% restic [.] 0x00000000002e6a18
> 0.56% [kernel] [k] aesni_xts_crypt8
> 0.51% restic [.] 0x000000000005676a
> 0.50% restic [.] 0x00000000002e69de
> 0.50% restic [.] 0x00000000002e69f1
> 0.43% restic [.] 0x00000000002e6a10
> 0.43% restic [.] 0x00000000002e69c9
> 0.41% .perf-wrapped [.] hpp__sort_overhead
> 0.41% restic [.] 0x00000000002e6996
> 0.40% [kernel] [k]
> update_blocked_averages
> 0.38% restic [.] 0x00000000002e6a05
> 0.38% [kernel] [k] __indirect_thunk_start
> 0.37% [kernel] [k]
> copy_user_enhanced_fast_string
> 0.35% rclone [.] crypto/md5.block
> ```
>
> I've seen it hit up to 25% overhead, while normally (when the cache hasn't
> filled up) it only has ~4% overhead. I believe that this is the cause of the
> stutter.
>
> I've kludged together a workaround, as running `echo 3 >
> /proc/sys/vm/drop_caches` every minute keeps the cache from filling up and the
> system responsive, but I was wondering if this was a potential issue in the
> kernel.
>
> More details on my workload:
>
> - Running Docker containers connected via NFS to disk; this computer serves ~20
> NFSv4.2 shares, though most of them have fairly light IO.
> - Running a restic backup with rclone, which requires significant CPU usage and
> does a lot of disk-waiting on hard drives. (It doesn't impact responsiveness
> when the cache isn't full, though.)
>
> System:
>
> - Linux 4.18-rc4, NixOS unstable
> - Intel i7-4820k
> - 20 GB RAM
> - AMD RX 580
>
> Let me know if there are any more details I can provide or any tests I can run.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Bug 200627] New: Stutters and high kernel CPU usage from list_lru_count_one when cache fills memory
2018-07-22 23:40 ` [Bug 200627] New: Stutters and high kernel CPU usage from list_lru_count_one when cache fills memory Andrew Morton
@ 2018-07-22 23:44 ` Kevin Liu
2018-07-23 0:02 ` Kevin Liu
0 siblings, 1 reply; 4+ messages in thread
From: Kevin Liu @ 2018-07-22 23:44 UTC (permalink / raw)
To: Andrew Morton; +Cc: bugzilla-daemon, linux-mm
[-- Attachment #1.1: Type: text/plain, Size: 358 bytes --]
> How recently? Were earlier kernels better behaved?
I've seen this issue both on Linux 4.16.15 (admittedly using the -ck
patchset) and on vanilla Linux 4.18-rc4 (which is what I'm currently using).
I'm fairly certain that it did not occur on Linux 4.14.50, which I used
previously, but I will boot back into it to double-check and let you know.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Bug 200627] New: Stutters and high kernel CPU usage from list_lru_count_one when cache fills memory
2018-07-22 23:44 ` Kevin Liu
@ 2018-07-23 0:02 ` Kevin Liu
2018-07-23 1:52 ` Kevin Liu
0 siblings, 1 reply; 4+ messages in thread
From: Kevin Liu @ 2018-07-23 0:02 UTC (permalink / raw)
To: Andrew Morton; +Cc: bugzilla-daemon, linux-mm
[-- Attachment #1: Type: text/plain, Size: 1136 bytes --]
Sorry, not sure if the previous message registered on bugzilla due to
the pgp signature? Including it below.
On 07/22/2018 07:44 PM, Kevin Liu wrote:
>> How recently? Were earlier kernels better behaved?
> I've seen this issue both on Linux 4.16.15 (admittedly using the -ck
> patchset) and on vanilla Linux 4.18-rc4 (which is what I'm currently using).
>
> I'm fairly certain that it did not occur on Linux 4.14.50, which I used
> previously, but I will boot back into it to double-check and let you know.
>
And yes, booted back into Linux 4.14.54, there appears to be no issue --
list_lru_count_one reaches 6% overhead at most:
Overhead Shared Object Symbol
5.91% [kernel] [k] list_lru_count_one
5.13% [kernel] [k] nmi
4.08% [kernel] [k] read_hpet
1.26% zma [.] Zone::CheckAlarms
1.16% [kernel] [k] _raw_spin_lock
1.07% restic [.] 0x00000000002e696c
1.06% .perf-wrapped [.] hpp__sort_overhead
[-- Attachment #2: Type: text/html, Size: 1944 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Bug 200627] New: Stutters and high kernel CPU usage from list_lru_count_one when cache fills memory
2018-07-23 0:02 ` Kevin Liu
@ 2018-07-23 1:52 ` Kevin Liu
0 siblings, 0 replies; 4+ messages in thread
From: Kevin Liu @ 2018-07-23 1:52 UTC (permalink / raw)
To: Andrew Morton; +Cc: bugzilla-daemon, linux-mm
[-- Attachment #1: Type: text/plain, Size: 1442 bytes --]
Correction - after using 4.14.53 for a while, I do actually see
list_lru_count_one at the top, but only at ~10% overhead. The
responsiveness is slightly degraded, but still better than how it was on
4.18-rc4.
On 07/22/2018 08:02 PM, Kevin Liu wrote:
> Sorry, not sure if the previous message registered on bugzilla due to
> the pgp signature? Including it below.
>
> On 07/22/2018 07:44 PM, Kevin Liu wrote:
>>> How recently? Were earlier kernels better behaved?
>> I've seen this issue both on Linux 4.16.15 (admittedly using the -ck
>> patchset) and on vanilla Linux 4.18-rc4 (which is what I'm currently using).
>>
>> I'm fairly certain that it did not occur on Linux 4.14.50, which I used
>> previously, but I will boot back into it to double-check and let you know.
>>
>
> And yes, booted back into Linux 4.14.54, there appears to be no issue --
> list_lru_count_one reaches 6% overhead at most:
>
> Overhead Shared Object Symbol
>
> 5.91% [kernel] [k] list_lru_count_one
>
> 5.13% [kernel] [k] nmi
>
> 4.08% [kernel] [k] read_hpet
>
> 1.26% zma [.] Zone::CheckAlarms
>
> 1.16% [kernel] [k] _raw_spin_lock
>
> 1.07% restic [.] 0x00000000002e696c
>
> 1.06% .perf-wrapped [.] hpp__sort_overhead
>
>
[-- Attachment #2: Type: text/html, Size: 2231 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-07-23 1:52 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <bug-200627-27@https.bugzilla.kernel.org/>
2018-07-22 23:40 ` [Bug 200627] New: Stutters and high kernel CPU usage from list_lru_count_one when cache fills memory Andrew Morton
2018-07-22 23:44 ` Kevin Liu
2018-07-23 0:02 ` Kevin Liu
2018-07-23 1:52 ` Kevin Liu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.