From: Daniel Dao <dqminh@cloudflare.com>
To: shakeelb@google.com
Cc: kernel-team <kernel-team@cloudflare.com>,
linux-mm@kvack.org, hannes@cmpxchg.org, guro@fb.com,
feng.tang@intel.com, mhocko@kernel.org, hdanton@sina.com,
mkoutny@suse.com, akpm@linux-foundation.org,
torvalds@linux-foundation.org
Subject: Regression in workingset_refault latency on 5.15
Date: Wed, 23 Feb 2022 13:51:18 +0000 [thread overview]
Message-ID: <CA+wXwBSyO87ZX5PVwdHm-=dBjZYECGmfnydUicUyrQqndgX2MQ@mail.gmail.com> (raw)
Hi all,
We are observing some regressions in workingset_refault on our newly upgraded
5.15.19 nodes with zram as swap. This manifests in several ways:
1) Regression of workingset_refault duration observed in flamegraph
We regularly collect flamegraphs for running services on the node. Since upgrade
to 5.15.19, we see that workingset_refault occupied a more significant part of
the service flamegraph (13%) with the following call trace
workingset_refault+0x128
add_to_page_cache_lru+0x9f
page_cache_ra_unbounded+0x154
force_page_cache_ra+0xe2
filemap_get_pages+0xe9
filemap_read+0xa4
xfs_file_buffered_read+0x98
xfs_file_read_iter+0x6a
new_sync_read+0x118
vfs_read+0xf2
__x64_sys_pread64+0x89
do_syscall_64+0x3b
entry_SYSCALL_64_after_hwframe+0x44
2) Regression of userspace performance sensitive code
We have some performance sensentive code running in userspace that have their
runtime measured by CLOCK_THREAD_CPUTIME_ID. They look roughly as:
now = clock_gettime(CLOCK_THREAD_CPUTIME_ID)
func()
elapsed = clock_gettime(CLOCK_THREAD_CPUTIME_ID) - now
Since 5.15 upgrade, we observed long `elapsed` in the range of 4-10ms much more
frequently than before. This went away after we disabled swap for the service
using `memory.swap.max=0` memcg configuration.
The new thing in 5.15 workingset_refault seems to be introduction of
mem_cgroup_flush_stats()
by commit 1f828223b7991a228bc2aef837b78737946d44b2 (memcg: flush
lruvec stats in the
refault).
Given that mem_cgroup_flush_stats can take quite a long time for us on the
standard systemd cgroupv2 hierrachy ( root / system.slice / workload.service )
sudo /usr/share/bcc/tools/funcslower -m 10 -t mem_cgroup_flush_stats
Tracing function calls slower than 10 ms... Ctrl+C to quit.
TIME COMM PID LAT(ms) RVAL FUNC
0.000000 <redacted> 804776 11.50 200
mem_cgroup_flush_stats
0.343383 <redacted> 647496 10.58 200
mem_cgroup_flush_stats
0.604309 <redacted> 804776 10.50 200
mem_cgroup_flush_stats
1.230416 <redacted> 803293 10.01 200
mem_cgroup_flush_stats
1.248442 <redacted> 646400 11.02 200
mem_cgroup_flush_stats
could it be possible that workingset_refault in some unfortunate case can take
much longer than before such that it increases the time observed by
CLOCK_THREAD_CPUTIME_ID from userspace, or overall duration of
workingset_refault
observed by perf ?
next reply other threads:[~2022-02-23 13:51 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-23 13:51 Daniel Dao [this message]
2022-02-23 15:57 ` Regression in workingset_refault latency on 5.15 Shakeel Butt
2022-02-23 16:00 ` Shakeel Butt
2022-02-23 17:07 ` Daniel Dao
2022-02-23 17:36 ` Shakeel Butt
2022-02-23 19:28 ` Ivan Babrou
2022-02-23 20:28 ` Shakeel Butt
2022-02-23 21:16 ` Ivan Babrou
2022-02-24 14:46 ` Daniel Dao
2022-02-24 16:58 ` Shakeel Butt
2022-02-24 17:34 ` Daniel Dao
2022-02-24 18:00 ` Shakeel Butt
2022-02-24 18:52 ` Shakeel Butt
2022-02-25 10:23 ` Daniel Dao
2022-02-25 17:08 ` Ivan Babrou
2022-02-25 17:22 ` Shakeel Butt
2022-02-25 18:03 ` Michal Koutný
2022-02-25 18:08 ` Ivan Babrou
2022-02-28 23:09 ` Shakeel Butt
2022-02-28 23:34 ` Ivan Babrou
2022-02-28 23:43 ` Shakeel Butt
2022-03-02 0:48 ` Ivan Babrou
2022-03-02 2:50 ` Shakeel Butt
2022-03-02 3:40 ` Ivan Babrou
2022-03-02 22:33 ` Ivan Babrou
2022-03-03 2:32 ` Shakeel Butt
2022-03-03 2:35 ` Shakeel Butt
2022-03-04 0:21 ` Ivan Babrou
2022-03-04 1:05 ` Shakeel Butt
2022-03-04 1:12 ` Ivan Babrou
2022-03-02 11:49 ` Frank Hofmann
2022-03-02 15:52 ` Shakeel Butt
2022-03-02 10:08 ` Michal Koutný
2022-03-02 15:53 ` Shakeel Butt
2022-03-02 17:28 ` Ivan Babrou
2022-02-24 9:22 ` Thorsten Leemhuis
2022-04-11 10:17 ` Regression in workingset_refault latency on 5.15 #forregzbot Thorsten Leemhuis
2022-05-16 12:51 ` Thorsten Leemhuis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CA+wXwBSyO87ZX5PVwdHm-=dBjZYECGmfnydUicUyrQqndgX2MQ@mail.gmail.com' \
--to=dqminh@cloudflare.com \
--cc=akpm@linux-foundation.org \
--cc=feng.tang@intel.com \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=hdanton@sina.com \
--cc=kernel-team@cloudflare.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mkoutny@suse.com \
--cc=shakeelb@google.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.