All of lore.kernel.org
 help / color / mirror / Atom feed
* Regression in workingset_refault latency on 5.15
@ 2022-02-23 13:51 Daniel Dao
  2022-02-23 15:57 ` Shakeel Butt
  2022-02-24  9:22 ` Thorsten Leemhuis
  0 siblings, 2 replies; 38+ messages in thread
From: Daniel Dao @ 2022-02-23 13:51 UTC (permalink / raw)
  To: shakeelb
  Cc: kernel-team, linux-mm, hannes, guro, feng.tang, mhocko, hdanton,
	mkoutny, akpm, torvalds

Hi all,

We are observing some regressions in workingset_refault on our newly upgraded
5.15.19 nodes with zram as swap. This manifests in several ways:

1) Regression of workingset_refault duration observed in flamegraph

We regularly collect flamegraphs for running services on the node. Since upgrade
to 5.15.19, we see that workingset_refault occupied a more significant part of
the service flamegraph (13%) with the following call trace

  workingset_refault+0x128
  add_to_page_cache_lru+0x9f
  page_cache_ra_unbounded+0x154
  force_page_cache_ra+0xe2
  filemap_get_pages+0xe9
  filemap_read+0xa4
  xfs_file_buffered_read+0x98
  xfs_file_read_iter+0x6a
  new_sync_read+0x118
  vfs_read+0xf2
  __x64_sys_pread64+0x89
  do_syscall_64+0x3b
  entry_SYSCALL_64_after_hwframe+0x44

2) Regression of userspace performance sensitive code

We have some performance sensentive code running in userspace that have their
runtime measured by CLOCK_THREAD_CPUTIME_ID. They look roughly as:

  now = clock_gettime(CLOCK_THREAD_CPUTIME_ID)
  func()
  elapsed = clock_gettime(CLOCK_THREAD_CPUTIME_ID) - now

Since 5.15 upgrade, we observed long `elapsed` in the range of 4-10ms much more
frequently than before.  This went away after we disabled swap for the service
using `memory.swap.max=0` memcg configuration.

The new thing in 5.15 workingset_refault seems to be introduction of
mem_cgroup_flush_stats()
by commit 1f828223b7991a228bc2aef837b78737946d44b2 (memcg: flush
lruvec stats in the
refault).

Given that mem_cgroup_flush_stats can take quite a long time for us on the
standard systemd cgroupv2 hierrachy ( root / system.slice / workload.service )

  sudo /usr/share/bcc/tools/funcslower -m 10 -t mem_cgroup_flush_stats
  Tracing function calls slower than 10 ms... Ctrl+C to quit.
  TIME       COMM           PID    LAT(ms)             RVAL FUNC
  0.000000   <redacted>     804776   11.50              200
mem_cgroup_flush_stats
  0.343383   <redacted>     647496   10.58              200
mem_cgroup_flush_stats
  0.604309   <redacted>     804776   10.50              200
mem_cgroup_flush_stats
  1.230416   <redacted>     803293   10.01              200
mem_cgroup_flush_stats
  1.248442   <redacted>     646400   11.02              200
mem_cgroup_flush_stats

could it be possible that workingset_refault in some unfortunate case can take
much longer than before such that it increases the time observed by
CLOCK_THREAD_CPUTIME_ID from userspace, or overall duration of
workingset_refault
observed by perf ?


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2022-05-16 12:51 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-23 13:51 Regression in workingset_refault latency on 5.15 Daniel Dao
2022-02-23 15:57 ` Shakeel Butt
2022-02-23 16:00   ` Shakeel Butt
2022-02-23 17:07     ` Daniel Dao
2022-02-23 17:36       ` Shakeel Butt
2022-02-23 19:28         ` Ivan Babrou
2022-02-23 20:28           ` Shakeel Butt
2022-02-23 21:16             ` Ivan Babrou
2022-02-24 14:46               ` Daniel Dao
2022-02-24 16:58                 ` Shakeel Butt
2022-02-24 17:34                   ` Daniel Dao
2022-02-24 18:00                     ` Shakeel Butt
2022-02-24 18:52                       ` Shakeel Butt
2022-02-25 10:23                         ` Daniel Dao
2022-02-25 17:08                           ` Ivan Babrou
2022-02-25 17:22                             ` Shakeel Butt
2022-02-25 18:03                             ` Michal Koutný
2022-02-25 18:08                               ` Ivan Babrou
2022-02-28 23:09                                 ` Shakeel Butt
2022-02-28 23:34                                   ` Ivan Babrou
2022-02-28 23:43                                     ` Shakeel Butt
2022-03-02  0:48                                     ` Ivan Babrou
2022-03-02  2:50                                       ` Shakeel Butt
2022-03-02  3:40                                         ` Ivan Babrou
2022-03-02 22:33                                           ` Ivan Babrou
2022-03-03  2:32                                             ` Shakeel Butt
2022-03-03  2:35                                             ` Shakeel Butt
2022-03-04  0:21                                               ` Ivan Babrou
2022-03-04  1:05                                                 ` Shakeel Butt
2022-03-04  1:12                                                   ` Ivan Babrou
2022-03-02 11:49                                         ` Frank Hofmann
2022-03-02 15:52                                           ` Shakeel Butt
2022-03-02 10:08                                       ` Michal Koutný
2022-03-02 15:53                                         ` Shakeel Butt
2022-03-02 17:28                                           ` Ivan Babrou
2022-02-24  9:22 ` Thorsten Leemhuis
2022-04-11 10:17   ` Regression in workingset_refault latency on 5.15 #forregzbot Thorsten Leemhuis
2022-05-16 12:51     ` Thorsten Leemhuis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.