All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Dao <dqminh@cloudflare.com>
To: shakeelb@google.com
Cc: kernel-team <kernel-team@cloudflare.com>,
	linux-mm@kvack.org, hannes@cmpxchg.org,  guro@fb.com,
	feng.tang@intel.com, mhocko@kernel.org, hdanton@sina.com,
	 mkoutny@suse.com, akpm@linux-foundation.org,
	torvalds@linux-foundation.org
Subject: Regression in workingset_refault latency on 5.15
Date: Wed, 23 Feb 2022 13:51:18 +0000	[thread overview]
Message-ID: <CA+wXwBSyO87ZX5PVwdHm-=dBjZYECGmfnydUicUyrQqndgX2MQ@mail.gmail.com> (raw)

Hi all,

We are observing some regressions in workingset_refault on our newly upgraded
5.15.19 nodes with zram as swap. This manifests in several ways:

1) Regression of workingset_refault duration observed in flamegraph

We regularly collect flamegraphs for running services on the node. Since upgrade
to 5.15.19, we see that workingset_refault occupied a more significant part of
the service flamegraph (13%) with the following call trace

  workingset_refault+0x128
  add_to_page_cache_lru+0x9f
  page_cache_ra_unbounded+0x154
  force_page_cache_ra+0xe2
  filemap_get_pages+0xe9
  filemap_read+0xa4
  xfs_file_buffered_read+0x98
  xfs_file_read_iter+0x6a
  new_sync_read+0x118
  vfs_read+0xf2
  __x64_sys_pread64+0x89
  do_syscall_64+0x3b
  entry_SYSCALL_64_after_hwframe+0x44

2) Regression of userspace performance sensitive code

We have some performance sensentive code running in userspace that have their
runtime measured by CLOCK_THREAD_CPUTIME_ID. They look roughly as:

  now = clock_gettime(CLOCK_THREAD_CPUTIME_ID)
  func()
  elapsed = clock_gettime(CLOCK_THREAD_CPUTIME_ID) - now

Since 5.15 upgrade, we observed long `elapsed` in the range of 4-10ms much more
frequently than before.  This went away after we disabled swap for the service
using `memory.swap.max=0` memcg configuration.

The new thing in 5.15 workingset_refault seems to be introduction of
mem_cgroup_flush_stats()
by commit 1f828223b7991a228bc2aef837b78737946d44b2 (memcg: flush
lruvec stats in the
refault).

Given that mem_cgroup_flush_stats can take quite a long time for us on the
standard systemd cgroupv2 hierrachy ( root / system.slice / workload.service )

  sudo /usr/share/bcc/tools/funcslower -m 10 -t mem_cgroup_flush_stats
  Tracing function calls slower than 10 ms... Ctrl+C to quit.
  TIME       COMM           PID    LAT(ms)             RVAL FUNC
  0.000000   <redacted>     804776   11.50              200
mem_cgroup_flush_stats
  0.343383   <redacted>     647496   10.58              200
mem_cgroup_flush_stats
  0.604309   <redacted>     804776   10.50              200
mem_cgroup_flush_stats
  1.230416   <redacted>     803293   10.01              200
mem_cgroup_flush_stats
  1.248442   <redacted>     646400   11.02              200
mem_cgroup_flush_stats

could it be possible that workingset_refault in some unfortunate case can take
much longer than before such that it increases the time observed by
CLOCK_THREAD_CPUTIME_ID from userspace, or overall duration of
workingset_refault
observed by perf ?


             reply	other threads:[~2022-02-23 13:51 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-23 13:51 Daniel Dao [this message]
2022-02-23 15:57 ` Regression in workingset_refault latency on 5.15 Shakeel Butt
2022-02-23 16:00   ` Shakeel Butt
2022-02-23 17:07     ` Daniel Dao
2022-02-23 17:36       ` Shakeel Butt
2022-02-23 19:28         ` Ivan Babrou
2022-02-23 20:28           ` Shakeel Butt
2022-02-23 21:16             ` Ivan Babrou
2022-02-24 14:46               ` Daniel Dao
2022-02-24 16:58                 ` Shakeel Butt
2022-02-24 17:34                   ` Daniel Dao
2022-02-24 18:00                     ` Shakeel Butt
2022-02-24 18:52                       ` Shakeel Butt
2022-02-25 10:23                         ` Daniel Dao
2022-02-25 17:08                           ` Ivan Babrou
2022-02-25 17:22                             ` Shakeel Butt
2022-02-25 18:03                             ` Michal Koutný
2022-02-25 18:08                               ` Ivan Babrou
2022-02-28 23:09                                 ` Shakeel Butt
2022-02-28 23:34                                   ` Ivan Babrou
2022-02-28 23:43                                     ` Shakeel Butt
2022-03-02  0:48                                     ` Ivan Babrou
2022-03-02  2:50                                       ` Shakeel Butt
2022-03-02  3:40                                         ` Ivan Babrou
2022-03-02 22:33                                           ` Ivan Babrou
2022-03-03  2:32                                             ` Shakeel Butt
2022-03-03  2:35                                             ` Shakeel Butt
2022-03-04  0:21                                               ` Ivan Babrou
2022-03-04  1:05                                                 ` Shakeel Butt
2022-03-04  1:12                                                   ` Ivan Babrou
2022-03-02 11:49                                         ` Frank Hofmann
2022-03-02 15:52                                           ` Shakeel Butt
2022-03-02 10:08                                       ` Michal Koutný
2022-03-02 15:53                                         ` Shakeel Butt
2022-03-02 17:28                                           ` Ivan Babrou
2022-02-24  9:22 ` Thorsten Leemhuis
2022-04-11 10:17   ` Regression in workingset_refault latency on 5.15 #forregzbot Thorsten Leemhuis
2022-05-16 12:51     ` Thorsten Leemhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+wXwBSyO87ZX5PVwdHm-=dBjZYECGmfnydUicUyrQqndgX2MQ@mail.gmail.com' \
    --to=dqminh@cloudflare.com \
    --cc=akpm@linux-foundation.org \
    --cc=feng.tang@intel.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=kernel-team@cloudflare.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=shakeelb@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.