From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from wp530.webpack.hosteurope.de (wp530.webpack.hosteurope.de [80.237.130.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C8256A52 for ; Mon, 16 May 2022 12:51:34 +0000 (UTC) Received: from [2a02:8108:963f:de38:eca4:7d19:f9a2:22c5]; authenticated by wp530.webpack.hosteurope.de running ExIM with esmtpsa (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) id 1nqaCC-00014f-Qn; Mon, 16 May 2022 14:51:32 +0200 Message-ID: <4bbafeff-5640-40f9-f0e6-43c9d4fb83a3@leemhuis.info> Date: Mon, 16 May 2022 14:51:32 +0200 Precedence: bulk X-Mailing-List: regressions@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.0 Subject: Re: Regression in workingset_refault latency on 5.15 #forregzbot Content-Language: en-US From: Thorsten Leemhuis To: "regressions@lists.linux.dev" References: <8137bfa9-dc0a-8aaa-f280-6d611ad8d490@leemhuis.info> In-Reply-To: <8137bfa9-dc0a-8aaa-f280-6d611ad8d490@leemhuis.info> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-bounce-key: webpack.hosteurope.de;regressions@leemhuis.info;1652705494;2a99ea08; X-HE-SMSGID: 1nqaCC-00014f-Qn On 11.04.22 12:17, Thorsten Leemhuis wrote: > TWIMC: this mail is primarily send for documentation purposes and for > regzbot, my Linux kernel regression tracking bot. These mails usually > contain '#forregzbot' in the subject, to make them easy to spot and filter. > > #regzbot backburner: patch in mm, but it needs to be tested properly and > the issue is not urgend #regzbot fixed-by: 9b3016154c913b2e7 > On 24.02.22 10:22, Thorsten Leemhuis wrote: >> [TLDR: I'm adding below regression report below to regzbot, the Linux >> kernel regression tracking bot; all text you find below is compiled from >> a few templates paragraphs you might have encountered already already >> from similar mails.] >> >> Hi, this is your Linux kernel regression tracker speaking. >> >> Thanks for the report. CCing the regression mailing list, as it should >> be in the loop for all regressions, as explained here: >> https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html >> >> To be sure this issue doesn't fall through the cracks unnoticed, I'm >> adding it to regzbot, my Linux kernel regression tracking bot: >> >> #regzbot ^introduced v3.0..v5.15.19 >> #regzbot ignore-activity >> >> Reminder for developers: when fixing the issue, please add a 'Link:' >> tags pointing to the report (the mail quoted below) using >> lore.kernel.org/r/, as explained in >> 'Documentation/process/submitting-patches.rst' and >> 'Documentation/process/5.Posting.rst'. This allows the bot to connect >> the report with any patches posted or committed to fix the issue; this >> again allows the bot to show the current status of regressions and >> automatically resolve the issue when the fix hits the right tree. >> >> I'm sending this to everyone that got the initial report, to make them >> aware of the tracking. I also hope that messages like this motivate >> people to directly get at least the regression mailing list and ideally >> even regzbot involved when dealing with regressions, as messages like >> this wouldn't be needed then. >> >> Don't worry, I'll send further messages wrt to this regression just to >> the lists (with a tag in the subject so people can filter them away), if >> they are relevant just for regzbot. With a bit of luck no such messages >> will be needed anyway. >> >> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) >> >> P.S.: As the Linux kernel's regression tracker I'm getting a lot of >> reports on my table. I can only look briefly into most of them and lack >> knowledge about most of the areas they concern. I thus unfortunately >> will sometimes get things wrong or miss something important. I hope >> that's not the case here; if you think it is, don't hesitate to tell me >> in a public reply, it's in everyone's interest to set the public record >> straight. >> >> On 23.02.22 14:51, Daniel Dao wrote: >>> Hi all, >>> >>> We are observing some regressions in workingset_refault on our newly upgraded >>> 5.15.19 nodes with zram as swap. This manifests in several ways: >>> >>> 1) Regression of workingset_refault duration observed in flamegraph >>> >>> We regularly collect flamegraphs for running services on the node. Since upgrade >>> to 5.15.19, we see that workingset_refault occupied a more significant part of >>> the service flamegraph (13%) with the following call trace >>> >>> workingset_refault+0x128 >>> add_to_page_cache_lru+0x9f >>> page_cache_ra_unbounded+0x154 >>> force_page_cache_ra+0xe2 >>> filemap_get_pages+0xe9 >>> filemap_read+0xa4 >>> xfs_file_buffered_read+0x98 >>> xfs_file_read_iter+0x6a >>> new_sync_read+0x118 >>> vfs_read+0xf2 >>> __x64_sys_pread64+0x89 >>> do_syscall_64+0x3b >>> entry_SYSCALL_64_after_hwframe+0x44 >>> >>> 2) Regression of userspace performance sensitive code >>> >>> We have some performance sensentive code running in userspace that have their >>> runtime measured by CLOCK_THREAD_CPUTIME_ID. They look roughly as: >>> >>> now = clock_gettime(CLOCK_THREAD_CPUTIME_ID) >>> func() >>> elapsed = clock_gettime(CLOCK_THREAD_CPUTIME_ID) - now >>> >>> Since 5.15 upgrade, we observed long `elapsed` in the range of 4-10ms much more >>> frequently than before. This went away after we disabled swap for the service >>> using `memory.swap.max=0` memcg configuration. >>> >>> The new thing in 5.15 workingset_refault seems to be introduction of >>> mem_cgroup_flush_stats() >>> by commit 1f828223b7991a228bc2aef837b78737946d44b2 (memcg: flush >>> lruvec stats in the >>> refault). >>> >>> Given that mem_cgroup_flush_stats can take quite a long time for us on the >>> standard systemd cgroupv2 hierrachy ( root / system.slice / workload.service ) >>> >>> sudo /usr/share/bcc/tools/funcslower -m 10 -t mem_cgroup_flush_stats >>> Tracing function calls slower than 10 ms... Ctrl+C to quit. >>> TIME COMM PID LAT(ms) RVAL FUNC >>> 0.000000 804776 11.50 200 >>> mem_cgroup_flush_stats >>> 0.343383 647496 10.58 200 >>> mem_cgroup_flush_stats >>> 0.604309 804776 10.50 200 >>> mem_cgroup_flush_stats >>> 1.230416 803293 10.01 200 >>> mem_cgroup_flush_stats >>> 1.248442 646400 11.02 200 >>> mem_cgroup_flush_stats >>> >>> could it be possible that workingset_refault in some unfortunate case can take >>> much longer than before such that it increases the time observed by >>> CLOCK_THREAD_CPUTIME_ID from userspace, or overall duration of >>> workingset_refault >>> observed by perf ? >>> >> >>