From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from wp530.webpack.hosteurope.de (wp530.webpack.hosteurope.de [80.237.130.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B2ACE1363 for ; Mon, 11 Apr 2022 10:17:55 +0000 (UTC) Received: from ip4d144895.dynamic.kabel-deutschland.de ([77.20.72.149] helo=[192.168.66.200]); authenticated by wp530.webpack.hosteurope.de running ExIM with esmtpsa (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) id 1ndr7J-0007JY-II; Mon, 11 Apr 2022 12:17:53 +0200 Message-ID: <8137bfa9-dc0a-8aaa-f280-6d611ad8d490@leemhuis.info> Date: Mon, 11 Apr 2022 12:17:53 +0200 Precedence: bulk X-Mailing-List: regressions@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: Regression in workingset_refault latency on 5.15 #forregzbot Content-Language: en-US From: Thorsten Leemhuis To: "regressions@lists.linux.dev" References: In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-bounce-key: webpack.hosteurope.de;regressions@leemhuis.info;1649672275;4a68e665; X-HE-SMSGID: 1ndr7J-0007JY-II TWIMC: this mail is primarily send for documentation purposes and for regzbot, my Linux kernel regression tracking bot. These mails usually contain '#forregzbot' in the subject, to make them easy to spot and filter. #regzbot backburner: patch in mm, but it needs to be tested properly and the issue is not urgend On 24.02.22 10:22, Thorsten Leemhuis wrote: > [TLDR: I'm adding below regression report below to regzbot, the Linux > kernel regression tracking bot; all text you find below is compiled from > a few templates paragraphs you might have encountered already already > from similar mails.] > > Hi, this is your Linux kernel regression tracker speaking. > > Thanks for the report. CCing the regression mailing list, as it should > be in the loop for all regressions, as explained here: > https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html > > To be sure this issue doesn't fall through the cracks unnoticed, I'm > adding it to regzbot, my Linux kernel regression tracking bot: > > #regzbot ^introduced v3.0..v5.15.19 > #regzbot ignore-activity > > Reminder for developers: when fixing the issue, please add a 'Link:' > tags pointing to the report (the mail quoted below) using > lore.kernel.org/r/, as explained in > 'Documentation/process/submitting-patches.rst' and > 'Documentation/process/5.Posting.rst'. This allows the bot to connect > the report with any patches posted or committed to fix the issue; this > again allows the bot to show the current status of regressions and > automatically resolve the issue when the fix hits the right tree. > > I'm sending this to everyone that got the initial report, to make them > aware of the tracking. I also hope that messages like this motivate > people to directly get at least the regression mailing list and ideally > even regzbot involved when dealing with regressions, as messages like > this wouldn't be needed then. > > Don't worry, I'll send further messages wrt to this regression just to > the lists (with a tag in the subject so people can filter them away), if > they are relevant just for regzbot. With a bit of luck no such messages > will be needed anyway. > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > > P.S.: As the Linux kernel's regression tracker I'm getting a lot of > reports on my table. I can only look briefly into most of them and lack > knowledge about most of the areas they concern. I thus unfortunately > will sometimes get things wrong or miss something important. I hope > that's not the case here; if you think it is, don't hesitate to tell me > in a public reply, it's in everyone's interest to set the public record > straight. > > On 23.02.22 14:51, Daniel Dao wrote: >> Hi all, >> >> We are observing some regressions in workingset_refault on our newly upgraded >> 5.15.19 nodes with zram as swap. This manifests in several ways: >> >> 1) Regression of workingset_refault duration observed in flamegraph >> >> We regularly collect flamegraphs for running services on the node. Since upgrade >> to 5.15.19, we see that workingset_refault occupied a more significant part of >> the service flamegraph (13%) with the following call trace >> >> workingset_refault+0x128 >> add_to_page_cache_lru+0x9f >> page_cache_ra_unbounded+0x154 >> force_page_cache_ra+0xe2 >> filemap_get_pages+0xe9 >> filemap_read+0xa4 >> xfs_file_buffered_read+0x98 >> xfs_file_read_iter+0x6a >> new_sync_read+0x118 >> vfs_read+0xf2 >> __x64_sys_pread64+0x89 >> do_syscall_64+0x3b >> entry_SYSCALL_64_after_hwframe+0x44 >> >> 2) Regression of userspace performance sensitive code >> >> We have some performance sensentive code running in userspace that have their >> runtime measured by CLOCK_THREAD_CPUTIME_ID. They look roughly as: >> >> now = clock_gettime(CLOCK_THREAD_CPUTIME_ID) >> func() >> elapsed = clock_gettime(CLOCK_THREAD_CPUTIME_ID) - now >> >> Since 5.15 upgrade, we observed long `elapsed` in the range of 4-10ms much more >> frequently than before. This went away after we disabled swap for the service >> using `memory.swap.max=0` memcg configuration. >> >> The new thing in 5.15 workingset_refault seems to be introduction of >> mem_cgroup_flush_stats() >> by commit 1f828223b7991a228bc2aef837b78737946d44b2 (memcg: flush >> lruvec stats in the >> refault). >> >> Given that mem_cgroup_flush_stats can take quite a long time for us on the >> standard systemd cgroupv2 hierrachy ( root / system.slice / workload.service ) >> >> sudo /usr/share/bcc/tools/funcslower -m 10 -t mem_cgroup_flush_stats >> Tracing function calls slower than 10 ms... Ctrl+C to quit. >> TIME COMM PID LAT(ms) RVAL FUNC >> 0.000000 804776 11.50 200 >> mem_cgroup_flush_stats >> 0.343383 647496 10.58 200 >> mem_cgroup_flush_stats >> 0.604309 804776 10.50 200 >> mem_cgroup_flush_stats >> 1.230416 803293 10.01 200 >> mem_cgroup_flush_stats >> 1.248442 646400 11.02 200 >> mem_cgroup_flush_stats >> >> could it be possible that workingset_refault in some unfortunate case can take >> much longer than before such that it increases the time observed by >> CLOCK_THREAD_CPUTIME_ID from userspace, or overall duration of >> workingset_refault >> observed by perf ? >> > >