From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E54BC433EF for ; Wed, 23 Feb 2022 15:57:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 74DDA8D0020; Wed, 23 Feb 2022 10:57:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6F98A8D0011; Wed, 23 Feb 2022 10:57:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5C2288D0020; Wed, 23 Feb 2022 10:57:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0188.hostedemail.com [216.40.44.188]) by kanga.kvack.org (Postfix) with ESMTP id 325F48D0011 for ; Wed, 23 Feb 2022 10:57:49 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id C99239A7EE for ; Wed, 23 Feb 2022 15:57:48 +0000 (UTC) X-FDA: 79174500216.26.C0C6B6A Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf14.hostedemail.com (Postfix) with ESMTP id 4D4E210000C for ; Wed, 23 Feb 2022 15:57:48 +0000 (UTC) Received: by mail-pl1-f173.google.com with SMTP id ay3so10226073plb.1 for ; Wed, 23 Feb 2022 07:57:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Iw2Rn6hEwWrz/MbUy/nWBT4/5B/bAg/6lMZN10lmW1k=; b=VQaBd24Q1L6b6waa8eJm6VIBgsMCJQsdzfAJllEl8/swhByLRy6hDbDD9ldq6iD/0H CrgCBer+Z7qb6N7bWqoePwFK3Nbc3lTq+5Dd0V8LCxFpxcaGeS40qYfBLXYaiszLD8U5 DacBu7/xJ3Z8V7kt+K70kAE/kWXJg2Ye4rH2obB+skYLf42RX7g1L9Ss3LVyrkpDwYwg hbe3H61LMtsGkDpIlgQ6cZKZWU0wKW0P7XwZ+nYODTHVOkoG9pJT5M4WIXkPJlELLjTK VtWHDDByHmbG3QwLayYao336e5m/lUNe6neA51dAxvKykGhyOWUu7cZXP6XvSTNfh94D O+2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Iw2Rn6hEwWrz/MbUy/nWBT4/5B/bAg/6lMZN10lmW1k=; b=7uQy4iiCfZILU06W9qy/ZGFFw1n9Js1XPg25IcLb5/Mw79Cdg4ONWHcY2EEtsnOlur UYp6n0ol2A6VB8K1deOh2zovkcnz7JEf2YwCqhHS2iYwZfxeAlZ0ONS2C2faLVn/H9vp mluC2aDOfGEpO+zISvX9uQ8/m65HnjAwu2W+7Tc8IDiJ+o6LSViVoboqFPb75ii1siZS P1bm5lNV9wDb9trJPvnb93jlL6ur11uLMSTGd6PCP5BD28DvktCKwroBAnzCtHHWSRMw nm2XPTv6gb4MGjl4BkZu7NwXVuDLw9rXeGlNvsIx7JixbXrZ79cXcrPYP7Ixi0fROgFX jAKw== X-Gm-Message-State: AOAM53048VU/kyGEctWoJ7q6DjtIUJqEThicq15UIZ8REVRSVHPhsrrH dXy0ltqGyMGwEpSHXNMeLvXL6BPXaXDcDsl9DSVmxQ== X-Google-Smtp-Source: ABdhPJwpPdc7x3Zwp/bUHc/wI7LP6UeLIpv9dGcg2MUkLLX9kaw64c7/u74SCjWqwx7WJVkHZ8EffJZpPHQZOtZLz04= X-Received: by 2002:a17:903:2cb:b0:14f:4fb6:2fb0 with SMTP id s11-20020a17090302cb00b0014f4fb62fb0mr225173plk.172.1645631866705; Wed, 23 Feb 2022 07:57:46 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Shakeel Butt Date: Wed, 23 Feb 2022 07:57:35 -0800 Message-ID: Subject: Re: Regression in workingset_refault latency on 5.15 To: Daniel Dao , Ivan Babrou Cc: kernel-team , Linux MM , Johannes Weiner , Roman Gushchin , Feng Tang , Michal Hocko , Hillf Danton , =?UTF-8?Q?Michal_Koutn=C3=BD?= , Andrew Morton , Linus Torvalds Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=VQaBd24Q; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of shakeelb@google.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=shakeelb@google.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 4D4E210000C X-Stat-Signature: sgf46j5yqwk9wwcq3xudbqm4usegzhzi X-HE-Tag: 1645631868-677161 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: +Ivan Babrou Hi Daniel, On Wed, Feb 23, 2022 at 5:51 AM Daniel Dao wrote: > > Hi all, > > We are observing some regressions in workingset_refault on our newly upgraded > 5.15.19 nodes with zram as swap. This manifests in several ways: > > 1) Regression of workingset_refault duration observed in flamegraph > > We regularly collect flamegraphs for running services on the node. Since upgrade > to 5.15.19, we see that workingset_refault occupied a more significant part of > the service flamegraph (13%) with the following call trace > > workingset_refault+0x128 > add_to_page_cache_lru+0x9f > page_cache_ra_unbounded+0x154 > force_page_cache_ra+0xe2 > filemap_get_pages+0xe9 > filemap_read+0xa4 > xfs_file_buffered_read+0x98 > xfs_file_read_iter+0x6a > new_sync_read+0x118 > vfs_read+0xf2 > __x64_sys_pread64+0x89 > do_syscall_64+0x3b > entry_SYSCALL_64_after_hwframe+0x44 > > 2) Regression of userspace performance sensitive code > > We have some performance sensentive code running in userspace that have their > runtime measured by CLOCK_THREAD_CPUTIME_ID. They look roughly as: > > now = clock_gettime(CLOCK_THREAD_CPUTIME_ID) > func() > elapsed = clock_gettime(CLOCK_THREAD_CPUTIME_ID) - now > > Since 5.15 upgrade, we observed long `elapsed` in the range of 4-10ms much more > frequently than before. This went away after we disabled swap for the service > using `memory.swap.max=0` memcg configuration. > > The new thing in 5.15 workingset_refault seems to be introduction of > mem_cgroup_flush_stats() > by commit 1f828223b7991a228bc2aef837b78737946d44b2 (memcg: flush > lruvec stats in the > refault). > > Given that mem_cgroup_flush_stats can take quite a long time for us on the > standard systemd cgroupv2 hierrachy ( root / system.slice / workload.service ) > > sudo /usr/share/bcc/tools/funcslower -m 10 -t mem_cgroup_flush_stats > Tracing function calls slower than 10 ms... Ctrl+C to quit. > TIME COMM PID LAT(ms) RVAL FUNC > 0.000000 804776 11.50 200 > mem_cgroup_flush_stats > 0.343383 647496 10.58 200 > mem_cgroup_flush_stats > 0.604309 804776 10.50 200 > mem_cgroup_flush_stats > 1.230416 803293 10.01 200 > mem_cgroup_flush_stats > 1.248442 646400 11.02 200 > mem_cgroup_flush_stats > > could it be possible that workingset_refault in some unfortunate case can take > much longer than before such that it increases the time observed by > CLOCK_THREAD_CPUTIME_ID from userspace, or overall duration of > workingset_refault > observed by perf ? Ivan reported this issue last month and since then we have backported three patches to 5.15-stable to fix the issue. These got merged into 5.15.18. However you are reporting the issue is still present 5.15.19. Can you share a bit more detail on your hardware configuration (num of cpus) and if possible the flamegraph? thanks, Shakeel