From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E587C433EF for ; Wed, 23 Feb 2022 16:00:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8D0B18D0022; Wed, 23 Feb 2022 11:00:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 87CCD8D0011; Wed, 23 Feb 2022 11:00:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 71E648D0022; Wed, 23 Feb 2022 11:00:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0200.hostedemail.com [216.40.44.200]) by kanga.kvack.org (Postfix) with ESMTP id 4C94E8D0011 for ; Wed, 23 Feb 2022 11:00:58 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id C0723181CC404 for ; Wed, 23 Feb 2022 16:00:57 +0000 (UTC) X-FDA: 79174508154.22.CF446EA Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf27.hostedemail.com (Postfix) with ESMTP id 098FC4001A for ; Wed, 23 Feb 2022 16:00:56 +0000 (UTC) Received: by mail-pl1-f182.google.com with SMTP id s1so1760172plg.12 for ; Wed, 23 Feb 2022 08:00:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=+5f2a/Sh/Yko/BZGwe1Kjeo50gPddm7kGy2awXCVltw=; b=g6nibnydriQVuO15tBPCQTt7Mh0zpDMa4NhYDwnYg3eQRpK1z5DfV1M0A0oDD4gmqO zUVDXF1Dit/ZifctAILUbrsPK05FJVs6mqav2xuLRfIrd31vMKmT+0OJqxR3lGL9aUqt D2s/0m2/LxLFGmrwfhDolPGKTSBw1HVaAp212H2Dt4unpaM3yjRY6iVNb3PNYxjuTnfs rCLP/MTOsw0Mmk1+d2+0NIxtmCnjZB25Wbl6ATZfLmCHKYT3VDtXyDSSF+Ypci3KPPNP GGOG1gZy87WQN3ukvyyQHJZXk8dacy1Hj11TGp02i0/Tu2Zr7KyoaSFdbQtoerbY03Y/ w1JQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+5f2a/Sh/Yko/BZGwe1Kjeo50gPddm7kGy2awXCVltw=; b=YifgN100BSLpbiq0HeNWl4T4lFKznB8VEv1rE/3VshK3unR5hyAxq7/2avEQv3MEl3 RGtrh24+f38qvcKjkcoXUwUVQ15uIiVDq3B/7iRylJqxpwd7tO73dxKVVXEc0CvHGuPK t9T8NExpzbwvyBXbq+jkabd5GqwxOSOzzdobPpzsv6Po3vZbGjqpnCk5SRCBqvJV/Yqj 6AZEFqawdJhgSBBwA1BSVA9kF20cNAnav5Aw6/NrJcfVbyTKv6o4b6A0ehnwHhwH6Q8D HZ0jru5a7D7MNj4ZjcWspvCvaO2aEj//73NHM8KFzpYwnqga2mJO6DR9hSxJ0DR64Ont Xj5A== X-Gm-Message-State: AOAM531cle2W4prNldBHTRrp3NUFiBcWDIJ4BrtNGcLCBhMNK7GZIk1r lUVEt1bWl0T1agRnWKiA+hFPdJHVNYZLh44NnahyQA== X-Google-Smtp-Source: ABdhPJyQmFur371kFJVFu127l6JVlLgDIJgOpZ05xM27oCcXjXsxIdShcKSbkj0rO+aTEbtZcLkpR8Hgj2CYBRUeWA4= X-Received: by 2002:a17:903:2cb:b0:14f:4fb6:2fb0 with SMTP id s11-20020a17090302cb00b0014f4fb62fb0mr238618plk.172.1645632055268; Wed, 23 Feb 2022 08:00:55 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Shakeel Butt Date: Wed, 23 Feb 2022 08:00:44 -0800 Message-ID: Subject: Re: Regression in workingset_refault latency on 5.15 To: Daniel Dao , Ivan Babrou Cc: kernel-team , Linux MM , Johannes Weiner , Roman Gushchin , Feng Tang , Michal Hocko , Hillf Danton , =?UTF-8?Q?Michal_Koutn=C3=BD?= , Andrew Morton , Linus Torvalds Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 098FC4001A X-Stat-Signature: w8pms41d9ckqai9uj1srqgnju48trmws Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=g6nibnyd; spf=pass (imf27.hostedemail.com: domain of shakeelb@google.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=shakeelb@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-HE-Tag: 1645632056-153790 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Feb 23, 2022 at 7:57 AM Shakeel Butt wrote: > > +Ivan Babrou > > Hi Daniel, > > On Wed, Feb 23, 2022 at 5:51 AM Daniel Dao wrote: > > > > Hi all, > > > > We are observing some regressions in workingset_refault on our newly upgraded > > 5.15.19 nodes with zram as swap. This manifests in several ways: > > > > 1) Regression of workingset_refault duration observed in flamegraph > > > > We regularly collect flamegraphs for running services on the node. Since upgrade > > to 5.15.19, we see that workingset_refault occupied a more significant part of > > the service flamegraph (13%) with the following call trace > > > > workingset_refault+0x128 > > add_to_page_cache_lru+0x9f > > page_cache_ra_unbounded+0x154 > > force_page_cache_ra+0xe2 > > filemap_get_pages+0xe9 > > filemap_read+0xa4 > > xfs_file_buffered_read+0x98 > > xfs_file_read_iter+0x6a > > new_sync_read+0x118 > > vfs_read+0xf2 > > __x64_sys_pread64+0x89 > > do_syscall_64+0x3b > > entry_SYSCALL_64_after_hwframe+0x44 > > > > 2) Regression of userspace performance sensitive code > > > > We have some performance sensentive code running in userspace that have their > > runtime measured by CLOCK_THREAD_CPUTIME_ID. They look roughly as: > > > > now = clock_gettime(CLOCK_THREAD_CPUTIME_ID) > > func() > > elapsed = clock_gettime(CLOCK_THREAD_CPUTIME_ID) - now > > > > Since 5.15 upgrade, we observed long `elapsed` in the range of 4-10ms much more > > frequently than before. This went away after we disabled swap for the service > > using `memory.swap.max=0` memcg configuration. > > > > The new thing in 5.15 workingset_refault seems to be introduction of > > mem_cgroup_flush_stats() > > by commit 1f828223b7991a228bc2aef837b78737946d44b2 (memcg: flush > > lruvec stats in the > > refault). > > > > Given that mem_cgroup_flush_stats can take quite a long time for us on the > > standard systemd cgroupv2 hierrachy ( root / system.slice / workload.service ) > > > > sudo /usr/share/bcc/tools/funcslower -m 10 -t mem_cgroup_flush_stats > > Tracing function calls slower than 10 ms... Ctrl+C to quit. > > TIME COMM PID LAT(ms) RVAL FUNC > > 0.000000 804776 11.50 200 > > mem_cgroup_flush_stats > > 0.343383 647496 10.58 200 > > mem_cgroup_flush_stats > > 0.604309 804776 10.50 200 > > mem_cgroup_flush_stats > > 1.230416 803293 10.01 200 > > mem_cgroup_flush_stats > > 1.248442 646400 11.02 200 > > mem_cgroup_flush_stats > > > > could it be possible that workingset_refault in some unfortunate case can take > > much longer than before such that it increases the time observed by > > CLOCK_THREAD_CPUTIME_ID from userspace, or overall duration of > > workingset_refault > > observed by perf ? > > Ivan reported this issue last month and since then we have backported > three patches to 5.15-stable to fix the issue. These got merged into > 5.15.18. However you are reporting the issue is still present 5.15.19. > > Can you share a bit more detail on your hardware configuration (num of > cpus) and if possible the flamegraph? > Also if you can reproduce the issue, can you try the patch at https://lore.kernel.org/all/20210929235936.2859271-1-shakeelb@google.com/ ?