From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F31AC433EF for ; Thu, 24 Feb 2022 17:34:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AFB308D0002; Thu, 24 Feb 2022 12:34:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AAB948D0001; Thu, 24 Feb 2022 12:34:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 971BA8D0002; Thu, 24 Feb 2022 12:34:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0249.hostedemail.com [216.40.44.249]) by kanga.kvack.org (Postfix) with ESMTP id 84FAF8D0001 for ; Thu, 24 Feb 2022 12:34:23 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 46F039A7E8 for ; Thu, 24 Feb 2022 17:34:23 +0000 (UTC) X-FDA: 79178372406.24.5771F7C Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) by imf06.hostedemail.com (Postfix) with ESMTP id 9D559180008 for ; Thu, 24 Feb 2022 17:34:22 +0000 (UTC) Received: by mail-pl1-f178.google.com with SMTP id m11so2382800pls.5 for ; Thu, 24 Feb 2022 09:34:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=WIEa4hQvGeGCXRCnejNv22vHFhHN0e52Vaph2UfVzrs=; b=JQfHUjym6L+pf66WAt68Q2XgMQgH/1B/XFWOk11fCqxHdOYqt0sx1VyYZgWcls+OqE O0FdW2Jb6bD/8DqeexFAY7DKpKvIgQ47AiQLRp3a8BzqALuTnd/9kEWhFsHfL2RlcEHL 2UzLTckD4gGN0+ZOzTwb2N5HpoAy6obkEAuHY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=WIEa4hQvGeGCXRCnejNv22vHFhHN0e52Vaph2UfVzrs=; b=1a+5u5w3MfX90voCeVG4ruIe9pfHUkja9YQnhm4NoGK3cUiRKYZnEMMnBJq6BhElkN Po9jEW55+K8TmFni1okVK/MXf/qRNBDTJwWN/JYa5dqUIYfgalGaRMj8cnz6JUv5oYrG wP7M+fF/ZI3DC73rrwYG6hjnpJdoUHWfLK8IRltozYxyfWLxpeEs3roUlv6EYUfi20Gt hWH/HAo8jGzQz1c0EKHHOydAsU5n+4Zs39hAe6eFiYSnLGRMRRjdBMIJbXw5s9HxzlIw gaBY/RkoCogwKAMUqYshmuxXi3jfVy0AV8+YO20z48Lyeh3hKZ7DQdHjbstAgQefotj2 T++w== X-Gm-Message-State: AOAM5335rRn4sAjUiNPv+FmBObVwtc+FogN5wC2yuQReJ19e57uBl66+ NDevJ5CDlmZLs668NaHG3zbdC/+oyiQ27IHbb25cSQ== X-Google-Smtp-Source: ABdhPJzRzgZGJl5B2uQwBdi4bSmRPw5WtRui2Yahsq7vsqMwYYLBWdLiLgUXZXIbxnqPBDufBOa2ZDJWeUz9b9dZ3+8= X-Received: by 2002:a17:902:8ec7:b0:14a:c442:8ca2 with SMTP id x7-20020a1709028ec700b0014ac4428ca2mr3877829plo.12.1645724061482; Thu, 24 Feb 2022 09:34:21 -0800 (PST) MIME-Version: 1.0 References: <20220224165838.oir5clpkkqpstpx3@google.com> In-Reply-To: <20220224165838.oir5clpkkqpstpx3@google.com> From: Daniel Dao Date: Thu, 24 Feb 2022 17:34:10 +0000 Message-ID: Subject: Re: Regression in workingset_refault latency on 5.15 To: Shakeel Butt Cc: Ivan Babrou , kernel-team , Linux MM , Johannes Weiner , Roman Gushchin , Feng Tang , Michal Hocko , Hillf Danton , =?UTF-8?Q?Michal_Koutn=C3=BD?= , Andrew Morton , Linus Torvalds Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 9D559180008 X-Stat-Signature: sf371xjre86a8ia3ht49c7q11isc85pk Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=cloudflare.com header.s=google header.b=JQfHUjym; dmarc=pass (policy=reject) header.from=cloudflare.com; spf=none (imf06.hostedemail.com: domain of dqminh@cloudflare.com has no SPF policy when checking 209.85.214.178) smtp.mailfrom=dqminh@cloudflare.com X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1645724062-602760 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Feb 24, 2022 at 4:58 PM Shakeel Butt wrote: > > On Thu, Feb 24, 2022 at 02:46:27PM +0000, Daniel Dao wrote: > > [...] > > > > 3) Summary of stack traces when mem_cgroup_flush_stats is over 5ms > > > Can you please check if flush_memcg_stats_dwork() appears in any stack > traces at all? Here is the result of probes on flush_memcg_stats_dwork: $ sudo /usr/share/bcc/tools/funccount -d 30 flush_memcg_stats_dwork Tracing 1 functions for "b'flush_memcg_stats_dwork'"... Hit Ctrl-C to end. FUNC COUNT b'flush_memcg_stats_dwork' 14 sudo /usr/share/bcc/tools/funclatency -d 30 flush_memcg_stats_dwork Tracing 1 functions for "flush_memcg_stats_dwork"... Hit Ctrl-C to end. nsecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 0 | | 128 -> 255 : 0 | | 256 -> 511 : 0 | | 512 -> 1023 : 0 | | 1024 -> 2047 : 0 | | 2048 -> 4095 : 0 | | 4096 -> 8191 : 8 |****************************************| 8192 -> 16383 : 0 | | 16384 -> 32767 : 0 | | 32768 -> 65535 : 0 | | 65536 -> 131071 : 0 | | 131072 -> 262143 : 0 | | 262144 -> 524287 : 0 | | 524288 -> 1048575 : 0 | | 1048576 -> 2097151 : 1 |***** | 2097152 -> 4194303 : 4 |******************** | 4194304 -> 8388607 : 2 |********** | avg = 1725693 nsecs, total: 25885397 nsecs, count: 15 So we triggered the async flush as expected, around every 2 seconds. But they mostly run faster than the inline call from workingset_refault. I think on busy servers with varied workloads that touch swap/page_cache, it's very likely that most of the cost is in inline mem_cgroup_flush_stats() of workingset_refault rather than from async flush. > Thanks for testing. At the moment I am suspecting the async worker is > not getting the CPU. Can you share your CONFIG_HZ setting? Also can you > try the following patch and see if that helps otherwise keep halving the > delay (i.e. 2HZ -> HZ -> HZ/2 -> ...) and find at what value the issue > you are seeing get resolved? We have CONFIG_HZ=1000. We can try to increase the frequency of async flush, but that seems like a not great bandaid. Is it possible to remove mem_cgroup_flush_stats() from workingset_refault, or at least scope it down to some targeted cgroup so we don't need to flush from root with potentially large sets of cgroups to walk ?