From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE793C433F5 for ; Wed, 23 Feb 2022 17:07:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7C1068D002D; Wed, 23 Feb 2022 12:07:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 747088D0011; Wed, 23 Feb 2022 12:07:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5C15A8D002D; Wed, 23 Feb 2022 12:07:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 2E4C38D0011 for ; Wed, 23 Feb 2022 12:07:22 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id E3ACD23096 for ; Wed, 23 Feb 2022 17:07:21 +0000 (UTC) X-FDA: 79174675482.02.845AA2C Received: from mail-pf1-f177.google.com (mail-pf1-f177.google.com [209.85.210.177]) by imf25.hostedemail.com (Postfix) with ESMTP id 257E3A000F for ; Wed, 23 Feb 2022 17:07:20 +0000 (UTC) Received: by mail-pf1-f177.google.com with SMTP id y11so15935280pfa.6 for ; Wed, 23 Feb 2022 09:07:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=BLwR2Akx/IGBRPkNjyXJzPC9JMyrW8o1IEDH9zP9Abg=; b=jGNbWxVB0uPMdJw2p1PiW4SJhX9m42vl2EZEIZxFyrzsZK179wN8jasNI5nIHctyI6 jwDtGLldOSv5lnKI0bplvOeR6qg88sfTAm+PgW0Jp1IpqKnSl3SiUVa9ydb1yXD1A1sI LW1VUlL+F0xgnuFbo8w/ht8rcISKmTONl0mQw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=BLwR2Akx/IGBRPkNjyXJzPC9JMyrW8o1IEDH9zP9Abg=; b=b+uOBPl7sTpqcbWFuSOTTNWNZfgWS8woEGAyKIgU/4Aisqy6o7lNu2+/pq4aQXuysU puJpXjz6JM53S0aa+nehtJZuSTWd167fyQrWXBFDrQPk9F9Eu4j/MJVBhWv1PXMMSZt9 x8CXHYKMxDgiSUs0ey21aGlpO92PnbhY7qL1k1rIG5GEl8F82TOf52Slk9gMdEMfGEGm pUmuZLv/ebiZfxiE7fKxsAf1N/L7Z6U/TPB3lwYIxFi+WmA1nzFME6xJYVutlfsJvWXc 5J393fwizXwJrNKcZ6Y6rR5tw2QmJDnpBXtUxoBmGZkbQ8BgcbCcapWDkQei0GNUFmGq BHWw== X-Gm-Message-State: AOAM530rMuDRyQjzBLdOBipb/NCfxq8o3PpuSLajMd4QLrKBoVXNTTyI 8oYCEHbpi+Fa8jTHcNATqUOvS1QTgv0Gdvb2+Y3AFA== X-Google-Smtp-Source: ABdhPJxI/B8XX8Do8l+5zxReh+PP5XjMk1UM/79IOLikziuGCRBxBXKNc/WAZtaKbgydmyg4mEONkLIGN5xshGqFgjg= X-Received: by 2002:a63:490c:0:b0:372:c378:6686 with SMTP id w12-20020a63490c000000b00372c3786686mr391155pga.295.1645636039663; Wed, 23 Feb 2022 09:07:19 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Daniel Dao Date: Wed, 23 Feb 2022 17:07:08 +0000 Message-ID: Subject: Re: Regression in workingset_refault latency on 5.15 To: Shakeel Butt Cc: Ivan Babrou , kernel-team , Linux MM , Johannes Weiner , Roman Gushchin , Feng Tang , Michal Hocko , Hillf Danton , =?UTF-8?Q?Michal_Koutn=C3=BD?= , Andrew Morton , Linus Torvalds Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=cloudflare.com header.s=google header.b=jGNbWxVB; dmarc=pass (policy=reject) header.from=cloudflare.com; spf=none (imf25.hostedemail.com: domain of dqminh@cloudflare.com has no SPF policy when checking 209.85.210.177) smtp.mailfrom=dqminh@cloudflare.com X-Rspam-User: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 257E3A000F X-Stat-Signature: 68mn5amy1rm4ifnjcuoxxgqtjmjuoyzc X-HE-Tag: 1645636040-303765 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Feb 23, 2022 at 4:00 PM Shakeel Butt wrote: > > Can you share a bit more detail on your hardware configuration (num of > > cpus) and if possible the flamegraph? > > We have a mix of 96 and 128 cpus. I'm not yet sure if it's possible to share the flamegraphs. We may have to come back to that later if necessary. > > Also if you can reproduce the issue, can you try the patch at > https://lore.kernel.org/all/20210929235936.2859271-1-shakeelb@google.com/ > ? We can give it a try. I also wrote a bpftrace script to get the kernel stack when we encounter slow mem_cgroup_flush_stats ( with 10ms as threshold ) kprobe:mem_cgroup_flush_stats { @start[tid] = nsecs; @stack[tid] = kstack; } kretprobe:mem_cgroup_flush_stats /@start[tid]/ { $usecs = (nsecs - @start[tid]) / 1000; if ($usecs >= 10000) { printf("mem_cgroup_flush_stats: %d us\n", $usecs); printf("stack: %s\n", @stack[tid]); } delete(@start[tid]); delete(@stack[tid]); } END { clear(@start); clear(@stack); } Running it on a production node yields output like mem_cgroup_flush_stats: 10697 us stack: mem_cgroup_flush_stats+1 workingset_refault+296 add_to_page_cache_lru+159 page_cache_ra_unbounded+340 force_page_cache_ra+226 filemap_get_pages+233 filemap_read+164 xfs_file_buffered_read+152 xfs_file_read_iter+106 new_sync_read+277 vfs_read+242 __x64_sys_pread64+137 do_syscall_64+56 entry_SYSCALL_64_after_hwframe+68 I think the addition of many milliseconds on workingset_refault is too high.