From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7C9FC433F5 for ; Wed, 2 Mar 2022 02:50:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 562068D0003; Tue, 1 Mar 2022 21:50:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4E9368D0001; Tue, 1 Mar 2022 21:50:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B1D58D0003; Tue, 1 Mar 2022 21:50:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 2CB508D0001 for ; Tue, 1 Mar 2022 21:50:27 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E760820595 for ; Wed, 2 Mar 2022 02:50:26 +0000 (UTC) X-FDA: 79197917652.08.79141CB Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf29.hostedemail.com (Postfix) with ESMTP id 5CBA1120004 for ; Wed, 2 Mar 2022 02:50:26 +0000 (UTC) Received: by mail-pl1-f201.google.com with SMTP id d14-20020a170902654e00b001518cc774d3so300878pln.2 for ; Tue, 01 Mar 2022 18:50:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=8k62KaFQKgquE58WmhyTo1WTsAoyTX+pOW9cib1bq7o=; b=Z4V2tghme/+aYGF0x6px4MSIonC6iG4OPfKZ9dlAp5hKZW+I0wSi7olilCwHeSBWRR okIRBHA99/c3FDqjhK7i4KZqpa8t+keWgXtgZQQNvZqGQBxSqYSl9qwNa+p3UBFCMguh LGTZe+lcBK1lUiyoxiNh3Jn2I9Qaxpb90HR9uOtvimX4mXsVOfhZSEthuHpevAzHJb50 blLBiNPRkqweP9C6cXT1iZTXpucVpm0+eMvf1U80DgprwJtduneJF6idraECeEhr1aoP 9HCd1Vm9a6OK/mIHMCCsXaSVgdzdBvf+bLgbdl6MgBHi9EcKGZ0/GenuEHLO1MKse3M8 qbDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=8k62KaFQKgquE58WmhyTo1WTsAoyTX+pOW9cib1bq7o=; b=SE3NKLjrIXa4pTgZFjngHdcKP0uOGBGtKssND8yEMLlpr9rxtGRGJARzhlsKxwYr6n H2hEvL0/JfGzTI64ibfXZT+Bq9ApUnjmY+UD3HZQQfyLnR9rvmVgyyaCqG+9QLZF6pwg lDcyk1nsqlnWgPBoiPxfcJS+FMWpyPRKsZgioQm31apYmustTOpP+7E5Oe0kYh39PfvO PqWL1AwwlbrS2gxoZuuOyAY4bVxECTtvdAZ1B5CsYKIo24D7Bdm9q3hTBqSNaS4jRZ9n 2zlYZOoJCLV3NJdkkvf6soDqkWY6vo/r15hnmAQMyHm/3oUcUaM8F+z4MftiOdJaohqE 6krQ== X-Gm-Message-State: AOAM531JpF3O4bMdiymQOscZFQKyhIC2kAud0IE/iceoouEzRqX9qvtj g2rIs+2atz58/MZt+Tv0nSuV3UZwuCCRsw== X-Google-Smtp-Source: ABdhPJzDnPyU9eo6dyJWYUmOUIFadIK/Bc1+AjtNC1giyvdam8tIPeIdEZQWDHx9BMcMKBai86w5P3BYdrkYMw== X-Received: from shakeelb.svl.corp.google.com ([2620:15c:2cd:202:6ae:5b49:c501:7c17]) (user=shakeelb job=sendgmr) by 2002:a17:902:f552:b0:150:11a5:5e01 with SMTP id h18-20020a170902f55200b0015011a55e01mr28106579plf.114.1646189425080; Tue, 01 Mar 2022 18:50:25 -0800 (PST) Date: Tue, 1 Mar 2022 18:50:22 -0800 In-Reply-To: Message-Id: <20220302025022.nnmpwxmkqed2icck@google.com> Mime-Version: 1.0 References: <20220224185236.qgzm3jpoz2orjfcw@google.com> <20220225180345.GD12037@blackbody.suse.cz> <20220228230949.xrmy6j2glxsoffko@google.com> Subject: Re: Regression in workingset_refault latency on 5.15 From: Shakeel Butt To: Ivan Babrou Cc: "Michal =?utf-8?Q?Koutn=C3=BD?=" , Daniel Dao , kernel-team , Linux MM , Johannes Weiner , Roman Gushchin , Feng Tang , Michal Hocko , Hillf Danton , Andrew Morton , Linus Torvalds Content-Type: text/plain; charset="UTF-8"; format=flowed; delsp=yes X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 5CBA1120004 X-Stat-Signature: 6y63st3ap81t7fbywyusuiu9d876ezqd Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Z4V2tghm; spf=pass (imf29.hostedemail.com: domain of 3cdseYggKCLoxmfpjjqglttlqj.htrqnsz2-rrp0fhp.twl@flex--shakeelb.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3cdseYggKCLoxmfpjjqglttlqj.htrqnsz2-rrp0fhp.twl@flex--shakeelb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1646189426-891617 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Mar 01, 2022 at 04:48:00PM -0800, Ivan Babrou wrote: [...] > Looks like you were right that targeted flush is not going to be as good. Thanks a lot. Can you please try the following patch (independent of other patches) as well? diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index d9b8df5ef212..499f75e066f3 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -75,29 +75,8 @@ enum mem_cgroup_events_target { MEM_CGROUP_NTARGETS, }; -struct memcg_vmstats_percpu { - /* Local (CPU and cgroup) page state & events */ - long state[MEMCG_NR_STAT]; - unsigned long events[NR_VM_EVENT_ITEMS]; - - /* Delta calculation for lockless upward propagation */ - long state_prev[MEMCG_NR_STAT]; - unsigned long events_prev[NR_VM_EVENT_ITEMS]; - - /* Cgroup1: threshold notifications & softlimit tree updates */ - unsigned long nr_page_events; - unsigned long targets[MEM_CGROUP_NTARGETS]; -}; - -struct memcg_vmstats { - /* Aggregated (CPU and subtree) page state & events */ - long state[MEMCG_NR_STAT]; - unsigned long events[NR_VM_EVENT_ITEMS]; - - /* Pending child counts during tree propagation */ - long state_pending[MEMCG_NR_STAT]; - unsigned long events_pending[NR_VM_EVENT_ITEMS]; -}; +struct memcg_vmstats_percpu; +struct memcg_vmstats; struct mem_cgroup_reclaim_iter { struct mem_cgroup *position; @@ -304,7 +283,7 @@ struct mem_cgroup { MEMCG_PADDING(_pad1_); /* memory.stat */ - struct memcg_vmstats vmstats; + struct memcg_vmstats *vmstats; /* memory.events */ atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS]; @@ -964,11 +943,7 @@ static inline void mod_memcg_state(struct mem_cgroup *memcg, local_irq_restore(flags); } -static inline unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx) -{ - return READ_ONCE(memcg->vmstats.state[idx]); -} - +unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx); static inline unsigned long lruvec_page_state(struct lruvec *lruvec, enum node_stat_item idx) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 32ba963ebf2e..c65f155c2048 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -688,6 +688,71 @@ static void flush_memcg_stats_dwork(struct work_struct *w) queue_delayed_work(system_unbound_wq, &stats_flush_dwork, 2UL*HZ); } +static const unsigned int memcg_vm_events[] = { + PGPGIN, + PGPGOUT, + PGFAULT, + PGMAJFAULT, + PGREFILL, + PGSCAN_KSWAPD, + PGSCAN_DIRECT, + PGSTEAL_KSWAPD, + PGSTEAL_DIRECT, + PGACTIVATE, + PGDEACTIVATE, + PGLAZYFREE, + PGLAZYFREED, +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + THP_FAULT_ALLOC, + THP_COLLAPSE_ALLOC, +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +}; +#define NR_MEMCG_EVENTS ARRAY_SIZE(memcg_vm_events) + +static int memcg_events_index[NR_VM_EVENT_ITEMS] __read_mostly; + +static void init_memcg_events(void) +{ + int i; + + for (i = 0; i < NR_MEMCG_EVENTS; ++i) + memcg_events_index[memcg_vm_events[i]] = i + 1; +} + +static int get_memcg_events_index(enum vm_event_item idx) +{ + return memcg_events_index[idx] - 1; +} + +struct memcg_vmstats_percpu { + /* Local (CPU and cgroup) page state & events */ + long state[MEMCG_NR_STAT]; + unsigned long events[NR_MEMCG_EVENTS]; + + /* Delta calculation for lockless upward propagation */ + long state_prev[MEMCG_NR_STAT]; + unsigned long events_prev[NR_MEMCG_EVENTS]; + + /* Cgroup1: threshold notifications & softlimit tree updates */ + unsigned long nr_page_events; + unsigned long targets[MEM_CGROUP_NTARGETS]; +}; + +struct memcg_vmstats { + /* Aggregated (CPU and subtree) page state & events */ + long state[MEMCG_NR_STAT]; + unsigned long events[NR_MEMCG_EVENTS]; + + /* Pending child counts during tree propagation */ + long state_pending[MEMCG_NR_STAT]; + unsigned long events_pending[NR_MEMCG_EVENTS]; +}; + +unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx) +{ + return READ_ONCE(memcg->vmstats->state[idx]); +} + /** * __mod_memcg_state - update cgroup memory statistics * @memcg: the memory cgroup @@ -831,25 +896,33 @@ static inline void mod_objcg_mlstate(struct obj_cgroup *objcg, void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, unsigned long count) { - if (mem_cgroup_disabled()) + int index = get_memcg_events_index(idx); + + if (mem_cgroup_disabled() || index < 0) return; - __this_cpu_add(memcg->vmstats_percpu->events[idx], count); + __this_cpu_add(memcg->vmstats_percpu->events[index], count); memcg_rstat_updated(memcg, count); } static unsigned long memcg_events(struct mem_cgroup *memcg, int event) { - return READ_ONCE(memcg->vmstats.events[event]); + int index = get_memcg_events_index(event); + + if (index < 0) + return 0; + return READ_ONCE(memcg->vmstats->events[index]); } static unsigned long memcg_events_local(struct mem_cgroup *memcg, int event) { long x = 0; int cpu; + int index = get_memcg_events_index(event); - for_each_possible_cpu(cpu) - x += per_cpu(memcg->vmstats_percpu->events[event], cpu); + if (index >= 0) + for_each_possible_cpu(cpu) + x += per_cpu(memcg->vmstats_percpu->events[index], cpu); return x; } @@ -5147,6 +5220,7 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg) for_each_node(node) free_mem_cgroup_per_node_info(memcg, node); + kfree(memcg->vmstats); free_percpu(memcg->vmstats_percpu); kfree(memcg); } @@ -5180,6 +5254,10 @@ static struct mem_cgroup *mem_cgroup_alloc(void) goto fail; } + memcg->vmstats = kzalloc(sizeof(struct memcg_vmstats), GFP_KERNEL); + if (!memcg->vmstats) + goto fail; + memcg->vmstats_percpu = alloc_percpu_gfp(struct memcg_vmstats_percpu, GFP_KERNEL_ACCOUNT); if (!memcg->vmstats_percpu) @@ -5248,6 +5326,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) page_counter_init(&memcg->kmem, &parent->kmem); page_counter_init(&memcg->tcpmem, &parent->tcpmem); } else { + init_memcg_events(); page_counter_init(&memcg->memory, NULL); page_counter_init(&memcg->swap, NULL); page_counter_init(&memcg->kmem, NULL); @@ -5400,9 +5479,9 @@ static void mem_cgroup_css_rstat_flush(struct cgroup_subsys_state *css, int cpu) * below us. We're in a per-cpu loop here and this is * a global counter, so the first cycle will get them. */ - delta = memcg->vmstats.state_pending[i]; + delta = memcg->vmstats->state_pending[i]; if (delta) - memcg->vmstats.state_pending[i] = 0; + memcg->vmstats->state_pending[i] = 0; /* Add CPU changes on this level since the last flush */ v = READ_ONCE(statc->state[i]); @@ -5415,15 +5494,15 @@ static void mem_cgroup_css_rstat_flush(struct cgroup_subsys_state *css, int cpu) continue; /* Aggregate counts on this level and propagate upwards */ - memcg->vmstats.state[i] += delta; + memcg->vmstats->state[i] += delta; if (parent) - parent->vmstats.state_pending[i] += delta; + parent->vmstats->state_pending[i] += delta; } - for (i = 0; i < NR_VM_EVENT_ITEMS; i++) { - delta = memcg->vmstats.events_pending[i]; + for (i = 0; i < NR_MEMCG_EVENTS; i++) { + delta = memcg->vmstats->events_pending[i]; if (delta) - memcg->vmstats.events_pending[i] = 0; + memcg->vmstats->events_pending[i] = 0; v = READ_ONCE(statc->events[i]); if (v != statc->events_prev[i]) { @@ -5434,9 +5513,9 @@ static void mem_cgroup_css_rstat_flush(struct cgroup_subsys_state *css, int cpu) if (!delta) continue; - memcg->vmstats.events[i] += delta; + memcg->vmstats->events[i] += delta; if (parent) - parent->vmstats.events_pending[i] += delta; + parent->vmstats->events_pending[i] += delta; } for_each_node_state(nid, N_MEMORY) { -- 2.35.1.574.g5d30c73bfb-goog