From: Yu Zhao <yuzhao@google.com> To: Michal Hocko <mhocko@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org>, Linus Torvalds <torvalds@linux-foundation.org>, Andi Kleen <ak@linux.intel.com>, Catalin Marinas <catalin.marinas@arm.com>, Dave Hansen <dave.hansen@linux.intel.com>, Hillf Danton <hdanton@sina.com>, Jens Axboe <axboe@kernel.dk>, Jesse Barnes <jsbarnes@google.com>, Johannes Weiner <hannes@cmpxchg.org>, Jonathan Corbet <corbet@lwn.net>, Matthew Wilcox <willy@infradead.org>, Mel Gorman <mgorman@suse.de>, Michael Larabel <Michael@michaellarabel.com>, Rik van Riel <riel@surriel.com>, Vlastimil Babka <vbabka@suse.cz>, Will Deacon <will@kernel.org>, Ying Huang <ying.huang@intel.com>, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, page-reclaim@google.com, x86@kernel.org, Konstantin Kharlamov <Hi-Angel@yandex.ru> Subject: Re: [PATCH v6 6/9] mm: multigenerational lru: aging Date: Sun, 9 Jan 2022 21:47:57 -0700 [thread overview] Message-ID: <Ydu6fXg2FmrseQOn@google.com> (raw) In-Reply-To: <YdhR4vWdWksBALtM@dhcp22.suse.cz> On Fri, Jan 07, 2022 at 03:44:50PM +0100, Michal Hocko wrote: > On Tue 04-01-22 13:22:25, Yu Zhao wrote: > [...] > > +static void walk_mm(struct lruvec *lruvec, struct mm_struct *mm, struct lru_gen_mm_walk *walk) > > +{ > > + static const struct mm_walk_ops mm_walk_ops = { > > + .test_walk = should_skip_vma, > > + .p4d_entry = walk_pud_range, > > + }; > > + > > + int err; > > +#ifdef CONFIG_MEMCG > > + struct mem_cgroup *memcg = lruvec_memcg(lruvec); > > +#endif > > + > > + walk->next_addr = FIRST_USER_ADDRESS; > > + > > + do { > > + unsigned long start = walk->next_addr; > > + unsigned long end = mm->highest_vm_end; > > + > > + err = -EBUSY; > > + > > + rcu_read_lock(); > > +#ifdef CONFIG_MEMCG > > + if (memcg && atomic_read(&memcg->moving_account)) > > + goto contended; > > +#endif > > + if (!mmap_read_trylock(mm)) > > + goto contended; > > Have you evaluated the behavior under mmap_sem contention? I mean what > would be an effect of some mms being excluded from the walk? This path > is called from direct reclaim and we do allocate with exclusive mmap_sem > IIRC and the trylock can fail in a presence of pending writer if I am > not mistaken so even the read lock holder (e.g. an allocation from the #PF) > can bypass the walk. You are right. Here it must be a trylock; otherwise it can deadlock. I think there might be a misunderstanding: the aging doesn't exclusively rely on page table walks to gather the accessed bit. It prefers page table walks but it can also fallback to the rmap-based function, i.e., lru_gen_look_around(), which only gathers the accessed bit from at most 64 PTEs and therefore is less efficient. But it still retains about 80% of the performance gains. > Or is this considered statistically insignificant thus a theoretical > problem? Yes. People who work on the maple tree and SPF at Google expressed the same concern during the design review meeting (all stakeholders on the mailing list were also invited). So we had a counter to monitor the contention in previous versions, i.e., MM_LOCK_CONTENTION in v4 here: https://lore.kernel.org/lkml/20210818063107.2696454-8-yuzhao@google.com/ And we also combined this patchset with the SPF patchset to see if the latter makes any difference. Our conclusion was the contention is statistically insignificant to the performance under memory pressure. This can be explained by how often we create a new generation. (We only walk page tables when we create a new generation. And it's similar to the low inactive condition for the active/inactive lru.) Usually we only do so every few seconds. We'd run into problems with other parts of the kernel, e.g., lru lock contention, i/o congestion, etc. if we create more than a few generation every second.
WARNING: multiple messages have this Message-ID (diff)
From: Yu Zhao <yuzhao@google.com> To: Michal Hocko <mhocko@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org>, Linus Torvalds <torvalds@linux-foundation.org>, Andi Kleen <ak@linux.intel.com>, Catalin Marinas <catalin.marinas@arm.com>, Dave Hansen <dave.hansen@linux.intel.com>, Hillf Danton <hdanton@sina.com>, Jens Axboe <axboe@kernel.dk>, Jesse Barnes <jsbarnes@google.com>, Johannes Weiner <hannes@cmpxchg.org>, Jonathan Corbet <corbet@lwn.net>, Matthew Wilcox <willy@infradead.org>, Mel Gorman <mgorman@suse.de>, Michael Larabel <Michael@michaellarabel.com>, Rik van Riel <riel@surriel.com>, Vlastimil Babka <vbabka@suse.cz>, Will Deacon <will@kernel.org>, Ying Huang <ying.huang@intel.com>, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, page-reclaim@google.com, x86@kernel.org, Konstantin Kharlamov <Hi-Angel@yandex.ru> Subject: Re: [PATCH v6 6/9] mm: multigenerational lru: aging Date: Sun, 9 Jan 2022 21:47:57 -0700 [thread overview] Message-ID: <Ydu6fXg2FmrseQOn@google.com> (raw) In-Reply-To: <YdhR4vWdWksBALtM@dhcp22.suse.cz> On Fri, Jan 07, 2022 at 03:44:50PM +0100, Michal Hocko wrote: > On Tue 04-01-22 13:22:25, Yu Zhao wrote: > [...] > > +static void walk_mm(struct lruvec *lruvec, struct mm_struct *mm, struct lru_gen_mm_walk *walk) > > +{ > > + static const struct mm_walk_ops mm_walk_ops = { > > + .test_walk = should_skip_vma, > > + .p4d_entry = walk_pud_range, > > + }; > > + > > + int err; > > +#ifdef CONFIG_MEMCG > > + struct mem_cgroup *memcg = lruvec_memcg(lruvec); > > +#endif > > + > > + walk->next_addr = FIRST_USER_ADDRESS; > > + > > + do { > > + unsigned long start = walk->next_addr; > > + unsigned long end = mm->highest_vm_end; > > + > > + err = -EBUSY; > > + > > + rcu_read_lock(); > > +#ifdef CONFIG_MEMCG > > + if (memcg && atomic_read(&memcg->moving_account)) > > + goto contended; > > +#endif > > + if (!mmap_read_trylock(mm)) > > + goto contended; > > Have you evaluated the behavior under mmap_sem contention? I mean what > would be an effect of some mms being excluded from the walk? This path > is called from direct reclaim and we do allocate with exclusive mmap_sem > IIRC and the trylock can fail in a presence of pending writer if I am > not mistaken so even the read lock holder (e.g. an allocation from the #PF) > can bypass the walk. You are right. Here it must be a trylock; otherwise it can deadlock. I think there might be a misunderstanding: the aging doesn't exclusively rely on page table walks to gather the accessed bit. It prefers page table walks but it can also fallback to the rmap-based function, i.e., lru_gen_look_around(), which only gathers the accessed bit from at most 64 PTEs and therefore is less efficient. But it still retains about 80% of the performance gains. > Or is this considered statistically insignificant thus a theoretical > problem? Yes. People who work on the maple tree and SPF at Google expressed the same concern during the design review meeting (all stakeholders on the mailing list were also invited). So we had a counter to monitor the contention in previous versions, i.e., MM_LOCK_CONTENTION in v4 here: https://lore.kernel.org/lkml/20210818063107.2696454-8-yuzhao@google.com/ And we also combined this patchset with the SPF patchset to see if the latter makes any difference. Our conclusion was the contention is statistically insignificant to the performance under memory pressure. This can be explained by how often we create a new generation. (We only walk page tables when we create a new generation. And it's similar to the low inactive condition for the active/inactive lru.) Usually we only do so every few seconds. We'd run into problems with other parts of the kernel, e.g., lru lock contention, i/o congestion, etc. if we create more than a few generation every second. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2022-01-10 4:48 UTC|newest] Thread overview: 223+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-01-04 20:22 [PATCH v6 0/9] Multigenerational LRU Framework Yu Zhao 2022-01-04 20:22 ` Yu Zhao 2022-01-04 20:22 ` [PATCH v6 1/9] mm: x86, arm64: add arch_has_hw_pte_young() Yu Zhao 2022-01-04 20:22 ` Yu Zhao 2022-01-05 10:45 ` Will Deacon 2022-01-05 10:45 ` Will Deacon 2022-01-05 20:47 ` Yu Zhao 2022-01-05 20:47 ` Yu Zhao 2022-01-06 10:30 ` Will Deacon 2022-01-06 10:30 ` Will Deacon 2022-01-07 7:25 ` Yu Zhao 2022-01-07 7:25 ` Yu Zhao 2022-01-11 14:19 ` Will Deacon 2022-01-11 14:19 ` Will Deacon 2022-01-11 22:27 ` Yu Zhao 2022-01-11 22:27 ` Yu Zhao 2022-01-04 20:22 ` [PATCH v6 2/9] mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG Yu Zhao 2022-01-04 20:22 ` Yu Zhao 2022-01-04 21:24 ` Linus Torvalds 2022-01-04 21:24 ` Linus Torvalds 2022-01-04 20:22 ` [PATCH v6 3/9] mm/vmscan.c: refactor shrink_node() Yu Zhao 2022-01-04 20:22 ` Yu Zhao 2022-01-04 20:22 ` [PATCH v6 4/9] mm: multigenerational lru: groundwork Yu Zhao 2022-01-04 20:22 ` Yu Zhao 2022-01-04 21:34 ` Linus Torvalds 2022-01-04 21:34 ` Linus Torvalds 2022-01-11 8:16 ` Aneesh Kumar K.V 2022-01-11 8:16 ` Aneesh Kumar K.V 2022-01-12 2:16 ` Yu Zhao 2022-01-12 2:16 ` Yu Zhao 2022-01-04 20:22 ` [PATCH v6 5/9] mm: multigenerational lru: mm_struct list Yu Zhao 2022-01-04 20:22 ` Yu Zhao 2022-01-07 9:06 ` Michal Hocko 2022-01-07 9:06 ` Michal Hocko 2022-01-08 0:19 ` Yu Zhao 2022-01-08 0:19 ` Yu Zhao 2022-01-10 15:21 ` Michal Hocko 2022-01-10 15:21 ` Michal Hocko 2022-01-12 8:08 ` Yu Zhao 2022-01-12 8:08 ` Yu Zhao 2022-01-04 20:22 ` [PATCH v6 6/9] mm: multigenerational lru: aging Yu Zhao 2022-01-04 20:22 ` Yu Zhao 2022-01-06 16:06 ` Michal Hocko 2022-01-06 16:06 ` Michal Hocko 2022-01-06 21:27 ` Yu Zhao 2022-01-06 21:27 ` Yu Zhao 2022-01-07 8:43 ` Michal Hocko 2022-01-07 8:43 ` Michal Hocko 2022-01-07 21:12 ` Yu Zhao 2022-01-07 21:12 ` Yu Zhao 2022-01-06 16:12 ` Michal Hocko 2022-01-06 16:12 ` Michal Hocko 2022-01-06 21:41 ` Yu Zhao 2022-01-06 21:41 ` Yu Zhao 2022-01-07 8:55 ` Michal Hocko 2022-01-07 8:55 ` Michal Hocko 2022-01-07 9:00 ` Michal Hocko 2022-01-07 9:00 ` Michal Hocko 2022-01-10 3:58 ` Yu Zhao 2022-01-10 3:58 ` Yu Zhao 2022-01-10 14:37 ` Michal Hocko 2022-01-10 14:37 ` Michal Hocko 2022-01-13 9:43 ` Yu Zhao 2022-01-13 9:43 ` Yu Zhao 2022-01-13 12:02 ` Michal Hocko 2022-01-13 12:02 ` Michal Hocko 2022-01-19 6:31 ` Yu Zhao 2022-01-19 6:31 ` Yu Zhao 2022-01-19 9:44 ` Michal Hocko 2022-01-19 9:44 ` Michal Hocko 2022-01-10 15:01 ` Michal Hocko 2022-01-10 15:01 ` Michal Hocko 2022-01-10 16:01 ` Vlastimil Babka 2022-01-10 16:01 ` Vlastimil Babka 2022-01-10 16:25 ` Michal Hocko 2022-01-10 16:25 ` Michal Hocko 2022-01-11 23:16 ` Yu Zhao 2022-01-11 23:16 ` Yu Zhao 2022-01-12 10:28 ` Michal Hocko 2022-01-12 10:28 ` Michal Hocko 2022-01-13 9:25 ` Yu Zhao 2022-01-13 9:25 ` Yu Zhao 2022-01-07 13:11 ` Michal Hocko 2022-01-07 13:11 ` Michal Hocko 2022-01-07 23:36 ` Yu Zhao 2022-01-07 23:36 ` Yu Zhao 2022-01-10 15:35 ` Michal Hocko 2022-01-10 15:35 ` Michal Hocko 2022-01-11 1:18 ` Yu Zhao 2022-01-11 1:18 ` Yu Zhao 2022-01-11 9:00 ` Michal Hocko 2022-01-11 9:00 ` Michal Hocko [not found] ` <1641900108.61dd684cb0e59@mail.inbox.lv> 2022-01-11 12:15 ` Michal Hocko 2022-01-11 12:15 ` Michal Hocko 2022-01-13 17:00 ` Alexey Avramov 2022-01-13 17:00 ` Alexey Avramov 2022-01-11 14:22 ` Alexey Avramov 2022-01-11 14:22 ` Alexey Avramov 2022-01-07 14:44 ` Michal Hocko 2022-01-07 14:44 ` Michal Hocko 2022-01-10 4:47 ` Yu Zhao [this message] 2022-01-10 4:47 ` Yu Zhao 2022-01-10 10:54 ` Michal Hocko 2022-01-10 10:54 ` Michal Hocko 2022-01-19 7:04 ` Yu Zhao 2022-01-19 7:04 ` Yu Zhao 2022-01-19 9:42 ` Michal Hocko 2022-01-19 9:42 ` Michal Hocko 2022-01-23 21:28 ` Yu Zhao 2022-01-23 21:28 ` Yu Zhao 2022-01-24 14:01 ` Michal Hocko 2022-01-24 14:01 ` Michal Hocko 2022-01-10 16:57 ` Michal Hocko 2022-01-10 16:57 ` Michal Hocko 2022-01-12 1:01 ` Yu Zhao 2022-01-12 1:01 ` Yu Zhao 2022-01-12 10:17 ` Michal Hocko 2022-01-12 10:17 ` Michal Hocko 2022-01-12 23:43 ` Yu Zhao 2022-01-12 23:43 ` Yu Zhao 2022-01-13 11:57 ` Michal Hocko 2022-01-13 11:57 ` Michal Hocko 2022-01-23 21:40 ` Yu Zhao 2022-01-23 21:40 ` Yu Zhao 2022-01-04 20:22 ` [PATCH v6 7/9] mm: multigenerational lru: eviction Yu Zhao 2022-01-04 20:22 ` Yu Zhao 2022-01-11 10:37 ` Aneesh Kumar K.V 2022-01-11 10:37 ` Aneesh Kumar K.V 2022-01-12 8:05 ` Yu Zhao 2022-01-12 8:05 ` Yu Zhao 2022-01-04 20:22 ` [PATCH v6 8/9] mm: multigenerational lru: user interface Yu Zhao 2022-01-04 20:22 ` Yu Zhao 2022-01-10 10:27 ` Mike Rapoport 2022-01-10 10:27 ` Mike Rapoport 2022-01-12 8:35 ` Yu Zhao 2022-01-12 8:35 ` Yu Zhao 2022-01-12 10:31 ` Michal Hocko 2022-01-12 10:31 ` Michal Hocko 2022-01-12 15:45 ` Mike Rapoport 2022-01-12 15:45 ` Mike Rapoport 2022-01-13 9:47 ` Yu Zhao 2022-01-13 9:47 ` Yu Zhao 2022-01-13 10:31 ` Aneesh Kumar K.V 2022-01-13 10:31 ` Aneesh Kumar K.V 2022-01-13 23:02 ` Yu Zhao 2022-01-13 23:02 ` Yu Zhao 2022-01-14 5:20 ` Aneesh Kumar K.V 2022-01-14 5:20 ` Aneesh Kumar K.V 2022-01-14 6:50 ` Yu Zhao 2022-01-14 6:50 ` Yu Zhao 2022-01-04 20:22 ` [PATCH v6 9/9] mm: multigenerational lru: Kconfig Yu Zhao 2022-01-04 20:22 ` Yu Zhao 2022-01-04 21:39 ` Linus Torvalds 2022-01-04 21:39 ` Linus Torvalds 2022-01-04 20:22 ` [PATCH v6 0/9] Multigenerational LRU Framework Yu Zhao 2022-01-04 20:30 ` Yu Zhao 2022-01-04 20:30 ` Yu Zhao 2022-01-04 21:43 ` Linus Torvalds 2022-01-04 21:43 ` Linus Torvalds 2022-01-05 21:12 ` Yu Zhao 2022-01-05 21:12 ` Yu Zhao 2022-01-07 9:38 ` Michal Hocko 2022-01-07 9:38 ` Michal Hocko 2022-01-07 18:45 ` Yu Zhao 2022-01-07 18:45 ` Yu Zhao 2022-01-10 15:39 ` Michal Hocko 2022-01-10 15:39 ` Michal Hocko 2022-01-10 22:04 ` Yu Zhao 2022-01-10 22:04 ` Yu Zhao 2022-01-10 22:46 ` Jesse Barnes 2022-01-10 22:46 ` Jesse Barnes 2022-01-11 1:41 ` Linus Torvalds 2022-01-11 1:41 ` Linus Torvalds 2022-01-11 10:40 ` Michal Hocko 2022-01-11 10:40 ` Michal Hocko 2022-01-11 8:41 ` Yu Zhao 2022-01-11 8:41 ` Yu Zhao 2022-01-11 8:53 ` Holger Hoffstätte 2022-01-11 8:53 ` Holger Hoffstätte 2022-01-11 9:26 ` Jan Alexander Steffens (heftig) 2022-01-11 16:04 ` Shuang Zhai 2022-01-11 16:04 ` Shuang Zhai 2022-01-12 1:46 ` Suleiman Souhlal 2022-01-12 1:46 ` Suleiman Souhlal 2022-01-12 6:07 ` Sofia Trinh 2022-01-12 6:07 ` Sofia Trinh 2022-01-12 16:17 ` Daniel Byrne 2022-01-18 9:21 ` Yu Zhao 2022-01-18 9:21 ` Yu Zhao 2022-01-18 9:36 ` Donald Carr 2022-01-18 9:36 ` Donald Carr 2022-01-19 20:19 ` Steven Barrett 2022-01-19 20:19 ` Steven Barrett 2022-01-19 22:25 ` Brian Geffon 2022-01-19 22:25 ` Brian Geffon 2022-01-05 2:44 ` Shuang Zhai 2022-01-05 2:44 ` Shuang Zhai 2022-01-05 8:55 ` SeongJae Park 2022-01-05 8:55 ` SeongJae Park 2022-01-05 10:53 ` Yu Zhao 2022-01-05 10:53 ` Yu Zhao 2022-01-05 11:12 ` Borislav Petkov 2022-01-05 11:12 ` Borislav Petkov 2022-01-05 11:25 ` SeongJae Park 2022-01-05 11:25 ` SeongJae Park 2022-01-05 21:06 ` Yu Zhao 2022-01-05 21:06 ` Yu Zhao 2022-01-10 14:49 ` Alexey Avramov 2022-01-10 14:49 ` Alexey Avramov 2022-01-11 10:24 ` Alexey Avramov 2022-01-11 10:24 ` Alexey Avramov 2022-01-12 20:56 ` Oleksandr Natalenko 2022-01-12 20:56 ` Oleksandr Natalenko 2022-01-13 8:59 ` Yu Zhao 2022-01-13 8:59 ` Yu Zhao 2022-01-23 5:43 ` Barry Song 2022-01-23 5:43 ` Barry Song 2022-01-25 6:48 ` Yu Zhao 2022-01-25 6:48 ` Yu Zhao 2022-01-28 8:54 ` Barry Song 2022-01-28 8:54 ` Barry Song 2022-02-08 9:16 ` Yu Zhao 2022-02-08 9:16 ` Yu Zhao
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=Ydu6fXg2FmrseQOn@google.com \ --to=yuzhao@google.com \ --cc=Hi-Angel@yandex.ru \ --cc=Michael@michaellarabel.com \ --cc=ak@linux.intel.com \ --cc=akpm@linux-foundation.org \ --cc=axboe@kernel.dk \ --cc=catalin.marinas@arm.com \ --cc=corbet@lwn.net \ --cc=dave.hansen@linux.intel.com \ --cc=hannes@cmpxchg.org \ --cc=hdanton@sina.com \ --cc=jsbarnes@google.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-doc@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mgorman@suse.de \ --cc=mhocko@suse.com \ --cc=page-reclaim@google.com \ --cc=riel@surriel.com \ --cc=torvalds@linux-foundation.org \ --cc=vbabka@suse.cz \ --cc=will@kernel.org \ --cc=willy@infradead.org \ --cc=x86@kernel.org \ --cc=ying.huang@intel.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.