From: Yu Zhao <yuzhao@google.com> To: Nadav Amit <nadav.amit@gmail.com> Cc: "Andrew Morton" <akpm@linux-foundation.org>, "Andi Kleen" <ak@linux.intel.com>, "Aneesh Kumar" <aneesh.kumar@linux.ibm.com>, "Catalin Marinas" <catalin.marinas@arm.com>, "Dave Hansen" <dave.hansen@linux.intel.com>, "Hillf Danton" <hdanton@sina.com>, "Jens Axboe" <axboe@kernel.dk>, "Johannes Weiner" <hannes@cmpxchg.org>, "Jonathan Corbet" <corbet@lwn.net>, "Linus Torvalds" <torvalds@linux-foundation.org>, "Matthew Wilcox" <willy@infradead.org>, "Mel Gorman" <mgorman@suse.de>, "Michael Larabel" <Michael@michaellarabel.com>, "Michal Hocko" <mhocko@kernel.org>, "Mike Rapoport" <rppt@kernel.org>, "Peter Zijlstra" <peterz@infradead.org>, "Tejun Heo" <tj@kernel.org>, "Vlastimil Babka" <vbabka@suse.cz>, "Will Deacon" <will@kernel.org>, "Linux ARM" <linux-arm-kernel@lists.infradead.org>, "open list:DOCUMENTATION" <linux-doc@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>, "Linux MM" <linux-mm@kvack.org>, "X86 ML" <x86@kernel.org>, "Kernel Page Reclaim v2" <page-reclaim@google.com>, "Barry Song" <baohua@kernel.org>, "Brian Geffon" <bgeffon@google.com>, "Jan Alexander Steffens" <heftig@archlinux.org>, "Oleksandr Natalenko" <oleksandr@natalenko.name>, "Steven Barrett" <steven@liquorix.net>, "Suleiman Souhlal" <suleiman@google.com>, "Daniel Byrne" <djbyrne@mtu.edu>, "Donald Carr" <d@chaos-reins.com>, "Holger Hoffstätte" <holger@applied-asynchrony.com>, "Konstantin Kharlamov" <Hi-Angel@yandex.ru>, "Shuang Zhai" <szhai2@cs.rochester.edu>, "Sofia Trinh" <sofia.trinh@edi.works>, "Vaibhav Jain" <vaibhav@linux.ibm.com> Subject: Re: [PATCH v14 07/14] mm: multi-gen LRU: exploit locality in rmap Date: Thu, 1 Sep 2022 19:17:26 -0600 [thread overview] Message-ID: <CAOUHufZ6LGyBoPBkniB63-77r5=1POWpEWmUTESFtJo2bwbi-w@mail.gmail.com> (raw) In-Reply-To: <0F7CF2A7-F671-4196-B8FD-F35E9556391B@gmail.com> On Thu, Sep 1, 2022 at 3:18 AM Nadav Amit <nadav.amit@gmail.com> wrote: > > > > > On Aug 15, 2022, at 12:13 AM, Yu Zhao <yuzhao@google.com> wrote: > > > > Searching the rmap for PTEs mapping each page on an LRU list (to test > > and clear the accessed bit) can be expensive because pages from > > different VMAs (PA space) are not cache friendly to the rmap (VA > > space). For workloads mostly using mapped pages, searching the rmap > > can incur the highest CPU cost in the reclaim path. > > Impressive work. Sorry if my feedback is not timely. > > Just one minor point for thought, that can be left for a later cleanup. > > > > > + for (i = 0, addr = start; addr != end; i++, addr += PAGE_SIZE) { > > + unsigned long pfn; > > + > > + pfn = get_pte_pfn(pte[i], pvmw->vma, addr); > > + if (pfn == -1) > > + continue; > > + > > + if (!pte_young(pte[i])) > > + continue; > > + > > + folio = get_pfn_folio(pfn, memcg, pgdat); > > + if (!folio) > > + continue; > > + > > + if (!ptep_test_and_clear_young(pvmw->vma, addr, pte + i)) > > + continue; > > + > > You have already checked that the PTE is old (not young) so this check > seems redundant. You are right, for x86, which belongs to category 1: hardware and OS share the same paging data structure. > I do not see a way in which the access-bit can be cleared > since you hold the ptl. There is also category 2: the OS paging data structure is a shadow of what hardware actually uses, e.g., POWER9 radix. To make both categories work, the general rule is that the OS paging data structure must be more strict, i.e., it can have A/D bits set while the hardware paging data structure may not. The opposite is not allowed, even for the A bit, because the A bit can also be used to determine whether a TLB flush is required. The Linux kernel doesn't do this but there are other OSes that do. For prefaulted PTEs, we generally mark them young unless arch_wants_old_prefaulted_pte() returns true (currently only ARMv8.2+ do). On POWER9, we'd see those PTEs pass the first check but fail the second. > IOW, there is no need for the “if" and “continue". > > Makes me also wonder whether having a separate ptep_clear_young() can > slightly help, since anyhow the access-bit is more of an estimation, > and having a separate ptep_clear_young() can enable optimizations. > > On x86, for instance, if the PTE is dirty, we may be able to clear the > access-bit without an atomic operation, which should be faster. Agreed.
WARNING: multiple messages have this Message-ID (diff)
From: Yu Zhao <yuzhao@google.com> To: Nadav Amit <nadav.amit@gmail.com> Cc: "Andrew Morton" <akpm@linux-foundation.org>, "Andi Kleen" <ak@linux.intel.com>, "Aneesh Kumar" <aneesh.kumar@linux.ibm.com>, "Catalin Marinas" <catalin.marinas@arm.com>, "Dave Hansen" <dave.hansen@linux.intel.com>, "Hillf Danton" <hdanton@sina.com>, "Jens Axboe" <axboe@kernel.dk>, "Johannes Weiner" <hannes@cmpxchg.org>, "Jonathan Corbet" <corbet@lwn.net>, "Linus Torvalds" <torvalds@linux-foundation.org>, "Matthew Wilcox" <willy@infradead.org>, "Mel Gorman" <mgorman@suse.de>, "Michael Larabel" <Michael@michaellarabel.com>, "Michal Hocko" <mhocko@kernel.org>, "Mike Rapoport" <rppt@kernel.org>, "Peter Zijlstra" <peterz@infradead.org>, "Tejun Heo" <tj@kernel.org>, "Vlastimil Babka" <vbabka@suse.cz>, "Will Deacon" <will@kernel.org>, "Linux ARM" <linux-arm-kernel@lists.infradead.org>, "open list:DOCUMENTATION" <linux-doc@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>, "Linux MM" <linux-mm@kvack.org>, "X86 ML" <x86@kernel.org>, "Kernel Page Reclaim v2" <page-reclaim@google.com>, "Barry Song" <baohua@kernel.org>, "Brian Geffon" <bgeffon@google.com>, "Jan Alexander Steffens" <heftig@archlinux.org>, "Oleksandr Natalenko" <oleksandr@natalenko.name>, "Steven Barrett" <steven@liquorix.net>, "Suleiman Souhlal" <suleiman@google.com>, "Daniel Byrne" <djbyrne@mtu.edu>, "Donald Carr" <d@chaos-reins.com>, "Holger Hoffstätte" <holger@applied-asynchrony.com>, "Konstantin Kharlamov" <Hi-Angel@yandex.ru>, "Shuang Zhai" <szhai2@cs.rochester.edu>, "Sofia Trinh" <sofia.trinh@edi.works>, "Vaibhav Jain" <vaibhav@linux.ibm.com> Subject: Re: [PATCH v14 07/14] mm: multi-gen LRU: exploit locality in rmap Date: Thu, 1 Sep 2022 19:17:26 -0600 [thread overview] Message-ID: <CAOUHufZ6LGyBoPBkniB63-77r5=1POWpEWmUTESFtJo2bwbi-w@mail.gmail.com> (raw) In-Reply-To: <0F7CF2A7-F671-4196-B8FD-F35E9556391B@gmail.com> On Thu, Sep 1, 2022 at 3:18 AM Nadav Amit <nadav.amit@gmail.com> wrote: > > > > > On Aug 15, 2022, at 12:13 AM, Yu Zhao <yuzhao@google.com> wrote: > > > > Searching the rmap for PTEs mapping each page on an LRU list (to test > > and clear the accessed bit) can be expensive because pages from > > different VMAs (PA space) are not cache friendly to the rmap (VA > > space). For workloads mostly using mapped pages, searching the rmap > > can incur the highest CPU cost in the reclaim path. > > Impressive work. Sorry if my feedback is not timely. > > Just one minor point for thought, that can be left for a later cleanup. > > > > > + for (i = 0, addr = start; addr != end; i++, addr += PAGE_SIZE) { > > + unsigned long pfn; > > + > > + pfn = get_pte_pfn(pte[i], pvmw->vma, addr); > > + if (pfn == -1) > > + continue; > > + > > + if (!pte_young(pte[i])) > > + continue; > > + > > + folio = get_pfn_folio(pfn, memcg, pgdat); > > + if (!folio) > > + continue; > > + > > + if (!ptep_test_and_clear_young(pvmw->vma, addr, pte + i)) > > + continue; > > + > > You have already checked that the PTE is old (not young) so this check > seems redundant. You are right, for x86, which belongs to category 1: hardware and OS share the same paging data structure. > I do not see a way in which the access-bit can be cleared > since you hold the ptl. There is also category 2: the OS paging data structure is a shadow of what hardware actually uses, e.g., POWER9 radix. To make both categories work, the general rule is that the OS paging data structure must be more strict, i.e., it can have A/D bits set while the hardware paging data structure may not. The opposite is not allowed, even for the A bit, because the A bit can also be used to determine whether a TLB flush is required. The Linux kernel doesn't do this but there are other OSes that do. For prefaulted PTEs, we generally mark them young unless arch_wants_old_prefaulted_pte() returns true (currently only ARMv8.2+ do). On POWER9, we'd see those PTEs pass the first check but fail the second. > IOW, there is no need for the “if" and “continue". > > Makes me also wonder whether having a separate ptep_clear_young() can > slightly help, since anyhow the access-bit is more of an estimation, > and having a separate ptep_clear_young() can enable optimizations. > > On x86, for instance, if the PTE is dirty, we may be able to clear the > access-bit without an atomic operation, which should be faster. Agreed. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2022-09-02 1:18 UTC|newest] Thread overview: 118+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-08-15 7:13 [PATCH v14 00/14] Multi-Gen LRU Framework Yu Zhao 2022-08-15 7:13 ` Yu Zhao 2022-08-15 7:13 ` [PATCH v14 01/14] mm: x86, arm64: add arch_has_hw_pte_young() Yu Zhao 2022-08-15 7:13 ` Yu Zhao 2022-08-15 7:13 ` [PATCH v14 02/14] mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG Yu Zhao 2022-08-15 7:13 ` Yu Zhao 2022-08-15 7:13 ` [PATCH v14 03/14] mm/vmscan.c: refactor shrink_node() Yu Zhao 2022-08-15 7:13 ` Yu Zhao 2022-08-15 7:13 ` [PATCH v14 04/14] Revert "include/linux/mm_inline.h: fold __update_lru_size() into its sole caller" Yu Zhao 2022-08-15 7:13 ` Yu Zhao 2022-08-15 7:13 ` [PATCH v14 05/14] mm: multi-gen LRU: groundwork Yu Zhao 2022-08-15 7:13 ` Yu Zhao 2022-08-15 7:13 ` [PATCH v14 06/14] mm: multi-gen LRU: minimal implementation Yu Zhao 2022-08-15 7:13 ` Yu Zhao 2022-08-15 7:13 ` [PATCH v14 07/14] mm: multi-gen LRU: exploit locality in rmap Yu Zhao 2022-08-15 7:13 ` Yu Zhao 2022-09-01 9:18 ` Nadav Amit 2022-09-01 9:18 ` Nadav Amit 2022-09-02 1:17 ` Yu Zhao [this message] 2022-09-02 1:17 ` Yu Zhao 2022-09-02 1:28 ` Yu Zhao 2022-09-02 1:28 ` Yu Zhao 2022-08-15 7:13 ` [PATCH v14 08/14] mm: multi-gen LRU: support page table walks Yu Zhao 2022-08-15 7:13 ` Yu Zhao 2022-10-13 15:04 ` Peter Zijlstra 2022-10-13 15:04 ` Peter Zijlstra 2022-10-19 5:51 ` Yu Zhao 2022-10-19 5:51 ` Yu Zhao 2022-10-19 17:40 ` Linus Torvalds 2022-10-19 17:40 ` Linus Torvalds 2022-10-20 14:13 ` Peter Zijlstra 2022-10-20 14:13 ` Peter Zijlstra 2022-10-20 17:29 ` Yu Zhao 2022-10-20 17:29 ` Yu Zhao 2022-10-20 17:35 ` Linus Torvalds 2022-10-20 17:35 ` Linus Torvalds 2022-10-20 18:55 ` Peter Zijlstra 2022-10-20 18:55 ` Peter Zijlstra 2022-10-21 2:10 ` Linus Torvalds 2022-10-21 2:10 ` Linus Torvalds 2022-10-21 3:38 ` Matthew Wilcox 2022-10-21 3:38 ` Matthew Wilcox 2022-10-21 16:50 ` Linus Torvalds 2022-10-21 16:50 ` Linus Torvalds 2022-10-23 14:44 ` David Gow 2022-10-23 14:44 ` David Gow 2022-10-23 17:55 ` Maciej W. Rozycki 2022-10-23 17:55 ` Maciej W. Rozycki 2022-10-23 18:35 ` Linus Torvalds 2022-10-23 18:35 ` Linus Torvalds 2022-10-24 7:30 ` Arnd Bergmann 2022-10-24 7:30 ` Arnd Bergmann 2022-10-25 16:28 ` Maciej W. Rozycki 2022-10-25 16:28 ` Maciej W. Rozycki 2022-10-26 15:43 ` Arnd Bergmann 2022-10-26 15:43 ` Arnd Bergmann 2022-10-27 23:08 ` Maciej W. Rozycki 2022-10-27 23:08 ` Maciej W. Rozycki 2022-10-28 7:27 ` Arnd Bergmann 2022-10-28 7:27 ` Arnd Bergmann 2022-10-21 10:12 ` Peter Zijlstra 2022-10-21 10:12 ` Peter Zijlstra 2022-10-24 18:20 ` Gareth Poole 2022-10-24 18:20 ` Gareth Poole 2022-10-24 19:28 ` Serentty 2022-10-24 19:28 ` Serentty 2022-08-15 7:13 ` [PATCH v14 09/14] mm: multi-gen LRU: optimize multiple memcgs Yu Zhao 2022-08-15 7:13 ` Yu Zhao 2022-08-15 7:13 ` [PATCH v14 10/14] mm: multi-gen LRU: kill switch Yu Zhao 2022-08-15 7:13 ` Yu Zhao 2022-08-15 7:13 ` [PATCH v14 11/14] mm: multi-gen LRU: thrashing prevention Yu Zhao 2022-08-15 7:13 ` Yu Zhao 2022-08-15 7:13 ` [PATCH v14 12/14] mm: multi-gen LRU: debugfs interface Yu Zhao 2022-08-15 7:13 ` Yu Zhao 2022-08-15 7:13 ` [PATCH v14 13/14] mm: multi-gen LRU: admin guide Yu Zhao 2022-08-15 7:13 ` Yu Zhao 2022-08-15 9:06 ` Bagas Sanjaya 2022-08-15 9:06 ` Bagas Sanjaya 2022-08-15 9:12 ` Mike Rapoport 2022-08-15 9:12 ` Mike Rapoport 2022-08-17 22:46 ` Yu Zhao 2022-08-17 22:46 ` Yu Zhao 2022-09-20 7:43 ` Bagas Sanjaya 2022-09-20 7:43 ` Bagas Sanjaya 2022-08-15 7:13 ` [PATCH v14 14/14] mm: multi-gen LRU: design doc Yu Zhao 2022-08-15 7:13 ` Yu Zhao 2022-08-15 9:07 ` Bagas Sanjaya 2022-08-15 9:07 ` Bagas Sanjaya 2022-08-31 4:17 ` OpenWrt / MIPS benchmark with MGLRU Yu Zhao 2022-08-31 4:17 ` Yu Zhao 2022-08-31 9:44 ` Arnd Bergmann 2022-08-31 12:12 ` Arnd Bergmann 2022-08-31 12:12 ` Arnd Bergmann 2022-08-31 15:13 ` Dave Hansen 2022-08-31 15:13 ` Dave Hansen 2022-08-31 22:18 ` Yu Zhao 2022-08-31 22:18 ` Yu Zhao 2022-09-12 0:08 ` [PATCH v14 00/14] Multi-Gen LRU Framework Andrew Morton 2022-09-12 0:08 ` Andrew Morton 2022-09-15 17:56 ` Yu Zhao 2022-09-15 17:56 ` Yu Zhao 2022-09-18 20:40 ` Yu Zhao 2022-09-18 20:40 ` Yu Zhao 2022-09-18 20:47 ` [PATCH v14-fix 01/11] mm: multi-gen LRU: update admin guide Yu Zhao 2022-09-18 20:47 ` [PATCH v14-fix 02/11] mm: multi-gen LRU: add comment in lru_gen_use_mm() Yu Zhao 2022-09-18 20:47 ` [PATCH v14-fix 03/11] mm: multi-gen LRU: warn on !ptep_test_and_clear_young() Yu Zhao 2022-09-18 23:47 ` Andrew Morton 2022-09-18 23:53 ` Yu Zhao 2022-09-18 20:47 ` [PATCH v14-fix 04/11] mm: multi-gen LRU: fix warning from __rcu Yu Zhao 2022-09-18 20:47 ` [PATCH v14-fix 05/11] mm: multi-gen LRU: fix warning from seq_is_valid() Yu Zhao 2022-09-18 20:47 ` [PATCH v14-fix 06/11] mm: multi-gen LRU: delete overcautious VM_WARN_ON_ONCE() Yu Zhao 2022-09-18 20:47 ` [PATCH v14-fix 07/11] mm: multi-gen LRU: dial down MAX_LRU_BATCH Yu Zhao 2022-09-18 20:47 ` [PATCH v14-fix 08/11] mm: multi-gen LRU: delete newline in kswapd_age_node() Yu Zhao 2022-09-18 20:47 ` [PATCH v14-fix 09/11] mm: multi-gen LRU: add comment in lru_gen_look_around() Yu Zhao 2022-09-18 20:47 ` [PATCH v14-fix 10/11] mm: multi-gen LRU: fixed long-tailed direct reclaim latency Yu Zhao 2022-09-18 20:47 ` [PATCH v14-fix 11/11] mm: multi-gen LRU: refactor get_nr_evictable() Yu Zhao 2022-09-18 23:47 ` [PATCH v14 00/14] Multi-Gen LRU Framework Andrew Morton 2022-09-18 23:47 ` Andrew Morton
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAOUHufZ6LGyBoPBkniB63-77r5=1POWpEWmUTESFtJo2bwbi-w@mail.gmail.com' \ --to=yuzhao@google.com \ --cc=Hi-Angel@yandex.ru \ --cc=Michael@michaellarabel.com \ --cc=ak@linux.intel.com \ --cc=akpm@linux-foundation.org \ --cc=aneesh.kumar@linux.ibm.com \ --cc=axboe@kernel.dk \ --cc=baohua@kernel.org \ --cc=bgeffon@google.com \ --cc=catalin.marinas@arm.com \ --cc=corbet@lwn.net \ --cc=d@chaos-reins.com \ --cc=dave.hansen@linux.intel.com \ --cc=djbyrne@mtu.edu \ --cc=hannes@cmpxchg.org \ --cc=hdanton@sina.com \ --cc=heftig@archlinux.org \ --cc=holger@applied-asynchrony.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-doc@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mgorman@suse.de \ --cc=mhocko@kernel.org \ --cc=nadav.amit@gmail.com \ --cc=oleksandr@natalenko.name \ --cc=page-reclaim@google.com \ --cc=peterz@infradead.org \ --cc=rppt@kernel.org \ --cc=sofia.trinh@edi.works \ --cc=steven@liquorix.net \ --cc=suleiman@google.com \ --cc=szhai2@cs.rochester.edu \ --cc=tj@kernel.org \ --cc=torvalds@linux-foundation.org \ --cc=vaibhav@linux.ibm.com \ --cc=vbabka@suse.cz \ --cc=will@kernel.org \ --cc=willy@infradead.org \ --cc=x86@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.