linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yu Zhao <yuzhao@google.com>
To: Nadav Amit <nadav.amit@gmail.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	"Andi Kleen" <ak@linux.intel.com>,
	"Aneesh Kumar" <aneesh.kumar@linux.ibm.com>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Hillf Danton" <hdanton@sina.com>, "Jens Axboe" <axboe@kernel.dk>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Mel Gorman" <mgorman@suse.de>,
	"Michael Larabel" <Michael@michaellarabel.com>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Tejun Heo" <tj@kernel.org>, "Vlastimil Babka" <vbabka@suse.cz>,
	"Will Deacon" <will@kernel.org>,
	"Linux ARM" <linux-arm-kernel@lists.infradead.org>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"Linux MM" <linux-mm@kvack.org>, "X86 ML" <x86@kernel.org>,
	"Kernel Page Reclaim v2" <page-reclaim@google.com>,
	"Barry Song" <baohua@kernel.org>,
	"Brian Geffon" <bgeffon@google.com>,
	"Jan Alexander Steffens" <heftig@archlinux.org>,
	"Oleksandr Natalenko" <oleksandr@natalenko.name>,
	"Steven Barrett" <steven@liquorix.net>,
	"Suleiman Souhlal" <suleiman@google.com>,
	"Daniel Byrne" <djbyrne@mtu.edu>,
	"Donald Carr" <d@chaos-reins.com>,
	"Holger Hoffstätte" <holger@applied-asynchrony.com>,
	"Konstantin Kharlamov" <Hi-Angel@yandex.ru>,
	"Shuang Zhai" <szhai2@cs.rochester.edu>,
	"Sofia Trinh" <sofia.trinh@edi.works>,
	"Vaibhav Jain" <vaibhav@linux.ibm.com>
Subject: Re: [PATCH v14 07/14] mm: multi-gen LRU: exploit locality in rmap
Date: Thu, 1 Sep 2022 19:28:31 -0600	[thread overview]
Message-ID: <CAOUHufa+WpwP5NENgQ5jqgsVwqvK8vaayyJ4hT5071y=+ZYF6A@mail.gmail.com> (raw)
In-Reply-To: <CAOUHufZ6LGyBoPBkniB63-77r5=1POWpEWmUTESFtJo2bwbi-w@mail.gmail.com>

On Thu, Sep 1, 2022 at 7:17 PM Yu Zhao <yuzhao@google.com> wrote:
>
> On Thu, Sep 1, 2022 at 3:18 AM Nadav Amit <nadav.amit@gmail.com> wrote:
> >
> >
> >
> > > On Aug 15, 2022, at 12:13 AM, Yu Zhao <yuzhao@google.com> wrote:
> > >
> > > Searching the rmap for PTEs mapping each page on an LRU list (to test
> > > and clear the accessed bit) can be expensive because pages from
> > > different VMAs (PA space) are not cache friendly to the rmap (VA
> > > space). For workloads mostly using mapped pages, searching the rmap
> > > can incur the highest CPU cost in the reclaim path.
> >
> > Impressive work.

Thanks.

> > Sorry if my feedback is not timely.
> >
> > Just one minor point for thought, that can be left for a later cleanup.
> >
> > >
> > > +     for (i = 0, addr = start; addr != end; i++, addr += PAGE_SIZE) {
> > > +             unsigned long pfn;
> > > +
> > > +             pfn = get_pte_pfn(pte[i], pvmw->vma, addr);
> > > +             if (pfn == -1)
> > > +                     continue;
> > > +
> > > +             if (!pte_young(pte[i]))
> > > +                     continue;
> > > +
> > > +             folio = get_pfn_folio(pfn, memcg, pgdat);
> > > +             if (!folio)
> > > +                     continue;
> > > +
> > > +             if (!ptep_test_and_clear_young(pvmw->vma, addr, pte + i))
> > > +                     continue;
> > > +
> >
> > You have already checked that the PTE is old (not young) so this check
> > seems redundant.
>
> You are right, for x86, which belongs to category 1: hardware and
> OS share the same paging data structure.
>
> > I do not see a way in which the access-bit can be cleared
> > since you hold the ptl.
>
> There is also category 2: the OS paging data structure is a shadow of what
> hardware actually uses, e.g., POWER9 radix.
>
> To make both categories work, the general rule is that the OS paging
> data structure must be more strict, i.e., it can have A/D bits set
> while the hardware paging data structure may not. The opposite is not
> allowed, even for the A bit, because the A bit can also be used to
> determine whether a TLB flush is required. The Linux kernel doesn't do
> this but there are other OSes that do.
>
> For prefaulted PTEs, we generally mark them young unless
> arch_wants_old_prefaulted_pte() returns true (currently only ARMv8.2+
> do). On POWER9, we'd see those PTEs pass the first check but fail the
> second.

Because the first check (non-atomic) is allowed to fetch from the OS
paging data structure (which is more strict) while the second check
(atomic) must fetch from the hardware page data structure (which does
not have the A bit because those PTEs are preffaulted).

> > IOW, there is no need for the “if" and “continue".
> >
> > Makes me also wonder whether having a separate ptep_clear_young() can
> > slightly help, since anyhow the access-bit is more of an estimation,
> > and having a separate ptep_clear_young() can enable optimizations.
> >
> > On x86, for instance, if the PTE is dirty, we may be able to clear the
> > access-bit without an atomic operation, which should be faster.
>
> Agreed.

  reply	other threads:[~2022-09-02  1:29 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-15  7:13 [PATCH v14 00/14] Multi-Gen LRU Framework Yu Zhao
2022-08-15  7:13 ` [PATCH v14 01/14] mm: x86, arm64: add arch_has_hw_pte_young() Yu Zhao
2022-08-15  7:13 ` [PATCH v14 02/14] mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG Yu Zhao
2022-08-15  7:13 ` [PATCH v14 03/14] mm/vmscan.c: refactor shrink_node() Yu Zhao
2022-08-15  7:13 ` [PATCH v14 04/14] Revert "include/linux/mm_inline.h: fold __update_lru_size() into its sole caller" Yu Zhao
2022-08-15  7:13 ` [PATCH v14 05/14] mm: multi-gen LRU: groundwork Yu Zhao
2022-08-15  7:13 ` [PATCH v14 06/14] mm: multi-gen LRU: minimal implementation Yu Zhao
2022-08-15  7:13 ` [PATCH v14 07/14] mm: multi-gen LRU: exploit locality in rmap Yu Zhao
2022-09-01  9:18   ` Nadav Amit
2022-09-02  1:17     ` Yu Zhao
2022-09-02  1:28       ` Yu Zhao [this message]
2022-08-15  7:13 ` [PATCH v14 08/14] mm: multi-gen LRU: support page table walks Yu Zhao
2022-10-13 15:04   ` Peter Zijlstra
2022-10-19  5:51     ` Yu Zhao
2022-10-19 17:40       ` Linus Torvalds
2022-10-20 14:13         ` Peter Zijlstra
2022-10-20 17:29           ` Yu Zhao
2022-10-20 17:35           ` Linus Torvalds
2022-10-20 18:55             ` Peter Zijlstra
2022-10-21  2:10               ` Linus Torvalds
2022-10-21  3:38                 ` Matthew Wilcox
2022-10-21 16:50                   ` Linus Torvalds
2022-10-23 14:44                     ` David Gow
2022-10-23 17:55                     ` Maciej W. Rozycki
2022-10-23 18:35                       ` Linus Torvalds
2022-10-24  7:30                         ` Arnd Bergmann
2022-10-25 16:28                         ` Maciej W. Rozycki
2022-10-26 15:43                           ` Arnd Bergmann
2022-10-27 23:08                             ` Maciej W. Rozycki
2022-10-28  7:27                               ` Arnd Bergmann
2022-10-21 10:12                 ` Peter Zijlstra
2022-10-24 18:20                 ` Gareth Poole
2022-10-24 19:28                 ` Serentty
2022-08-15  7:13 ` [PATCH v14 09/14] mm: multi-gen LRU: optimize multiple memcgs Yu Zhao
2022-08-15  7:13 ` [PATCH v14 10/14] mm: multi-gen LRU: kill switch Yu Zhao
2022-08-15  7:13 ` [PATCH v14 11/14] mm: multi-gen LRU: thrashing prevention Yu Zhao
2022-08-15  7:13 ` [PATCH v14 12/14] mm: multi-gen LRU: debugfs interface Yu Zhao
2022-08-15  7:13 ` [PATCH v14 13/14] mm: multi-gen LRU: admin guide Yu Zhao
2022-08-15  9:06   ` Bagas Sanjaya
2022-08-15  9:12   ` Mike Rapoport
2022-08-17 22:46     ` Yu Zhao
2022-09-20  7:43   ` Bagas Sanjaya
2022-08-15  7:13 ` [PATCH v14 14/14] mm: multi-gen LRU: design doc Yu Zhao
2022-08-15  9:07   ` Bagas Sanjaya
2022-08-31  4:17 ` OpenWrt / MIPS benchmark with MGLRU Yu Zhao
2022-08-31 12:12   ` Arnd Bergmann
2022-08-31 15:13   ` Dave Hansen
2022-08-31 22:18   ` Yu Zhao
2022-09-12  0:08 ` [PATCH v14 00/14] Multi-Gen LRU Framework Andrew Morton
2022-09-15 17:56   ` Yu Zhao
2022-09-18 20:40     ` Yu Zhao
2022-09-18 23:47       ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOUHufa+WpwP5NENgQ5jqgsVwqvK8vaayyJ4hT5071y=+ZYF6A@mail.gmail.com' \
    --to=yuzhao@google.com \
    --cc=Hi-Angel@yandex.ru \
    --cc=Michael@michaellarabel.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=axboe@kernel.dk \
    --cc=baohua@kernel.org \
    --cc=bgeffon@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=d@chaos-reins.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=djbyrne@mtu.edu \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=heftig@archlinux.org \
    --cc=holger@applied-asynchrony.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=nadav.amit@gmail.com \
    --cc=oleksandr@natalenko.name \
    --cc=page-reclaim@google.com \
    --cc=peterz@infradead.org \
    --cc=rppt@kernel.org \
    --cc=sofia.trinh@edi.works \
    --cc=steven@liquorix.net \
    --cc=suleiman@google.com \
    --cc=szhai2@cs.rochester.edu \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vaibhav@linux.ibm.com \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).