linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yu Zhao <yuzhao@google.com>
To: Barry Song <21cnbao@gmail.com>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>,
	"Will Deacon" <will@kernel.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	Linux-MM <linux-mm@kvack.org>, "Andi Kleen" <ak@linux.intel.com>,
	"Aneesh Kumar" <aneesh.kumar@linux.ibm.com>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Hillf Danton" <hdanton@sina.com>, "Jens Axboe" <axboe@kernel.dk>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Mel Gorman" <mgorman@suse.de>,
	"Michael Larabel" <Michael@michaellarabel.com>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Tejun Heo" <tj@kernel.org>, "Vlastimil Babka" <vbabka@suse.cz>,
	LAK <linux-arm-kernel@lists.infradead.org>,
	"Linux Doc Mailing List" <linux-doc@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>, x86 <x86@kernel.org>,
	"Kernel Page Reclaim v2" <page-reclaim@google.com>,
	"Brian Geffon" <bgeffon@google.com>,
	"Jan Alexander Steffens" <heftig@archlinux.org>,
	"Oleksandr Natalenko" <oleksandr@natalenko.name>,
	"Steven Barrett" <steven@liquorix.net>,
	"Suleiman Souhlal" <suleiman@google.com>,
	"Daniel Byrne" <djbyrne@mtu.edu>,
	"Donald Carr" <d@chaos-reins.com>,
	"Holger Hoffstätte" <holger@applied-asynchrony.com>,
	"Konstantin Kharlamov" <Hi-Angel@yandex.ru>,
	"Shuang Zhai" <szhai2@cs.rochester.edu>,
	"Sofia Trinh" <sofia.trinh@edi.works>,
	"Vaibhav Jain" <vaibhav@linux.ibm.com>,
	huzhanyuan@oppo.com
Subject: Re: [PATCH v11 07/14] mm: multi-gen LRU: exploit locality in rmap
Date: Thu, 16 Jun 2022 21:03:31 -0600	[thread overview]
Message-ID: <CAOUHufYq81_1HAnTU84md5xr8a8msjxK3tDWmmRfLSUnY-+u+g@mail.gmail.com> (raw)
In-Reply-To: <CAGsJ_4ypWMoxUJPjYiFdwQpLOXj8STDN8dSDEQbCpuNonBBkcA@mail.gmail.com>

On Thu, Jun 16, 2022 at 8:01 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Fri, Jun 17, 2022 at 1:43 PM Yu Zhao <yuzhao@google.com> wrote:
> >
> > On Thu, Jun 16, 2022 at 5:29 PM Yu Zhao <yuzhao@google.com> wrote:
> > >
> > > On Thu, Jun 16, 2022 at 4:33 PM Barry Song <21cnbao@gmail.com> wrote:
> > > >
> > > > On Fri, Jun 17, 2022 at 9:56 AM Yu Zhao <yuzhao@google.com> wrote:
> > > > >
> > > > > On Wed, Jun 8, 2022 at 4:46 PM Barry Song <21cnbao@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, Jun 9, 2022 at 3:52 AM Linus Torvalds
> > > > > > <torvalds@linux-foundation.org> wrote:
> > > > > > >
> > > > > > > On Tue, Jun 7, 2022 at 5:43 PM Barry Song <21cnbao@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Given we used to have a flush for clear pte young in LRU, right now we are
> > > > > > > > moving to nop in almost all cases for the flush unless the address becomes
> > > > > > > > young exactly after look_around and before ptep_clear_flush_young_notify.
> > > > > > > > It means we are actually dropping flush. So the question is,  were we
> > > > > > > > overcautious? we actually don't need the flush at all even without mglru?
> > > > > > >
> > > > > > > We stopped flushing the TLB on A bit clears on x86 back in 2014.
> > > > > > >
> > > > > > > See commit b13b1d2d8692 ("x86/mm: In the PTE swapout page reclaim case
> > > > > > > clear the accessed bit instead of flushing the TLB").
> > > > > >
> > > > > > This is true for x86, RISC-V, powerpc and S390. but it is not true for
> > > > > > most platforms.
> > > > > >
> > > > > > There was an attempt to do the same thing in arm64:
> > > > > > https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1793830.html
> > > > > > but arm64 still sent a nosync tlbi and depent on a deferred to dsb :
> > > > > > https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1794484.html
> > > > >
> > > > > Barry, you've already answered your own question.
> > > > >
> > > > > Without commit 07509e10dcc7 arm64: pgtable: Fix pte_accessible():
> > > > >    #define pte_accessible(mm, pte)        \
> > > > >   -       (mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid_young(pte))
> > > > >   +       (mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid(pte))
> > > > >
> > > > > You missed all TLB flushes for PTEs that have gone through
> > > > > ptep_test_and_clear_young() on the reclaim path. But most of the time,
> > > > > you got away with it, only occasional app crashes:
> > > > > https://lore.kernel.org/r/CAGsJ_4w6JjuG4rn2P=d974wBOUtXUUnaZKnx+-G6a8_mSROa+Q@mail.gmail.com/
> > > > >
> > > > > Why?
> > > >
> > > > Yes. On the arm64 platform, ptep_test_and_clear_young() without flush
> > > > can cause random
> > > > App to crash.
> > > > ptep_test_and_clear_young() + flush won't have this kind of crashes though.
> > > > But after applying commit 07509e10dcc7 arm64: pgtable: Fix
> > > > pte_accessible(), on arm64,
> > > > ptep_test_and_clear_young() without flush won't cause App to crash.
> > > >
> > > > ptep_test_and_clear_young(), with flush, without commit 07509e10dcc7:   OK
> > > > ptep_test_and_clear_young(), without flush, with commit 07509e10dcc7:   OK
> > > > ptep_test_and_clear_young(), without flush, without commit 07509e10dcc7:   CRASH
> > >
> > > I agree -- my question was rhetorical :)
> > >
> > > I was trying to imply this logic:
> > > 1. We cleared the A-bit in PTEs with ptep_test_and_clear_young()
> > > 2. We missed TLB flush for those PTEs on the reclaim path, i.e., case
> > > 3 (case 1 & 2 guarantee flushes)
> > > 3. We saw crashes, but only occasionally
> > >
> > > Assuming TLB cached those PTEs, we would have seen the crashes more
> > > often, which contradicts our observation. So the conclusion is TLB
> > > didn't cache them most of the time, meaning flushing TLB just for the
> > > sake of the A-bit isn't necessary.
> > >
> > > > do you think it is safe to totally remove the flush code even for
> > > > the original
> > > > LRU?
> > >
> > > Affirmative, based on not only my words, but 3rd parties':
> > > 1. Your (indirect) observation
> > > 2. Alexander's benchmark:
> > > https://lore.kernel.org/r/BYAPR12MB271295B398729E07F31082A7CFAA0@BYAPR12MB2712.namprd12.prod.outlook.com/
> > > 3. The fundamental hardware limitation in terms of the TLB scalability
> > > (Fig. 1): https://www.usenix.org/legacy/events/osdi02/tech/full_papers/navarro/navarro.pdf
> >
> > 4. Intel's commit b13b1d2d8692 ("x86/mm: In the PTE swapout page
> > reclaim case clear the accessed bit instead of flushing the TLB")
>
> Hi Yu,
> I am going to send a RFC based on the above discussion.
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 5bcb334cd6f2..7ce6f0b6c330 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -830,7 +830,7 @@ static bool folio_referenced_one(struct folio *folio,
>                 }
>
>                 if (pvmw.pte) {
> -                       if (ptep_clear_flush_young_notify(vma, address,
> +                       if (ptep_clear_young_notify(vma, address,
>                                                 pvmw.pte)) {
>                                 /*
>                                  * Don't treat a reference through

Thanks!

This might make a difference on my 64 core Altra -- I'll test after
you post the RFC.

  reply	other threads:[~2022-06-17  3:04 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-18  1:46 [PATCH v11 00/14] Multi-Gen LRU Framework Yu Zhao
2022-05-18  1:46 ` [PATCH v11 01/14] mm: x86, arm64: add arch_has_hw_pte_young() Yu Zhao
2022-05-18  1:46 ` [PATCH v11 02/14] mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG Yu Zhao
2022-05-18  1:46 ` [PATCH v11 03/14] mm/vmscan.c: refactor shrink_node() Yu Zhao
2022-05-18  1:46 ` [PATCH v11 04/14] Revert "include/linux/mm_inline.h: fold __update_lru_size() into its sole caller" Yu Zhao
2022-05-18  1:46 ` [PATCH v11 05/14] mm: multi-gen LRU: groundwork Yu Zhao
2022-06-09  5:33   ` zhong jiang
2022-05-18  1:46 ` [PATCH v11 06/14] mm: multi-gen LRU: minimal implementation Yu Zhao
2022-06-09 12:34   ` zhong jiang
2022-06-09 14:46     ` zhong jiang
2022-05-18  1:46 ` [PATCH v11 07/14] mm: multi-gen LRU: exploit locality in rmap Yu Zhao
2022-06-06  9:25   ` Barry Song
2022-06-07  7:37     ` Barry Song
2022-06-07 10:21       ` Will Deacon
2022-06-06 22:37         ` Barry Song
2022-06-07 10:43           ` Will Deacon
2022-06-07 21:06             ` Yu Zhao
2022-06-08  0:43               ` Barry Song
2022-06-08 15:51                 ` Linus Torvalds
2022-06-08 22:45                   ` Barry Song
2022-06-16 21:55                     ` Yu Zhao
2022-06-16 22:33                       ` Barry Song
2022-06-16 23:29                         ` Yu Zhao
2022-06-17  1:42                           ` Yu Zhao
2022-06-17  2:01                             ` Barry Song
2022-06-17  3:03                               ` Yu Zhao [this message]
2022-06-17  3:17                                 ` Yu Zhao
2022-06-19 20:36                                   ` Yu Zhao
2022-06-19 21:56                                     ` Barry Song
2022-06-07 19:07       ` Yu Zhao
2022-06-08  7:48         ` Barry Song
2022-06-07 18:58     ` Yu Zhao
2022-05-18  1:46 ` [PATCH v11 08/14] mm: multi-gen LRU: support page table walks Yu Zhao
2022-05-18  1:46 ` [PATCH v11 09/14] mm: multi-gen LRU: optimize multiple memcgs Yu Zhao
2022-05-18  1:46 ` [PATCH v11 10/14] mm: multi-gen LRU: kill switch Yu Zhao
2022-05-18  1:46 ` [PATCH v11 11/14] mm: multi-gen LRU: thrashing prevention Yu Zhao
2022-05-18  1:46 ` [PATCH v11 12/14] mm: multi-gen LRU: debugfs interface Yu Zhao
2022-05-18  1:46 ` [PATCH v11 13/14] mm: multi-gen LRU: admin guide Yu Zhao
2022-05-18  1:46 ` [PATCH v11 14/14] mm: multi-gen LRU: design doc Yu Zhao
2022-05-18  2:05 ` [PATCH v11 00/14] Multi-Gen LRU Framework Jens Axboe
2022-06-07 22:47   ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOUHufYq81_1HAnTU84md5xr8a8msjxK3tDWmmRfLSUnY-+u+g@mail.gmail.com \
    --to=yuzhao@google.com \
    --cc=21cnbao@gmail.com \
    --cc=Hi-Angel@yandex.ru \
    --cc=Michael@michaellarabel.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=axboe@kernel.dk \
    --cc=bgeffon@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=d@chaos-reins.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=djbyrne@mtu.edu \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=heftig@archlinux.org \
    --cc=holger@applied-asynchrony.com \
    --cc=huzhanyuan@oppo.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=oleksandr@natalenko.name \
    --cc=page-reclaim@google.com \
    --cc=peterz@infradead.org \
    --cc=rppt@kernel.org \
    --cc=sofia.trinh@edi.works \
    --cc=steven@liquorix.net \
    --cc=suleiman@google.com \
    --cc=szhai2@cs.rochester.edu \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vaibhav@linux.ibm.com \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).