All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yu Zhao <yuzhao@google.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andi Kleen <ak@linux.intel.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Hillf Danton <hdanton@sina.com>, Jens Axboe <axboe@kernel.dk>,
	Jesse Barnes <jsbarnes@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Matthew Wilcox <willy@infradead.org>,
	Mel Gorman <mgorman@suse.de>,
	Michael Larabel <Michael@michaellarabel.com>,
	Rik van Riel <riel@surriel.com>, Vlastimil Babka <vbabka@suse.cz>,
	Will Deacon <will@kernel.org>, Ying Huang <ying.huang@intel.com>,
	linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	page-reclaim@google.com, x86@kernel.org,
	Konstantin Kharlamov <Hi-Angel@yandex.ru>
Subject: Re: [PATCH v6 6/9] mm: multigenerational lru: aging
Date: Thu, 13 Jan 2022 02:43:38 -0700	[thread overview]
Message-ID: <Yd/0Sgxy+jLm5cqd@google.com> (raw)
In-Reply-To: <YdxEqFPLDf+wI0xX@dhcp22.suse.cz>

On Mon, Jan 10, 2022 at 03:37:28PM +0100, Michal Hocko wrote:
> On Sun 09-01-22 20:58:02, Yu Zhao wrote:
> > On Fri, Jan 07, 2022 at 10:00:31AM +0100, Michal Hocko wrote:
> > > On Fri 07-01-22 09:55:09, Michal Hocko wrote:
> > > [...]
> > > > > In this case, lru_gen_mm_walk is small (160 bytes); it's per direct
> > > > > reclaimer; and direct reclaimers rarely come here, i.e., only when
> > > > > kswapd can't keep up in terms of the aging, which is similar to the
> > > > > condition where the inactive list is empty for the active/inactive
> > > > > lru.
> > > > 
> > > > Well, this is not a strong argument to be honest. Kswapd being stuck
> > > > and the majority of the reclaim being done in the direct reclaim
> > > > context is a situation I have seen many many times.
> > > 
> > > Also do not forget that memcg reclaim is effectivelly only direct
> > > reclaim. Not that the memcg reclaim indicates a global memory shortage
> > > but it can add up and race with the global reclaim as well.
> > 
> > I don't dispute any of the above, and I probably don't like this code
> > more than you do.
> > 
> > But let's not forget the purposes of PF_MEMALLOC, besides preventing
> > recursive reclaims, include letting reclaim dip into reserves so that
> > it can make more free memory. So I think it's acceptable if the
> > following conditions are met:
> > 1. The allocation size is small.
> > 2. The number of allocations is bounded.
> > 3. Its failure doesn't stall reclaim.
> > And it'd be nice if
> > 4. The allocation happens rarely, e.g., slow path only.
> 
> I would add 
>   0. The allocation should be done only if absolutely _necessary_.
> 
> Please keep in mind that whatever you allocate from that context will be
> consuming a very precious memory reserves which are shared with other
> components of the system. Even worse these can go all the way to
> depleting memory completely where other things can fall apart.

I agree but I also see a distinction:
   1,2,3 are objective;
   0,4 are subjective.

For some users, page reclaim itself could be not absolutely necessary
because they are okay with OOM kills. But for others, the situation
could be reversed.

> > The code in question meets all of them.
> > 
> > 1. This allocation is 160 bytes.
> > 2. It's bounded by the number of page table walkers which, in the
> >    worst, is same as the number of mm_struct's.
> > 3. Most importantly, its failure doesn't stall the aging. The aging
> >    will fallback to the rmap-based function lru_gen_look_around().
> >    But this function only gathers the accessed bit from at most 64
> >    PTEs, meaning it's less efficient (retains ~80% performance gains).
> > 4. This allocation is rare, i.e., only when the aging is required,
> >    which is similar to the low inactive case for the active/inactive
> >    lru.
> 
> I think this fallback behavior deserves much more detailed explanation
> in changelogs.

Will do.

> > The bottom line is I can try various optimizations, e.g., preallocate
> > a few buffers for a limited number of page walkers and if this number
> > has been reached, fallback to the rmap-based function. But I have yet
> > to see evidence that calls for additional complexity.
> 
> I would disagree here. This is not an optimization. You should be
> avoiding allocations from the memory reclaim because any allocation just
> add a runtime behavior complexity and potential corner cases.

Would __GFP_NOMEMALLOC address your concern? It prevents allocations
from accessing the reserves even under PF_MEMALLOC.

WARNING: multiple messages have this Message-ID (diff)
From: Yu Zhao <yuzhao@google.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andi Kleen <ak@linux.intel.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Hillf Danton <hdanton@sina.com>, Jens Axboe <axboe@kernel.dk>,
	Jesse Barnes <jsbarnes@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Matthew Wilcox <willy@infradead.org>,
	Mel Gorman <mgorman@suse.de>,
	Michael Larabel <Michael@michaellarabel.com>,
	Rik van Riel <riel@surriel.com>, Vlastimil Babka <vbabka@suse.cz>,
	Will Deacon <will@kernel.org>, Ying Huang <ying.huang@intel.com>,
	linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	page-reclaim@google.com, x86@kernel.org,
	Konstantin Kharlamov <Hi-Angel@yandex.ru>
Subject: Re: [PATCH v6 6/9] mm: multigenerational lru: aging
Date: Thu, 13 Jan 2022 02:43:38 -0700	[thread overview]
Message-ID: <Yd/0Sgxy+jLm5cqd@google.com> (raw)
In-Reply-To: <YdxEqFPLDf+wI0xX@dhcp22.suse.cz>

On Mon, Jan 10, 2022 at 03:37:28PM +0100, Michal Hocko wrote:
> On Sun 09-01-22 20:58:02, Yu Zhao wrote:
> > On Fri, Jan 07, 2022 at 10:00:31AM +0100, Michal Hocko wrote:
> > > On Fri 07-01-22 09:55:09, Michal Hocko wrote:
> > > [...]
> > > > > In this case, lru_gen_mm_walk is small (160 bytes); it's per direct
> > > > > reclaimer; and direct reclaimers rarely come here, i.e., only when
> > > > > kswapd can't keep up in terms of the aging, which is similar to the
> > > > > condition where the inactive list is empty for the active/inactive
> > > > > lru.
> > > > 
> > > > Well, this is not a strong argument to be honest. Kswapd being stuck
> > > > and the majority of the reclaim being done in the direct reclaim
> > > > context is a situation I have seen many many times.
> > > 
> > > Also do not forget that memcg reclaim is effectivelly only direct
> > > reclaim. Not that the memcg reclaim indicates a global memory shortage
> > > but it can add up and race with the global reclaim as well.
> > 
> > I don't dispute any of the above, and I probably don't like this code
> > more than you do.
> > 
> > But let's not forget the purposes of PF_MEMALLOC, besides preventing
> > recursive reclaims, include letting reclaim dip into reserves so that
> > it can make more free memory. So I think it's acceptable if the
> > following conditions are met:
> > 1. The allocation size is small.
> > 2. The number of allocations is bounded.
> > 3. Its failure doesn't stall reclaim.
> > And it'd be nice if
> > 4. The allocation happens rarely, e.g., slow path only.
> 
> I would add 
>   0. The allocation should be done only if absolutely _necessary_.
> 
> Please keep in mind that whatever you allocate from that context will be
> consuming a very precious memory reserves which are shared with other
> components of the system. Even worse these can go all the way to
> depleting memory completely where other things can fall apart.

I agree but I also see a distinction:
   1,2,3 are objective;
   0,4 are subjective.

For some users, page reclaim itself could be not absolutely necessary
because they are okay with OOM kills. But for others, the situation
could be reversed.

> > The code in question meets all of them.
> > 
> > 1. This allocation is 160 bytes.
> > 2. It's bounded by the number of page table walkers which, in the
> >    worst, is same as the number of mm_struct's.
> > 3. Most importantly, its failure doesn't stall the aging. The aging
> >    will fallback to the rmap-based function lru_gen_look_around().
> >    But this function only gathers the accessed bit from at most 64
> >    PTEs, meaning it's less efficient (retains ~80% performance gains).
> > 4. This allocation is rare, i.e., only when the aging is required,
> >    which is similar to the low inactive case for the active/inactive
> >    lru.
> 
> I think this fallback behavior deserves much more detailed explanation
> in changelogs.

Will do.

> > The bottom line is I can try various optimizations, e.g., preallocate
> > a few buffers for a limited number of page walkers and if this number
> > has been reached, fallback to the rmap-based function. But I have yet
> > to see evidence that calls for additional complexity.
> 
> I would disagree here. This is not an optimization. You should be
> avoiding allocations from the memory reclaim because any allocation just
> add a runtime behavior complexity and potential corner cases.

Would __GFP_NOMEMALLOC address your concern? It prevents allocations
from accessing the reserves even under PF_MEMALLOC.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-01-13  9:43 UTC|newest]

Thread overview: 223+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-04 20:22 [PATCH v6 0/9] Multigenerational LRU Framework Yu Zhao
2022-01-04 20:22 ` Yu Zhao
2022-01-04 20:22 ` [PATCH v6 1/9] mm: x86, arm64: add arch_has_hw_pte_young() Yu Zhao
2022-01-04 20:22   ` Yu Zhao
2022-01-05 10:45   ` Will Deacon
2022-01-05 10:45     ` Will Deacon
2022-01-05 20:47     ` Yu Zhao
2022-01-05 20:47       ` Yu Zhao
2022-01-06 10:30       ` Will Deacon
2022-01-06 10:30         ` Will Deacon
2022-01-07  7:25         ` Yu Zhao
2022-01-07  7:25           ` Yu Zhao
2022-01-11 14:19           ` Will Deacon
2022-01-11 14:19             ` Will Deacon
2022-01-11 22:27             ` Yu Zhao
2022-01-11 22:27               ` Yu Zhao
2022-01-04 20:22 ` [PATCH v6 2/9] mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG Yu Zhao
2022-01-04 20:22   ` Yu Zhao
2022-01-04 21:24   ` Linus Torvalds
2022-01-04 21:24     ` Linus Torvalds
2022-01-04 20:22 ` [PATCH v6 3/9] mm/vmscan.c: refactor shrink_node() Yu Zhao
2022-01-04 20:22   ` Yu Zhao
2022-01-04 20:22 ` [PATCH v6 4/9] mm: multigenerational lru: groundwork Yu Zhao
2022-01-04 20:22   ` Yu Zhao
2022-01-04 21:34   ` Linus Torvalds
2022-01-04 21:34     ` Linus Torvalds
2022-01-11  8:16   ` Aneesh Kumar K.V
2022-01-11  8:16     ` Aneesh Kumar K.V
2022-01-12  2:16     ` Yu Zhao
2022-01-12  2:16       ` Yu Zhao
2022-01-04 20:22 ` [PATCH v6 5/9] mm: multigenerational lru: mm_struct list Yu Zhao
2022-01-04 20:22   ` Yu Zhao
2022-01-07  9:06   ` Michal Hocko
2022-01-07  9:06     ` Michal Hocko
2022-01-08  0:19     ` Yu Zhao
2022-01-08  0:19       ` Yu Zhao
2022-01-10 15:21       ` Michal Hocko
2022-01-10 15:21         ` Michal Hocko
2022-01-12  8:08         ` Yu Zhao
2022-01-12  8:08           ` Yu Zhao
2022-01-04 20:22 ` [PATCH v6 6/9] mm: multigenerational lru: aging Yu Zhao
2022-01-04 20:22   ` Yu Zhao
2022-01-06 16:06   ` Michal Hocko
2022-01-06 16:06     ` Michal Hocko
2022-01-06 21:27     ` Yu Zhao
2022-01-06 21:27       ` Yu Zhao
2022-01-07  8:43       ` Michal Hocko
2022-01-07  8:43         ` Michal Hocko
2022-01-07 21:12         ` Yu Zhao
2022-01-07 21:12           ` Yu Zhao
2022-01-06 16:12   ` Michal Hocko
2022-01-06 16:12     ` Michal Hocko
2022-01-06 21:41     ` Yu Zhao
2022-01-06 21:41       ` Yu Zhao
2022-01-07  8:55       ` Michal Hocko
2022-01-07  8:55         ` Michal Hocko
2022-01-07  9:00         ` Michal Hocko
2022-01-07  9:00           ` Michal Hocko
2022-01-10  3:58           ` Yu Zhao
2022-01-10  3:58             ` Yu Zhao
2022-01-10 14:37             ` Michal Hocko
2022-01-10 14:37               ` Michal Hocko
2022-01-13  9:43               ` Yu Zhao [this message]
2022-01-13  9:43                 ` Yu Zhao
2022-01-13 12:02                 ` Michal Hocko
2022-01-13 12:02                   ` Michal Hocko
2022-01-19  6:31                   ` Yu Zhao
2022-01-19  6:31                     ` Yu Zhao
2022-01-19  9:44                     ` Michal Hocko
2022-01-19  9:44                       ` Michal Hocko
2022-01-10 15:01     ` Michal Hocko
2022-01-10 15:01       ` Michal Hocko
2022-01-10 16:01       ` Vlastimil Babka
2022-01-10 16:01         ` Vlastimil Babka
2022-01-10 16:25         ` Michal Hocko
2022-01-10 16:25           ` Michal Hocko
2022-01-11 23:16       ` Yu Zhao
2022-01-11 23:16         ` Yu Zhao
2022-01-12 10:28         ` Michal Hocko
2022-01-12 10:28           ` Michal Hocko
2022-01-13  9:25           ` Yu Zhao
2022-01-13  9:25             ` Yu Zhao
2022-01-07 13:11   ` Michal Hocko
2022-01-07 13:11     ` Michal Hocko
2022-01-07 23:36     ` Yu Zhao
2022-01-07 23:36       ` Yu Zhao
2022-01-10 15:35       ` Michal Hocko
2022-01-10 15:35         ` Michal Hocko
2022-01-11  1:18         ` Yu Zhao
2022-01-11  1:18           ` Yu Zhao
2022-01-11  9:00           ` Michal Hocko
2022-01-11  9:00             ` Michal Hocko
     [not found]         ` <1641900108.61dd684cb0e59@mail.inbox.lv>
2022-01-11 12:15           ` Michal Hocko
2022-01-11 12:15             ` Michal Hocko
2022-01-13 17:00             ` Alexey Avramov
2022-01-13 17:00               ` Alexey Avramov
2022-01-11 14:22         ` Alexey Avramov
2022-01-11 14:22           ` Alexey Avramov
2022-01-07 14:44   ` Michal Hocko
2022-01-07 14:44     ` Michal Hocko
2022-01-10  4:47     ` Yu Zhao
2022-01-10  4:47       ` Yu Zhao
2022-01-10 10:54       ` Michal Hocko
2022-01-10 10:54         ` Michal Hocko
2022-01-19  7:04         ` Yu Zhao
2022-01-19  7:04           ` Yu Zhao
2022-01-19  9:42           ` Michal Hocko
2022-01-19  9:42             ` Michal Hocko
2022-01-23 21:28             ` Yu Zhao
2022-01-23 21:28               ` Yu Zhao
2022-01-24 14:01               ` Michal Hocko
2022-01-24 14:01                 ` Michal Hocko
2022-01-10 16:57   ` Michal Hocko
2022-01-10 16:57     ` Michal Hocko
2022-01-12  1:01     ` Yu Zhao
2022-01-12  1:01       ` Yu Zhao
2022-01-12 10:17       ` Michal Hocko
2022-01-12 10:17         ` Michal Hocko
2022-01-12 23:43         ` Yu Zhao
2022-01-12 23:43           ` Yu Zhao
2022-01-13 11:57           ` Michal Hocko
2022-01-13 11:57             ` Michal Hocko
2022-01-23 21:40             ` Yu Zhao
2022-01-23 21:40               ` Yu Zhao
2022-01-04 20:22 ` [PATCH v6 7/9] mm: multigenerational lru: eviction Yu Zhao
2022-01-04 20:22   ` Yu Zhao
2022-01-11 10:37   ` Aneesh Kumar K.V
2022-01-11 10:37     ` Aneesh Kumar K.V
2022-01-12  8:05     ` Yu Zhao
2022-01-12  8:05       ` Yu Zhao
2022-01-04 20:22 ` [PATCH v6 8/9] mm: multigenerational lru: user interface Yu Zhao
2022-01-04 20:22   ` Yu Zhao
2022-01-10 10:27   ` Mike Rapoport
2022-01-10 10:27     ` Mike Rapoport
2022-01-12  8:35     ` Yu Zhao
2022-01-12  8:35       ` Yu Zhao
2022-01-12 10:31       ` Michal Hocko
2022-01-12 10:31         ` Michal Hocko
2022-01-12 15:45       ` Mike Rapoport
2022-01-12 15:45         ` Mike Rapoport
2022-01-13  9:47         ` Yu Zhao
2022-01-13  9:47           ` Yu Zhao
2022-01-13 10:31   ` Aneesh Kumar K.V
2022-01-13 10:31     ` Aneesh Kumar K.V
2022-01-13 23:02     ` Yu Zhao
2022-01-13 23:02       ` Yu Zhao
2022-01-14  5:20       ` Aneesh Kumar K.V
2022-01-14  5:20         ` Aneesh Kumar K.V
2022-01-14  6:50         ` Yu Zhao
2022-01-14  6:50           ` Yu Zhao
2022-01-04 20:22 ` [PATCH v6 9/9] mm: multigenerational lru: Kconfig Yu Zhao
2022-01-04 20:22   ` Yu Zhao
2022-01-04 21:39   ` Linus Torvalds
2022-01-04 21:39     ` Linus Torvalds
2022-01-04 20:22 ` [PATCH v6 0/9] Multigenerational LRU Framework Yu Zhao
2022-01-04 20:30 ` Yu Zhao
2022-01-04 20:30   ` Yu Zhao
2022-01-04 21:43   ` Linus Torvalds
2022-01-04 21:43     ` Linus Torvalds
2022-01-05 21:12     ` Yu Zhao
2022-01-05 21:12       ` Yu Zhao
2022-01-07  9:38   ` Michal Hocko
2022-01-07  9:38     ` Michal Hocko
2022-01-07 18:45     ` Yu Zhao
2022-01-07 18:45       ` Yu Zhao
2022-01-10 15:39       ` Michal Hocko
2022-01-10 15:39         ` Michal Hocko
2022-01-10 22:04         ` Yu Zhao
2022-01-10 22:04           ` Yu Zhao
2022-01-10 22:46           ` Jesse Barnes
2022-01-10 22:46             ` Jesse Barnes
2022-01-11  1:41             ` Linus Torvalds
2022-01-11  1:41               ` Linus Torvalds
2022-01-11 10:40             ` Michal Hocko
2022-01-11 10:40               ` Michal Hocko
2022-01-11  8:41   ` Yu Zhao
2022-01-11  8:41     ` Yu Zhao
2022-01-11  8:53     ` Holger Hoffstätte
2022-01-11  8:53       ` Holger Hoffstätte
2022-01-11  9:26     ` Jan Alexander Steffens (heftig)
2022-01-11 16:04     ` Shuang Zhai
2022-01-11 16:04       ` Shuang Zhai
2022-01-12  1:46     ` Suleiman Souhlal
2022-01-12  1:46       ` Suleiman Souhlal
2022-01-12  6:07     ` Sofia Trinh
2022-01-12  6:07       ` Sofia Trinh
2022-01-12 16:17       ` Daniel Byrne
2022-01-18  9:21     ` Yu Zhao
2022-01-18  9:21       ` Yu Zhao
2022-01-18  9:36     ` Donald Carr
2022-01-18  9:36       ` Donald Carr
2022-01-19 20:19     ` Steven Barrett
2022-01-19 20:19       ` Steven Barrett
2022-01-19 22:25     ` Brian Geffon
2022-01-19 22:25       ` Brian Geffon
2022-01-05  2:44 ` Shuang Zhai
2022-01-05  2:44   ` Shuang Zhai
2022-01-05  8:55 ` SeongJae Park
2022-01-05  8:55   ` SeongJae Park
2022-01-05 10:53   ` Yu Zhao
2022-01-05 10:53     ` Yu Zhao
2022-01-05 11:12     ` Borislav Petkov
2022-01-05 11:12       ` Borislav Petkov
2022-01-05 11:25     ` SeongJae Park
2022-01-05 11:25       ` SeongJae Park
2022-01-05 21:06       ` Yu Zhao
2022-01-05 21:06         ` Yu Zhao
2022-01-10 14:49 ` Alexey Avramov
2022-01-10 14:49   ` Alexey Avramov
2022-01-11 10:24 ` Alexey Avramov
2022-01-11 10:24   ` Alexey Avramov
2022-01-12 20:56 ` Oleksandr Natalenko
2022-01-12 20:56   ` Oleksandr Natalenko
2022-01-13  8:59   ` Yu Zhao
2022-01-13  8:59     ` Yu Zhao
2022-01-23  5:43 ` Barry Song
2022-01-23  5:43   ` Barry Song
2022-01-25  6:48   ` Yu Zhao
2022-01-25  6:48     ` Yu Zhao
2022-01-28  8:54     ` Barry Song
2022-01-28  8:54       ` Barry Song
2022-02-08  9:16       ` Yu Zhao
2022-02-08  9:16         ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yd/0Sgxy+jLm5cqd@google.com \
    --to=yuzhao@google.com \
    --cc=Hi-Angel@yandex.ru \
    --cc=Michael@michaellarabel.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=jsbarnes@google.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=page-reclaim@google.com \
    --cc=riel@surriel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.