linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Huang, Ying" <ying.huang@intel.com>
To: Yu Zhao <yuzhao@google.com>
Cc: Rik van Riel <riel@surriel.com>,
	 linux-mm@kvack.org,  Alex Shi <alex.shi@linux.alibaba.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	 Hillf Danton <hdanton@sina.com>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	 Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	 Matthew Wilcox <willy@infradead.org>,
	 Mel Gorman <mgorman@suse.de>,  Michal Hocko <mhocko@suse.com>,
	 Roman Gushchin <guro@fb.com>,  Vlastimil Babka <vbabka@suse.cz>,
	 Wei Yang <richard.weiyang@linux.alibaba.com>,
	 Yang Shi <shy828301@gmail.com>,
	linux-kernel@vger.kernel.org,  page-reclaim@google.com
Subject: Re: [PATCH v1 09/14] mm: multigenerational lru: mm_struct list
Date: Mon, 22 Mar 2021 11:13:19 +0800	[thread overview]
Message-ID: <87czvryj74.fsf@yhuang6-desk1.ccr.corp.intel.com> (raw)
In-Reply-To: <YFHeFslZ85/h3o/q@google.com> (Yu Zhao's message of "Wed, 17 Mar 2021 04:46:46 -0600")

Yu Zhao <yuzhao@google.com> writes:

> On Wed, Mar 17, 2021 at 11:37:38AM +0800, Huang, Ying wrote:
>> Yu Zhao <yuzhao@google.com> writes:
>> 
>> > On Tue, Mar 16, 2021 at 02:44:31PM +0800, Huang, Ying wrote:
>> > The scanning overhead is only one of the two major problems of the
>> > current page reclaim. The other problem is the granularity of the
>> > active/inactive (sizes). We stopped using them in making job
>> > scheduling decision a long time ago. I know another large internet
>> > company adopted a similar approach as ours, and I'm wondering how
>> > everybody else is coping with the discrepancy from those counters.
>> 
>> From intuition, the scanning overhead of the full page table scanning
>> appears higher than that of the rmap scanning for a small portion of
>> system memory.  But form your words, you think the reality is the
>> reverse?  If others concern about the overhead too, finally, I think you
>> need to prove the overhead of the page table scanning isn't too higher,
>> or even lower with more data and theory.
>
> There is a misunderstanding here. I never said anything about full
> page table scanning. And this is not how it's done in this series
> either. I guess the misunderstanding has something to do with the cold
> memory tracking you are thinking about?

If my understanding were correct, from the following code path in your
patch 10/14,

age_active_anon
  age_lru_gens
    try_walk_mm_list
      walk_mm_list
        walk_mm

So, in kswapd(), the page tables of many processes may be scanned
fully.  If the number of processes that are active are high, the
overhead may be high too.

> This series uses page tables to discover page accesses when a system
> has run out of inactive pages. Under such a situation, the system is
> very likely to have a lot of page accesses, and using the rmap is
> likely to cost a lot more because its poor memory locality compared
> with page tables.

This is the theory.  Can you verify this with more data?  Including the
CPU cycles or time spent scanning page tables?

> But, page tables can be sparse too, in terms of hot memory tracking.
> Dave has asked me to test the worst case scenario, which I'll do.
> And I'd be happy to share more data. Any specific workload you are
> interested in?

We can start with some simple workloads that are easier to be reasoned.
For example,

1. Run the workload with hot and cold pages, when the free memory
becomes lower than the low watermark, kswapd will be waken up to scan
and reclaim some cold pages.  How long will it take to do that?  It's
expected that almost all pages need to be scanned, so that page table
scanning is expected to have less overhead.  We can measure how well it
is.

2. Run the workload with hot and cold pages, if the whole working-set
cannot fit in DRAM, that is, the cold pages will be reclaimed and
swapped in regularly (for example tens MB/s).  It's expected that less
pages may be scanned with rmap, but the speed of page table scanning is
faster.

3. Run the workload with hot and cold pages, the system is
overcommitted, that is, some cold pages will be placed in swap.  But the
cold pages are cold enough, so there's almost no thrashing.  Then the
hot working-set of the workload changes, that is, some hot pages become
cold, while some cold pages becomes hot, so page reclaiming and swapin
will be triggered.

For each cases, we can use some different parameters.  And we can
measure something like the number of pages scanned, the time taken to
scan them, the number of page reclaimed and swapped in, etc.

Best Regards,
Huang, Ying


  reply	other threads:[~2021-03-22  3:13 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-13  7:57 [PATCH v1 00/14] Multigenerational LRU Yu Zhao
2021-03-13  7:57 ` [PATCH v1 01/14] include/linux/memcontrol.h: do not warn in page_memcg_rcu() if !CONFIG_MEMCG Yu Zhao
2021-03-13 15:09   ` Matthew Wilcox
2021-03-14  7:45     ` Yu Zhao
2021-03-13  7:57 ` [PATCH v1 02/14] include/linux/nodemask.h: define next_memory_node() if !CONFIG_NUMA Yu Zhao
2021-03-13  7:57 ` [PATCH v1 03/14] include/linux/huge_mm.h: define is_huge_zero_pmd() if !CONFIG_TRANSPARENT_HUGEPAGE Yu Zhao
2021-03-13  7:57 ` [PATCH v1 04/14] include/linux/cgroup.h: export cgroup_mutex Yu Zhao
2021-03-13  7:57 ` [PATCH v1 05/14] mm/swap.c: export activate_page() Yu Zhao
2021-03-13  7:57 ` [PATCH v1 06/14] mm, x86: support the access bit on non-leaf PMD entries Yu Zhao
2021-03-14 22:12   ` Zi Yan
2021-03-14 22:51     ` Matthew Wilcox
2021-03-15  0:03       ` Yu Zhao
2021-03-15  0:27         ` Zi Yan
2021-03-15  1:04           ` Yu Zhao
2021-03-14 23:22   ` Dave Hansen
2021-03-15  3:16     ` Yu Zhao
2021-03-13  7:57 ` [PATCH v1 07/14] mm/pagewalk.c: add pud_entry_post() for post-order traversals Yu Zhao
2021-03-13  7:57 ` [PATCH v1 08/14] mm/vmscan.c: refactor shrink_node() Yu Zhao
2021-03-13  7:57 ` [PATCH v1 09/14] mm: multigenerational lru: mm_struct list Yu Zhao
2021-03-15 19:40   ` Rik van Riel
2021-03-16  2:07     ` Huang, Ying
2021-03-16  3:57       ` Yu Zhao
2021-03-16  6:44         ` Huang, Ying
2021-03-16  7:56           ` Yu Zhao
2021-03-17  3:37             ` Huang, Ying
2021-03-17 10:46               ` Yu Zhao
2021-03-22  3:13                 ` Huang, Ying [this message]
2021-03-22  8:08                   ` Yu Zhao
2021-03-24  6:58                     ` Huang, Ying
2021-04-10 18:48                       ` Yu Zhao
2021-04-13  3:06                         ` Huang, Ying
2021-03-13  7:57 ` [PATCH v1 10/14] mm: multigenerational lru: core Yu Zhao
2021-03-15  2:02   ` Andi Kleen
2021-03-15  3:37     ` Yu Zhao
2021-03-13  7:57 ` [PATCH v1 11/14] mm: multigenerational lru: page activation Yu Zhao
2021-03-16 16:34   ` Matthew Wilcox
2021-03-16 21:29     ` Yu Zhao
2021-03-13  7:57 ` [PATCH v1 12/14] mm: multigenerational lru: user space interface Yu Zhao
2021-03-13 12:23   ` kernel test robot
2021-03-13  7:57 ` [PATCH v1 13/14] mm: multigenerational lru: Kconfig Yu Zhao
2021-03-13 12:53   ` kernel test robot
2021-03-13 13:36   ` kernel test robot
2021-03-13  7:57 ` [PATCH v1 14/14] mm: multigenerational lru: documentation Yu Zhao
2021-03-19  9:31   ` Alex Shi
2021-03-22  6:09     ` Yu Zhao
2021-03-14 22:48 ` [PATCH v1 00/14] Multigenerational LRU Zi Yan
2021-03-15  0:52   ` Yu Zhao
2021-03-15  1:13 ` Hillf Danton
2021-03-15  6:49   ` Yu Zhao
2021-03-15 18:00 ` Dave Hansen
2021-03-16  2:24   ` Yu Zhao
2021-03-16 14:50     ` Dave Hansen
2021-03-16 20:30       ` Yu Zhao
2021-03-16 21:14         ` Dave Hansen
2021-04-10  9:21           ` Yu Zhao
2021-04-13  3:02             ` Huang, Ying
2021-04-13 23:00               ` Yu Zhao
2021-03-15 18:38 ` Yang Shi
2021-03-16  3:38   ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87czvryj74.fsf@yhuang6-desk1.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linux.alibaba.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=page-reclaim@google.com \
    --cc=richard.weiyang@linux.alibaba.com \
    --cc=riel@surriel.com \
    --cc=shy828301@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).