linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yu Zhao <yuzhao@google.com>
To: Barry Song <21cnbao@gmail.com>
Cc: "Konstantin Kharlamov" <Hi-Angel@yandex.ru>,
	"Michael Larabel" <Michael@michaellarabel.com>,
	"Andi Kleen" <ak@linux.intel.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
	"Jens Axboe" <axboe@kernel.dk>,
	"Brian Geffon" <bgeffon@google.com>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Donald Carr" <d@chaos-reins.com>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Daniel Byrne" <djbyrne@mtu.edu>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Hillf Danton" <hdanton@sina.com>,
	"Jan Alexander Steffens" <heftig@archlinux.org>,
	"Holger Hoffstätte" <holger@applied-asynchrony.com>,
	"Jesse Barnes" <jsbarnes@google.com>,
	"Linux ARM" <linux-arm-kernel@lists.infradead.org>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>, "Mel Gorman" <mgorman@suse.de>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Oleksandr Natalenko" <oleksandr@natalenko.name>,
	"Kernel Page Reclaim v2" <page-reclaim@google.com>,
	"Rik van Riel" <riel@surriel.com>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Sofia Trinh" <sofia.trinh@edi.works>,
	"Steven Barrett" <steven@liquorix.net>,
	"Suleiman Souhlal" <suleiman@google.com>,
	"Shuang Zhai" <szhai2@cs.rochester.edu>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Will Deacon" <will@kernel.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	"Huang Ying" <ying.huang@intel.com>
Subject: Re: [PATCH v7 04/12] mm: multigenerational LRU: groundwork
Date: Tue, 15 Mar 2022 20:46:56 -0600	[thread overview]
Message-ID: <CAOUHufau34de-FmdBxNHpWWUUuN4DxT1fci9aX8Uc+RAfVXwXw@mail.gmail.com> (raw)
In-Reply-To: <CAGsJ_4zZc0oFSmBKAN77vm7VstQH=ieaQ0cfyvcMi3OQRrEpSg@mail.gmail.com>

On Tue, Mar 15, 2022 at 4:29 AM Barry Song <21cnbao@gmail.com> wrote:

<snipped>

> > I guess the main cause of the regression for the previous sequence
> > with 16 entries is that the ebizzy has a new allocated copy in
> > search_mem(), which is mapped and used only once in each loop.
> > and the temp copy can push out those hot chunks.
> >
> > Anyway, I understand it is a trade-off between warmly embracing new
> > pages and holding old pages tightly. Real user cases from phone, server,
> > desktop will be judging this better.

Thanks for all the details. I looked into them today and found no
regressions when running with your original program.

After I explain why, I hope you'd be convinced that using programs
like this one is not a good way to measure things :)

Problems:
1) Given the 2.5GB configuration and a sequence of cold/hot chunks, I
assume your program tries to simulate a handful of apps running on a
phone.  A short repeating sequence is closer to sequential access than
to real user behaviors, as I suggested last time. You could check out
how something similar is done here [1].
2) Under the same assumption (phone), C programs are very different
from Android apps in terms of runtime memory behaviors, e.g., JVM GC
[2].
3) Assuming you are interested in the runtime memory behavior of C/C++
programs, your program is still not very representative. All C/C++
programs I'm familiar with choose to link against TCmalloc, jemalloc
or implement their own allocators. GNU libc, IMO, has a small market
share nowadays.
4) TCmalloc/jemalloc are not only optimized for multithreading, they
are also THP aware. THP is very important when benchmarking page
reclaim, e.g., two similarly warm THPs can comprise 511+1 or 1+511 of
warm+cold 4K pages. The LRU algorithm that chooses more of the former
is at the disadvantage. Unless it's recommended by the applications
you are trying to benchmark, THP should be disabled. (Android
generally doesn't use THP.)
5) Swap devices are also important. Zram should NOT be used unless you
know your benchmark doesn't generate incompressible data. The LRU
algorithm that chooses more incompressible pages is at disadvantage.

Here is my result: on the same Snapdragon 7c + 2.5GB RAM + 1.5GB
ramdisk swap, with your original program compiled against libc malloc
and TCMalloc, to 32-bit and 64-bit binaries:

# cat /sys/kernel/mm/lru_gen/enabled
0x0003
# cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]

# modprobe brd rd_nr=1 rd_size=1572864
# if=/dev/zero of=/dev/ram0 bs=1M
# mkswap /dev/ram0
# swapoff -a
# swapon /dev/ram0

# ldd test_absl_32
        linux-vdso.so.1 (0xf6e7f000)
        libabsl_malloc.so.2103.0.1 =>
/usr/lib/libabsl_malloc.so.2103.0.1 (0xf6e23000)
        libpthread.so.0 => /lib/libpthread.so.0 (0xf6dff000)
        libc.so.6 => /lib/libc.so.6 (0xf6d07000)
        /lib/ld-linux-armhf.so.3 (0x09df0000)
        libabsl_base.so.2103.0.1 => /usr/lib/libabsl_base.so.2103.0.1
(0xf6ce5000)
        libabsl_raw_logging.so.2103.0.1 =>
/usr/lib/libabsl_raw_logging.so.2103.0.1 (0xf6cc4000)
        libabsl_spinlock_wait.so.2103.0.1 =>
/usr/lib/libabsl_spinlock_wait.so.2103.0.1 (0xf6ca3000)
        libc++.so.1 => /usr/lib/libc++.so.1 (0xf6c04000)
        libc++abi.so.1 => /usr/lib/libc++abi.so.1 (0xf6bcd000)
# file test_absl_64
test_absl_64: ELF 64-bit LSB executable, ARM aarch64, version 1
(SYSV), statically linked
# ldd test_gnu_32
        linux-vdso.so.1 (0xeabef000)
        libpthread.so.0 => /lib/libpthread.so.0 (0xeab92000)
        libc.so.6 => /lib/libc.so.6 (0xeaa9a000)
        /lib/ld-linux-armhf.so.3 (0x05690000)
# file test_gnu_64
test_gnu_64: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV),
statically linked

### baseline 5.17-rc8

# perf record ./test_gnu_64 -t 4 -s $((200*1024*1024)) -S 6000000
10 records/s
real 59.00 s
user 39.83 s
sys  174.18 s

    18.51%  [.] memcpy
    15.98%  [k] __pi_clear_page
     5.59%  [k] rmqueue_pcplist
     5.19%  [k] do_raw_spin_lock
     5.09%  [k] memmove
     4.60%  [k] _raw_spin_unlock_irq
     3.62%  [k] _raw_spin_unlock_irqrestore
     3.61%  [k] free_unref_page_list
     3.29%  [k] zap_pte_range
     2.53%  [k] local_daif_restore
     2.50%  [k] down_read_trylock
     1.41%  [k] handle_mm_fault
     1.32%  [k] do_anonymous_page
     1.31%  [k] up_read
     1.03%  [k] free_swap_cache

### MGLRU v9

# perf record ./test_gnu_64 -t 4 -s $((200*1024*1024)) -S 6000000
11 records/s
real 57.00 s
user 39.39 s

    19.36%  [.] memcpy
    16.50%  [k] __pi_clear_page
     6.21%  [k] memmove
     5.57%  [k] rmqueue_pcplist
     5.07%  [k] do_raw_spin_lock
     4.96%  [k] _raw_spin_unlock_irqrestore
     4.25%  [k] free_unref_page_list
     3.80%  [k] zap_pte_range
     3.69%  [k] _raw_spin_unlock_irq
     2.71%  [k] local_daif_restore
     2.10%  [k] down_read_trylock
     1.50%  [k] handle_mm_fault
     1.29%  [k] do_anonymous_page
     1.17%  [k] free_swap_cache
     1.08%  [k] up_read

[1] https://chromium.googlesource.com/chromiumos/platform/tast-tests/+/refs/heads/main/src/chromiumos/tast/local/memory/mempressure/mempressure.go
[2] https://developer.android.com/topic/performance/memory-overview


  reply	other threads:[~2022-03-16  2:47 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-08  8:18 [PATCH v7 00/12] Multigenerational LRU Framework Yu Zhao
2022-02-08  8:18 ` [PATCH v7 01/12] mm: x86, arm64: add arch_has_hw_pte_young() Yu Zhao
2022-02-08  8:24   ` Yu Zhao
2022-02-08 10:33   ` Will Deacon
2022-02-08  8:18 ` [PATCH v7 02/12] mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG Yu Zhao
2022-02-08  8:27   ` Yu Zhao
2022-02-08  8:18 ` [PATCH v7 03/12] mm/vmscan.c: refactor shrink_node() Yu Zhao
2022-02-08  8:18 ` [PATCH v7 04/12] mm: multigenerational LRU: groundwork Yu Zhao
2022-02-08  8:28   ` Yu Zhao
2022-02-10 20:41   ` Johannes Weiner
2022-02-15  9:43     ` Yu Zhao
2022-02-15 21:53       ` Johannes Weiner
2022-02-21  8:14         ` Yu Zhao
2022-02-23 21:18           ` Yu Zhao
2022-02-25 16:34             ` Minchan Kim
2022-03-03 15:29           ` Johannes Weiner
2022-03-03 19:26             ` Yu Zhao
2022-03-03 21:43               ` Johannes Weiner
2022-03-11 10:16       ` Barry Song
2022-03-11 23:45         ` Yu Zhao
2022-03-12 10:37           ` Barry Song
2022-03-12 21:11             ` Yu Zhao
2022-03-13  4:57               ` Barry Song
2022-03-14 11:11                 ` Barry Song
2022-03-14 16:45                   ` Yu Zhao
2022-03-14 23:38                     ` Barry Song
     [not found]                       ` <CAOUHufa9eY44QadfGTzsxa2=hEvqwahXd7Canck5Gt-N6c4UKA@mail.gmail.com>
     [not found]                         ` <CAGsJ_4zvj5rmz7DkW-kJx+jmUT9G8muLJ9De--NZma9ey0Oavw@mail.gmail.com>
2022-03-15 10:29                           ` Barry Song
2022-03-16  2:46                             ` Yu Zhao [this message]
2022-03-16  4:37                               ` Barry Song
2022-03-16  5:44                                 ` Yu Zhao
2022-03-16  6:06                                   ` Barry Song
2022-03-16 21:37                                     ` Yu Zhao
2022-02-10 21:37   ` Matthew Wilcox
2022-02-13 21:16     ` Yu Zhao
2022-02-08  8:18 ` [PATCH v7 05/12] mm: multigenerational LRU: minimal implementation Yu Zhao
2022-02-08  8:33   ` Yu Zhao
2022-02-08 16:50   ` Johannes Weiner
2022-02-10  2:53     ` Yu Zhao
2022-02-13 10:04   ` Hillf Danton
2022-02-17  0:13     ` Yu Zhao
2022-02-23  8:27   ` Huang, Ying
2022-02-23  9:36     ` Yu Zhao
2022-02-24  0:59       ` Huang, Ying
2022-02-24  1:34         ` Yu Zhao
2022-02-24  3:31           ` Huang, Ying
2022-02-24  4:09             ` Yu Zhao
2022-02-24  5:27               ` Huang, Ying
2022-02-24  5:35                 ` Yu Zhao
2022-02-08  8:18 ` [PATCH v7 06/12] mm: multigenerational LRU: exploit locality in rmap Yu Zhao
2022-02-08  8:40   ` Yu Zhao
2022-02-08  8:18 ` [PATCH v7 07/12] mm: multigenerational LRU: support page table walks Yu Zhao
2022-02-08  8:39   ` Yu Zhao
2022-02-08  8:18 ` [PATCH v7 08/12] mm: multigenerational LRU: optimize multiple memcgs Yu Zhao
2022-02-08  8:18 ` [PATCH v7 09/12] mm: multigenerational LRU: runtime switch Yu Zhao
2022-02-08  8:42   ` Yu Zhao
2022-02-08  8:19 ` [PATCH v7 10/12] mm: multigenerational LRU: thrashing prevention Yu Zhao
2022-02-08  8:43   ` Yu Zhao
2022-02-08  8:19 ` [PATCH v7 11/12] mm: multigenerational LRU: debugfs interface Yu Zhao
2022-02-18 18:56   ` [page-reclaim] " David Rientjes
2022-02-08  8:19 ` [PATCH v7 12/12] mm: multigenerational LRU: documentation Yu Zhao
2022-02-08  8:44   ` Yu Zhao
2022-02-14 10:28   ` Mike Rapoport
2022-02-16  3:22     ` Yu Zhao
2022-02-21  9:01       ` Mike Rapoport
2022-02-22  1:47         ` Yu Zhao
2022-02-23 10:58           ` Mike Rapoport
2022-02-23 21:20             ` Yu Zhao
2022-02-08 10:11 ` [PATCH v7 00/12] Multigenerational LRU Framework Oleksandr Natalenko
2022-02-08 11:14   ` Michal Hocko
2022-02-08 11:23     ` Oleksandr Natalenko
2022-02-11 20:12 ` Alexey Avramov
2022-02-12 21:01   ` Yu Zhao
2022-03-03  6:06 ` Vaibhav Jain
2022-03-03  6:47   ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOUHufau34de-FmdBxNHpWWUUuN4DxT1fci9aX8Uc+RAfVXwXw@mail.gmail.com \
    --to=yuzhao@google.com \
    --cc=21cnbao@gmail.com \
    --cc=Hi-Angel@yandex.ru \
    --cc=Michael@michaellarabel.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=axboe@kernel.dk \
    --cc=bgeffon@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=d@chaos-reins.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=djbyrne@mtu.edu \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=heftig@archlinux.org \
    --cc=holger@applied-asynchrony.com \
    --cc=jsbarnes@google.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=oleksandr@natalenko.name \
    --cc=page-reclaim@google.com \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=sofia.trinh@edi.works \
    --cc=steven@liquorix.net \
    --cc=suleiman@google.com \
    --cc=szhai2@cs.rochester.edu \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).