From: Barry Song <21cnbao@gmail.com> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk>, Shijie Huang <shijie@amperemail.onmicrosoft.com>, Andrew Morton <akpm@linux-foundation.org>, Linux-MM <linux-mm@kvack.org>, "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Frank Wang <zwang@amperecomputing.com> Subject: Re: Is it possible to implement the per-node page cache for programs/libraries? Date: Thu, 2 Sep 2021 10:56:20 +1200 [thread overview] Message-ID: <CAGsJ_4yLrGv2izZ2z4QWnBbDOhEjHygHDFBthfFqW0XEkMP-ag@mail.gmail.com> (raw) In-Reply-To: <CAHk-=wjAPEs3HRGswJ-AE1R048j2MBsBtMfg3GOsaFykHoeKsg@mail.gmail.com> On Thu, Sep 2, 2021 at 5:31 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Wed, Sep 1, 2021 at 10:24 AM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > But what you could do, if you wanted to, would be to catch the > > situation where you have lots of expensive NUMA accesses either using > > our VM infrastructure or performance counters, and when the mapping is > > a MAP_PRIVATE you just do a COW fault on them. > > > > Sounds entirely doable, and has absolutely nothing to do with the page > > cache. It would literally just be an "over-eager COW fault triggered > > by NUMA access counters". > > Note how it would work perfectly fine for anonymous mappings too. Just > to reinforce the point that this has nothing to do with any page cache > issues. > > Of course, if you want to actually then *share* pages within a node > (rather than replicate them for each process), that gets more > exciting. > > But I suspect that this is mainly only useful for long-running big > processes (not least due to that node binding thing), so I question > the need for that kind of excitement. In Linux server scenarios, it would be quite common to have long-running big processes constantly running on one machine, for example, web, database etc. This kind of process can cross a couple of NUMA nodes using all CPUs in a server to achieve the maximum throughput. SGI/HPE has a numatool with command "dplace" to help deploy processes with replicated text in either libraries or binary (a.out) [1]: dplace [-e] [-c cpu_numbers] [-s skip_count] [-n process_name] \ [-x skip_mask] [-r [l|b|t]] [-o log_file] [-v 1|2] \ command [command-args] The dplace command accepts the following options: ... -r: Specifies that text should be replicated on the node or nodes where the application is running. In some cases, replication will improve performance by reducing the need to make offnode memory references for code. The replication option applies to all programs placed by the dplace command. See the dplace man page for additional information on text replication. The replication options are a string of one or more of the following characters: l - Replicate library text b - Replicate binary (a.out) text t - Thread round-robin option On the other hand, it would be also interesting to investigate if kernel text replication can help improve performance. MIPS does have REPLICATE_KTEXT support in the kernel: config REPLICATE_KTEXT bool "Kernel text replication support" depends on SGI_IP27 select MAPPED_KERNEL help Say Y here to enable replicating the kernel text across multiple nodes in a NUMA cluster. This trades memory for speed. Not quite sure how it will benefit X86 and ARM64 though it seems concurrent-rt has some solution and benchmark data in RedHawk Linux[2]. [1] http://www.nacad.ufrj.br/online/sgi/007-5646-002/sgi_html/ch05.html [2] https://www.concurrent-rt.com/wp-content/uploads/2016/11/kernel-page-replication.pdf > > Linus Thanks Barry
next prev parent reply other threads:[~2021-09-01 22:56 UTC|newest] Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-09-01 3:07 Shijie Huang 2021-09-01 3:07 ` Shijie Huang 2021-09-01 2:09 ` Barry Song 2021-09-01 2:09 ` Barry Song 2021-09-01 3:25 ` Matthew Wilcox 2021-09-01 13:30 ` Huang Shijie 2021-09-01 14:25 ` Huang Shijie 2021-09-01 11:32 ` Matthew Wilcox 2021-09-01 23:58 ` Matthew Wilcox 2021-09-02 0:15 ` Barry Song 2021-09-02 0:15 ` Barry Song 2021-09-02 1:13 ` Linus Torvalds 2021-09-02 1:13 ` Linus Torvalds 2021-09-02 10:16 ` Huang Shijie 2021-09-02 3:25 ` Nicholas Piggin 2021-09-02 10:17 ` Matthew Wilcox 2021-09-03 7:10 ` Nicholas Piggin 2021-09-03 19:01 ` Matthew Wilcox 2021-09-03 19:08 ` Linus Torvalds 2021-09-03 19:08 ` Linus Torvalds 2021-09-06 9:56 ` Huang Shijie 2021-09-03 23:42 ` Nicholas Piggin 2021-09-01 4:55 ` Al Viro 2021-09-01 13:10 ` Huang Shijie 2021-09-01 17:24 ` Linus Torvalds 2021-09-01 17:24 ` Linus Torvalds 2021-09-01 17:29 ` Linus Torvalds 2021-09-01 17:29 ` Linus Torvalds 2021-09-01 22:56 ` Barry Song [this message] 2021-09-01 22:56 ` Barry Song 2021-09-02 10:12 ` Huang Shijie 2021-09-02 10:08 ` Huang Shijie
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CAGsJ_4yLrGv2izZ2z4QWnBbDOhEjHygHDFBthfFqW0XEkMP-ag@mail.gmail.com \ --to=21cnbao@gmail.com \ --cc=akpm@linux-foundation.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=shijie@amperemail.onmicrosoft.com \ --cc=song.bao.hua@hisilicon.com \ --cc=torvalds@linux-foundation.org \ --cc=viro@zeniv.linux.org.uk \ --cc=zwang@amperecomputing.com \ --subject='Re: Is it possible to implement the per-node page cache for programs/libraries?' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.