All of lore.kernel.org
 help / color / mirror / Atom feed
From: Huang Shijie <shijie@os.amperecomputing.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	Shijie Huang <shijie@amperemail.onmicrosoft.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux-MM <linux-mm@kvack.org>,
	"Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Frank Wang <zwang@amperecomputing.com>
Subject: Re: Is it possible to implement the per-node page cache for programs/libraries?
Date: Thu, 2 Sep 2021 10:08:06 +0000	[thread overview]
Message-ID: <YTCihsPZL0HtO2lp@hsj> (raw)
In-Reply-To: <CAHk-=wjAPEs3HRGswJ-AE1R048j2MBsBtMfg3GOsaFykHoeKsg@mail.gmail.com>

Hi Linus,
On Wed, Sep 01, 2021 at 10:29:01AM -0700, Linus Torvalds wrote:
> On Wed, Sep 1, 2021 at 10:24 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > But what you could do, if  you wanted to, would be to catch the
> > situation where you have lots of expensive NUMA accesses either using
> > our VM infrastructure or performance counters, and when the mapping is
> > a MAP_PRIVATE you just do a COW fault on them.
> >
> > Sounds entirely doable, and has absolutely nothing to do with the page
> > cache. It would literally just be an "over-eager COW fault triggered
> > by NUMA access counters".
Yes. You are right, we can use COW. :)

Actually we have _TWO_ levels to do the optimization for NUMA remote-access:
   1.) the page cache which is independent to process.
   2.) the process address space(page table).

   For 2.), we can use the over-eager COW:
        2.1) I have finished a user patch for glibc which uses "over-eager COW" to do the text
	   replication in NUMA.
        2.2) Also a kernel patch uses the "over-eager COW" to do the replication for 
           the programs itself in NUMA. (We may refine it to another topic..)
> 
> Note how it would work perfectly fine for anonymous mappings too. Just
> to reinforce the point that this has nothing to do with any page cache
> issues.
> 
> Of course, if you want to actually then *share* pages within a node
> (rather than replicate them for each process), that gets more
> exciting.
Do we really need to change the page cache?
          The 2.1) above may produces one-copy "shared libraries pages" for each process, such glibc.so.
          Even in the same NUMA node 0, we may run two same processes. So it produces "two glibc.so" now.
	  If We run 5 same processes in NUMA Node 0, it will produces "five glibs.so".

	  But if we have per-node page cache for the glibc.so, we can do it like this:
	  (1) disable the "over-eager COW" in the process.
	  (2) use the per-node page cache's pages to different processes in the _SAME_ NUMA node.
	      So all the processes in the same NUMA node, can use only one same page.
          (3) Processes in other NUMA nodes, use the pages belong to this node.

	  By this way, we can save many pages, and provide more access speed in NUMA.

Thanks
Huang Shijie

      parent reply	other threads:[~2021-09-02  2:09 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-01  3:07 Is it possible to implement the per-node page cache for programs/libraries? Shijie Huang
2021-09-01  3:07 ` Shijie Huang
2021-09-01  2:09 ` Barry Song
2021-09-01  2:09   ` Barry Song
2021-09-01  3:25 ` Matthew Wilcox
2021-09-01 13:30   ` Huang Shijie
2021-09-01 14:25     ` Huang Shijie
2021-09-01 11:32       ` Matthew Wilcox
2021-09-01 23:58       ` Matthew Wilcox
2021-09-02  0:15         ` Barry Song
2021-09-02  0:15           ` Barry Song
2021-09-02  1:13           ` Linus Torvalds
2021-09-02  1:13             ` Linus Torvalds
2021-09-02 10:16         ` Huang Shijie
2021-09-02  3:25   ` Nicholas Piggin
2021-09-02 10:17     ` Matthew Wilcox
2021-09-03  7:10       ` Nicholas Piggin
2021-09-03 19:01         ` Matthew Wilcox
2021-09-03 19:08           ` Linus Torvalds
2021-09-03 19:08             ` Linus Torvalds
2021-09-06  9:56             ` Huang Shijie
2021-09-03 23:42           ` Nicholas Piggin
2021-09-01  4:55 ` Al Viro
2021-09-01 13:10   ` Huang Shijie
2021-09-01 17:24   ` Linus Torvalds
2021-09-01 17:24     ` Linus Torvalds
2021-09-01 17:29     ` Linus Torvalds
2021-09-01 17:29       ` Linus Torvalds
2021-09-01 22:56       ` Barry Song
2021-09-01 22:56         ` Barry Song
2021-09-02 10:12         ` Huang Shijie
2021-09-02 10:08       ` Huang Shijie [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YTCihsPZL0HtO2lp@hsj \
    --to=shijie@os.amperecomputing.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shijie@amperemail.onmicrosoft.com \
    --cc=song.bao.hua@hisilicon.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zwang@amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.