All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: David Hildenbrand <david@redhat.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>,
	linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org,
	"xuyu@linux.alibaba.com" <xuyu@linux.alibaba.com>
Subject: Re: [PATCH RFC v2 0/4] Add support for sharing page tables across processes (Previously mshare)
Date: Mon, 31 Jul 2023 17:54:14 +0100	[thread overview]
Message-ID: <ZMfnNpQIkXXs1W02@casper.infradead.org> (raw)
In-Reply-To: <c1f3c78d-b1eb-5c1c-83aa-35901800498f@redhat.com>

On Mon, Jul 31, 2023 at 06:48:47PM +0200, David Hildenbrand wrote:
> On 31.07.23 18:38, Matthew Wilcox wrote:
> > On Mon, Jul 31, 2023 at 06:30:22PM +0200, David Hildenbrand wrote:
> > > Assume we do do the page table sharing at mmap time, if the flags are right.
> > > Let's focus on the most common:
> > > 
> > > mmap(memfd, PROT_READ | PROT_WRITE, MAP_SHARED)
> > > 
> > > And doing the same in each and every process.
> > 
> > That may be the most common in your usage, but for a database, you're
> > looking at two usage scenarios.  Postgres calls mmap() on the database
> > file itself so that all processes share the kernel page cache.
> > Some Commercial Databases call mmap() on a hugetlbfs file so that all
> > processes share the same userspace buffer cache.  Other Commecial
> > Databases call shmget() / shmat() with SHM_HUGETLB for the exact
> > same reason.
> 
> I remember you said that postgres might be looking into using shmem as well,
> maybe I am wrong.

No, I said that postgres was also interested in sharing page tables.
I don't think they have any use for shmem.

> memfd/hugetlb/shmem could all be handled alike, just "arbitrary filesystems"
> would require more work.

But arbitrary filesystems was one of the origin use cases; where the
database is stored on a persistent memory filesystem, and neither the
kernel nor userspace has a cache.  The Postgres & Commercial Database
use-cases collapse into the same case, and we want to mmap the files
directly and share the page tables.

> > This is why I proposed mshare().  Anyone can use it for anything.
> > We have such a diverse set of users who want to do stuff with shared
> > page tables that we should not be tying it to memfd or any other
> > filesystem.  Not to mention that it's more flexible; you can map
> > individual 4kB files into it and still get page table sharing.
> 
> That's not what the current proposal does, or am I wrong?

I think you're wrong, but I haven't had time to read the latest patches.

> Also, I'm curious, is that a real requirement in the database world?

I don't know.  It's definitely an advantage that falls out of the design
of mshare.

  reply	other threads:[~2023-07-31 16:54 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-26 16:49 [PATCH RFC v2 0/4] Add support for sharing page tables across processes (Previously mshare) Khalid Aziz
2023-04-26 16:49 ` [PATCH RFC v2 1/4] mm/ptshare: Add vm flag for shared PTE Khalid Aziz
2023-04-26 16:49 ` [PATCH RFC v2 2/4] mm/ptshare: Add flag MAP_SHARED_PT to mmap() Khalid Aziz
2023-04-27 11:17   ` kernel test robot
2023-04-29  4:41   ` kernel test robot
2023-04-26 16:49 ` [PATCH RFC v2 3/4] mm/ptshare: Create new mm struct for page table sharing Khalid Aziz
2023-06-26  8:08   ` Karim Manaouil
2023-04-26 16:49 ` [PATCH RFC v2 4/4] mm/ptshare: Add page fault handling for page table shared regions Khalid Aziz
2023-04-27  0:24   ` kernel test robot
2023-04-29 14:07   ` kernel test robot
2023-04-26 21:27 ` [PATCH RFC v2 0/4] Add support for sharing page tables across processes (Previously mshare) Mike Kravetz
2023-04-27 16:40   ` Khalid Aziz
2023-06-12 16:25 ` Peter Xu
2023-06-30 11:29 ` Rongwei Wang
2023-07-31  4:35 ` Rongwei Wang
2023-07-31 12:25   ` Matthew Wilcox
2023-07-31 12:50     ` David Hildenbrand
2023-07-31 16:19       ` Rongwei Wang
2023-07-31 16:30         ` David Hildenbrand
2023-07-31 16:38           ` Matthew Wilcox
2023-07-31 16:48             ` David Hildenbrand
2023-07-31 16:54               ` Matthew Wilcox [this message]
2023-07-31 17:06                 ` David Hildenbrand
2023-08-01  6:53             ` Rongwei Wang
2023-08-01 19:28               ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZMfnNpQIkXXs1W02@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=david@redhat.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rongwei.wang@linux.alibaba.com \
    --cc=xuyu@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.