Linux-RISC-V Archive on lore.kernel.org
 help / color / Atom feed
From: Mike Rapoport <rppt@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>,
	David Hildenbrand <david@redhat.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-mm@kvack.org, Will Deacon <will@kernel.org>,
	linux-kselftest@vger.kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	Christopher Lameter <cl@linux.com>,
	Idan Yaniv <idan.yaniv@ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Elena Reshetova <elena.reshetova@intel.com>,
	linux-arch@vger.kernel.org, Tycho Andersen <tycho@tycho.ws>,
	linux-nvdimm@lists.01.org, Shuah Khan <shuah@kernel.org>,
	x86@kernel.org, Matthew Wilcox <willy@infradead.org>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Arnd Bergmann <arnd@arndb.de>,
	James Bottomley <jejb@linux.ibm.com>,
	Borislav Petkov <bp@alien8.de>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Dan Williams <dan.j.williams@intel.com>,
	linux-arm-kernel@lists.infradead.org, linux-api@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
	Palmer Dabbelt <palmer@dabbelt.com>,
	linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v6 5/6] mm: secretmem: use PMD-size pages to amortize direct map fragmentation
Date: Wed, 30 Sep 2020 13:20:31 +0300
Message-ID: <20200930102031.GJ2142832@kernel.org> (raw)
In-Reply-To: <20200929141216.GO2628@hirez.programming.kicks-ass.net>

On Tue, Sep 29, 2020 at 04:12:16PM +0200, Peter Zijlstra wrote:
> On Tue, Sep 29, 2020 at 04:05:29PM +0300, Mike Rapoport wrote:
> > On Fri, Sep 25, 2020 at 09:41:25AM +0200, Peter Zijlstra wrote:
> > > On Thu, Sep 24, 2020 at 04:29:03PM +0300, Mike Rapoport wrote:
> > > > From: Mike Rapoport <rppt@linux.ibm.com>
> > > > 
> > > > Removing a PAGE_SIZE page from the direct map every time such page is
> > > > allocated for a secret memory mapping will cause severe fragmentation of
> > > > the direct map. This fragmentation can be reduced by using PMD-size pages
> > > > as a pool for small pages for secret memory mappings.
> > > > 
> > > > Add a gen_pool per secretmem inode and lazily populate this pool with
> > > > PMD-size pages.
> > > 
> > > What's the actual efficacy of this? Since the pmd is per inode, all I
> > > need is a lot of inodes and we're in business to destroy the directmap,
> > > no?
> > > 
> > > Afaict there's no privs needed to use this, all a process needs is to
> > > stay below the mlock limit, so a 'fork-bomb' that maps a single secret
> > > page will utterly destroy the direct map.
> > 
> > This indeed will cause 1G pages in the direct map to be split into 2M
> > chunks, but I disagree with 'destroy' term here. Citing the cover letter
> > of an earlier version of this series:
> 
> It will drop them down to 4k pages. Given enough inodes, and allocating
> only a single sekrit page per pmd, we'll shatter the directmap into 4k.
> 
> >   I've tried to find some numbers that show the benefit of using larger
> >   pages in the direct map, but I couldn't find anything so I've run a
> >   couple of benchmarks from phoronix-test-suite on my laptop (i7-8650U
> >   with 32G RAM).
> 
> Existing benchmarks suck at this, but FB had a load that had a

I tried to dig the regression report in the mailing list, and the best I
could find is

https://lore.kernel.org/lkml/20190823052335.572133-1-songliubraving@fb.com/

which does not mention the actual performance regression but it only
complaints about kernel text mapping being split into 4K pages.

Any chance you have the regression report handy? 

> deterministic enough performance regression to bisect to a directmap
> issue, fixed by:
> 
>   7af0145067bc ("x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text")

This commit talks about large page split for the text and mentions iTLB
performance.
Could it be that for data the behavoiur is different?

> >   I've tested three variants: the default with 28G of the physical
> >   memory covered with 1G pages, then I disabled 1G pages using
> >   "nogbpages" in the kernel command line and at last I've forced the
> >   entire direct map to use 4K pages using a simple patch to
> >   arch/x86/mm/init.c.  I've made runs of the benchmarks with SSD and
> >   tmpfs.
> >   
> >   Surprisingly, the results does not show huge advantage for large
> >   pages. For instance, here the results for kernel build with
> >   'make -j8', in seconds:
> 
> Your benchmark should stress the TLB of your uarch, such that additional
> pressure added by the shattered directmap shows up.

I understand that the benchmark should stress the TLB, but it's not that
we can add something like random access to a large working set as a
kernel module and insmod it. The userspace should do something that will
cause the stress to the TLB so that entries corresponding to the direct
map will be evicted frequently. And, frankly, 

> And no, I don't have one either.
> 
> >                         |  1G    |  2M    |  4K
> >   ----------------------+--------+--------+---------
> >   ssd, mitigations=on	| 308.75 | 317.37 | 314.9
> >   ssd, mitigations=off	| 305.25 | 295.32 | 304.92
> >   ram, mitigations=on	| 301.58 | 322.49 | 306.54
> >   ram, mitigations=off	| 299.32 | 288.44 | 310.65
> 
> These results lack error data, but assuming the reults are significant,
> then this very much makes a case for 1G mappings. 5s on a kernel builds
> is pretty good.

The standard error for those are between 2.5 and 4.5 out of 3 runs for
each variant. 

For kernel build 1G mappings perform better, but here 5s is only 1.6% of
300s and the direct map fragmentation was taken to the extreme here.
I'm not saying that the direct map fragmentation comes with no cost, but
the cost is not so big to dismiss features that cause the fragmentation
out of hand.

There were also benchmarks that actually performed better with 2M pages
in the direct map, so I'm still not convinced that 1G pages in the
direct map are the clear cut winner.

-- 
Sincerely yours,
Mike.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  parent reply index

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-24 13:28 [PATCH v6 0/6] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2020-09-24 13:28 ` [PATCH v6 1/6] mm: add definition of PMD_PAGE_ORDER Mike Rapoport
2020-09-24 13:29 ` [PATCH v6 2/6] mmap: make mlock_future_check() global Mike Rapoport
2020-09-24 13:29 ` [PATCH v6 3/6] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2020-09-29  4:58   ` Edgecombe, Rick P
2020-09-29 13:06     ` Mike Rapoport
2020-09-29 20:06       ` Edgecombe, Rick P
2020-09-30 10:35         ` Mike Rapoport
2020-09-30 20:11           ` Edgecombe, Rick P
2020-10-11  9:42             ` Mike Rapoport
2020-09-24 13:29 ` [PATCH v6 4/6] arch, mm: wire up memfd_secret system call were relevant Mike Rapoport
2020-09-24 13:29 ` [PATCH v6 5/6] mm: secretmem: use PMD-size pages to amortize direct map fragmentation Mike Rapoport
2020-09-25  7:41   ` Peter Zijlstra
2020-09-25  9:00     ` David Hildenbrand
2020-09-25  9:50       ` Peter Zijlstra
2020-09-25 10:31         ` Mark Rutland
2020-09-25 14:57           ` Tycho Andersen
2020-09-29 14:04           ` Mike Rapoport
2020-09-29 13:07         ` Mike Rapoport
2020-09-29 13:06       ` Mike Rapoport
2020-09-29 13:05     ` Mike Rapoport
2020-09-29 14:12       ` Peter Zijlstra
2020-09-29 14:31         ` Dave Hansen
2020-09-29 14:58         ` Mike Rapoport
2020-09-29 15:15           ` Peter Zijlstra
2020-09-30 10:27             ` Mike Rapoport
2020-09-30 14:39               ` James Bottomley
2020-09-30 14:45                 ` David Hildenbrand
2020-09-30 15:17                   ` James Bottomley
2020-09-30 15:25                     ` David Hildenbrand
2020-09-30 15:09               ` Matthew Wilcox
2020-10-01  8:14                 ` Mike Rapoport
2020-09-29 15:03         ` James Bottomley
2020-09-30 10:20         ` Mike Rapoport [this message]
2020-09-30 10:43           ` Peter Zijlstra
2020-09-24 13:29 ` [PATCH v6 6/6] secretmem: test: add basic selftest for memfd_secret(2) Mike Rapoport
2020-09-24 13:35 ` [PATCH] man2: new page describing memfd_secret() system call Mike Rapoport
2020-09-24 14:55   ` Alejandro Colomar
2020-10-03  9:32     ` Alejandro Colomar
2020-10-05  7:32       ` Mike Rapoport
2020-11-16 21:01         ` [PATCH v2] memfd_secret.2: New " Alejandro Colomar
2020-11-17  6:26           ` Mike Rapoport
2020-09-25  2:34 ` [PATCH v6 0/6] mm: introduce memfd_secret system call to create "secret" memory areas Andrew Morton
2020-09-25  6:42   ` Mike Rapoport
2020-11-01 11:09 ` Hagen Paul Pfeifer
2020-11-02 15:40   ` Mike Rapoport
2020-11-03 13:52     ` Hagen Paul Pfeifer
2020-11-03 16:30       ` Mike Rapoport
2020-11-04 11:39         ` Hagen Paul Pfeifer
2020-11-04 17:02           ` Mike Rapoport
2020-11-09 10:41             ` Hagen Paul Pfeifer
2020-11-02  9:11 ` David Hildenbrand
2020-11-02  9:31   ` David Hildenbrand
2020-11-02 17:43   ` Mike Rapoport
2020-11-02 17:51     ` David Hildenbrand
2020-11-03  9:52       ` Mike Rapoport
2020-11-03 10:11         ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200930102031.GJ2142832@kernel.org \
    --to=rppt@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=cl@linux.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=elena.reshetova@intel.com \
    --cc=hpa@zytor.com \
    --cc=idan.yaniv@ibm.com \
    --cc=jejb@linux.ibm.com \
    --cc=kirill@shutemov.name \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=mtk.manpages@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=peterz@infradead.org \
    --cc=rppt@linux.ibm.com \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tycho@tycho.ws \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-RISC-V Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-riscv/0 linux-riscv/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-riscv linux-riscv/ https://lore.kernel.org/linux-riscv \
		linux-riscv@lists.infradead.org
	public-inbox-index linux-riscv

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-riscv


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git