Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions
       [not found]   ` <20181012113056.gxhcbrqyu7k7xnyv@kshutemo-mobl1>
@ 2018-10-12 18:02     ` David Miller
       [not found]     ` <20181012125046.GA170912@joelaf.mtv.corp.google.com>
  1 sibling, 0 replies; 9+ messages in thread
From: David Miller @ 2018-10-12 18:02 UTC (permalink / raw)
  To: kirill
  Cc: joel, linux-kernel, kernel-team, minchan, pantin, hughd,
	lokeshgidra, dancol, mhocko, akpm, aryabinin, luto, bp,
	catalin.marinas, chris, dave.hansen, elfring, fenghua.yu, geert,
	gxt, deller, mingo, jejb, jdike, jonas, Julia.Lawall, kasan-dev,
	kvmarm, lftan, linux-alpha, linux-arm-kernel, linux-hexagon,
	linux-ia64, linux-m68k, linux-mips, linux-mm, linux-parisc,
	linuxppc-dev, linux-riscv, linux-s390, linux-sh, linux-snps-arc,
	linux-um, linux-xtensa, jcmvbkbc, nios2-dev, openrisc, peterz,
	richard

From: "Kirill A. Shutemov" <kirill@shutemov.name>
Date: Fri, 12 Oct 2018 14:30:56 +0300

> I looked into the code more and noticed move_pte() helper called from
> move_ptes(). It changes PTE entry to suite new address.
> 
> It is only defined in non-trivial way on Sparc. I don't know much about
> Sparc and it's hard for me to say if the optimization will break anything
> there.
> 
> I think it worth to disable the optimization if __HAVE_ARCH_MOVE_PTE is
> defined. Or make architectures state explicitely that the optimization is
> safe.

What sparc is doing in move_pte() is flushing the data-cache
(synchronously) if the virtual address color of the mapping changes.

Hope this helps.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions
       [not found]     ` <20181012125046.GA170912@joelaf.mtv.corp.google.com>
@ 2018-10-12 18:18       ` David Miller
  2018-10-13  1:35         ` Joel Fernandes
  0 siblings, 1 reply; 9+ messages in thread
From: David Miller @ 2018-10-12 18:18 UTC (permalink / raw)
  To: joel
  Cc: kirill, linux-kernel, kernel-team, minchan, pantin, hughd,
	lokeshgidra, dancol, mhocko, akpm, aryabinin, luto, bp,
	catalin.marinas, chris, dave.hansen, elfring, fenghua.yu, geert,
	gxt, deller, mingo, jejb, jdike, jonas, Julia.Lawall, kasan-dev,
	kvmarm, lftan, linux-alpha, linux-arm-kernel, linux-hexagon,
	linux-ia64, linux-m68k, linux-mips, linux-mm, linux-parisc,
	linuxppc-dev, linux-riscv, linux-s390, linux-sh, linux-snps-arc,
	linux-um, linux-xtensa, jcmvbkbc, nios2-dev, openrisc, peterz,
	richard

From: Joel Fernandes <joel@joelfernandes.org>
Date: Fri, 12 Oct 2018 05:50:46 -0700

> If its an issue, then how do transparent huge pages work on Sparc?  I don't
> see the huge page code (move_huge_pages) during mremap doing anything special
> for Sparc architecture when moving PMDs..

This is because all huge pages are larger than SHMLBA.  So no cache flushing
necessary.

> Also, do we not flush the caches from any path when we munmap
> address space?  We do call do_munmap on the old mapping from mremap
> after moving to the new one.

Sparc makes sure that shared mapping have consistent colors.  Therefore
all that's left are private mappings and those will be initialized by
block stores to clear the page out or similar.

Also, when creating new mappings, we flush the D-cache when necessary
in update_mmu_cache().

We also maintain a bit in the page struct to track when a page which
was potentially written to on one cpu ends up mapped into another
address space and flush as necessary.

The cache is write-through, which simplifies the preconditions we have
to maintain.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions
  2018-10-12 18:18       ` David Miller
@ 2018-10-13  1:35         ` Joel Fernandes
  2018-10-13  1:39           ` Daniel Colascione
  0 siblings, 1 reply; 9+ messages in thread
From: Joel Fernandes @ 2018-10-13  1:35 UTC (permalink / raw)
  To: David Miller
  Cc: kirill, linux-kernel, kernel-team, minchan, pantin, hughd,
	lokeshgidra, dancol, mhocko, akpm, aryabinin, luto, bp,
	catalin.marinas, chris, dave.hansen, elfring, fenghua.yu, geert,
	gxt, deller, mingo, jejb, jdike, jonas, Julia.Lawall, kasan-dev,
	kvmarm, lftan, linux-alpha, linux-hexagon, linux-ia64,
	linux-m68k, linux-mips, linux-mm, linux-parisc, linuxppc-dev,
	linux-riscv, linux-s390, linux-sh, linux-snps-arc, linux-um,
	linux-xtensa, jcmvbkbc, nios2-dev, peterz, richard

On Fri, Oct 12, 2018 at 11:18:36AM -0700, David Miller wrote:
> From: Joel Fernandes <joel@joelfernandes.org>
[...]
> > Also, do we not flush the caches from any path when we munmap
> > address space?  We do call do_munmap on the old mapping from mremap
> > after moving to the new one.
> 
> Sparc makes sure that shared mapping have consistent colors.  Therefore
> all that's left are private mappings and those will be initialized by
> block stores to clear the page out or similar.
> 
> Also, when creating new mappings, we flush the D-cache when necessary
> in update_mmu_cache().
> 
> We also maintain a bit in the page struct to track when a page which
> was potentially written to on one cpu ends up mapped into another
> address space and flush as necessary.
> 
> The cache is write-through, which simplifies the preconditions we have
> to maintain.

Makes sense, thanks. For the moment I sent patches to enable this on arm64
and x86. We can enable it on sparc as well at a later time as it sounds it
could be a safe optimization to apply to that architecture as well.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions
  2018-10-13  1:35         ` Joel Fernandes
@ 2018-10-13  1:39           ` Daniel Colascione
  2018-10-13  1:44             ` Joel Fernandes
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Colascione @ 2018-10-13  1:39 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: David Miller, kirill, linux-kernel, kernel-team, Minchan Kim,
	Ramon Pantin, hughd, Lokesh Gidra, Michal Hocko, Andrew Morton,
	aryabinin, luto, bp, catalin.marinas, chris, dave.hansen,
	elfring, fenghua.yu, geert, gxt, deller, mingo, jejb, jdike,
	jonas, Julia.Lawall, kasan-dev, kvmarm, lftan, linux-alpha,
	linux-hexagon, linux-ia64, linux-m68k, linux-mips, linux-mm,
	linux-parisc, linuxppc-dev, linux-riscv, linux-s390, linux-sh,
	linux-snps-arc, linux-um, linux-xtensa, jcmvbkbc, nios2-dev,
	Peter Zijlstra, richard

Not 32-bit ARM?

On Fri, Oct 12, 2018 at 6:35 PM, Joel Fernandes <joel@joelfernandes.org> wrote:
> On Fri, Oct 12, 2018 at 11:18:36AM -0700, David Miller wrote:
>> From: Joel Fernandes <joel@joelfernandes.org>
> [...]
>> > Also, do we not flush the caches from any path when we munmap
>> > address space?  We do call do_munmap on the old mapping from mremap
>> > after moving to the new one.
>>
>> Sparc makes sure that shared mapping have consistent colors.  Therefore
>> all that's left are private mappings and those will be initialized by
>> block stores to clear the page out or similar.
>>
>> Also, when creating new mappings, we flush the D-cache when necessary
>> in update_mmu_cache().
>>
>> We also maintain a bit in the page struct to track when a page which
>> was potentially written to on one cpu ends up mapped into another
>> address space and flush as necessary.
>>
>> The cache is write-through, which simplifies the preconditions we have
>> to maintain.
>
> Makes sense, thanks. For the moment I sent patches to enable this on arm64
> and x86. We can enable it on sparc as well at a later time as it sounds it
> could be a safe optimization to apply to that architecture as well.
>
> thanks,
>
>  - Joel
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions
  2018-10-13  1:39           ` Daniel Colascione
@ 2018-10-13  1:44             ` Joel Fernandes
  2018-10-13  1:54               ` Daniel Colascione
  0 siblings, 1 reply; 9+ messages in thread
From: Joel Fernandes @ 2018-10-13  1:44 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: David Miller, kirill, linux-kernel, kernel-team, Minchan Kim,
	Ramon Pantin, hughd, Lokesh Gidra, Michal Hocko, Andrew Morton,
	aryabinin, luto, bp, catalin.marinas, chris, dave.hansen,
	elfring, fenghua.yu, geert, gxt, deller, mingo, jejb, jdike,
	jonas, Julia.Lawall, kasan-dev, kvmarm, lftan, linux-alpha,
	linux-hexagon, linux-ia64, linux-m68k, linux-mips, linux-mm,
	linux-parisc, linuxppc-dev, linux-riscv, linux-s390, linux-sh,
	linux-snps-arc, linux-um, linux-xtensa, jcmvbkbc, nios2-dev,
	Peter Zijlstra, richard

On Fri, Oct 12, 2018 at 06:39:45PM -0700, Daniel Colascione wrote:
> Not 32-bit ARM?

Well, I didn't want to enable every possible architecture we could in a
single go. Certainly arm32 can be a follow on enablement as can be other
architectures. The point of this series is to upstream this feature and
enable a hand-picked few architectures as a first step.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions
  2018-10-13  1:44             ` Joel Fernandes
@ 2018-10-13  1:54               ` Daniel Colascione
  2018-10-13  2:10                 ` Joel Fernandes
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Colascione @ 2018-10-13  1:54 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: David Miller, kirill, linux-kernel, kernel-team, Minchan Kim,
	Ramon Pantin, hughd, Lokesh Gidra, Michal Hocko, Andrew Morton,
	aryabinin, luto, bp, catalin.marinas, chris, dave.hansen,
	elfring, fenghua.yu, geert, gxt, deller, mingo, jejb, jdike,
	jonas, Julia.Lawall, kasan-dev, kvmarm, lftan, linux-alpha,
	linux-hexagon, linux-ia64, linux-m68k, linux-mips, linux-mm,
	linux-parisc, linuxppc-dev, linux-riscv, linux-s390, linux-sh,
	linux-snps-arc, linux-um, linux-xtensa, jcmvbkbc, nios2-dev,
	Peter Zijlstra, richard

I wonder whether it makes sense to expose to userspace somehow whether
mremap is "fast" for a particular architecture. If a feature relies on
fast mremap, it might be better for some userland component to disable
that feature entirely rather than blindly use mremap and end up
performing very poorly. If we're disabling fast mremap when THP is
enabled, the userland component can't just rely on an architecture
switch and some kind of runtime feature detection becomes even more
important.

On Fri, Oct 12, 2018 at 6:44 PM, Joel Fernandes <joel@joelfernandes.org> wrote:
> On Fri, Oct 12, 2018 at 06:39:45PM -0700, Daniel Colascione wrote:
>> Not 32-bit ARM?
>
> Well, I didn't want to enable every possible architecture we could in a
> single go. Certainly arm32 can be a follow on enablement as can be other
> architectures. The point of this series is to upstream this feature and
> enable a hand-picked few architectures as a first step.
>
> thanks,
>
>  - Joel
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions
  2018-10-13  1:54               ` Daniel Colascione
@ 2018-10-13  2:10                 ` Joel Fernandes
  2018-10-13  2:25                   ` Daniel Colascione
  0 siblings, 1 reply; 9+ messages in thread
From: Joel Fernandes @ 2018-10-13  2:10 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: David Miller, kirill, linux-kernel, kernel-team, Minchan Kim,
	Ramon Pantin, hughd, Lokesh Gidra, Michal Hocko, Andrew Morton,
	aryabinin, luto, bp, catalin.marinas, chris, dave.hansen,
	elfring, fenghua.yu, geert, gxt, deller, mingo, jejb, jdike,
	jonas, Julia.Lawall, kasan-dev, kvmarm, lftan, linux-alpha,
	linux-hexagon, linux-ia64, linux-m68k, linux-mips, linux-mm,
	linux-parisc, linuxppc-dev, linux-riscv, linux-s390, linux-sh,
	linux-snps-arc, linux-um, linux-xtensa, jcmvbkbc, nios2-dev,
	Peter Zijlstra, richard

On Fri, Oct 12, 2018 at 06:54:33PM -0700, Daniel Colascione wrote:
> I wonder whether it makes sense to expose to userspace somehow whether
> mremap is "fast" for a particular architecture. If a feature relies on
> fast mremap, it might be better for some userland component to disable
> that feature entirely rather than blindly use mremap and end up
> performing very poorly. If we're disabling fast mremap when THP is
> enabled, the userland component can't just rely on an architecture
> switch and some kind of runtime feature detection becomes even more
> important.

I hate to point out that its forbidden to top post on LKML :-)
https://kernelnewbies.org/mailinglistguidelines
So don't that Mr. Dan! :D

But anyway, I think this runtime detection thing is not needed. THP is
actually expected to be as fast as this anyway, so if that's available then
we should already be as fast. This is for non-THP where THP cannot be enabled
and there is still room for some improvement. Most/all architectures will be
just fine with this. This flag is more of a safety-net type of thing where in
the future if there is this one or two weird architectures that don't play
well, then they can turn it off at the architecture level by not selecting
the flag. See my latest patches for the per-architecture compile-time
controls. Ideally we'd like to blanket turn it on on all, but this is just
playing it extra safe as Kirill and me were discussing on other threads.

thanks!

- Joel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions
  2018-10-13  2:10                 ` Joel Fernandes
@ 2018-10-13  2:25                   ` Daniel Colascione
  2018-10-13 17:50                     ` Joel Fernandes
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Colascione @ 2018-10-13  2:25 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: David Miller, kirill, linux-kernel, kernel-team, Minchan Kim,
	Ramon Pantin, Hugh Dickins, Lokesh Gidra, Michal Hocko,
	Andrew Morton, aryabinin, luto, bp, catalin.marinas,
	Chris Zankel, dave.hansen, elfring, fenghua.yu, geert, gxt,
	deller, mingo, jejb, jdike, Jonas Bonn, Julia Lawall, kasan-dev,
	kvmarm, lftan, linux-alpha, linux-hexagon, linux-ia64,
	linux-m68k, linux-mips, linux-mm, linux-parisc, linuxppc-dev,
	linux-riscv, linux-s390, linux-sh, linux-snps-arc, linux-um,
	linux-xtensa, Max Filippov, nios2-dev, Peter Zijlstra, richard

On Fri, Oct 12, 2018 at 7:10 PM, Joel Fernandes <joel@joelfernandes.org> wrote:
> On Fri, Oct 12, 2018 at 06:54:33PM -0700, Daniel Colascione wrote:
>> I wonder whether it makes sense to expose to userspace somehow whether
>> mremap is "fast" for a particular architecture. If a feature relies on
>> fast mremap, it might be better for some userland component to disable
>> that feature entirely rather than blindly use mremap and end up
>> performing very poorly. If we're disabling fast mremap when THP is
>> enabled, the userland component can't just rely on an architecture
>> switch and some kind of runtime feature detection becomes even more
>> important.
>
> I hate to point out that its forbidden to top post on LKML :-)
> https://kernelnewbies.org/mailinglistguidelines
> So don't that Mr. Dan! :D

Guilty as charged. I really should switch back to Gnus. :-)

> But anyway, I think this runtime detection thing is not needed. THP is
> actually expected to be as fast as this anyway, so if that's available then
> we should already be as fast.

Ah, I think the commit message is confusing. (Or else I'm misreading
the patch now.) It's not quite that we're disabling the feature when
THP is enabled anywhere, but rather that we use the move_huge_pmd path
for huge PMDs and use the new code only for non-huge PMDs. (Right?) If
that's the case, the commit message shouldn't say "Incase THP is
enabled, the optimization is skipped". Even if THP is enabled on a
system generally, we might use the new PMD-moving code for mapping
types that don't support THP-ization, right?

> This is for non-THP where THP cannot be enabled
> and there is still room for some improvement. Most/all architectures will be
> just fine with this. This flag is more of a safety-net type of thing where in
> the future if there is this one or two weird architectures that don't play
> well, then they can turn it off at the architecture level by not selecting
> the flag. See my latest patches for the per-architecture compile-time
> controls. Ideally we'd like to blanket turn it on on all, but this is just
> playing it extra safe as Kirill and me were discussing on other threads.

Sure. I'm just pointing out that the 500x performance different turns
the operation into a qualitatively different feature, so if we expect
to actually ship a mainstream architecture without support for this
thing, we should make it explicit. If we're not, we shouldn't.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions
  2018-10-13  2:25                   ` Daniel Colascione
@ 2018-10-13 17:50                     ` Joel Fernandes
  0 siblings, 0 replies; 9+ messages in thread
From: Joel Fernandes @ 2018-10-13 17:50 UTC (permalink / raw)
  To: dancol
  Cc: David Miller, kirill, linux-kernel, kernel-team, Minchan Kim,
	Ramon Pantin, Hugh Dickins, Lokesh Gidra, Michal Hocko,
	Andrew Morton, aryabinin, luto, bp, catalin.marinas,
	Chris Zankel, dave.hansen, elfring, fenghua.yu, geert, gxt,
	deller, mingo, jejb, jdike, Jonas Bonn, Julia Lawall, kasan-dev,
	kvmarm, lftan, linux-alpha, linux-hexagon, linux-ia64,
	linux-m68k, linux-mips, linux-mm, linux-parisc, linuxppc-dev,
	linux-riscv, linux-s390, linux-sh, linux-snps-arc, linux-um,
	linux-xtensa, Max Filippov, nios2-dev, Peter Zijlstra, richard

On Fri, Oct 12, 2018 at 07:25:08PM -0700, Daniel Colascione wrote:
[...] 
> > But anyway, I think this runtime detection thing is not needed. THP is
> > actually expected to be as fast as this anyway, so if that's available then
> > we should already be as fast.
> 
> Ah, I think the commit message is confusing. (Or else I'm misreading
> the patch now.) It's not quite that we're disabling the feature when
> THP is enabled anywhere, but rather that we use the move_huge_pmd path
> for huge PMDs and use the new code only for non-huge PMDs. (Right?) If
> that's the case, the commit message shouldn't say "Incase THP is
> enabled, the optimization is skipped". Even if THP is enabled on a
> system generally, we might use the new PMD-moving code for mapping
> types that don't support THP-ization, right?

That is true. Ok, I guess I can update the commit message to be more accurate
about that.

> > This is for non-THP where THP cannot be enabled
> > and there is still room for some improvement. Most/all architectures will be
> > just fine with this. This flag is more of a safety-net type of thing where in
> > the future if there is this one or two weird architectures that don't play
> > well, then they can turn it off at the architecture level by not selecting
> > the flag. See my latest patches for the per-architecture compile-time
> > controls. Ideally we'd like to blanket turn it on on all, but this is just
> > playing it extra safe as Kirill and me were discussing on other threads.
> 
> Sure. I'm just pointing out that the 500x performance different turns
> the operation into a qualitatively different feature, so if we expect
> to actually ship a mainstream architecture without support for this
> thing, we should make it explicit. If we're not, we shouldn't.

We can make it explicit by enabling it in such a mainstream architecture is
my point. Also if the optimization is not doing what its supposed to, then
userspace will also just know by measuring the time.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-10-13 17:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20181012013756.11285-1-joel@joelfernandes.org>
     [not found] ` <20181012013756.11285-2-joel@joelfernandes.org>
     [not found]   ` <20181012113056.gxhcbrqyu7k7xnyv@kshutemo-mobl1>
2018-10-12 18:02     ` [PATCH v2 2/2] mm: speed up mremap by 500x on large regions David Miller
     [not found]     ` <20181012125046.GA170912@joelaf.mtv.corp.google.com>
2018-10-12 18:18       ` David Miller
2018-10-13  1:35         ` Joel Fernandes
2018-10-13  1:39           ` Daniel Colascione
2018-10-13  1:44             ` Joel Fernandes
2018-10-13  1:54               ` Daniel Colascione
2018-10-13  2:10                 ` Joel Fernandes
2018-10-13  2:25                   ` Daniel Colascione
2018-10-13 17:50                     ` Joel Fernandes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).