All of lore.kernel.org
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Arnd Bergmann <arnd@arndb.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Mark Langsdorf <mlangsdo@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Linux 3.19-rc3
Date: Mon, 12 Jan 2015 12:18:15 +0000	[thread overview]
Message-ID: <20150112121815.GB19807@e104818-lin.cambridge.arm.com> (raw)
In-Reply-To: <10028397.kdbz8TfPck@wuerfel>

On Sat, Jan 10, 2015 at 09:36:13PM +0000, Arnd Bergmann wrote:
> On Saturday 10 January 2015 13:00:27 Linus Torvalds wrote:
> > > IIRC, AIX works great with 64k pages, but only because of two
> > > reasons that don't apply on Linux:
> > 
> > .. there's a few other ones:
> > 
> >  (c) nobody really runs AIX on dekstops. It's very much a DB load
> > environment, with historically some HPC.
> > 
> >  (d) the powerpc TLB fill/buildup/teardown costs are horrible, so on
> > AIX the cost of lots of small pages is much higher too.
> 
> I think (d) applies to ARM as well, since it has no hardware
> dirty/referenced bit tracking and requires the OS to mark the
> pages as invalid/readonly until the first access. ARMv8.1
> has a fix for that, but it's optional and we haven't seen any
> implementations yet.

Do you happen have any data on how significantly non-hardware
dirty/access bits impact the performance? I think it may affect the user
process start-up time a but at run-time it shouldn't be that bad.

If it is that significant, we could optimise it further in the arch
code. For example, make a fast exception path where we need to mark the
pte dirty. This would be handled by arch code without even calling
handle_pte_fault().

> > so I feel pretty confident in saying it won't happen. It's just too
> > much of a bother, for little to no actual upside. It's likely a much
> > better approach to try to instead use THP for anonymous mappings.
> 
> arm64 already supports 2MB transparent hugepages. I guess it
> wouldn't be too hard to change it so that an existing hugepage
> on an anonymous mapping that gets split up into 4KB pages gets
> split along 64KB boundaries with the contiguous mapping bit set.
> 
> Having full support for multiple hugepage sizes (64KB, 2MB and 32MB
> in case of ARM64 with 4KB PAGE_SIZE) would be even better and
> probably negate any benefits of 64KB PAGE_SIZE, but requires more
> changes to common mm code.

As I replied to your other email, I don't think that's simple for the
transparent huge pages case.

The main advantage I see with 64KB pages is not the reduced TLB pressure
but the number of levels of page tables. Take the AMD Seattle board for
example, with 4KB pages you need 4 levels but 64KB allow only 2 levels
(42-bit VA). Larger TLBs and improved walk caches (caching VA -> pmd
entry translation rather than all the way to pte/PA) make things better
but you still have the warming up time for any fork/new process as they
don't share the same TLB entries.

But as Linus said already, the trade-off with the memory wastage
is highly dependent on the targeted load.

-- 
Catalin

WARNING: multiple messages have this Message-ID (diff)
From: catalin.marinas@arm.com (Catalin Marinas)
To: linux-arm-kernel@lists.infradead.org
Subject: Linux 3.19-rc3
Date: Mon, 12 Jan 2015 12:18:15 +0000	[thread overview]
Message-ID: <20150112121815.GB19807@e104818-lin.cambridge.arm.com> (raw)
In-Reply-To: <10028397.kdbz8TfPck@wuerfel>

On Sat, Jan 10, 2015 at 09:36:13PM +0000, Arnd Bergmann wrote:
> On Saturday 10 January 2015 13:00:27 Linus Torvalds wrote:
> > > IIRC, AIX works great with 64k pages, but only because of two
> > > reasons that don't apply on Linux:
> > 
> > .. there's a few other ones:
> > 
> >  (c) nobody really runs AIX on dekstops. It's very much a DB load
> > environment, with historically some HPC.
> > 
> >  (d) the powerpc TLB fill/buildup/teardown costs are horrible, so on
> > AIX the cost of lots of small pages is much higher too.
> 
> I think (d) applies to ARM as well, since it has no hardware
> dirty/referenced bit tracking and requires the OS to mark the
> pages as invalid/readonly until the first access. ARMv8.1
> has a fix for that, but it's optional and we haven't seen any
> implementations yet.

Do you happen have any data on how significantly non-hardware
dirty/access bits impact the performance? I think it may affect the user
process start-up time a but at run-time it shouldn't be that bad.

If it is that significant, we could optimise it further in the arch
code. For example, make a fast exception path where we need to mark the
pte dirty. This would be handled by arch code without even calling
handle_pte_fault().

> > so I feel pretty confident in saying it won't happen. It's just too
> > much of a bother, for little to no actual upside. It's likely a much
> > better approach to try to instead use THP for anonymous mappings.
> 
> arm64 already supports 2MB transparent hugepages. I guess it
> wouldn't be too hard to change it so that an existing hugepage
> on an anonymous mapping that gets split up into 4KB pages gets
> split along 64KB boundaries with the contiguous mapping bit set.
> 
> Having full support for multiple hugepage sizes (64KB, 2MB and 32MB
> in case of ARM64 with 4KB PAGE_SIZE) would be even better and
> probably negate any benefits of 64KB PAGE_SIZE, but requires more
> changes to common mm code.

As I replied to your other email, I don't think that's simple for the
transparent huge pages case.

The main advantage I see with 64KB pages is not the reduced TLB pressure
but the number of levels of page tables. Take the AMD Seattle board for
example, with 4KB pages you need 4 levels but 64KB allow only 2 levels
(42-bit VA). Larger TLBs and improved walk caches (caching VA -> pmd
entry translation rather than all the way to pte/PA) make things better
but you still have the warming up time for any fork/new process as they
don't share the same TLB entries.

But as Linus said already, the trade-off with the memory wastage
is highly dependent on the targeted load.

-- 
Catalin

  parent reply	other threads:[~2015-01-12 12:18 UTC|newest]

Thread overview: 154+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-06  1:46 Linux 3.19-rc3 Linus Torvalds
2015-01-06  2:46 ` Dave Jones
2015-01-06  8:18   ` Takashi Iwai
2015-01-06  9:45   ` Jiri Kosina
2015-01-08 12:51 ` Mark Langsdorf
2015-01-08 12:51   ` Mark Langsdorf
2015-01-08 13:45   ` Catalin Marinas
2015-01-08 13:45     ` Catalin Marinas
2015-01-08 17:29     ` Mark Langsdorf
2015-01-08 17:29       ` Mark Langsdorf
2015-01-08 17:34       ` Catalin Marinas
2015-01-08 17:34         ` Catalin Marinas
2015-01-08 18:48         ` Mark Langsdorf
2015-01-08 18:48           ` Mark Langsdorf
2015-01-08 19:21           ` Linus Torvalds
2015-01-08 19:21             ` Linus Torvalds
2015-01-09 23:27             ` Catalin Marinas
2015-01-09 23:27               ` Catalin Marinas
2015-01-10  0:35               ` Kirill A. Shutemov
2015-01-10  0:35                 ` Kirill A. Shutemov
2015-01-10  2:27                 ` Linus Torvalds
2015-01-10  2:27                   ` Linus Torvalds
2015-01-10  2:51                   ` David Lang
2015-01-10  2:51                     ` David Lang
2015-01-10  3:06                     ` Linus Torvalds
2015-01-10  3:06                       ` Linus Torvalds
2015-01-10 10:46                       ` Andreas Mohr
2015-01-10 10:46                         ` Andreas Mohr
2015-01-10 19:42                         ` Linus Torvalds
2015-01-10 19:42                           ` Linus Torvalds
2015-01-13  3:33                     ` Rik van Riel
2015-01-13  3:33                       ` Rik van Riel
2015-01-13 10:28                       ` Catalin Marinas
2015-01-13 10:28                         ` Catalin Marinas
2015-01-10  3:17                   ` Tony Luck
2015-01-10  3:17                     ` Tony Luck
2015-01-10 20:16                   ` Arnd Bergmann
2015-01-10 20:16                     ` Arnd Bergmann
2015-01-10 21:00                     ` Linus Torvalds
2015-01-10 21:00                       ` Linus Torvalds
2015-01-10 21:36                       ` Arnd Bergmann
2015-01-10 21:36                         ` Arnd Bergmann
2015-01-10 21:48                         ` Linus Torvalds
2015-01-10 21:48                           ` Linus Torvalds
2015-01-12 11:37                         ` Kirill A. Shutemov
2015-01-12 11:37                           ` Kirill A. Shutemov
2015-01-12 12:18                         ` Catalin Marinas [this message]
2015-01-12 12:18                           ` Catalin Marinas
2015-01-12 13:57                           ` Arnd Bergmann
2015-01-12 13:57                             ` Arnd Bergmann
2015-01-12 14:23                             ` Catalin Marinas
2015-01-12 14:23                               ` Catalin Marinas
2015-01-12 15:42                               ` Arnd Bergmann
2015-01-12 15:42                                 ` Arnd Bergmann
2015-01-12 11:53                     ` Catalin Marinas
2015-01-12 11:53                       ` Catalin Marinas
2015-01-12 13:15                       ` Arnd Bergmann
2015-01-12 13:15                         ` Arnd Bergmann
2015-01-08 15:08   ` Michal Hocko
2015-01-08 15:08     ` Michal Hocko
2015-01-08 15:08     ` Michal Hocko
2015-01-08 16:37     ` Mark Langsdorf
2015-01-08 16:37       ` Mark Langsdorf
2015-01-08 16:37       ` Mark Langsdorf
2015-01-09 15:56       ` Michal Hocko
2015-01-09 15:56         ` Michal Hocko
2015-01-09 15:56         ` Michal Hocko
2015-01-09 12:13   ` Mark Rutland
2015-01-09 12:13     ` Mark Rutland
2015-01-09 14:19     ` Steve Capper
2015-01-09 14:19       ` Steve Capper
2015-01-09 14:27       ` Mark Langsdorf
2015-01-09 14:27         ` Mark Langsdorf
2015-01-09 17:57         ` Mark Rutland
2015-01-09 17:57           ` Mark Rutland
2015-01-09 18:37           ` Marc Zyngier
2015-01-09 18:37             ` Marc Zyngier
2015-01-09 19:43             ` Will Deacon
2015-01-09 19:43               ` Will Deacon
2015-01-10  3:29               ` Laszlo Ersek
2015-01-10  3:29                 ` Laszlo Ersek
2015-01-10  4:39                 ` Linus Torvalds
2015-01-10  4:39                   ` Linus Torvalds
2015-01-10 13:37                   ` Will Deacon
2015-01-10 13:37                     ` Will Deacon
2015-01-10 19:47                     ` Laszlo Ersek
2015-01-10 19:47                       ` Laszlo Ersek
2015-01-10 19:56                       ` Linus Torvalds
2015-01-10 19:56                         ` Linus Torvalds
2015-01-10 20:08                         ` Laszlo Ersek
2015-01-10 20:08                           ` Laszlo Ersek
2015-01-10 19:51                     ` Linus Torvalds
2015-01-10 19:51                       ` Linus Torvalds
2015-01-12 12:42                       ` Will Deacon
2015-01-12 12:42                         ` Will Deacon
2015-01-12 13:22                         ` Mark Langsdorf
2015-01-12 13:22                           ` Mark Langsdorf
2015-01-12 19:03                         ` Dave Hansen
2015-01-12 19:03                           ` Dave Hansen
2015-01-12 19:06                         ` Linus Torvalds
2015-01-12 19:06                           ` Linus Torvalds
2015-01-12 19:07                           ` Linus Torvalds
2015-01-12 19:07                             ` Linus Torvalds
2015-01-12 19:24                             ` Will Deacon
2015-01-12 19:24                               ` Will Deacon
2015-01-10 15:22                 ` Kyle McMartin
2015-01-10 15:22                   ` Kyle McMartin
2015-01-06  4:49 Sedat Dilek
2015-01-06  9:34 ` Sedat Dilek
2015-01-06  9:56   ` Takashi Iwai
2015-01-06 10:06     ` Sedat Dilek
2015-01-06 10:28       ` Takashi Iwai
2015-01-06 10:31         ` Sedat Dilek
2015-01-06 10:37           ` Takashi Iwai
2015-01-06 10:42             ` Sedat Dilek
2015-01-06  9:59   ` Peter Zijlstra
2015-01-06  9:40 ` Peter Zijlstra
2015-01-06  9:42   ` Sedat Dilek
2015-01-06  9:57     ` Sedat Dilek
2015-01-06 10:06       ` Peter Zijlstra
2015-01-06 10:18         ` Sedat Dilek
2015-01-06 11:01           ` Peter Zijlstra
2015-01-06 11:07             ` Kent Overstreet
2015-01-06 11:25               ` Sedat Dilek
2015-01-06 11:40                 ` Kent Overstreet
2015-01-06 12:51                   ` Sedat Dilek
2015-01-06 11:42               ` Peter Zijlstra
2015-01-06 11:48                 ` Peter Zijlstra
2015-01-06 12:01                   ` Kent Overstreet
2015-01-06 12:20                     ` Peter Zijlstra
2015-01-06 12:45                       ` Kent Overstreet
2015-01-06 12:55                       ` Peter Hurley
2015-01-06 17:38                         ` Paul E. McKenney
2015-01-06 17:58                           ` Peter Hurley
2015-01-06 19:25                             ` Paul E. McKenney
2015-01-06 19:57                               ` Peter Hurley
2015-01-06 20:47                                 ` Paul E. McKenney
2015-01-20  0:30                                   ` Paul E. McKenney
2015-01-20 14:03                                     ` Peter Hurley
2015-02-02 16:11                                       ` Paul E. McKenney
2015-02-02 19:03                                         ` Peter Hurley
2015-02-02 19:33                                           ` Paul E. McKenney
2015-01-06 11:56                 ` Kent Overstreet
2015-01-06 12:16                   ` Peter Zijlstra
2015-01-06 12:43                     ` Kent Overstreet
2015-01-06 13:03                       ` Peter Zijlstra
2015-01-06 13:28                         ` Kent Overstreet
2015-01-13 15:23                           ` Peter Zijlstra
2015-01-06 11:58               ` Peter Zijlstra
2015-01-06 12:18                 ` Kent Overstreet
2015-01-16 16:56               ` Peter Hurley
2015-01-16 17:00                 ` Chris Mason
2015-01-16 18:58                   ` Peter Hurley
2015-01-06 10:29   ` Sedat Dilek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150112121815.GB19807@e104818-lin.cambridge.arm.com \
    --to=catalin.marinas@arm.com \
    --cc=arnd@arndb.de \
    --cc=kirill@shutemov.name \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mlangsdo@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.