Re: Linux 3.19-rc3 - Kirill A. Shutemov

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Mark Langsdorf <mlangsdo@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: Linux 3.19-rc3
Date: Sat, 10 Jan 2015 02:35:40 +0200	[thread overview]
Message-ID: <20150110003540.GA32037@node.dhcp.inet.fi> (raw)
In-Reply-To: <20150109232707.GA6325@e104818-lin.cambridge.arm.com>

On Fri, Jan 09, 2015 at 11:27:07PM +0000, Catalin Marinas wrote:
> On Thu, Jan 08, 2015 at 07:21:02PM +0000, Linus Torvalds wrote:
> > The only excuse for 64kB pages is "my hardware TLB is complete crap,
> > and I have very specialized server-only loads".
> 
> I would make a slight correction: s/and/or/.
> 
> I agree that for a general purpose system (and even systems like web
> hosting servers), 64KB is overkill; 16KB may be a better compromise.
> 
> There are however some specialised loads that benefit from this. The
> main example here is virtualisation where if both guest and host use 4
> levels of page tables each (that's what you may get with 4KB pages on
> arm64), a full TLB miss in both stages of translation (the ARM
> terminology for nested page tables) needs up to _24_ memory accesses
> (though cached). Of course, once the TLB warms up, there will be much
> less but for new mmaps you always get some misses.
> 
> With 64KB pages (in the host usually), you can reduce the page table
> levels to three or two (the latter for 42-bit VA) or you could even
> couple this with some insanely huge pages (512MB, the next up from 64KB)
> to decrease the number of levels further.
> 
> I see three main advantages: the usual reduced TLB pressure (which
> arguably can be solved with bigger TLBs), less TLB misses and, pretty
> important with virtualisation, the cost of the TLB miss due to a reduced
> number of levels. But that's for the user to balance the advantages and
> disadvantages you already mentioned based on the planned workload (e.g.
> host configured with 64KB pages while guests use 4KB).
> 
> Another aspect on ARM is the TLB flushing on (large) MP systems. With a
> larger page size, we reduce the number of TLB operation (in-hardware)
> broadcasting between CPUs (we could use non-broadcasting ops and IPIs,
> not sure they are any faster though).

With bigger page size there's also reduction in number of entities to
handle by kernel: less memory occupied by struct pages, fewer pages on
lru, etc.

Managing a lot of memory (TiB scale) with 4k chunks is just insane.
We will need to find a way to cluster memory together to manage it
reasonably. Whether it bigger base page size or some other mechanism.
Maybe THP? ;)

-- 
 Kirill A. Shutemov

WARNING: multiple messages have this Message-ID (diff)

From: kirill@shutemov.name (Kirill A. Shutemov)
To: linux-arm-kernel@lists.infradead.org
Subject: Linux 3.19-rc3
Date: Sat, 10 Jan 2015 02:35:40 +0200	[thread overview]
Message-ID: <20150110003540.GA32037@node.dhcp.inet.fi> (raw)
In-Reply-To: <20150109232707.GA6325@e104818-lin.cambridge.arm.com>

On Fri, Jan 09, 2015 at 11:27:07PM +0000, Catalin Marinas wrote:
> On Thu, Jan 08, 2015 at 07:21:02PM +0000, Linus Torvalds wrote:
> > The only excuse for 64kB pages is "my hardware TLB is complete crap,
> > and I have very specialized server-only loads".
> 
> I would make a slight correction: s/and/or/.
> 
> I agree that for a general purpose system (and even systems like web
> hosting servers), 64KB is overkill; 16KB may be a better compromise.
> 
> There are however some specialised loads that benefit from this. The
> main example here is virtualisation where if both guest and host use 4
> levels of page tables each (that's what you may get with 4KB pages on
> arm64), a full TLB miss in both stages of translation (the ARM
> terminology for nested page tables) needs up to _24_ memory accesses
> (though cached). Of course, once the TLB warms up, there will be much
> less but for new mmaps you always get some misses.
> 
> With 64KB pages (in the host usually), you can reduce the page table
> levels to three or two (the latter for 42-bit VA) or you could even
> couple this with some insanely huge pages (512MB, the next up from 64KB)
> to decrease the number of levels further.
> 
> I see three main advantages: the usual reduced TLB pressure (which
> arguably can be solved with bigger TLBs), less TLB misses and, pretty
> important with virtualisation, the cost of the TLB miss due to a reduced
> number of levels. But that's for the user to balance the advantages and
> disadvantages you already mentioned based on the planned workload (e.g.
> host configured with 64KB pages while guests use 4KB).
> 
> Another aspect on ARM is the TLB flushing on (large) MP systems. With a
> larger page size, we reduce the number of TLB operation (in-hardware)
> broadcasting between CPUs (we could use non-broadcasting ops and IPIs,
> not sure they are any faster though).

With bigger page size there's also reduction in number of entities to
handle by kernel: less memory occupied by struct pages, fewer pages on
lru, etc.

Managing a lot of memory (TiB scale) with 4k chunks is just insane.
We will need to find a way to cluster memory together to manage it
reasonably. Whether it bigger base page size or some other mechanism.
Maybe THP? ;)

-- 
 Kirill A. Shutemov

next prev parent reply	other threads:[~2015-01-10  0:38 UTC|newest]

Thread overview: 154+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-06  1:46 Linux 3.19-rc3 Linus Torvalds
2015-01-06  2:46 ` Dave Jones
2015-01-06  8:18   ` Takashi Iwai
2015-01-06  9:45   ` Jiri Kosina
2015-01-08 12:51 ` Mark Langsdorf
2015-01-08 12:51   ` Mark Langsdorf
2015-01-08 13:45   ` Catalin Marinas
2015-01-08 13:45     ` Catalin Marinas
2015-01-08 17:29     ` Mark Langsdorf
2015-01-08 17:29       ` Mark Langsdorf
2015-01-08 17:34       ` Catalin Marinas
2015-01-08 17:34         ` Catalin Marinas
2015-01-08 18:48         ` Mark Langsdorf
2015-01-08 18:48           ` Mark Langsdorf
2015-01-08 19:21           ` Linus Torvalds
2015-01-08 19:21             ` Linus Torvalds
2015-01-09 23:27             ` Catalin Marinas
2015-01-09 23:27               ` Catalin Marinas
2015-01-10  0:35               ` Kirill A. Shutemov [this message]
2015-01-10  0:35                 ` Kirill A. Shutemov
2015-01-10  2:27                 ` Linus Torvalds
2015-01-10  2:27                   ` Linus Torvalds
2015-01-10  2:51                   ` David Lang
2015-01-10  2:51                     ` David Lang
2015-01-10  3:06                     ` Linus Torvalds
2015-01-10  3:06                       ` Linus Torvalds
2015-01-10 10:46                       ` Andreas Mohr
2015-01-10 10:46                         ` Andreas Mohr
2015-01-10 19:42                         ` Linus Torvalds
2015-01-10 19:42                           ` Linus Torvalds
2015-01-13  3:33                     ` Rik van Riel
2015-01-13  3:33                       ` Rik van Riel
2015-01-13 10:28                       ` Catalin Marinas
2015-01-13 10:28                         ` Catalin Marinas
2015-01-10  3:17                   ` Tony Luck
2015-01-10  3:17                     ` Tony Luck
2015-01-10 20:16                   ` Arnd Bergmann
2015-01-10 20:16                     ` Arnd Bergmann
2015-01-10 21:00                     ` Linus Torvalds
2015-01-10 21:00                       ` Linus Torvalds
2015-01-10 21:36                       ` Arnd Bergmann
2015-01-10 21:36                         ` Arnd Bergmann
2015-01-10 21:48                         ` Linus Torvalds
2015-01-10 21:48                           ` Linus Torvalds
2015-01-12 11:37                         ` Kirill A. Shutemov
2015-01-12 11:37                           ` Kirill A. Shutemov
2015-01-12 12:18                         ` Catalin Marinas
2015-01-12 12:18                           ` Catalin Marinas
2015-01-12 13:57                           ` Arnd Bergmann
2015-01-12 13:57                             ` Arnd Bergmann
2015-01-12 14:23                             ` Catalin Marinas
2015-01-12 14:23                               ` Catalin Marinas
2015-01-12 15:42                               ` Arnd Bergmann
2015-01-12 15:42                                 ` Arnd Bergmann
2015-01-12 11:53                     ` Catalin Marinas
2015-01-12 11:53                       ` Catalin Marinas
2015-01-12 13:15                       ` Arnd Bergmann
2015-01-12 13:15                         ` Arnd Bergmann
2015-01-08 15:08   ` Michal Hocko
2015-01-08 15:08     ` Michal Hocko
2015-01-08 15:08     ` Michal Hocko
2015-01-08 16:37     ` Mark Langsdorf
2015-01-08 16:37       ` Mark Langsdorf
2015-01-08 16:37       ` Mark Langsdorf
2015-01-09 15:56       ` Michal Hocko
2015-01-09 15:56         ` Michal Hocko
2015-01-09 15:56         ` Michal Hocko
2015-01-09 12:13   ` Mark Rutland
2015-01-09 12:13     ` Mark Rutland
2015-01-09 14:19     ` Steve Capper
2015-01-09 14:19       ` Steve Capper
2015-01-09 14:27       ` Mark Langsdorf
2015-01-09 14:27         ` Mark Langsdorf
2015-01-09 17:57         ` Mark Rutland
2015-01-09 17:57           ` Mark Rutland
2015-01-09 18:37           ` Marc Zyngier
2015-01-09 18:37             ` Marc Zyngier
2015-01-09 19:43             ` Will Deacon
2015-01-09 19:43               ` Will Deacon
2015-01-10  3:29               ` Laszlo Ersek
2015-01-10  3:29                 ` Laszlo Ersek
2015-01-10  4:39                 ` Linus Torvalds
2015-01-10  4:39                   ` Linus Torvalds
2015-01-10 13:37                   ` Will Deacon
2015-01-10 13:37                     ` Will Deacon
2015-01-10 19:47                     ` Laszlo Ersek
2015-01-10 19:47                       ` Laszlo Ersek
2015-01-10 19:56                       ` Linus Torvalds
2015-01-10 19:56                         ` Linus Torvalds
2015-01-10 20:08                         ` Laszlo Ersek
2015-01-10 20:08                           ` Laszlo Ersek
2015-01-10 19:51                     ` Linus Torvalds
2015-01-10 19:51                       ` Linus Torvalds
2015-01-12 12:42                       ` Will Deacon
2015-01-12 12:42                         ` Will Deacon
2015-01-12 13:22                         ` Mark Langsdorf
2015-01-12 13:22                           ` Mark Langsdorf
2015-01-12 19:03                         ` Dave Hansen
2015-01-12 19:03                           ` Dave Hansen
2015-01-12 19:06                         ` Linus Torvalds
2015-01-12 19:06                           ` Linus Torvalds
2015-01-12 19:07                           ` Linus Torvalds
2015-01-12 19:07                             ` Linus Torvalds
2015-01-12 19:24                             ` Will Deacon
2015-01-12 19:24                               ` Will Deacon
2015-01-10 15:22                 ` Kyle McMartin
2015-01-10 15:22                   ` Kyle McMartin
2015-01-06  4:49 Sedat Dilek
2015-01-06  9:34 ` Sedat Dilek
2015-01-06  9:56   ` Takashi Iwai
2015-01-06 10:06     ` Sedat Dilek
2015-01-06 10:28       ` Takashi Iwai
2015-01-06 10:31         ` Sedat Dilek
2015-01-06 10:37           ` Takashi Iwai
2015-01-06 10:42             ` Sedat Dilek
2015-01-06  9:59   ` Peter Zijlstra
2015-01-06  9:40 ` Peter Zijlstra
2015-01-06  9:42   ` Sedat Dilek
2015-01-06  9:57     ` Sedat Dilek
2015-01-06 10:06       ` Peter Zijlstra
2015-01-06 10:18         ` Sedat Dilek
2015-01-06 11:01           ` Peter Zijlstra
2015-01-06 11:07             ` Kent Overstreet
2015-01-06 11:25               ` Sedat Dilek
2015-01-06 11:40                 ` Kent Overstreet
2015-01-06 12:51                   ` Sedat Dilek
2015-01-06 11:42               ` Peter Zijlstra
2015-01-06 11:48                 ` Peter Zijlstra
2015-01-06 12:01                   ` Kent Overstreet
2015-01-06 12:20                     ` Peter Zijlstra
2015-01-06 12:45                       ` Kent Overstreet
2015-01-06 12:55                       ` Peter Hurley
2015-01-06 17:38                         ` Paul E. McKenney
2015-01-06 17:58                           ` Peter Hurley
2015-01-06 19:25                             ` Paul E. McKenney
2015-01-06 19:57                               ` Peter Hurley
2015-01-06 20:47                                 ` Paul E. McKenney
2015-01-20  0:30                                   ` Paul E. McKenney
2015-01-20 14:03                                     ` Peter Hurley
2015-02-02 16:11                                       ` Paul E. McKenney
2015-02-02 19:03                                         ` Peter Hurley
2015-02-02 19:33                                           ` Paul E. McKenney
2015-01-06 11:56                 ` Kent Overstreet
2015-01-06 12:16                   ` Peter Zijlstra
2015-01-06 12:43                     ` Kent Overstreet
2015-01-06 13:03                       ` Peter Zijlstra
2015-01-06 13:28                         ` Kent Overstreet
2015-01-13 15:23                           ` Peter Zijlstra
2015-01-06 11:58               ` Peter Zijlstra
2015-01-06 12:18                 ` Kent Overstreet
2015-01-16 16:56               ` Peter Hurley
2015-01-16 17:00                 ` Chris Mason
2015-01-16 18:58                   ` Peter Hurley
2015-01-06 10:29   ` Sedat Dilek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150110003540.GA32037@node.dhcp.inet.fi \
    --to=kirill@shutemov.name \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mlangsdo@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.