linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC, PATCHv1 00/28] 5-level paging
@ 2016-12-08 16:21 Kirill A. Shutemov
  2016-12-08 16:21 ` [QEMU, PATCH] x86: implement la57 paging mode Kirill A. Shutemov
                   ` (30 more replies)
  0 siblings, 31 replies; 64+ messages in thread
From: Kirill A. Shutemov @ 2016-12-08 16:21 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, x86, Thomas Gleixner, Ingo Molnar,
	Arnd Bergmann, H. Peter Anvin
  Cc: Andi Kleen, Dave Hansen, Andy Lutomirski, linux-arch, linux-mm,
	linux-kernel, Kirill A. Shutemov

x86-64 is currently limited to 256 TiB of virtual address space and 64 TiB
of physical address space. We are already bumping into this limit: some
vendors offers servers with 64 TiB of memory today.

To overcome the limitation upcoming hardware will introduce support for
5-level paging[1]. It is a straight-forward extension of the current page
table structure adding one more layer of translation.

It bumps the limits to 128 PiB of virtual address space and 4 PiB of
physical address space. This "ought to be enough for anybody" ©.

This patchset is still very early. There are a number of things missing
that we have to do before asking anyone to merge it (listed below).
It would be great if folks can start testing applications now (in QEMU) to
look for breakage.
Any early comments on the design or the patches would be appreciated as
well.

More details on the design and what’s left to implement are below.

  - Linux MM now uses 5-level paging abstraction.

    New page table level is p4d, just below pgd.

  - All architectures converted to folded 5-level paging.

    I added <asm-generic/5level-fixup.h>. It uses the same basic
    approach as <asm-generic/4level-fixup.h> hack.

  - x86 is converted to new <asm-generic/pgtable-nop4d.h>

    All existing paging modes (2-, 3-, 4-level) on x86 are converted to
    pgtable-nop4d.h.

    The new header provides basics for properly folded additional page
    table level. The idea is the same as with other pgtable-nop?d.h.

  - Implement 5-level paging in x86.

    CONFIG_X86_5LEVEL=y will enable new 5-level paging mode.

The patchset is build on top of v4.8.

I've also included a QEMU patch which enables 5-level paging in the
emulator, so anybody can play with the feature.

There is still work to do:

  - Boot-time switch between 4- and 5-level paging.

    We assume that distributions will be keen to avoid returning to the
    i386 days where we shipped one kernel binary for each page table
    layout.

    As page table format is the same for 4- and 5-level paging it should
    be possible to have single kernel binary and switch between them at
    boot-time without too much hassle.

    For now I only implemented compile-time switch.

    I hoped to bring this feature with separate patchset once basic
    enabling is in upstream.

    Is it okay?

  - Handle opt-in wider address space for userspace.

    Not all userspace is ready to handle addresses wider than current
    47-bits. At least some JIT compiler make use of upper bits to encode
    their info.

    We need to have an interface to opt-in wider addresses from userspace
    to avoid regressions.

    For now, I've included testing-only patch which bumps TASK_SIZE to
    56-bits. This can be handy for testing to see what breaks if we max-out
    size of virtual address space.

  - CONFIG_XEN is broken.

    Paravirt Xen MMU support hasn't yet adjusted to work with 5-level
    paging. It's legacy feature, not sure if we really need to support it
    with new paging, but it blocks Xen drivers too.

    I haven't got around to setup testing environment for XEN, so left it
    broken for now.

    I would appreciate help with the code.

  - Split patches further.

    In some cases it's not trivial to split patches into reasonable pieces
    without breaking bisectability

  - Validation.

    I haven't done much testing beyond basic boot.

Git:
	git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git la57/v1

Any comments are welcome.

[1] https://software.intel.com/sites/default/files/managed/2b/80/5-level_paging_white_paper.pdf
Kirill A. Shutemov (28):
  asm-generic: introduce 5level-fixup.h
  asm-generic: introduce __ARCH_USE_5LEVEL_HACK
  arch, mm: convert all architectures to use 5level-fixup.h
  asm-generic: introduce <asm-generic/pgtable-nop4d.h>
  mm: convert generic code to 5-level paging
  x86: basic changes into headers for 5-level paging
  x86: trivial portion of 5-level paging conversion
  x86/gup: add 5-level paging support
  x86/ident_map: add 5-level paging support
  x86/mm: add support of p4d_t in vmalloc_fault()
  x86/power: support p4d_t in hibernate code
  x86/kexec: support p4d_t
  x86: convert the rest of the code to support p4d_t
  mm: introduce __p4d_alloc()
  x86: detect 5-level paging support
  x86/asm: remove __VIRTUAL_MASK_SHIFT==47 assert
  x86/mm: define virtual memory map for 5-level paging
  x86/paravirt: make paravirt code support 5-level paging
  x86/mm: basic defines/helpers for CONFIG_X86_5LEVEL
  x86/dump_pagetables: support 5-level paging
  x86/mm: extend kasan to support 5-level paging
  x86/espfix: support 5-level paging
  x86/mm: add support of additional page table level during early boot
  x86/mm: add sync_global_pgds() for configuration with 5-level paging
  x86/mm: make kernel_physical_mapping_init() support 5-level paging
  x86/mm: add support for 5-level paging for KASLR
  x86: enable la57 support
  TESTING-ONLY: bump TASK_SIZE_MAX

 Documentation/x86/x86_64/mm.txt                  |  23 +-
 arch/arc/include/asm/hugepage.h                  |   1 +
 arch/arc/include/asm/pgtable.h                   |   1 +
 arch/arm/include/asm/pgtable.h                   |   1 +
 arch/arm64/include/asm/pgtable-types.h           |   4 +
 arch/avr32/include/asm/pgtable-2level.h          |   1 +
 arch/cris/include/asm/pgtable.h                  |   1 +
 arch/frv/include/asm/pgtable.h                   |   1 +
 arch/h8300/include/asm/pgtable.h                 |   1 +
 arch/hexagon/include/asm/pgtable.h               |   1 +
 arch/ia64/include/asm/pgtable.h                  |   2 +
 arch/metag/include/asm/pgtable.h                 |   1 +
 arch/mips/include/asm/pgtable-32.h               |   1 +
 arch/mips/include/asm/pgtable-64.h               |   1 +
 arch/mn10300/include/asm/page.h                  |   1 +
 arch/nios2/include/asm/pgtable.h                 |   1 +
 arch/openrisc/include/asm/pgtable.h              |   1 +
 arch/powerpc/include/asm/book3s/32/pgtable.h     |   1 +
 arch/powerpc/include/asm/book3s/64/pgtable.h     |   2 +
 arch/powerpc/include/asm/nohash/32/pgtable.h     |   1 +
 arch/powerpc/include/asm/nohash/64/pgtable-4k.h  |   3 +
 arch/powerpc/include/asm/nohash/64/pgtable-64k.h |   1 +
 arch/s390/include/asm/pgtable.h                  |   1 +
 arch/score/include/asm/pgtable.h                 |   1 +
 arch/sh/include/asm/pgtable-2level.h             |   1 +
 arch/sh/include/asm/pgtable-3level.h             |   1 +
 arch/sparc/include/asm/pgtable_64.h              |   1 +
 arch/tile/include/asm/pgtable_32.h               |   1 +
 arch/tile/include/asm/pgtable_64.h               |   1 +
 arch/um/include/asm/pgtable-2level.h             |   1 +
 arch/um/include/asm/pgtable-3level.h             |   1 +
 arch/unicore32/include/asm/pgtable.h             |   1 +
 arch/x86/Kconfig                                 |   7 +
 arch/x86/boot/compressed/head_64.S               |  23 +-
 arch/x86/boot/cpucheck.c                         |   9 +
 arch/x86/boot/cpuflags.c                         |  16 ++
 arch/x86/entry/entry_64.S                        |   7 +-
 arch/x86/include/asm/cpufeatures.h               |   1 +
 arch/x86/include/asm/disabled-features.h         |   8 +-
 arch/x86/include/asm/kasan.h                     |   9 +-
 arch/x86/include/asm/kexec.h                     |   1 +
 arch/x86/include/asm/page_64_types.h             |  10 +
 arch/x86/include/asm/paravirt.h                  |  64 +++++-
 arch/x86/include/asm/paravirt_types.h            |  17 +-
 arch/x86/include/asm/pgalloc.h                   |  36 ++-
 arch/x86/include/asm/pgtable-2level_types.h      |   1 +
 arch/x86/include/asm/pgtable-3level_types.h      |   1 +
 arch/x86/include/asm/pgtable.h                   |  91 +++++++-
 arch/x86/include/asm/pgtable_64.h                |  29 ++-
 arch/x86/include/asm/pgtable_64_types.h          |  27 +++
 arch/x86/include/asm/pgtable_types.h             |  42 +++-
 arch/x86/include/asm/processor.h                 |   3 +-
 arch/x86/include/asm/required-features.h         |   8 +-
 arch/x86/include/asm/sparsemem.h                 |   9 +-
 arch/x86/include/uapi/asm/processor-flags.h      |   2 +
 arch/x86/kernel/espfix_64.c                      |  43 +++-
 arch/x86/kernel/head64.c                         |  40 +++-
 arch/x86/kernel/head_64.S                        |  58 +++--
 arch/x86/kernel/machine_kexec_32.c               |   4 +-
 arch/x86/kernel/machine_kexec_64.c               |  14 +-
 arch/x86/kernel/paravirt.c                       |  13 +-
 arch/x86/kernel/tboot.c                          |   6 +-
 arch/x86/kernel/vm86_32.c                        |   6 +-
 arch/x86/mm/dump_pagetables.c                    |  51 ++++-
 arch/x86/mm/fault.c                              |  57 ++++-
 arch/x86/mm/gup.c                                |  33 ++-
 arch/x86/mm/ident_map.c                          |  42 +++-
 arch/x86/mm/init_32.c                            |  22 +-
 arch/x86/mm/init_64.c                            | 274 +++++++++++++++++++----
 arch/x86/mm/ioremap.c                            |   3 +-
 arch/x86/mm/kasan_init_64.c                      |  42 +++-
 arch/x86/mm/kaslr.c                              |  82 +++++--
 arch/x86/mm/pageattr.c                           |  56 +++--
 arch/x86/mm/pgtable.c                            |  38 +++-
 arch/x86/mm/pgtable_32.c                         |   8 +-
 arch/x86/platform/efi/efi_64.c                   |  21 +-
 arch/x86/power/hibernate_32.c                    |   7 +-
 arch/x86/power/hibernate_64.c                    |  35 +--
 arch/x86/realmode/init.c                         |   2 +-
 arch/x86/xen/Kconfig                             |   1 +
 arch/xtensa/include/asm/pgtable.h                |   1 +
 drivers/misc/sgi-gru/grufault.c                  |   9 +-
 fs/userfaultfd.c                                 |   6 +-
 include/asm-generic/4level-fixup.h               |   3 +-
 include/asm-generic/5level-fixup.h               |  41 ++++
 include/asm-generic/pgtable-nop4d-hack.h         |  62 +++++
 include/asm-generic/pgtable-nop4d.h              |  56 +++++
 include/asm-generic/pgtable-nopud.h              |  48 ++--
 include/asm-generic/pgtable.h                    |  48 +++-
 include/asm-generic/tlb.h                        |  12 +-
 include/linux/hugetlb.h                          |   5 +-
 include/linux/kasan.h                            |   1 +
 include/linux/mm.h                               |  32 ++-
 lib/ioremap.c                                    |  39 +++-
 mm/gup.c                                         |  46 +++-
 mm/huge_memory.c                                 |   7 +-
 mm/hugetlb.c                                     |  29 ++-
 mm/kasan/kasan_init.c                            |  35 ++-
 mm/memory.c                                      | 230 ++++++++++++++++---
 mm/mlock.c                                       |   1 +
 mm/mprotect.c                                    |  26 ++-
 mm/mremap.c                                      |  13 +-
 mm/pagewalk.c                                    |  32 ++-
 mm/pgtable-generic.c                             |   6 +
 mm/rmap.c                                        |  13 +-
 mm/sparse-vmemmap.c                              |  22 +-
 mm/swapfile.c                                    |  26 ++-
 mm/userfaultfd.c                                 |  23 +-
 mm/vmalloc.c                                     |  81 +++++--
 109 files changed, 2027 insertions(+), 366 deletions(-)
 create mode 100644 include/asm-generic/5level-fixup.h
 create mode 100644 include/asm-generic/pgtable-nop4d-hack.h
 create mode 100644 include/asm-generic/pgtable-nop4d.h

-- 
2.10.2

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2016-12-15 22:16 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-08 16:21 [RFC, PATCHv1 00/28] 5-level paging Kirill A. Shutemov
2016-12-08 16:21 ` [QEMU, PATCH] x86: implement la57 paging mode Kirill A. Shutemov
2016-12-08 16:48   ` [Qemu-devel] " no-reply
2016-12-08 16:21 ` [RFC, PATCHv1 01/28] asm-generic: introduce 5level-fixup.h Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 02/28] asm-generic: introduce __ARCH_USE_5LEVEL_HACK Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 03/28] arch, mm: convert all architectures to use 5level-fixup.h Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 04/28] asm-generic: introduce <asm-generic/pgtable-nop4d.h> Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 05/28] mm: convert generic code to 5-level paging Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 06/28] x86: basic changes into headers for " Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 07/28] x86: trivial portion of 5-level paging conversion Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 08/28] x86/gup: add 5-level paging support Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 09/28] x86/ident_map: " Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 10/28] x86/mm: add support of p4d_t in vmalloc_fault() Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 11/28] x86/power: support p4d_t in hibernate code Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 12/28] x86/kexec: support p4d_t Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 13/28] x86: convert the rest of the code to " Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 14/28] mm: introduce __p4d_alloc() Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 15/28] x86: detect 5-level paging support Kirill A. Shutemov
2016-12-08 20:05   ` Borislav Petkov
2016-12-08 20:08     ` Linus Torvalds
2016-12-08 20:20       ` Borislav Petkov
2016-12-13 22:44         ` H. Peter Anvin
2016-12-13 23:07           ` Boris Petkov
2016-12-15 14:39             ` Borislav Petkov
2016-12-15 17:52               ` hpa
2016-12-15 19:09                 ` Borislav Petkov
2016-12-15 19:20                   ` Andi Kleen
2016-12-15 20:52                     ` hpa
2016-12-15 20:57                     ` hpa
2016-12-09 15:32     ` Kirill A. Shutemov
2016-12-09 16:33       ` Borislav Petkov
2016-12-13 22:50       ` H. Peter Anvin
2016-12-08 16:21 ` [RFC, PATCHv1 16/28] x86/asm: remove __VIRTUAL_MASK_SHIFT==47 assert Kirill A. Shutemov
2016-12-08 18:39   ` Andy Lutomirski
2016-12-08 19:22     ` Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 17/28] x86/mm: define virtual memory map for 5-level paging Kirill A. Shutemov
2016-12-08 18:56   ` Randy Dunlap
2016-12-08 19:24     ` Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 18/28] x86/paravirt: make paravirt code support " Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 19/28] x86/mm: basic defines/helpers for CONFIG_X86_5LEVEL Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 20/28] x86/dump_pagetables: support 5-level paging Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 21/28] x86/mm: extend kasan to " Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 22/28] x86/espfix: " Kirill A. Shutemov
2016-12-08 18:40   ` Andy Lutomirski
2016-12-12 14:22     ` Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 23/28] x86/mm: add support of additional page table level during early boot Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 24/28] x86/mm: add sync_global_pgds() for configuration with 5-level paging Kirill A. Shutemov
2016-12-08 18:42   ` Andy Lutomirski
2016-12-08 19:33     ` Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 25/28] x86/mm: make kernel_physical_mapping_init() support " Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 26/28] x86/mm: add support for 5-level paging for KASLR Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 27/28] x86: enable la57 support Kirill A. Shutemov
2016-12-08 16:21 ` [RFC, PATCHv1 28/28] TESTING-ONLY: bump TASK_SIZE_MAX Kirill A. Shutemov
2016-12-08 18:16 ` [RFC, PATCHv1 00/28] 5-level paging Linus Torvalds
2016-12-08 18:26   ` hpa
2016-12-08 19:20   ` Kirill A. Shutemov
2016-12-09  5:01 ` Ingo Molnar
2016-12-09 10:24   ` Arnd Bergmann
2016-12-09 10:51     ` Catalin Marinas
2016-12-09 10:37   ` Kirill A. Shutemov
2016-12-09 16:40     ` Andi Kleen
2016-12-09 17:21       ` Kirill A. Shutemov
2016-12-09 16:49     ` Dave Hansen
2016-12-13 21:06   ` Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).