All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/21] arm64: implement support for KASLR
@ 2016-01-11 13:18 ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:18 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

This series implements KASLR for arm64, by building the kernel as a PIE
executable that can relocate itself at runtime, and moving it to a random
offset in the vmalloc area. v2 and up also implement physical randomization,
i.e., it allows the kernel to deal with being loaded at any physical offset
(modulo the required alignment), and invokes the EFI_RNG_PROTOCOL from the
UEFI stub to obtain random bits and perform the actual randomization of the
physical load address.

Changes since v2:
- Incorporated feedback from Marc Zyngier into the KVM patch (#5)
- Dropped the pgdir section and the patch that memblock_reserve()'s the kernel
  sections at a smaller granularity. This is no longer necessary with the pgdir
  section gone. This also fixes an issue spotted by James Morse where the fixmap
  page tables are not zeroed correctly; these have been moved back to the .bss
  section.
- Got rid of all ifdef'ery regarding the number of translation levels in the
  changed .c files, by introducing new definitions in pgtable.h (#3, #6)
- Fixed KAsan support, which was broken by all earlier versions.
- Moved module region along with the virtually randomized kernel, so that module
  addresses become unpredictable as well, and we only have to rely on veneers in
  the PLTs when the module region is exhausted (which is somewhat more likely
  since the module region is now shared with other uses of the vmalloc area)
- Added support for the 'nokaslr' command line option. This affects the
  randomization performed by the stub, and results in a warning if passed while
  the bootloader also presented a random seed for virtual KASLR in register x1.
- The .text/.rodata sections of the kernel are no longer aliased in the linear
  region with a writable mapping.
- Added a separate image header flag for kernel images that may be loaded at any
  2 MB aligned offset (+ TEXT_OFFSET)
- The KASLR displacement is now corrected if it results in the kernel image
  intersecting a PUD/PMD boundary (4k and 16k/64k granule kernels, respectively)
- Split out UEFI stub random routines into separate patches.
- Implemented a weight based EFI random allocation routine so that each suitable
  offset in available memory is equally likely to be selected (as suggested by
  Kees Cook)
- Reused CONFIG_RELOCATABLE and CONFIG_RANDOMIZE_BASE instead of introducing
  new Kconfig symbols to describe the same functionality.
- Reimplemented mem= logic so memory is clipped from the top first.

Changes since v1/RFC:
- This series now implements fully independent virtual and physical address
  randomization at load time. I have recycled some patches from this series:
  http://thread.gmane.org/gmane.linux.ports.arm.kernel/455151, and updated the
  final UEFI stub patch to randomize the physical address as well.
- Added a patch to deal with the way KVM on arm64 makes assumptions about the
  relation between kernel symbols and the linear mapping (on which the HYP
  mapping is based), as these assumptions cease to be valid once we move the
  kernel Image out of the linear mapping.
- Updated the module PLT patch so it works on BE kernels as well.
- Moved the constant Image header values to head.S, and updated the linker
  script to provide the kernel size using R_AARCH64_ABS32 relocation rather
  than a R_AARCH64_ABS64 relocation, since those are always resolved at build
  time. This allows me to get rid of the post-build perl script to swab header
  values on BE kernels.
- Minor style tweaks.

Notes:
- These patches apply on top of Mark Rutland's pagetable rework series:
  http://thread.gmane.org/gmane.linux.ports.arm.kernel/462438
- The arm64 Image is uncompressed by default, and the Elf64_Rela format uses
  24 bytes per relocation entry. This results in considerable bloat (i.e., a
  couple of MBs worth of relocation data in an .init section). However, no
  build time postprocessing is required, we rely fully on the toolchain to
  produce the image
- We have to rely on the bootloader to supply some randomness in register x1
  upon kernel entry. Since we have no decompressor, it is simply not feasible
  to collect randomness in the head.S code path before mapping the kernel and
  enabling the MMU.
- The EFI_RNG_PROTOCOL that is invoked in patch #13 to supply randomness on
  UEFI systems is not universally available. A QEMU/KVM firmware image that
  implements a pseudo-random version is available here:
  http://people.linaro.org/~ard.biesheuvel/QEMU_EFI.fd.aarch64-rng.bz2
  (requires access to PMCCNTR_EL0 and support for AES instructions)
  See below for instructions how to run the pseudo-random version on real
  hardware.
- Only mildly tested. Help appreciated.

Code can be found here:
git://git.linaro.org/people/ard.biesheuvel/linux-arm.git arm64-kaslr-v3
https://git.linaro.org/people/ard.biesheuvel/linux-arm.git/shortlog/refs/heads/arm64-kaslr-v3

Patch #1 updates the OF code to allow the minimum memblock physical address to
be overridden by the arch.

Patch #2 introduces KIMAGE_VADDR as the base of the kernel virtual region.

Patch #3 introduces dummy pud_index() and pmd_index() macros that are intended
to be optimized away if the configured number of translation levels does not
actually use them.

Patch #4 rewrites early_fixmap_init() so it does not rely on the linear mapping
(i.e., the use of phys_to_virt() is avoided)

Patch #5 updates KVM on arm64 so it can deal with kernel symbols whose addresses
are not covered by the linear mapping.

Patch #6 introduces pte_offset_kimg(), pmd_offset_kimg() and pud_offset_kimg()
that allow statically allocated page tables (i.e., by fixmap and kasan) to be
traversed before the linear mapping is installed.

Patch #7 moves the kernel virtual mapping to the vmalloc area, along with the
module region which is kept right below it, as before.

Patch #8 adds support for PLTs in modules so that relative branches can be
resolved via a PLT if the target is out of range. This is required for KASLR,
since modules may be loaded far away from the core kernel.

Patch #9 and #10 move arm64 to the a new generic relative version of the extable
implementation so that it no longer contains absolute addresses that require
fixing up at relocation time, but uses relative offsets instead.

Patch #11 reverts some changes to the Image header population code so we no
longer depend on the linker to populate the header fields. This is necessary
since the R_AARCH64_ABS64 relocations that are emitted for these fields are not
resolved at build time for PIE executables.

Patch #12 updates the code in head.S that needs to execute before relocation to
avoid the use of values that are subject to dynamic relocation. These values
will not be populated in PIE executables.

Patch #13 allows the kernel Image to be loaded anywhere in physical memory, by
decoupling PHYS_OFFSET from the base of the kernel image.

Patch #14 redefines SWAPPER_TABLE_SHIFT in a way that allows it to be used from
assembler code regardless of the number of configured translation levels.

Patch #15 (from Mark Rutland) moves the ELF relocation type #defines to a
separate file so we can use it from head.S later

Patch #16 updates scripts/sortextable.c so it accepts ET_DYN (relocatable)
executables as well as ET_EXEC (static) executables.

Patch #17 implements the core KASLR, by taking randomness supplied in register
x1 and using it to move the kernel inside the vmalloc area.

Patch #18 implements efi_get_random_bytes() based on the EFI_RNG_PROTOCOL

Patch #19 implements efi_random_alloc()

Patch #20 moves the allocation for the converted command line (UTF-16 to ASCII)
away from the base of memory. This is necessary since for parsing 

Patch #21 implements the actual KASLR, by randomizing the kernel physical
address, and passing entropy in x1 so that the kernel proper can relocate itself
virtually.

Ard Biesheuvel (20):
  of/fdt: make memblock minimum physical address arch configurable
  arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
  arm64: pgtable: add dummy pud_index() and pmd_index() definitions
  arm64: decouple early fixmap init from linear mapping
  arm64: kvm: deal with kernel symbols outside of linear mapping
  arm64: pgtable: implement static [pte|pmd|pud]_offset variants
  arm64: move kernel image to base of vmalloc area
  arm64: add support for module PLTs
  extable: add support for relative extables to search and sort routines
  arm64: switch to relative exception tables
  arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
  arm64: avoid dynamic relocations in early boot code
  arm64: allow kernel Image to be loaded anywhere in physical memory
  arm64: redefine SWAPPER_TABLE_SHIFT for use in asm code
  scripts/sortextable: add support for ET_DYN binaries
  arm64: add support for a relocatable kernel and KASLR
  efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
  efi: stub: add implementation of efi_random_alloc()
  efi: stub: use high allocation for converted command line
  arm64: efi: invoke EFI_RNG_PROTOCOL to supply KASLR randomness

Mark Rutland (1):
  arm64: split elf relocs into a separate header.

 Documentation/arm64/booting.txt                |  34 ++++-
 arch/arm/include/asm/kvm_asm.h                 |   2 +
 arch/arm/include/asm/kvm_mmu.h                 |   2 +
 arch/arm/kvm/arm.c                             |   5 +-
 arch/arm/kvm/mmu.c                             |   8 +-
 arch/arm64/Kconfig                             |  40 +++++
 arch/arm64/Makefile                            |  10 +-
 arch/arm64/include/asm/assembler.h             |  30 +++-
 arch/arm64/include/asm/boot.h                  |   6 +
 arch/arm64/include/asm/elf.h                   |  54 +------
 arch/arm64/include/asm/elf_relocs.h            |  75 ++++++++++
 arch/arm64/include/asm/futex.h                 |  12 +-
 arch/arm64/include/asm/kasan.h                 |  20 +--
 arch/arm64/include/asm/kernel-pgtable.h        |  20 ++-
 arch/arm64/include/asm/kvm_asm.h               |  19 ++-
 arch/arm64/include/asm/kvm_host.h              |   8 +-
 arch/arm64/include/asm/kvm_mmu.h               |   2 +
 arch/arm64/include/asm/memory.h                |  38 +++--
 arch/arm64/include/asm/module.h                |  11 ++
 arch/arm64/include/asm/pgtable.h               |  22 ++-
 arch/arm64/include/asm/uaccess.h               |  30 ++--
 arch/arm64/include/asm/virt.h                  |   4 -
 arch/arm64/include/asm/word-at-a-time.h        |   7 +-
 arch/arm64/kernel/Makefile                     |   1 +
 arch/arm64/kernel/armv8_deprecated.c           |   7 +-
 arch/arm64/kernel/efi-entry.S                  |   9 +-
 arch/arm64/kernel/head.S                       | 155 +++++++++++++++++---
 arch/arm64/kernel/image.h                      |  37 ++---
 arch/arm64/kernel/module-plts.c                | 137 +++++++++++++++++
 arch/arm64/kernel/module.c                     |  15 +-
 arch/arm64/kernel/module.lds                   |   4 +
 arch/arm64/kernel/setup.c                      |  44 +++++-
 arch/arm64/kernel/vmlinux.lds.S                |  13 +-
 arch/arm64/kvm/debug.c                         |   1 +
 arch/arm64/kvm/hyp.S                           |   6 +-
 arch/arm64/mm/dump.c                           |  12 +-
 arch/arm64/mm/extable.c                        |   2 +-
 arch/arm64/mm/init.c                           |  91 ++++++++++--
 arch/arm64/mm/kasan_init.c                     |  21 ++-
 arch/arm64/mm/mmu.c                            |  95 +++++++-----
 arch/x86/include/asm/efi.h                     |   2 +
 drivers/firmware/efi/libstub/Makefile          |   2 +-
 drivers/firmware/efi/libstub/arm-stub.c        |  17 ++-
 drivers/firmware/efi/libstub/arm64-stub.c      |  67 +++++++--
 drivers/firmware/efi/libstub/efi-stub-helper.c |  24 ++-
 drivers/firmware/efi/libstub/efistub.h         |   9 ++
 drivers/firmware/efi/libstub/random.c          | 120 +++++++++++++++
 drivers/of/fdt.c                               |   5 +-
 include/linux/efi.h                            |   5 +-
 lib/extable.c                                  |  50 +++++--
 scripts/sortextable.c                          |  10 +-
 51 files changed, 1111 insertions(+), 309 deletions(-)
 create mode 100644 arch/arm64/include/asm/elf_relocs.h
 create mode 100644 arch/arm64/kernel/module-plts.c
 create mode 100644 arch/arm64/kernel/module.lds
 create mode 100644 drivers/firmware/efi/libstub/random.c

EFI_RNG_PROTOCOL on real hardware
=================================

To test whether your UEFI implements the EFI_RNG_PROTOCOL, download the
following executable and run it from the UEFI Shell:
http://people.linaro.org/~ard.biesheuvel/RngTest.efi

FS0:\> rngtest
UEFI RNG Protocol Testing :
----------------------------
 -- Locate UEFI RNG Protocol : [Fail - Status = Not Found]

If your UEFI does not implement the EFI_RNG_PROTOCOL, you can download and
install the pseudo-random version that uses the generic timer and PMCCNTR_EL0
values and permutes them using a couple of rounds of AES.
http://people.linaro.org/~ard.biesheuvel/RngDxe.efi

NOTE: not for production!! This is a quick and dirty hack to test the KASLR
code, and is not suitable for anything else.

FS0:\> rngdxe
FS0:\> rngtest
UEFI RNG Protocol Testing :
----------------------------
 -- Locate UEFI RNG Protocol : [Pass]
 -- Call RNG->GetInfo() interface :
     >> Supported RNG Algorithm (Count = 2) :
          0) 44F0DE6E-4D8C-4045-A8C7-4DD168856B9E
          1) E43176D7-B6E8-4827-B784-7FFDC4B68561
 -- Call RNG->GetRNG() interface :
     >> RNG with default algorithm : [Pass]
     >> RNG with SP800-90-HMAC-256 : [Fail - Status = Unsupported]
     >> RNG with SP800-90-Hash-256 : [Fail - Status = Unsupported]
     >> RNG with SP800-90-CTR-256 : [Pass]
     >> RNG with X9.31-3DES : [Fail - Status = Unsupported]
     >> RNG with X9.31-AES : [Fail - Status = Unsupported]
     >> RNG with RAW Entropy : [Pass]
 -- Random Number Generation Test with default RNG Algorithm (20 Rounds):
          01) - 27
          02) - 61E8
          03) - 496FD8
          04) - DDD793BF
          05) - B6C37C8E23
          06) - 4D183C604A96
          07) - 9363311DB61298
          08) - 5715A7294F4E436E
          09) - F0D4D7BAA0DD52318E
          10) - C88C6EBCF4C0474D87C3
          11) - B5594602B482A643932172
          12) - CA7573F704B2089B726B9CF1
          13) - A93E9451CB533DCFBA87B97C33
          14) - 45AA7B83DB6044F7BBAB031F0D24
          15) - 3DD7A4D61F34ADCB400B5976730DCF
          16) - 4DD168D21FAB8F59708330D6A9BEB021
          17) - 4BBB225E61C465F174254159467E65939F
          18) - 030A156C9616337A20070941E702827DA8E1
          19) - AB0FC11C9A4E225011382A9D164D9D55CA2B64
          20) - 72B9B4735DC445E5DA6AF88DE965B7E87CB9A23C

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 00/21] arm64: implement support for KASLR
@ 2016-01-11 13:18 ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:18 UTC (permalink / raw)
  To: linux-arm-kernel

This series implements KASLR for arm64, by building the kernel as a PIE
executable that can relocate itself at runtime, and moving it to a random
offset in the vmalloc area. v2 and up also implement physical randomization,
i.e., it allows the kernel to deal with being loaded at any physical offset
(modulo the required alignment), and invokes the EFI_RNG_PROTOCOL from the
UEFI stub to obtain random bits and perform the actual randomization of the
physical load address.

Changes since v2:
- Incorporated feedback from Marc Zyngier into the KVM patch (#5)
- Dropped the pgdir section and the patch that memblock_reserve()'s the kernel
  sections at a smaller granularity. This is no longer necessary with the pgdir
  section gone. This also fixes an issue spotted by James Morse where the fixmap
  page tables are not zeroed correctly; these have been moved back to the .bss
  section.
- Got rid of all ifdef'ery regarding the number of translation levels in the
  changed .c files, by introducing new definitions in pgtable.h (#3, #6)
- Fixed KAsan support, which was broken by all earlier versions.
- Moved module region along with the virtually randomized kernel, so that module
  addresses become unpredictable as well, and we only have to rely on veneers in
  the PLTs when the module region is exhausted (which is somewhat more likely
  since the module region is now shared with other uses of the vmalloc area)
- Added support for the 'nokaslr' command line option. This affects the
  randomization performed by the stub, and results in a warning if passed while
  the bootloader also presented a random seed for virtual KASLR in register x1.
- The .text/.rodata sections of the kernel are no longer aliased in the linear
  region with a writable mapping.
- Added a separate image header flag for kernel images that may be loaded at any
  2 MB aligned offset (+ TEXT_OFFSET)
- The KASLR displacement is now corrected if it results in the kernel image
  intersecting a PUD/PMD boundary (4k and 16k/64k granule kernels, respectively)
- Split out UEFI stub random routines into separate patches.
- Implemented a weight based EFI random allocation routine so that each suitable
  offset in available memory is equally likely to be selected (as suggested by
  Kees Cook)
- Reused CONFIG_RELOCATABLE and CONFIG_RANDOMIZE_BASE instead of introducing
  new Kconfig symbols to describe the same functionality.
- Reimplemented mem= logic so memory is clipped from the top first.

Changes since v1/RFC:
- This series now implements fully independent virtual and physical address
  randomization at load time. I have recycled some patches from this series:
  http://thread.gmane.org/gmane.linux.ports.arm.kernel/455151, and updated the
  final UEFI stub patch to randomize the physical address as well.
- Added a patch to deal with the way KVM on arm64 makes assumptions about the
  relation between kernel symbols and the linear mapping (on which the HYP
  mapping is based), as these assumptions cease to be valid once we move the
  kernel Image out of the linear mapping.
- Updated the module PLT patch so it works on BE kernels as well.
- Moved the constant Image header values to head.S, and updated the linker
  script to provide the kernel size using R_AARCH64_ABS32 relocation rather
  than a R_AARCH64_ABS64 relocation, since those are always resolved at build
  time. This allows me to get rid of the post-build perl script to swab header
  values on BE kernels.
- Minor style tweaks.

Notes:
- These patches apply on top of Mark Rutland's pagetable rework series:
  http://thread.gmane.org/gmane.linux.ports.arm.kernel/462438
- The arm64 Image is uncompressed by default, and the Elf64_Rela format uses
  24 bytes per relocation entry. This results in considerable bloat (i.e., a
  couple of MBs worth of relocation data in an .init section). However, no
  build time postprocessing is required, we rely fully on the toolchain to
  produce the image
- We have to rely on the bootloader to supply some randomness in register x1
  upon kernel entry. Since we have no decompressor, it is simply not feasible
  to collect randomness in the head.S code path before mapping the kernel and
  enabling the MMU.
- The EFI_RNG_PROTOCOL that is invoked in patch #13 to supply randomness on
  UEFI systems is not universally available. A QEMU/KVM firmware image that
  implements a pseudo-random version is available here:
  http://people.linaro.org/~ard.biesheuvel/QEMU_EFI.fd.aarch64-rng.bz2
  (requires access to PMCCNTR_EL0 and support for AES instructions)
  See below for instructions how to run the pseudo-random version on real
  hardware.
- Only mildly tested. Help appreciated.

Code can be found here:
git://git.linaro.org/people/ard.biesheuvel/linux-arm.git arm64-kaslr-v3
https://git.linaro.org/people/ard.biesheuvel/linux-arm.git/shortlog/refs/heads/arm64-kaslr-v3

Patch #1 updates the OF code to allow the minimum memblock physical address to
be overridden by the arch.

Patch #2 introduces KIMAGE_VADDR as the base of the kernel virtual region.

Patch #3 introduces dummy pud_index() and pmd_index() macros that are intended
to be optimized away if the configured number of translation levels does not
actually use them.

Patch #4 rewrites early_fixmap_init() so it does not rely on the linear mapping
(i.e., the use of phys_to_virt() is avoided)

Patch #5 updates KVM on arm64 so it can deal with kernel symbols whose addresses
are not covered by the linear mapping.

Patch #6 introduces pte_offset_kimg(), pmd_offset_kimg() and pud_offset_kimg()
that allow statically allocated page tables (i.e., by fixmap and kasan) to be
traversed before the linear mapping is installed.

Patch #7 moves the kernel virtual mapping to the vmalloc area, along with the
module region which is kept right below it, as before.

Patch #8 adds support for PLTs in modules so that relative branches can be
resolved via a PLT if the target is out of range. This is required for KASLR,
since modules may be loaded far away from the core kernel.

Patch #9 and #10 move arm64 to the a new generic relative version of the extable
implementation so that it no longer contains absolute addresses that require
fixing up at relocation time, but uses relative offsets instead.

Patch #11 reverts some changes to the Image header population code so we no
longer depend on the linker to populate the header fields. This is necessary
since the R_AARCH64_ABS64 relocations that are emitted for these fields are not
resolved at build time for PIE executables.

Patch #12 updates the code in head.S that needs to execute before relocation to
avoid the use of values that are subject to dynamic relocation. These values
will not be populated in PIE executables.

Patch #13 allows the kernel Image to be loaded anywhere in physical memory, by
decoupling PHYS_OFFSET from the base of the kernel image.

Patch #14 redefines SWAPPER_TABLE_SHIFT in a way that allows it to be used from
assembler code regardless of the number of configured translation levels.

Patch #15 (from Mark Rutland) moves the ELF relocation type #defines to a
separate file so we can use it from head.S later

Patch #16 updates scripts/sortextable.c so it accepts ET_DYN (relocatable)
executables as well as ET_EXEC (static) executables.

Patch #17 implements the core KASLR, by taking randomness supplied in register
x1 and using it to move the kernel inside the vmalloc area.

Patch #18 implements efi_get_random_bytes() based on the EFI_RNG_PROTOCOL

Patch #19 implements efi_random_alloc()

Patch #20 moves the allocation for the converted command line (UTF-16 to ASCII)
away from the base of memory. This is necessary since for parsing 

Patch #21 implements the actual KASLR, by randomizing the kernel physical
address, and passing entropy in x1 so that the kernel proper can relocate itself
virtually.

Ard Biesheuvel (20):
  of/fdt: make memblock minimum physical address arch configurable
  arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
  arm64: pgtable: add dummy pud_index() and pmd_index() definitions
  arm64: decouple early fixmap init from linear mapping
  arm64: kvm: deal with kernel symbols outside of linear mapping
  arm64: pgtable: implement static [pte|pmd|pud]_offset variants
  arm64: move kernel image to base of vmalloc area
  arm64: add support for module PLTs
  extable: add support for relative extables to search and sort routines
  arm64: switch to relative exception tables
  arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
  arm64: avoid dynamic relocations in early boot code
  arm64: allow kernel Image to be loaded anywhere in physical memory
  arm64: redefine SWAPPER_TABLE_SHIFT for use in asm code
  scripts/sortextable: add support for ET_DYN binaries
  arm64: add support for a relocatable kernel and KASLR
  efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
  efi: stub: add implementation of efi_random_alloc()
  efi: stub: use high allocation for converted command line
  arm64: efi: invoke EFI_RNG_PROTOCOL to supply KASLR randomness

Mark Rutland (1):
  arm64: split elf relocs into a separate header.

 Documentation/arm64/booting.txt                |  34 ++++-
 arch/arm/include/asm/kvm_asm.h                 |   2 +
 arch/arm/include/asm/kvm_mmu.h                 |   2 +
 arch/arm/kvm/arm.c                             |   5 +-
 arch/arm/kvm/mmu.c                             |   8 +-
 arch/arm64/Kconfig                             |  40 +++++
 arch/arm64/Makefile                            |  10 +-
 arch/arm64/include/asm/assembler.h             |  30 +++-
 arch/arm64/include/asm/boot.h                  |   6 +
 arch/arm64/include/asm/elf.h                   |  54 +------
 arch/arm64/include/asm/elf_relocs.h            |  75 ++++++++++
 arch/arm64/include/asm/futex.h                 |  12 +-
 arch/arm64/include/asm/kasan.h                 |  20 +--
 arch/arm64/include/asm/kernel-pgtable.h        |  20 ++-
 arch/arm64/include/asm/kvm_asm.h               |  19 ++-
 arch/arm64/include/asm/kvm_host.h              |   8 +-
 arch/arm64/include/asm/kvm_mmu.h               |   2 +
 arch/arm64/include/asm/memory.h                |  38 +++--
 arch/arm64/include/asm/module.h                |  11 ++
 arch/arm64/include/asm/pgtable.h               |  22 ++-
 arch/arm64/include/asm/uaccess.h               |  30 ++--
 arch/arm64/include/asm/virt.h                  |   4 -
 arch/arm64/include/asm/word-at-a-time.h        |   7 +-
 arch/arm64/kernel/Makefile                     |   1 +
 arch/arm64/kernel/armv8_deprecated.c           |   7 +-
 arch/arm64/kernel/efi-entry.S                  |   9 +-
 arch/arm64/kernel/head.S                       | 155 +++++++++++++++++---
 arch/arm64/kernel/image.h                      |  37 ++---
 arch/arm64/kernel/module-plts.c                | 137 +++++++++++++++++
 arch/arm64/kernel/module.c                     |  15 +-
 arch/arm64/kernel/module.lds                   |   4 +
 arch/arm64/kernel/setup.c                      |  44 +++++-
 arch/arm64/kernel/vmlinux.lds.S                |  13 +-
 arch/arm64/kvm/debug.c                         |   1 +
 arch/arm64/kvm/hyp.S                           |   6 +-
 arch/arm64/mm/dump.c                           |  12 +-
 arch/arm64/mm/extable.c                        |   2 +-
 arch/arm64/mm/init.c                           |  91 ++++++++++--
 arch/arm64/mm/kasan_init.c                     |  21 ++-
 arch/arm64/mm/mmu.c                            |  95 +++++++-----
 arch/x86/include/asm/efi.h                     |   2 +
 drivers/firmware/efi/libstub/Makefile          |   2 +-
 drivers/firmware/efi/libstub/arm-stub.c        |  17 ++-
 drivers/firmware/efi/libstub/arm64-stub.c      |  67 +++++++--
 drivers/firmware/efi/libstub/efi-stub-helper.c |  24 ++-
 drivers/firmware/efi/libstub/efistub.h         |   9 ++
 drivers/firmware/efi/libstub/random.c          | 120 +++++++++++++++
 drivers/of/fdt.c                               |   5 +-
 include/linux/efi.h                            |   5 +-
 lib/extable.c                                  |  50 +++++--
 scripts/sortextable.c                          |  10 +-
 51 files changed, 1111 insertions(+), 309 deletions(-)
 create mode 100644 arch/arm64/include/asm/elf_relocs.h
 create mode 100644 arch/arm64/kernel/module-plts.c
 create mode 100644 arch/arm64/kernel/module.lds
 create mode 100644 drivers/firmware/efi/libstub/random.c

EFI_RNG_PROTOCOL on real hardware
=================================

To test whether your UEFI implements the EFI_RNG_PROTOCOL, download the
following executable and run it from the UEFI Shell:
http://people.linaro.org/~ard.biesheuvel/RngTest.efi

FS0:\> rngtest
UEFI RNG Protocol Testing :
----------------------------
 -- Locate UEFI RNG Protocol : [Fail - Status = Not Found]

If your UEFI does not implement the EFI_RNG_PROTOCOL, you can download and
install the pseudo-random version that uses the generic timer and PMCCNTR_EL0
values and permutes them using a couple of rounds of AES.
http://people.linaro.org/~ard.biesheuvel/RngDxe.efi

NOTE: not for production!! This is a quick and dirty hack to test the KASLR
code, and is not suitable for anything else.

FS0:\> rngdxe
FS0:\> rngtest
UEFI RNG Protocol Testing :
----------------------------
 -- Locate UEFI RNG Protocol : [Pass]
 -- Call RNG->GetInfo() interface :
     >> Supported RNG Algorithm (Count = 2) :
          0) 44F0DE6E-4D8C-4045-A8C7-4DD168856B9E
          1) E43176D7-B6E8-4827-B784-7FFDC4B68561
 -- Call RNG->GetRNG() interface :
     >> RNG with default algorithm : [Pass]
     >> RNG with SP800-90-HMAC-256 : [Fail - Status = Unsupported]
     >> RNG with SP800-90-Hash-256 : [Fail - Status = Unsupported]
     >> RNG with SP800-90-CTR-256 : [Pass]
     >> RNG with X9.31-3DES : [Fail - Status = Unsupported]
     >> RNG with X9.31-AES : [Fail - Status = Unsupported]
     >> RNG with RAW Entropy : [Pass]
 -- Random Number Generation Test with default RNG Algorithm (20 Rounds):
          01) - 27
          02) - 61E8
          03) - 496FD8
          04) - DDD793BF
          05) - B6C37C8E23
          06) - 4D183C604A96
          07) - 9363311DB61298
          08) - 5715A7294F4E436E
          09) - F0D4D7BAA0DD52318E
          10) - C88C6EBCF4C0474D87C3
          11) - B5594602B482A643932172
          12) - CA7573F704B2089B726B9CF1
          13) - A93E9451CB533DCFBA87B97C33
          14) - 45AA7B83DB6044F7BBAB031F0D24
          15) - 3DD7A4D61F34ADCB400B5976730DCF
          16) - 4DD168D21FAB8F59708330D6A9BEB021
          17) - 4BBB225E61C465F174254159467E65939F
          18) - 030A156C9616337A20070941E702827DA8E1
          19) - AB0FC11C9A4E225011382A9D164D9D55CA2B64
          20) - 72B9B4735DC445E5DA6AF88DE965B7E87CB9A23C

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] [PATCH v3 00/21] arm64: implement support for KASLR
@ 2016-01-11 13:18 ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:18 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

This series implements KASLR for arm64, by building the kernel as a PIE
executable that can relocate itself at runtime, and moving it to a random
offset in the vmalloc area. v2 and up also implement physical randomization,
i.e., it allows the kernel to deal with being loaded at any physical offset
(modulo the required alignment), and invokes the EFI_RNG_PROTOCOL from the
UEFI stub to obtain random bits and perform the actual randomization of the
physical load address.

Changes since v2:
- Incorporated feedback from Marc Zyngier into the KVM patch (#5)
- Dropped the pgdir section and the patch that memblock_reserve()'s the kernel
  sections at a smaller granularity. This is no longer necessary with the pgdir
  section gone. This also fixes an issue spotted by James Morse where the fixmap
  page tables are not zeroed correctly; these have been moved back to the .bss
  section.
- Got rid of all ifdef'ery regarding the number of translation levels in the
  changed .c files, by introducing new definitions in pgtable.h (#3, #6)
- Fixed KAsan support, which was broken by all earlier versions.
- Moved module region along with the virtually randomized kernel, so that module
  addresses become unpredictable as well, and we only have to rely on veneers in
  the PLTs when the module region is exhausted (which is somewhat more likely
  since the module region is now shared with other uses of the vmalloc area)
- Added support for the 'nokaslr' command line option. This affects the
  randomization performed by the stub, and results in a warning if passed while
  the bootloader also presented a random seed for virtual KASLR in register x1.
- The .text/.rodata sections of the kernel are no longer aliased in the linear
  region with a writable mapping.
- Added a separate image header flag for kernel images that may be loaded at any
  2 MB aligned offset (+ TEXT_OFFSET)
- The KASLR displacement is now corrected if it results in the kernel image
  intersecting a PUD/PMD boundary (4k and 16k/64k granule kernels, respectively)
- Split out UEFI stub random routines into separate patches.
- Implemented a weight based EFI random allocation routine so that each suitable
  offset in available memory is equally likely to be selected (as suggested by
  Kees Cook)
- Reused CONFIG_RELOCATABLE and CONFIG_RANDOMIZE_BASE instead of introducing
  new Kconfig symbols to describe the same functionality.
- Reimplemented mem= logic so memory is clipped from the top first.

Changes since v1/RFC:
- This series now implements fully independent virtual and physical address
  randomization at load time. I have recycled some patches from this series:
  http://thread.gmane.org/gmane.linux.ports.arm.kernel/455151, and updated the
  final UEFI stub patch to randomize the physical address as well.
- Added a patch to deal with the way KVM on arm64 makes assumptions about the
  relation between kernel symbols and the linear mapping (on which the HYP
  mapping is based), as these assumptions cease to be valid once we move the
  kernel Image out of the linear mapping.
- Updated the module PLT patch so it works on BE kernels as well.
- Moved the constant Image header values to head.S, and updated the linker
  script to provide the kernel size using R_AARCH64_ABS32 relocation rather
  than a R_AARCH64_ABS64 relocation, since those are always resolved at build
  time. This allows me to get rid of the post-build perl script to swab header
  values on BE kernels.
- Minor style tweaks.

Notes:
- These patches apply on top of Mark Rutland's pagetable rework series:
  http://thread.gmane.org/gmane.linux.ports.arm.kernel/462438
- The arm64 Image is uncompressed by default, and the Elf64_Rela format uses
  24 bytes per relocation entry. This results in considerable bloat (i.e., a
  couple of MBs worth of relocation data in an .init section). However, no
  build time postprocessing is required, we rely fully on the toolchain to
  produce the image
- We have to rely on the bootloader to supply some randomness in register x1
  upon kernel entry. Since we have no decompressor, it is simply not feasible
  to collect randomness in the head.S code path before mapping the kernel and
  enabling the MMU.
- The EFI_RNG_PROTOCOL that is invoked in patch #13 to supply randomness on
  UEFI systems is not universally available. A QEMU/KVM firmware image that
  implements a pseudo-random version is available here:
  http://people.linaro.org/~ard.biesheuvel/QEMU_EFI.fd.aarch64-rng.bz2
  (requires access to PMCCNTR_EL0 and support for AES instructions)
  See below for instructions how to run the pseudo-random version on real
  hardware.
- Only mildly tested. Help appreciated.

Code can be found here:
git://git.linaro.org/people/ard.biesheuvel/linux-arm.git arm64-kaslr-v3
https://git.linaro.org/people/ard.biesheuvel/linux-arm.git/shortlog/refs/heads/arm64-kaslr-v3

Patch #1 updates the OF code to allow the minimum memblock physical address to
be overridden by the arch.

Patch #2 introduces KIMAGE_VADDR as the base of the kernel virtual region.

Patch #3 introduces dummy pud_index() and pmd_index() macros that are intended
to be optimized away if the configured number of translation levels does not
actually use them.

Patch #4 rewrites early_fixmap_init() so it does not rely on the linear mapping
(i.e., the use of phys_to_virt() is avoided)

Patch #5 updates KVM on arm64 so it can deal with kernel symbols whose addresses
are not covered by the linear mapping.

Patch #6 introduces pte_offset_kimg(), pmd_offset_kimg() and pud_offset_kimg()
that allow statically allocated page tables (i.e., by fixmap and kasan) to be
traversed before the linear mapping is installed.

Patch #7 moves the kernel virtual mapping to the vmalloc area, along with the
module region which is kept right below it, as before.

Patch #8 adds support for PLTs in modules so that relative branches can be
resolved via a PLT if the target is out of range. This is required for KASLR,
since modules may be loaded far away from the core kernel.

Patch #9 and #10 move arm64 to the a new generic relative version of the extable
implementation so that it no longer contains absolute addresses that require
fixing up at relocation time, but uses relative offsets instead.

Patch #11 reverts some changes to the Image header population code so we no
longer depend on the linker to populate the header fields. This is necessary
since the R_AARCH64_ABS64 relocations that are emitted for these fields are not
resolved at build time for PIE executables.

Patch #12 updates the code in head.S that needs to execute before relocation to
avoid the use of values that are subject to dynamic relocation. These values
will not be populated in PIE executables.

Patch #13 allows the kernel Image to be loaded anywhere in physical memory, by
decoupling PHYS_OFFSET from the base of the kernel image.

Patch #14 redefines SWAPPER_TABLE_SHIFT in a way that allows it to be used from
assembler code regardless of the number of configured translation levels.

Patch #15 (from Mark Rutland) moves the ELF relocation type #defines to a
separate file so we can use it from head.S later

Patch #16 updates scripts/sortextable.c so it accepts ET_DYN (relocatable)
executables as well as ET_EXEC (static) executables.

Patch #17 implements the core KASLR, by taking randomness supplied in register
x1 and using it to move the kernel inside the vmalloc area.

Patch #18 implements efi_get_random_bytes() based on the EFI_RNG_PROTOCOL

Patch #19 implements efi_random_alloc()

Patch #20 moves the allocation for the converted command line (UTF-16 to ASCII)
away from the base of memory. This is necessary since for parsing 

Patch #21 implements the actual KASLR, by randomizing the kernel physical
address, and passing entropy in x1 so that the kernel proper can relocate itself
virtually.

Ard Biesheuvel (20):
  of/fdt: make memblock minimum physical address arch configurable
  arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
  arm64: pgtable: add dummy pud_index() and pmd_index() definitions
  arm64: decouple early fixmap init from linear mapping
  arm64: kvm: deal with kernel symbols outside of linear mapping
  arm64: pgtable: implement static [pte|pmd|pud]_offset variants
  arm64: move kernel image to base of vmalloc area
  arm64: add support for module PLTs
  extable: add support for relative extables to search and sort routines
  arm64: switch to relative exception tables
  arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
  arm64: avoid dynamic relocations in early boot code
  arm64: allow kernel Image to be loaded anywhere in physical memory
  arm64: redefine SWAPPER_TABLE_SHIFT for use in asm code
  scripts/sortextable: add support for ET_DYN binaries
  arm64: add support for a relocatable kernel and KASLR
  efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
  efi: stub: add implementation of efi_random_alloc()
  efi: stub: use high allocation for converted command line
  arm64: efi: invoke EFI_RNG_PROTOCOL to supply KASLR randomness

Mark Rutland (1):
  arm64: split elf relocs into a separate header.

 Documentation/arm64/booting.txt                |  34 ++++-
 arch/arm/include/asm/kvm_asm.h                 |   2 +
 arch/arm/include/asm/kvm_mmu.h                 |   2 +
 arch/arm/kvm/arm.c                             |   5 +-
 arch/arm/kvm/mmu.c                             |   8 +-
 arch/arm64/Kconfig                             |  40 +++++
 arch/arm64/Makefile                            |  10 +-
 arch/arm64/include/asm/assembler.h             |  30 +++-
 arch/arm64/include/asm/boot.h                  |   6 +
 arch/arm64/include/asm/elf.h                   |  54 +------
 arch/arm64/include/asm/elf_relocs.h            |  75 ++++++++++
 arch/arm64/include/asm/futex.h                 |  12 +-
 arch/arm64/include/asm/kasan.h                 |  20 +--
 arch/arm64/include/asm/kernel-pgtable.h        |  20 ++-
 arch/arm64/include/asm/kvm_asm.h               |  19 ++-
 arch/arm64/include/asm/kvm_host.h              |   8 +-
 arch/arm64/include/asm/kvm_mmu.h               |   2 +
 arch/arm64/include/asm/memory.h                |  38 +++--
 arch/arm64/include/asm/module.h                |  11 ++
 arch/arm64/include/asm/pgtable.h               |  22 ++-
 arch/arm64/include/asm/uaccess.h               |  30 ++--
 arch/arm64/include/asm/virt.h                  |   4 -
 arch/arm64/include/asm/word-at-a-time.h        |   7 +-
 arch/arm64/kernel/Makefile                     |   1 +
 arch/arm64/kernel/armv8_deprecated.c           |   7 +-
 arch/arm64/kernel/efi-entry.S                  |   9 +-
 arch/arm64/kernel/head.S                       | 155 +++++++++++++++++---
 arch/arm64/kernel/image.h                      |  37 ++---
 arch/arm64/kernel/module-plts.c                | 137 +++++++++++++++++
 arch/arm64/kernel/module.c                     |  15 +-
 arch/arm64/kernel/module.lds                   |   4 +
 arch/arm64/kernel/setup.c                      |  44 +++++-
 arch/arm64/kernel/vmlinux.lds.S                |  13 +-
 arch/arm64/kvm/debug.c                         |   1 +
 arch/arm64/kvm/hyp.S                           |   6 +-
 arch/arm64/mm/dump.c                           |  12 +-
 arch/arm64/mm/extable.c                        |   2 +-
 arch/arm64/mm/init.c                           |  91 ++++++++++--
 arch/arm64/mm/kasan_init.c                     |  21 ++-
 arch/arm64/mm/mmu.c                            |  95 +++++++-----
 arch/x86/include/asm/efi.h                     |   2 +
 drivers/firmware/efi/libstub/Makefile          |   2 +-
 drivers/firmware/efi/libstub/arm-stub.c        |  17 ++-
 drivers/firmware/efi/libstub/arm64-stub.c      |  67 +++++++--
 drivers/firmware/efi/libstub/efi-stub-helper.c |  24 ++-
 drivers/firmware/efi/libstub/efistub.h         |   9 ++
 drivers/firmware/efi/libstub/random.c          | 120 +++++++++++++++
 drivers/of/fdt.c                               |   5 +-
 include/linux/efi.h                            |   5 +-
 lib/extable.c                                  |  50 +++++--
 scripts/sortextable.c                          |  10 +-
 51 files changed, 1111 insertions(+), 309 deletions(-)
 create mode 100644 arch/arm64/include/asm/elf_relocs.h
 create mode 100644 arch/arm64/kernel/module-plts.c
 create mode 100644 arch/arm64/kernel/module.lds
 create mode 100644 drivers/firmware/efi/libstub/random.c

EFI_RNG_PROTOCOL on real hardware
=================================

To test whether your UEFI implements the EFI_RNG_PROTOCOL, download the
following executable and run it from the UEFI Shell:
http://people.linaro.org/~ard.biesheuvel/RngTest.efi

FS0:\> rngtest
UEFI RNG Protocol Testing :
----------------------------
 -- Locate UEFI RNG Protocol : [Fail - Status = Not Found]

If your UEFI does not implement the EFI_RNG_PROTOCOL, you can download and
install the pseudo-random version that uses the generic timer and PMCCNTR_EL0
values and permutes them using a couple of rounds of AES.
http://people.linaro.org/~ard.biesheuvel/RngDxe.efi

NOTE: not for production!! This is a quick and dirty hack to test the KASLR
code, and is not suitable for anything else.

FS0:\> rngdxe
FS0:\> rngtest
UEFI RNG Protocol Testing :
----------------------------
 -- Locate UEFI RNG Protocol : [Pass]
 -- Call RNG->GetInfo() interface :
     >> Supported RNG Algorithm (Count = 2) :
          0) 44F0DE6E-4D8C-4045-A8C7-4DD168856B9E
          1) E43176D7-B6E8-4827-B784-7FFDC4B68561
 -- Call RNG->GetRNG() interface :
     >> RNG with default algorithm : [Pass]
     >> RNG with SP800-90-HMAC-256 : [Fail - Status = Unsupported]
     >> RNG with SP800-90-Hash-256 : [Fail - Status = Unsupported]
     >> RNG with SP800-90-CTR-256 : [Pass]
     >> RNG with X9.31-3DES : [Fail - Status = Unsupported]
     >> RNG with X9.31-AES : [Fail - Status = Unsupported]
     >> RNG with RAW Entropy : [Pass]
 -- Random Number Generation Test with default RNG Algorithm (20 Rounds):
          01) - 27
          02) - 61E8
          03) - 496FD8
          04) - DDD793BF
          05) - B6C37C8E23
          06) - 4D183C604A96
          07) - 9363311DB61298
          08) - 5715A7294F4E436E
          09) - F0D4D7BAA0DD52318E
          10) - C88C6EBCF4C0474D87C3
          11) - B5594602B482A643932172
          12) - CA7573F704B2089B726B9CF1
          13) - A93E9451CB533DCFBA87B97C33
          14) - 45AA7B83DB6044F7BBAB031F0D24
          15) - 3DD7A4D61F34ADCB400B5976730DCF
          16) - 4DD168D21FAB8F59708330D6A9BEB021
          17) - 4BBB225E61C465F174254159467E65939F
          18) - 030A156C9616337A20070941E702827DA8E1
          19) - AB0FC11C9A4E225011382A9D164D9D55CA2B64
          20) - 72B9B4735DC445E5DA6AF88DE965B7E87CB9A23C

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 01/21] of/fdt: make memblock minimum physical address arch configurable
  2016-01-11 13:18 ` Ard Biesheuvel
  (?)
@ 2016-01-11 13:18   ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:18 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

By default, early_init_dt_add_memory_arch() ignores memory below
the base of the kernel image since it won't be addressable via the
linear mapping. However, this is not appropriate anymore once we
decouple the kernel text mapping from the linear mapping, so archs
may want to drop the low limit entirely. So allow the minimum to be
overridden by setting MIN_MEMBLOCK_ADDR.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 drivers/of/fdt.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index d2430298a309..0455564f8cbc 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -971,13 +971,16 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname,
 }
 
 #ifdef CONFIG_HAVE_MEMBLOCK
+#ifndef MIN_MEMBLOCK_ADDR
+#define MIN_MEMBLOCK_ADDR	__pa(PAGE_OFFSET)
+#endif
 #ifndef MAX_MEMBLOCK_ADDR
 #define MAX_MEMBLOCK_ADDR	((phys_addr_t)~0)
 #endif
 
 void __init __weak early_init_dt_add_memory_arch(u64 base, u64 size)
 {
-	const u64 phys_offset = __pa(PAGE_OFFSET);
+	const u64 phys_offset = MIN_MEMBLOCK_ADDR;
 
 	if (!PAGE_ALIGNED(base)) {
 		if (size < PAGE_SIZE - (base & ~PAGE_MASK)) {
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 01/21] of/fdt: make memblock minimum physical address arch configurable
@ 2016-01-11 13:18   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:18 UTC (permalink / raw)
  To: linux-arm-kernel

By default, early_init_dt_add_memory_arch() ignores memory below
the base of the kernel image since it won't be addressable via the
linear mapping. However, this is not appropriate anymore once we
decouple the kernel text mapping from the linear mapping, so archs
may want to drop the low limit entirely. So allow the minimum to be
overridden by setting MIN_MEMBLOCK_ADDR.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 drivers/of/fdt.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index d2430298a309..0455564f8cbc 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -971,13 +971,16 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname,
 }
 
 #ifdef CONFIG_HAVE_MEMBLOCK
+#ifndef MIN_MEMBLOCK_ADDR
+#define MIN_MEMBLOCK_ADDR	__pa(PAGE_OFFSET)
+#endif
 #ifndef MAX_MEMBLOCK_ADDR
 #define MAX_MEMBLOCK_ADDR	((phys_addr_t)~0)
 #endif
 
 void __init __weak early_init_dt_add_memory_arch(u64 base, u64 size)
 {
-	const u64 phys_offset = __pa(PAGE_OFFSET);
+	const u64 phys_offset = MIN_MEMBLOCK_ADDR;
 
 	if (!PAGE_ALIGNED(base)) {
 		if (size < PAGE_SIZE - (base & ~PAGE_MASK)) {
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [kernel-hardening] [PATCH v3 01/21] of/fdt: make memblock minimum physical address arch configurable
@ 2016-01-11 13:18   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:18 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

By default, early_init_dt_add_memory_arch() ignores memory below
the base of the kernel image since it won't be addressable via the
linear mapping. However, this is not appropriate anymore once we
decouple the kernel text mapping from the linear mapping, so archs
may want to drop the low limit entirely. So allow the minimum to be
overridden by setting MIN_MEMBLOCK_ADDR.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 drivers/of/fdt.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index d2430298a309..0455564f8cbc 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -971,13 +971,16 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname,
 }
 
 #ifdef CONFIG_HAVE_MEMBLOCK
+#ifndef MIN_MEMBLOCK_ADDR
+#define MIN_MEMBLOCK_ADDR	__pa(PAGE_OFFSET)
+#endif
 #ifndef MAX_MEMBLOCK_ADDR
 #define MAX_MEMBLOCK_ADDR	((phys_addr_t)~0)
 #endif
 
 void __init __weak early_init_dt_add_memory_arch(u64 base, u64 size)
 {
-	const u64 phys_offset = __pa(PAGE_OFFSET);
+	const u64 phys_offset = MIN_MEMBLOCK_ADDR;
 
 	if (!PAGE_ALIGNED(base)) {
 		if (size < PAGE_SIZE - (base & ~PAGE_MASK)) {
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 02/21] arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
  2016-01-11 13:18 ` Ard Biesheuvel
  (?)
@ 2016-01-11 13:18   ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:18 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

This introduces the preprocessor symbol KIMAGE_VADDR which will serve as
the symbolic virtual base of the kernel region, i.e., the kernel's virtual
offset will be KIMAGE_VADDR + TEXT_OFFSET. For now, we define it as being
equal to PAGE_OFFSET, but in the future, it will be moved below it once
we move the kernel virtual mapping out of the linear mapping.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/memory.h | 10 ++++++++--
 arch/arm64/kernel/head.S        |  2 +-
 arch/arm64/kernel/vmlinux.lds.S |  4 ++--
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 853953cd1f08..bea9631b34a8 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -51,7 +51,8 @@
 #define VA_BITS			(CONFIG_ARM64_VA_BITS)
 #define VA_START		(UL(0xffffffffffffffff) << VA_BITS)
 #define PAGE_OFFSET		(UL(0xffffffffffffffff) << (VA_BITS - 1))
-#define MODULES_END		(PAGE_OFFSET)
+#define KIMAGE_VADDR		(PAGE_OFFSET)
+#define MODULES_END		(KIMAGE_VADDR)
 #define MODULES_VADDR		(MODULES_END - SZ_64M)
 #define PCI_IO_END		(MODULES_VADDR - SZ_2M)
 #define PCI_IO_START		(PCI_IO_END - PCI_IO_SIZE)
@@ -75,8 +76,13 @@
  * private definitions which should NOT be used outside memory.h
  * files.  Use virt_to_phys/phys_to_virt/__pa/__va instead.
  */
-#define __virt_to_phys(x)	(((phys_addr_t)(x) - PAGE_OFFSET + PHYS_OFFSET))
+#define __virt_to_phys(x) ({						\
+	phys_addr_t __x = (phys_addr_t)(x);				\
+	__x >= PAGE_OFFSET ? (__x - PAGE_OFFSET + PHYS_OFFSET) :	\
+			     (__x - KIMAGE_VADDR + PHYS_OFFSET); })
+
 #define __phys_to_virt(x)	((unsigned long)((x) - PHYS_OFFSET + PAGE_OFFSET))
+#define __phys_to_kimg(x)	((unsigned long)((x) - PHYS_OFFSET + KIMAGE_VADDR))
 
 /*
  * Convert a page to/from a physical address
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 63335fa68426..350515276541 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -389,7 +389,7 @@ __create_page_tables:
 	 * Map the kernel image (starting with PHYS_OFFSET).
 	 */
 	mov	x0, x26				// swapper_pg_dir
-	mov	x5, #PAGE_OFFSET
+	ldr	x5, =KIMAGE_VADDR
 	create_pgd_entry x0, x5, x3, x6
 	ldr	x6, =KERNEL_END			// __va(KERNEL_END)
 	mov	x3, x24				// phys offset
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 7de6c39858a5..ced0dedcabcc 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -88,7 +88,7 @@ SECTIONS
 		*(.discard.*)
 	}
 
-	. = PAGE_OFFSET + TEXT_OFFSET;
+	. = KIMAGE_VADDR + TEXT_OFFSET;
 
 	.head.text : {
 		_text = .;
@@ -185,4 +185,4 @@ ASSERT(__idmap_text_end - (__idmap_text_start & ~(SZ_4K - 1)) <= SZ_4K,
 /*
  * If padding is applied before .head.text, virt<->phys conversions will fail.
  */
-ASSERT(_text == (PAGE_OFFSET + TEXT_OFFSET), "HEAD is misaligned")
+ASSERT(_text == (KIMAGE_VADDR + TEXT_OFFSET), "HEAD is misaligned")
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 02/21] arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
@ 2016-01-11 13:18   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:18 UTC (permalink / raw)
  To: linux-arm-kernel

This introduces the preprocessor symbol KIMAGE_VADDR which will serve as
the symbolic virtual base of the kernel region, i.e., the kernel's virtual
offset will be KIMAGE_VADDR + TEXT_OFFSET. For now, we define it as being
equal to PAGE_OFFSET, but in the future, it will be moved below it once
we move the kernel virtual mapping out of the linear mapping.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/memory.h | 10 ++++++++--
 arch/arm64/kernel/head.S        |  2 +-
 arch/arm64/kernel/vmlinux.lds.S |  4 ++--
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 853953cd1f08..bea9631b34a8 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -51,7 +51,8 @@
 #define VA_BITS			(CONFIG_ARM64_VA_BITS)
 #define VA_START		(UL(0xffffffffffffffff) << VA_BITS)
 #define PAGE_OFFSET		(UL(0xffffffffffffffff) << (VA_BITS - 1))
-#define MODULES_END		(PAGE_OFFSET)
+#define KIMAGE_VADDR		(PAGE_OFFSET)
+#define MODULES_END		(KIMAGE_VADDR)
 #define MODULES_VADDR		(MODULES_END - SZ_64M)
 #define PCI_IO_END		(MODULES_VADDR - SZ_2M)
 #define PCI_IO_START		(PCI_IO_END - PCI_IO_SIZE)
@@ -75,8 +76,13 @@
  * private definitions which should NOT be used outside memory.h
  * files.  Use virt_to_phys/phys_to_virt/__pa/__va instead.
  */
-#define __virt_to_phys(x)	(((phys_addr_t)(x) - PAGE_OFFSET + PHYS_OFFSET))
+#define __virt_to_phys(x) ({						\
+	phys_addr_t __x = (phys_addr_t)(x);				\
+	__x >= PAGE_OFFSET ? (__x - PAGE_OFFSET + PHYS_OFFSET) :	\
+			     (__x - KIMAGE_VADDR + PHYS_OFFSET); })
+
 #define __phys_to_virt(x)	((unsigned long)((x) - PHYS_OFFSET + PAGE_OFFSET))
+#define __phys_to_kimg(x)	((unsigned long)((x) - PHYS_OFFSET + KIMAGE_VADDR))
 
 /*
  * Convert a page to/from a physical address
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 63335fa68426..350515276541 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -389,7 +389,7 @@ __create_page_tables:
 	 * Map the kernel image (starting with PHYS_OFFSET).
 	 */
 	mov	x0, x26				// swapper_pg_dir
-	mov	x5, #PAGE_OFFSET
+	ldr	x5, =KIMAGE_VADDR
 	create_pgd_entry x0, x5, x3, x6
 	ldr	x6, =KERNEL_END			// __va(KERNEL_END)
 	mov	x3, x24				// phys offset
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 7de6c39858a5..ced0dedcabcc 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -88,7 +88,7 @@ SECTIONS
 		*(.discard.*)
 	}
 
-	. = PAGE_OFFSET + TEXT_OFFSET;
+	. = KIMAGE_VADDR + TEXT_OFFSET;
 
 	.head.text : {
 		_text = .;
@@ -185,4 +185,4 @@ ASSERT(__idmap_text_end - (__idmap_text_start & ~(SZ_4K - 1)) <= SZ_4K,
 /*
  * If padding is applied before .head.text, virt<->phys conversions will fail.
  */
-ASSERT(_text == (PAGE_OFFSET + TEXT_OFFSET), "HEAD is misaligned")
+ASSERT(_text == (KIMAGE_VADDR + TEXT_OFFSET), "HEAD is misaligned")
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [kernel-hardening] [PATCH v3 02/21] arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
@ 2016-01-11 13:18   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:18 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

This introduces the preprocessor symbol KIMAGE_VADDR which will serve as
the symbolic virtual base of the kernel region, i.e., the kernel's virtual
offset will be KIMAGE_VADDR + TEXT_OFFSET. For now, we define it as being
equal to PAGE_OFFSET, but in the future, it will be moved below it once
we move the kernel virtual mapping out of the linear mapping.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/memory.h | 10 ++++++++--
 arch/arm64/kernel/head.S        |  2 +-
 arch/arm64/kernel/vmlinux.lds.S |  4 ++--
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 853953cd1f08..bea9631b34a8 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -51,7 +51,8 @@
 #define VA_BITS			(CONFIG_ARM64_VA_BITS)
 #define VA_START		(UL(0xffffffffffffffff) << VA_BITS)
 #define PAGE_OFFSET		(UL(0xffffffffffffffff) << (VA_BITS - 1))
-#define MODULES_END		(PAGE_OFFSET)
+#define KIMAGE_VADDR		(PAGE_OFFSET)
+#define MODULES_END		(KIMAGE_VADDR)
 #define MODULES_VADDR		(MODULES_END - SZ_64M)
 #define PCI_IO_END		(MODULES_VADDR - SZ_2M)
 #define PCI_IO_START		(PCI_IO_END - PCI_IO_SIZE)
@@ -75,8 +76,13 @@
  * private definitions which should NOT be used outside memory.h
  * files.  Use virt_to_phys/phys_to_virt/__pa/__va instead.
  */
-#define __virt_to_phys(x)	(((phys_addr_t)(x) - PAGE_OFFSET + PHYS_OFFSET))
+#define __virt_to_phys(x) ({						\
+	phys_addr_t __x = (phys_addr_t)(x);				\
+	__x >= PAGE_OFFSET ? (__x - PAGE_OFFSET + PHYS_OFFSET) :	\
+			     (__x - KIMAGE_VADDR + PHYS_OFFSET); })
+
 #define __phys_to_virt(x)	((unsigned long)((x) - PHYS_OFFSET + PAGE_OFFSET))
+#define __phys_to_kimg(x)	((unsigned long)((x) - PHYS_OFFSET + KIMAGE_VADDR))
 
 /*
  * Convert a page to/from a physical address
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 63335fa68426..350515276541 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -389,7 +389,7 @@ __create_page_tables:
 	 * Map the kernel image (starting with PHYS_OFFSET).
 	 */
 	mov	x0, x26				// swapper_pg_dir
-	mov	x5, #PAGE_OFFSET
+	ldr	x5, =KIMAGE_VADDR
 	create_pgd_entry x0, x5, x3, x6
 	ldr	x6, =KERNEL_END			// __va(KERNEL_END)
 	mov	x3, x24				// phys offset
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 7de6c39858a5..ced0dedcabcc 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -88,7 +88,7 @@ SECTIONS
 		*(.discard.*)
 	}
 
-	. = PAGE_OFFSET + TEXT_OFFSET;
+	. = KIMAGE_VADDR + TEXT_OFFSET;
 
 	.head.text : {
 		_text = .;
@@ -185,4 +185,4 @@ ASSERT(__idmap_text_end - (__idmap_text_start & ~(SZ_4K - 1)) <= SZ_4K,
 /*
  * If padding is applied before .head.text, virt<->phys conversions will fail.
  */
-ASSERT(_text == (PAGE_OFFSET + TEXT_OFFSET), "HEAD is misaligned")
+ASSERT(_text == (KIMAGE_VADDR + TEXT_OFFSET), "HEAD is misaligned")
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 03/21] arm64: pgtable: add dummy pud_index() and pmd_index() definitions
  2016-01-11 13:18 ` Ard Biesheuvel
  (?)
@ 2016-01-11 13:18   ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:18 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

Add definitions of pud_index() and pmd_index() for configurations with
fewer than 4 resp. 3 translation levels. This makes it easier to keep
the users (e.g., the fixmap init code) generic.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/pgtable.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index fe9bf47db5d3..6129f6755081 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -495,6 +495,7 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
 #else
 
 #define pud_page_paddr(pud)	({ BUILD_BUG(); 0; })
+#define pmd_index(addr)		({ BUILD_BUG(); 0; })
 
 /* Match pmd_offset folding in <asm/generic/pgtable-nopmd.h> */
 #define pmd_set_fixmap(addr)		NULL
@@ -542,6 +543,7 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
 #else
 
 #define pgd_page_paddr(pgd)	({ BUILD_BUG(); 0;})
+#define pud_index(addr)		({ BUILD_BUG(); 0;})
 
 /* Match pud_offset folding in <asm/generic/pgtable-nopud.h> */
 #define pud_set_fixmap(addr)		NULL
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 03/21] arm64: pgtable: add dummy pud_index() and pmd_index() definitions
@ 2016-01-11 13:18   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:18 UTC (permalink / raw)
  To: linux-arm-kernel

Add definitions of pud_index() and pmd_index() for configurations with
fewer than 4 resp. 3 translation levels. This makes it easier to keep
the users (e.g., the fixmap init code) generic.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/pgtable.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index fe9bf47db5d3..6129f6755081 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -495,6 +495,7 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
 #else
 
 #define pud_page_paddr(pud)	({ BUILD_BUG(); 0; })
+#define pmd_index(addr)		({ BUILD_BUG(); 0; })
 
 /* Match pmd_offset folding in <asm/generic/pgtable-nopmd.h> */
 #define pmd_set_fixmap(addr)		NULL
@@ -542,6 +543,7 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
 #else
 
 #define pgd_page_paddr(pgd)	({ BUILD_BUG(); 0;})
+#define pud_index(addr)		({ BUILD_BUG(); 0;})
 
 /* Match pud_offset folding in <asm/generic/pgtable-nopud.h> */
 #define pud_set_fixmap(addr)		NULL
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [kernel-hardening] [PATCH v3 03/21] arm64: pgtable: add dummy pud_index() and pmd_index() definitions
@ 2016-01-11 13:18   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:18 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

Add definitions of pud_index() and pmd_index() for configurations with
fewer than 4 resp. 3 translation levels. This makes it easier to keep
the users (e.g., the fixmap init code) generic.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/pgtable.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index fe9bf47db5d3..6129f6755081 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -495,6 +495,7 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
 #else
 
 #define pud_page_paddr(pud)	({ BUILD_BUG(); 0; })
+#define pmd_index(addr)		({ BUILD_BUG(); 0; })
 
 /* Match pmd_offset folding in <asm/generic/pgtable-nopmd.h> */
 #define pmd_set_fixmap(addr)		NULL
@@ -542,6 +543,7 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
 #else
 
 #define pgd_page_paddr(pgd)	({ BUILD_BUG(); 0;})
+#define pud_index(addr)		({ BUILD_BUG(); 0;})
 
 /* Match pud_offset folding in <asm/generic/pgtable-nopud.h> */
 #define pud_set_fixmap(addr)		NULL
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping
  2016-01-11 13:18 ` Ard Biesheuvel
  (?)
@ 2016-01-11 13:18   ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:18 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

Since the early fixmap page tables are populated using pages that are
part of the static footprint of the kernel, they are covered by the
initial kernel mapping, and we can refer to them without using __va/__pa
translations, which are tied to the linear mapping.

Since the fixmap page tables are disjoint from the kernel mapping up
to the top level pgd entry, we can refer to bm_pte[] directly, and there
is no need to walk the page tables and perform __pa()/__va() translations
at each step.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/mmu.c | 32 ++++++--------------
 1 file changed, 9 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 7711554a94f4..75b5f0dc3bdc 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -570,38 +570,24 @@ void vmemmap_free(unsigned long start, unsigned long end)
 #endif	/* CONFIG_SPARSEMEM_VMEMMAP */
 
 static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
-#if CONFIG_PGTABLE_LEVELS > 2
 static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
-#endif
-#if CONFIG_PGTABLE_LEVELS > 3
 static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
-#endif
 
 static inline pud_t * fixmap_pud(unsigned long addr)
 {
-	pgd_t *pgd = pgd_offset_k(addr);
-
-	BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
-
-	return pud_offset(pgd, addr);
+	return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
+					   : (pud_t *)pgd_offset_k(addr);
 }
 
-static inline pmd_t * fixmap_pmd(unsigned long addr)
+static inline pte_t * fixmap_pmd(unsigned long addr)
 {
-	pud_t *pud = fixmap_pud(addr);
-
-	BUG_ON(pud_none(*pud) || pud_bad(*pud));
-
-	return pmd_offset(pud, addr);
+	return (CONFIG_PGTABLE_LEVELS > 2) ? &bm_pmd[pmd_index(addr)]
+					   : (pmd_t *)pgd_offset_k(addr);
 }
 
 static inline pte_t * fixmap_pte(unsigned long addr)
 {
-	pmd_t *pmd = fixmap_pmd(addr);
-
-	BUG_ON(pmd_none(*pmd) || pmd_bad(*pmd));
-
-	return pte_offset_kernel(pmd, addr);
+	return &bm_pte[pte_index(addr)];
 }
 
 void __init early_fixmap_init(void)
@@ -613,14 +599,14 @@ void __init early_fixmap_init(void)
 
 	pgd = pgd_offset_k(addr);
 	pgd_populate(&init_mm, pgd, bm_pud);
-	pud = pud_offset(pgd, addr);
+	pud = fixmap_pud(addr);
 	pud_populate(&init_mm, pud, bm_pmd);
-	pmd = pmd_offset(pud, addr);
+	pmd = fixmap_pmd(addr);
 	pmd_populate_kernel(&init_mm, pmd, bm_pte);
 
 	/*
 	 * The boot-ioremap range spans multiple pmds, for which
-	 * we are not preparted:
+	 * we are not prepared:
 	 */
 	BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
 		     != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping
@ 2016-01-11 13:18   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:18 UTC (permalink / raw)
  To: linux-arm-kernel

Since the early fixmap page tables are populated using pages that are
part of the static footprint of the kernel, they are covered by the
initial kernel mapping, and we can refer to them without using __va/__pa
translations, which are tied to the linear mapping.

Since the fixmap page tables are disjoint from the kernel mapping up
to the top level pgd entry, we can refer to bm_pte[] directly, and there
is no need to walk the page tables and perform __pa()/__va() translations
at each step.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/mmu.c | 32 ++++++--------------
 1 file changed, 9 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 7711554a94f4..75b5f0dc3bdc 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -570,38 +570,24 @@ void vmemmap_free(unsigned long start, unsigned long end)
 #endif	/* CONFIG_SPARSEMEM_VMEMMAP */
 
 static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
-#if CONFIG_PGTABLE_LEVELS > 2
 static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
-#endif
-#if CONFIG_PGTABLE_LEVELS > 3
 static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
-#endif
 
 static inline pud_t * fixmap_pud(unsigned long addr)
 {
-	pgd_t *pgd = pgd_offset_k(addr);
-
-	BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
-
-	return pud_offset(pgd, addr);
+	return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
+					   : (pud_t *)pgd_offset_k(addr);
 }
 
-static inline pmd_t * fixmap_pmd(unsigned long addr)
+static inline pte_t * fixmap_pmd(unsigned long addr)
 {
-	pud_t *pud = fixmap_pud(addr);
-
-	BUG_ON(pud_none(*pud) || pud_bad(*pud));
-
-	return pmd_offset(pud, addr);
+	return (CONFIG_PGTABLE_LEVELS > 2) ? &bm_pmd[pmd_index(addr)]
+					   : (pmd_t *)pgd_offset_k(addr);
 }
 
 static inline pte_t * fixmap_pte(unsigned long addr)
 {
-	pmd_t *pmd = fixmap_pmd(addr);
-
-	BUG_ON(pmd_none(*pmd) || pmd_bad(*pmd));
-
-	return pte_offset_kernel(pmd, addr);
+	return &bm_pte[pte_index(addr)];
 }
 
 void __init early_fixmap_init(void)
@@ -613,14 +599,14 @@ void __init early_fixmap_init(void)
 
 	pgd = pgd_offset_k(addr);
 	pgd_populate(&init_mm, pgd, bm_pud);
-	pud = pud_offset(pgd, addr);
+	pud = fixmap_pud(addr);
 	pud_populate(&init_mm, pud, bm_pmd);
-	pmd = pmd_offset(pud, addr);
+	pmd = fixmap_pmd(addr);
 	pmd_populate_kernel(&init_mm, pmd, bm_pte);
 
 	/*
 	 * The boot-ioremap range spans multiple pmds, for which
-	 * we are not preparted:
+	 * we are not prepared:
 	 */
 	BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
 		     != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [kernel-hardening] [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping
@ 2016-01-11 13:18   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:18 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

Since the early fixmap page tables are populated using pages that are
part of the static footprint of the kernel, they are covered by the
initial kernel mapping, and we can refer to them without using __va/__pa
translations, which are tied to the linear mapping.

Since the fixmap page tables are disjoint from the kernel mapping up
to the top level pgd entry, we can refer to bm_pte[] directly, and there
is no need to walk the page tables and perform __pa()/__va() translations
at each step.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/mm/mmu.c | 32 ++++++--------------
 1 file changed, 9 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 7711554a94f4..75b5f0dc3bdc 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -570,38 +570,24 @@ void vmemmap_free(unsigned long start, unsigned long end)
 #endif	/* CONFIG_SPARSEMEM_VMEMMAP */
 
 static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
-#if CONFIG_PGTABLE_LEVELS > 2
 static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
-#endif
-#if CONFIG_PGTABLE_LEVELS > 3
 static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
-#endif
 
 static inline pud_t * fixmap_pud(unsigned long addr)
 {
-	pgd_t *pgd = pgd_offset_k(addr);
-
-	BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
-
-	return pud_offset(pgd, addr);
+	return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
+					   : (pud_t *)pgd_offset_k(addr);
 }
 
-static inline pmd_t * fixmap_pmd(unsigned long addr)
+static inline pte_t * fixmap_pmd(unsigned long addr)
 {
-	pud_t *pud = fixmap_pud(addr);
-
-	BUG_ON(pud_none(*pud) || pud_bad(*pud));
-
-	return pmd_offset(pud, addr);
+	return (CONFIG_PGTABLE_LEVELS > 2) ? &bm_pmd[pmd_index(addr)]
+					   : (pmd_t *)pgd_offset_k(addr);
 }
 
 static inline pte_t * fixmap_pte(unsigned long addr)
 {
-	pmd_t *pmd = fixmap_pmd(addr);
-
-	BUG_ON(pmd_none(*pmd) || pmd_bad(*pmd));
-
-	return pte_offset_kernel(pmd, addr);
+	return &bm_pte[pte_index(addr)];
 }
 
 void __init early_fixmap_init(void)
@@ -613,14 +599,14 @@ void __init early_fixmap_init(void)
 
 	pgd = pgd_offset_k(addr);
 	pgd_populate(&init_mm, pgd, bm_pud);
-	pud = pud_offset(pgd, addr);
+	pud = fixmap_pud(addr);
 	pud_populate(&init_mm, pud, bm_pmd);
-	pmd = pmd_offset(pud, addr);
+	pmd = fixmap_pmd(addr);
 	pmd_populate_kernel(&init_mm, pmd, bm_pte);
 
 	/*
 	 * The boot-ioremap range spans multiple pmds, for which
-	 * we are not preparted:
+	 * we are not prepared:
 	 */
 	BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
 		     != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 05/21] arm64: kvm: deal with kernel symbols outside of linear mapping
  2016-01-11 13:18 ` Ard Biesheuvel
  (?)
@ 2016-01-11 13:18   ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:18 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

KVM on arm64 uses a fixed offset between the linear mapping at EL1 and
the HYP mapping at EL2. Before we can move the kernel virtual mapping
out of the linear mapping, we have to make sure that references to kernel
symbols that are accessed via the HYP mapping are translated to their
linear equivalent.

To prevent inadvertent direct references from sneaking in later, change
the type of all extern declarations to HYP kernel symbols to the opaque
'struct kvm_ksym', which does not decay to a pointer type like char arrays
and function references. This is not bullet proof, but at least forces the
user to take the address explicitly rather than referencing it directly.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/include/asm/kvm_asm.h    |  2 ++
 arch/arm/include/asm/kvm_mmu.h    |  2 ++
 arch/arm/kvm/arm.c                |  5 +++--
 arch/arm/kvm/mmu.c                |  8 +++-----
 arch/arm64/include/asm/kvm_asm.h  | 19 ++++++++++++-------
 arch/arm64/include/asm/kvm_host.h |  8 +++++---
 arch/arm64/include/asm/kvm_mmu.h  |  2 ++
 arch/arm64/include/asm/virt.h     |  4 ----
 arch/arm64/kvm/debug.c            |  1 +
 arch/arm64/kvm/hyp.S              |  6 +++---
 10 files changed, 33 insertions(+), 24 deletions(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 194c91b610ff..484ffdf7c70b 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -99,6 +99,8 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+
+extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
 #endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 405aa1883307..412b363f79e9 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -30,6 +30,8 @@
 #define HYP_PAGE_OFFSET		PAGE_OFFSET
 #define KERN_TO_HYP(kva)	(kva)
 
+#define kvm_ksym_ref(kva)	(kva)
+
 /*
  * Our virtual mapping for the boot-time MMU-enable code. Must be
  * shared across all the page-tables. Conveniently, we use the vectors
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index e06fd299de08..70e6d557c75f 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -969,7 +969,7 @@ static void cpu_init_hyp_mode(void *dummy)
 	pgd_ptr = kvm_mmu_get_httbr();
 	stack_page = __this_cpu_read(kvm_arm_hyp_stack_page);
 	hyp_stack_ptr = stack_page + PAGE_SIZE;
-	vector_ptr = (unsigned long)__kvm_hyp_vector;
+	vector_ptr = (unsigned long)kvm_ksym_ref(__kvm_hyp_vector);
 
 	__cpu_init_hyp_mode(boot_pgd_ptr, pgd_ptr, hyp_stack_ptr, vector_ptr);
 
@@ -1061,7 +1061,8 @@ static int init_hyp_mode(void)
 	/*
 	 * Map the Hyp-code called directly from the host
 	 */
-	err = create_hyp_mappings(__kvm_hyp_code_start, __kvm_hyp_code_end);
+	err = create_hyp_mappings(kvm_ksym_ref(__kvm_hyp_code_start),
+				  kvm_ksym_ref(__kvm_hyp_code_end));
 	if (err) {
 		kvm_err("Cannot map world-switch code\n");
 		goto out_free_mappings;
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 7dace909d5cf..9ab9e4b6376e 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -31,8 +31,6 @@
 
 #include "trace.h"
 
-extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
-
 static pgd_t *boot_hyp_pgd;
 static pgd_t *hyp_pgd;
 static pgd_t *merged_hyp_pgd;
@@ -1647,9 +1645,9 @@ int kvm_mmu_init(void)
 {
 	int err;
 
-	hyp_idmap_start = kvm_virt_to_phys(__hyp_idmap_text_start);
-	hyp_idmap_end = kvm_virt_to_phys(__hyp_idmap_text_end);
-	hyp_idmap_vector = kvm_virt_to_phys(__kvm_hyp_init);
+	hyp_idmap_start = kvm_virt_to_phys(&__hyp_idmap_text_start);
+	hyp_idmap_end = kvm_virt_to_phys(&__hyp_idmap_text_end);
+	hyp_idmap_vector = kvm_virt_to_phys(&__kvm_hyp_init);
 
 	/*
 	 * We rely on the linker script to ensure at build time that the HYP
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 5e377101f919..e3865845d3e1 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -105,24 +105,29 @@
 #ifndef __ASSEMBLY__
 struct kvm;
 struct kvm_vcpu;
+struct kvm_ksym;
 
 extern char __kvm_hyp_init[];
 extern char __kvm_hyp_init_end[];
 
-extern char __kvm_hyp_vector[];
+extern struct kvm_ksym __kvm_hyp_vector;
 
 #define	__kvm_hyp_code_start	__hyp_text_start
 #define	__kvm_hyp_code_end	__hyp_text_end
+extern struct kvm_ksym __hyp_text_start;
+extern struct kvm_ksym __hyp_text_end;
 
-extern void __kvm_flush_vm_context(void);
-extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
-extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+extern struct kvm_ksym __kvm_flush_vm_context;
+extern struct kvm_ksym __kvm_tlb_flush_vmid_ipa;
+extern struct kvm_ksym __kvm_tlb_flush_vmid;
 
-extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+extern struct kvm_ksym __kvm_vcpu_run;
 
-extern u64 __vgic_v3_get_ich_vtr_el2(void);
+extern struct kvm_ksym __hyp_idmap_text_start, __hyp_idmap_text_end;
 
-extern u32 __kvm_get_mdcr_el2(void);
+extern struct kvm_ksym __vgic_v3_get_ich_vtr_el2;
+
+extern struct kvm_ksym __kvm_get_mdcr_el2;
 
 #endif
 
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index a35ce7266aac..90c6368ad7c8 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -222,7 +222,7 @@ static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
 struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void);
 
-u64 kvm_call_hyp(void *hypfn, ...);
+u64 __kvm_call_hyp(void *hypfn, ...);
 void force_vm_exit(const cpumask_t *mask);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
@@ -243,8 +243,8 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
 	 * Call initialization code, and switch to the full blown
 	 * HYP code.
 	 */
-	kvm_call_hyp((void *)boot_pgd_ptr, pgd_ptr,
-		     hyp_stack_ptr, vector_ptr);
+	__kvm_call_hyp((void *)boot_pgd_ptr, pgd_ptr,
+		       hyp_stack_ptr, vector_ptr);
 }
 
 static inline void kvm_arch_hardware_disable(void) {}
@@ -258,4 +258,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
 
+#define kvm_call_hyp(f, ...) __kvm_call_hyp(kvm_ksym_ref(f), ##__VA_ARGS__)
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 61505676d085..0899026a2821 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -73,6 +73,8 @@
 
 #define KERN_TO_HYP(kva)	((unsigned long)kva - PAGE_OFFSET + HYP_PAGE_OFFSET)
 
+#define kvm_ksym_ref(sym)	((void *)&sym - KIMAGE_VADDR + PAGE_OFFSET)
+
 /*
  * We currently only support a 40bit IPA.
  */
diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index 7a5df5252dd7..215ad4649dd7 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -50,10 +50,6 @@ static inline bool is_hyp_mode_mismatched(void)
 	return __boot_cpu_mode[0] != __boot_cpu_mode[1];
 }
 
-/* The section containing the hypervisor text */
-extern char __hyp_text_start[];
-extern char __hyp_text_end[];
-
 #endif /* __ASSEMBLY__ */
 
 #endif /* ! __ASM__VIRT_H */
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index 47e5f0feaee8..f73d8c9b999b 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -24,6 +24,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_mmu.h>
 
 #include "trace.h"
 
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 86c289832272..309e3479dc2c 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -923,7 +923,7 @@ __hyp_panic_str:
 	.align	2
 
 /*
- * u64 kvm_call_hyp(void *hypfn, ...);
+ * u64 __kvm_call_hyp(void *hypfn, ...);
  *
  * This is not really a variadic function in the classic C-way and care must
  * be taken when calling this to ensure parameters are passed in registers
@@ -940,10 +940,10 @@ __hyp_panic_str:
  * used to implement __hyp_get_vectors in the same way as in
  * arch/arm64/kernel/hyp_stub.S.
  */
-ENTRY(kvm_call_hyp)
+ENTRY(__kvm_call_hyp)
 	hvc	#0
 	ret
-ENDPROC(kvm_call_hyp)
+ENDPROC(__kvm_call_hyp)
 
 .macro invalid_vector	label, target
 	.align	2
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 05/21] arm64: kvm: deal with kernel symbols outside of linear mapping
@ 2016-01-11 13:18   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:18 UTC (permalink / raw)
  To: linux-arm-kernel

KVM on arm64 uses a fixed offset between the linear mapping at EL1 and
the HYP mapping at EL2. Before we can move the kernel virtual mapping
out of the linear mapping, we have to make sure that references to kernel
symbols that are accessed via the HYP mapping are translated to their
linear equivalent.

To prevent inadvertent direct references from sneaking in later, change
the type of all extern declarations to HYP kernel symbols to the opaque
'struct kvm_ksym', which does not decay to a pointer type like char arrays
and function references. This is not bullet proof, but at least forces the
user to take the address explicitly rather than referencing it directly.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/include/asm/kvm_asm.h    |  2 ++
 arch/arm/include/asm/kvm_mmu.h    |  2 ++
 arch/arm/kvm/arm.c                |  5 +++--
 arch/arm/kvm/mmu.c                |  8 +++-----
 arch/arm64/include/asm/kvm_asm.h  | 19 ++++++++++++-------
 arch/arm64/include/asm/kvm_host.h |  8 +++++---
 arch/arm64/include/asm/kvm_mmu.h  |  2 ++
 arch/arm64/include/asm/virt.h     |  4 ----
 arch/arm64/kvm/debug.c            |  1 +
 arch/arm64/kvm/hyp.S              |  6 +++---
 10 files changed, 33 insertions(+), 24 deletions(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 194c91b610ff..484ffdf7c70b 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -99,6 +99,8 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+
+extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
 #endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 405aa1883307..412b363f79e9 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -30,6 +30,8 @@
 #define HYP_PAGE_OFFSET		PAGE_OFFSET
 #define KERN_TO_HYP(kva)	(kva)
 
+#define kvm_ksym_ref(kva)	(kva)
+
 /*
  * Our virtual mapping for the boot-time MMU-enable code. Must be
  * shared across all the page-tables. Conveniently, we use the vectors
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index e06fd299de08..70e6d557c75f 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -969,7 +969,7 @@ static void cpu_init_hyp_mode(void *dummy)
 	pgd_ptr = kvm_mmu_get_httbr();
 	stack_page = __this_cpu_read(kvm_arm_hyp_stack_page);
 	hyp_stack_ptr = stack_page + PAGE_SIZE;
-	vector_ptr = (unsigned long)__kvm_hyp_vector;
+	vector_ptr = (unsigned long)kvm_ksym_ref(__kvm_hyp_vector);
 
 	__cpu_init_hyp_mode(boot_pgd_ptr, pgd_ptr, hyp_stack_ptr, vector_ptr);
 
@@ -1061,7 +1061,8 @@ static int init_hyp_mode(void)
 	/*
 	 * Map the Hyp-code called directly from the host
 	 */
-	err = create_hyp_mappings(__kvm_hyp_code_start, __kvm_hyp_code_end);
+	err = create_hyp_mappings(kvm_ksym_ref(__kvm_hyp_code_start),
+				  kvm_ksym_ref(__kvm_hyp_code_end));
 	if (err) {
 		kvm_err("Cannot map world-switch code\n");
 		goto out_free_mappings;
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 7dace909d5cf..9ab9e4b6376e 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -31,8 +31,6 @@
 
 #include "trace.h"
 
-extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
-
 static pgd_t *boot_hyp_pgd;
 static pgd_t *hyp_pgd;
 static pgd_t *merged_hyp_pgd;
@@ -1647,9 +1645,9 @@ int kvm_mmu_init(void)
 {
 	int err;
 
-	hyp_idmap_start = kvm_virt_to_phys(__hyp_idmap_text_start);
-	hyp_idmap_end = kvm_virt_to_phys(__hyp_idmap_text_end);
-	hyp_idmap_vector = kvm_virt_to_phys(__kvm_hyp_init);
+	hyp_idmap_start = kvm_virt_to_phys(&__hyp_idmap_text_start);
+	hyp_idmap_end = kvm_virt_to_phys(&__hyp_idmap_text_end);
+	hyp_idmap_vector = kvm_virt_to_phys(&__kvm_hyp_init);
 
 	/*
 	 * We rely on the linker script to ensure at build time that the HYP
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 5e377101f919..e3865845d3e1 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -105,24 +105,29 @@
 #ifndef __ASSEMBLY__
 struct kvm;
 struct kvm_vcpu;
+struct kvm_ksym;
 
 extern char __kvm_hyp_init[];
 extern char __kvm_hyp_init_end[];
 
-extern char __kvm_hyp_vector[];
+extern struct kvm_ksym __kvm_hyp_vector;
 
 #define	__kvm_hyp_code_start	__hyp_text_start
 #define	__kvm_hyp_code_end	__hyp_text_end
+extern struct kvm_ksym __hyp_text_start;
+extern struct kvm_ksym __hyp_text_end;
 
-extern void __kvm_flush_vm_context(void);
-extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
-extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+extern struct kvm_ksym __kvm_flush_vm_context;
+extern struct kvm_ksym __kvm_tlb_flush_vmid_ipa;
+extern struct kvm_ksym __kvm_tlb_flush_vmid;
 
-extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+extern struct kvm_ksym __kvm_vcpu_run;
 
-extern u64 __vgic_v3_get_ich_vtr_el2(void);
+extern struct kvm_ksym __hyp_idmap_text_start, __hyp_idmap_text_end;
 
-extern u32 __kvm_get_mdcr_el2(void);
+extern struct kvm_ksym __vgic_v3_get_ich_vtr_el2;
+
+extern struct kvm_ksym __kvm_get_mdcr_el2;
 
 #endif
 
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index a35ce7266aac..90c6368ad7c8 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -222,7 +222,7 @@ static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
 struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void);
 
-u64 kvm_call_hyp(void *hypfn, ...);
+u64 __kvm_call_hyp(void *hypfn, ...);
 void force_vm_exit(const cpumask_t *mask);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
@@ -243,8 +243,8 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
 	 * Call initialization code, and switch to the full blown
 	 * HYP code.
 	 */
-	kvm_call_hyp((void *)boot_pgd_ptr, pgd_ptr,
-		     hyp_stack_ptr, vector_ptr);
+	__kvm_call_hyp((void *)boot_pgd_ptr, pgd_ptr,
+		       hyp_stack_ptr, vector_ptr);
 }
 
 static inline void kvm_arch_hardware_disable(void) {}
@@ -258,4 +258,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
 
+#define kvm_call_hyp(f, ...) __kvm_call_hyp(kvm_ksym_ref(f), ##__VA_ARGS__)
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 61505676d085..0899026a2821 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -73,6 +73,8 @@
 
 #define KERN_TO_HYP(kva)	((unsigned long)kva - PAGE_OFFSET + HYP_PAGE_OFFSET)
 
+#define kvm_ksym_ref(sym)	((void *)&sym - KIMAGE_VADDR + PAGE_OFFSET)
+
 /*
  * We currently only support a 40bit IPA.
  */
diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index 7a5df5252dd7..215ad4649dd7 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -50,10 +50,6 @@ static inline bool is_hyp_mode_mismatched(void)
 	return __boot_cpu_mode[0] != __boot_cpu_mode[1];
 }
 
-/* The section containing the hypervisor text */
-extern char __hyp_text_start[];
-extern char __hyp_text_end[];
-
 #endif /* __ASSEMBLY__ */
 
 #endif /* ! __ASM__VIRT_H */
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index 47e5f0feaee8..f73d8c9b999b 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -24,6 +24,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_mmu.h>
 
 #include "trace.h"
 
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 86c289832272..309e3479dc2c 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -923,7 +923,7 @@ __hyp_panic_str:
 	.align	2
 
 /*
- * u64 kvm_call_hyp(void *hypfn, ...);
+ * u64 __kvm_call_hyp(void *hypfn, ...);
  *
  * This is not really a variadic function in the classic C-way and care must
  * be taken when calling this to ensure parameters are passed in registers
@@ -940,10 +940,10 @@ __hyp_panic_str:
  * used to implement __hyp_get_vectors in the same way as in
  * arch/arm64/kernel/hyp_stub.S.
  */
-ENTRY(kvm_call_hyp)
+ENTRY(__kvm_call_hyp)
 	hvc	#0
 	ret
-ENDPROC(kvm_call_hyp)
+ENDPROC(__kvm_call_hyp)
 
 .macro invalid_vector	label, target
 	.align	2
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [kernel-hardening] [PATCH v3 05/21] arm64: kvm: deal with kernel symbols outside of linear mapping
@ 2016-01-11 13:18   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:18 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

KVM on arm64 uses a fixed offset between the linear mapping at EL1 and
the HYP mapping at EL2. Before we can move the kernel virtual mapping
out of the linear mapping, we have to make sure that references to kernel
symbols that are accessed via the HYP mapping are translated to their
linear equivalent.

To prevent inadvertent direct references from sneaking in later, change
the type of all extern declarations to HYP kernel symbols to the opaque
'struct kvm_ksym', which does not decay to a pointer type like char arrays
and function references. This is not bullet proof, but at least forces the
user to take the address explicitly rather than referencing it directly.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/include/asm/kvm_asm.h    |  2 ++
 arch/arm/include/asm/kvm_mmu.h    |  2 ++
 arch/arm/kvm/arm.c                |  5 +++--
 arch/arm/kvm/mmu.c                |  8 +++-----
 arch/arm64/include/asm/kvm_asm.h  | 19 ++++++++++++-------
 arch/arm64/include/asm/kvm_host.h |  8 +++++---
 arch/arm64/include/asm/kvm_mmu.h  |  2 ++
 arch/arm64/include/asm/virt.h     |  4 ----
 arch/arm64/kvm/debug.c            |  1 +
 arch/arm64/kvm/hyp.S              |  6 +++---
 10 files changed, 33 insertions(+), 24 deletions(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 194c91b610ff..484ffdf7c70b 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -99,6 +99,8 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+
+extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
 #endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 405aa1883307..412b363f79e9 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -30,6 +30,8 @@
 #define HYP_PAGE_OFFSET		PAGE_OFFSET
 #define KERN_TO_HYP(kva)	(kva)
 
+#define kvm_ksym_ref(kva)	(kva)
+
 /*
  * Our virtual mapping for the boot-time MMU-enable code. Must be
  * shared across all the page-tables. Conveniently, we use the vectors
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index e06fd299de08..70e6d557c75f 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -969,7 +969,7 @@ static void cpu_init_hyp_mode(void *dummy)
 	pgd_ptr = kvm_mmu_get_httbr();
 	stack_page = __this_cpu_read(kvm_arm_hyp_stack_page);
 	hyp_stack_ptr = stack_page + PAGE_SIZE;
-	vector_ptr = (unsigned long)__kvm_hyp_vector;
+	vector_ptr = (unsigned long)kvm_ksym_ref(__kvm_hyp_vector);
 
 	__cpu_init_hyp_mode(boot_pgd_ptr, pgd_ptr, hyp_stack_ptr, vector_ptr);
 
@@ -1061,7 +1061,8 @@ static int init_hyp_mode(void)
 	/*
 	 * Map the Hyp-code called directly from the host
 	 */
-	err = create_hyp_mappings(__kvm_hyp_code_start, __kvm_hyp_code_end);
+	err = create_hyp_mappings(kvm_ksym_ref(__kvm_hyp_code_start),
+				  kvm_ksym_ref(__kvm_hyp_code_end));
 	if (err) {
 		kvm_err("Cannot map world-switch code\n");
 		goto out_free_mappings;
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 7dace909d5cf..9ab9e4b6376e 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -31,8 +31,6 @@
 
 #include "trace.h"
 
-extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
-
 static pgd_t *boot_hyp_pgd;
 static pgd_t *hyp_pgd;
 static pgd_t *merged_hyp_pgd;
@@ -1647,9 +1645,9 @@ int kvm_mmu_init(void)
 {
 	int err;
 
-	hyp_idmap_start = kvm_virt_to_phys(__hyp_idmap_text_start);
-	hyp_idmap_end = kvm_virt_to_phys(__hyp_idmap_text_end);
-	hyp_idmap_vector = kvm_virt_to_phys(__kvm_hyp_init);
+	hyp_idmap_start = kvm_virt_to_phys(&__hyp_idmap_text_start);
+	hyp_idmap_end = kvm_virt_to_phys(&__hyp_idmap_text_end);
+	hyp_idmap_vector = kvm_virt_to_phys(&__kvm_hyp_init);
 
 	/*
 	 * We rely on the linker script to ensure at build time that the HYP
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 5e377101f919..e3865845d3e1 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -105,24 +105,29 @@
 #ifndef __ASSEMBLY__
 struct kvm;
 struct kvm_vcpu;
+struct kvm_ksym;
 
 extern char __kvm_hyp_init[];
 extern char __kvm_hyp_init_end[];
 
-extern char __kvm_hyp_vector[];
+extern struct kvm_ksym __kvm_hyp_vector;
 
 #define	__kvm_hyp_code_start	__hyp_text_start
 #define	__kvm_hyp_code_end	__hyp_text_end
+extern struct kvm_ksym __hyp_text_start;
+extern struct kvm_ksym __hyp_text_end;
 
-extern void __kvm_flush_vm_context(void);
-extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
-extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+extern struct kvm_ksym __kvm_flush_vm_context;
+extern struct kvm_ksym __kvm_tlb_flush_vmid_ipa;
+extern struct kvm_ksym __kvm_tlb_flush_vmid;
 
-extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+extern struct kvm_ksym __kvm_vcpu_run;
 
-extern u64 __vgic_v3_get_ich_vtr_el2(void);
+extern struct kvm_ksym __hyp_idmap_text_start, __hyp_idmap_text_end;
 
-extern u32 __kvm_get_mdcr_el2(void);
+extern struct kvm_ksym __vgic_v3_get_ich_vtr_el2;
+
+extern struct kvm_ksym __kvm_get_mdcr_el2;
 
 #endif
 
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index a35ce7266aac..90c6368ad7c8 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -222,7 +222,7 @@ static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
 struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void);
 
-u64 kvm_call_hyp(void *hypfn, ...);
+u64 __kvm_call_hyp(void *hypfn, ...);
 void force_vm_exit(const cpumask_t *mask);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 
@@ -243,8 +243,8 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
 	 * Call initialization code, and switch to the full blown
 	 * HYP code.
 	 */
-	kvm_call_hyp((void *)boot_pgd_ptr, pgd_ptr,
-		     hyp_stack_ptr, vector_ptr);
+	__kvm_call_hyp((void *)boot_pgd_ptr, pgd_ptr,
+		       hyp_stack_ptr, vector_ptr);
 }
 
 static inline void kvm_arch_hardware_disable(void) {}
@@ -258,4 +258,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
 
+#define kvm_call_hyp(f, ...) __kvm_call_hyp(kvm_ksym_ref(f), ##__VA_ARGS__)
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 61505676d085..0899026a2821 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -73,6 +73,8 @@
 
 #define KERN_TO_HYP(kva)	((unsigned long)kva - PAGE_OFFSET + HYP_PAGE_OFFSET)
 
+#define kvm_ksym_ref(sym)	((void *)&sym - KIMAGE_VADDR + PAGE_OFFSET)
+
 /*
  * We currently only support a 40bit IPA.
  */
diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index 7a5df5252dd7..215ad4649dd7 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -50,10 +50,6 @@ static inline bool is_hyp_mode_mismatched(void)
 	return __boot_cpu_mode[0] != __boot_cpu_mode[1];
 }
 
-/* The section containing the hypervisor text */
-extern char __hyp_text_start[];
-extern char __hyp_text_end[];
-
 #endif /* __ASSEMBLY__ */
 
 #endif /* ! __ASM__VIRT_H */
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index 47e5f0feaee8..f73d8c9b999b 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -24,6 +24,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_mmu.h>
 
 #include "trace.h"
 
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 86c289832272..309e3479dc2c 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -923,7 +923,7 @@ __hyp_panic_str:
 	.align	2
 
 /*
- * u64 kvm_call_hyp(void *hypfn, ...);
+ * u64 __kvm_call_hyp(void *hypfn, ...);
  *
  * This is not really a variadic function in the classic C-way and care must
  * be taken when calling this to ensure parameters are passed in registers
@@ -940,10 +940,10 @@ __hyp_panic_str:
  * used to implement __hyp_get_vectors in the same way as in
  * arch/arm64/kernel/hyp_stub.S.
  */
-ENTRY(kvm_call_hyp)
+ENTRY(__kvm_call_hyp)
 	hvc	#0
 	ret
-ENDPROC(kvm_call_hyp)
+ENDPROC(__kvm_call_hyp)
 
 .macro invalid_vector	label, target
 	.align	2
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 06/21] arm64: pgtable: implement static [pte|pmd|pud]_offset variants
  2016-01-11 13:18 ` Ard Biesheuvel
  (?)
@ 2016-01-11 13:18   ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:18 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

The page table accessors pte_offset(), pud_offset() and pmd_offset()
rely on __va translations, so they can only be used after the linear
mapping has been installed. For the early fixmap and kasan init routines,
whose page tables are allocated statically in the kernel image, these
functions will return bogus values. So implement pmd_offset_kimg() and
pud_offset_kimg(), which can be used instead before any page tables have
been allocated dynamically.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/pgtable.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 6129f6755081..7b4e16068c9f 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -449,6 +449,9 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
 
 #define pmd_page(pmd)		pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
 
+/* use ONLY for statically allocated translation tables */
+#define pte_offset_kimg(dir,addr)	((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr))))
+
 /*
  * Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
@@ -492,6 +495,9 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
 
 #define pud_page(pud)		pfn_to_page(__phys_to_pfn(pud_val(pud) & PHYS_MASK))
 
+/* use ONLY for statically allocated translation tables */
+#define pmd_offset_kimg(dir,addr)	((pmd_t *)__phys_to_kimg(pmd_offset_phys((dir), (addr))))
+
 #else
 
 #define pud_page_paddr(pud)	({ BUILD_BUG(); 0; })
@@ -502,6 +508,8 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
 #define pmd_set_fixmap_offset(pudp, addr)	((pmd_t *)pudp)
 #define pmd_clear_fixmap()
 
+#define pmd_offset_kimg(dir,addr)	((pmd_t *)dir)
+
 #endif	/* CONFIG_PGTABLE_LEVELS > 2 */
 
 #if CONFIG_PGTABLE_LEVELS > 3
@@ -540,6 +548,9 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
 
 #define pgd_page(pgd)		pfn_to_page(__phys_to_pfn(pgd_val(pgd) & PHYS_MASK))
 
+/* use ONLY for statically allocated translation tables */
+#define pud_offset_kimg(dir,addr)	((pud_t *)__phys_to_kimg(pud_offset_phys((dir), (addr))))
+
 #else
 
 #define pgd_page_paddr(pgd)	({ BUILD_BUG(); 0;})
@@ -550,6 +561,8 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
 #define pud_set_fixmap_offset(pgdp, addr)	((pud_t *)pgdp)
 #define pud_clear_fixmap()
 
+#define pud_offset_kimg(dir,addr)	((pud_t *)dir)
+
 #endif  /* CONFIG_PGTABLE_LEVELS > 3 */
 
 #define pgd_ERROR(pgd)		__pgd_error(__FILE__, __LINE__, pgd_val(pgd))
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 06/21] arm64: pgtable: implement static [pte|pmd|pud]_offset variants
@ 2016-01-11 13:18   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:18 UTC (permalink / raw)
  To: linux-arm-kernel

The page table accessors pte_offset(), pud_offset() and pmd_offset()
rely on __va translations, so they can only be used after the linear
mapping has been installed. For the early fixmap and kasan init routines,
whose page tables are allocated statically in the kernel image, these
functions will return bogus values. So implement pmd_offset_kimg() and
pud_offset_kimg(), which can be used instead before any page tables have
been allocated dynamically.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/pgtable.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 6129f6755081..7b4e16068c9f 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -449,6 +449,9 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
 
 #define pmd_page(pmd)		pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
 
+/* use ONLY for statically allocated translation tables */
+#define pte_offset_kimg(dir,addr)	((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr))))
+
 /*
  * Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
@@ -492,6 +495,9 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
 
 #define pud_page(pud)		pfn_to_page(__phys_to_pfn(pud_val(pud) & PHYS_MASK))
 
+/* use ONLY for statically allocated translation tables */
+#define pmd_offset_kimg(dir,addr)	((pmd_t *)__phys_to_kimg(pmd_offset_phys((dir), (addr))))
+
 #else
 
 #define pud_page_paddr(pud)	({ BUILD_BUG(); 0; })
@@ -502,6 +508,8 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
 #define pmd_set_fixmap_offset(pudp, addr)	((pmd_t *)pudp)
 #define pmd_clear_fixmap()
 
+#define pmd_offset_kimg(dir,addr)	((pmd_t *)dir)
+
 #endif	/* CONFIG_PGTABLE_LEVELS > 2 */
 
 #if CONFIG_PGTABLE_LEVELS > 3
@@ -540,6 +548,9 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
 
 #define pgd_page(pgd)		pfn_to_page(__phys_to_pfn(pgd_val(pgd) & PHYS_MASK))
 
+/* use ONLY for statically allocated translation tables */
+#define pud_offset_kimg(dir,addr)	((pud_t *)__phys_to_kimg(pud_offset_phys((dir), (addr))))
+
 #else
 
 #define pgd_page_paddr(pgd)	({ BUILD_BUG(); 0;})
@@ -550,6 +561,8 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
 #define pud_set_fixmap_offset(pgdp, addr)	((pud_t *)pgdp)
 #define pud_clear_fixmap()
 
+#define pud_offset_kimg(dir,addr)	((pud_t *)dir)
+
 #endif  /* CONFIG_PGTABLE_LEVELS > 3 */
 
 #define pgd_ERROR(pgd)		__pgd_error(__FILE__, __LINE__, pgd_val(pgd))
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [kernel-hardening] [PATCH v3 06/21] arm64: pgtable: implement static [pte|pmd|pud]_offset variants
@ 2016-01-11 13:18   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:18 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

The page table accessors pte_offset(), pud_offset() and pmd_offset()
rely on __va translations, so they can only be used after the linear
mapping has been installed. For the early fixmap and kasan init routines,
whose page tables are allocated statically in the kernel image, these
functions will return bogus values. So implement pmd_offset_kimg() and
pud_offset_kimg(), which can be used instead before any page tables have
been allocated dynamically.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/pgtable.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 6129f6755081..7b4e16068c9f 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -449,6 +449,9 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
 
 #define pmd_page(pmd)		pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
 
+/* use ONLY for statically allocated translation tables */
+#define pte_offset_kimg(dir,addr)	((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr))))
+
 /*
  * Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
@@ -492,6 +495,9 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
 
 #define pud_page(pud)		pfn_to_page(__phys_to_pfn(pud_val(pud) & PHYS_MASK))
 
+/* use ONLY for statically allocated translation tables */
+#define pmd_offset_kimg(dir,addr)	((pmd_t *)__phys_to_kimg(pmd_offset_phys((dir), (addr))))
+
 #else
 
 #define pud_page_paddr(pud)	({ BUILD_BUG(); 0; })
@@ -502,6 +508,8 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
 #define pmd_set_fixmap_offset(pudp, addr)	((pmd_t *)pudp)
 #define pmd_clear_fixmap()
 
+#define pmd_offset_kimg(dir,addr)	((pmd_t *)dir)
+
 #endif	/* CONFIG_PGTABLE_LEVELS > 2 */
 
 #if CONFIG_PGTABLE_LEVELS > 3
@@ -540,6 +548,9 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
 
 #define pgd_page(pgd)		pfn_to_page(__phys_to_pfn(pgd_val(pgd) & PHYS_MASK))
 
+/* use ONLY for statically allocated translation tables */
+#define pud_offset_kimg(dir,addr)	((pud_t *)__phys_to_kimg(pud_offset_phys((dir), (addr))))
+
 #else
 
 #define pgd_page_paddr(pgd)	({ BUILD_BUG(); 0;})
@@ -550,6 +561,8 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
 #define pud_set_fixmap_offset(pgdp, addr)	((pud_t *)pgdp)
 #define pud_clear_fixmap()
 
+#define pud_offset_kimg(dir,addr)	((pud_t *)dir)
+
 #endif  /* CONFIG_PGTABLE_LEVELS > 3 */
 
 #define pgd_ERROR(pgd)		__pgd_error(__FILE__, __LINE__, pgd_val(pgd))
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
  2016-01-11 13:18 ` Ard Biesheuvel
  (?)
@ 2016-01-11 13:19   ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

This moves the module area to right before the vmalloc area, and
moves the kernel image to the base of the vmalloc area. This is
an intermediate step towards implementing kASLR, where the kernel
image can be located anywhere in the vmalloc area.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/kasan.h          | 20 ++++---
 arch/arm64/include/asm/kernel-pgtable.h |  5 +-
 arch/arm64/include/asm/memory.h         | 18 ++++--
 arch/arm64/include/asm/pgtable.h        |  7 ---
 arch/arm64/kernel/setup.c               | 12 ++++
 arch/arm64/mm/dump.c                    | 12 ++--
 arch/arm64/mm/init.c                    | 20 +++----
 arch/arm64/mm/kasan_init.c              | 21 +++++--
 arch/arm64/mm/mmu.c                     | 62 ++++++++++++++------
 9 files changed, 118 insertions(+), 59 deletions(-)

diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
index de0d21211c34..2c583dbf4746 100644
--- a/arch/arm64/include/asm/kasan.h
+++ b/arch/arm64/include/asm/kasan.h
@@ -1,20 +1,16 @@
 #ifndef __ASM_KASAN_H
 #define __ASM_KASAN_H
 
-#ifndef __ASSEMBLY__
-
 #ifdef CONFIG_KASAN
 
 #include <linux/linkage.h>
-#include <asm/memory.h>
-#include <asm/pgtable-types.h>
 
 /*
  * KASAN_SHADOW_START: beginning of the kernel virtual addresses.
  * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
  */
-#define KASAN_SHADOW_START      (VA_START)
-#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))
+#define KASAN_SHADOW_START	(VA_START)
+#define KASAN_SHADOW_END	(KASAN_SHADOW_START + (_AC(1, UL) << (VA_BITS - 3)))
 
 /*
  * This value is used to map an address to the corresponding shadow
@@ -26,16 +22,22 @@
  * should satisfy the following equation:
  *      KASAN_SHADOW_OFFSET = KASAN_SHADOW_END - (1ULL << 61)
  */
-#define KASAN_SHADOW_OFFSET     (KASAN_SHADOW_END - (1ULL << (64 - 3)))
+#define KASAN_SHADOW_OFFSET	(KASAN_SHADOW_END - (_AC(1, ULL) << (64 - 3)))
+
+#ifndef __ASSEMBLY__
+#include <asm/pgtable-types.h>
 
 void kasan_init(void);
 void kasan_copy_shadow(pgd_t *pgdir);
 asmlinkage void kasan_early_init(void);
+#endif
 
 #else
+
+#ifndef __ASSEMBLY__
 static inline void kasan_init(void) { }
 static inline void kasan_copy_shadow(pgd_t *pgdir) { }
 #endif
 
-#endif
-#endif
+#endif /* CONFIG_KASAN */
+#endif /* __ASM_KASAN_H */
diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index a459714ee29e..daa8a7b9917a 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -70,8 +70,9 @@
 /*
  * Initial memory map attributes.
  */
-#define SWAPPER_PTE_FLAGS	(PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
-#define SWAPPER_PMD_FLAGS	(PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
+#define SWAPPER_PTE_FLAGS	(PTE_TYPE_PAGE | PTE_AF | PTE_SHARED | PTE_UXN)
+#define SWAPPER_PMD_FLAGS	(PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S | \
+				 PMD_SECT_UXN)
 
 #if ARM64_SWAPPER_USES_SECTION_MAPS
 #define SWAPPER_MM_MMUFLAGS	(PMD_ATTRINDX(MT_NORMAL) | SWAPPER_PMD_FLAGS)
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index bea9631b34a8..e45d3141ad98 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -51,14 +51,24 @@
 #define VA_BITS			(CONFIG_ARM64_VA_BITS)
 #define VA_START		(UL(0xffffffffffffffff) << VA_BITS)
 #define PAGE_OFFSET		(UL(0xffffffffffffffff) << (VA_BITS - 1))
-#define KIMAGE_VADDR		(PAGE_OFFSET)
-#define MODULES_END		(KIMAGE_VADDR)
-#define MODULES_VADDR		(MODULES_END - SZ_64M)
-#define PCI_IO_END		(MODULES_VADDR - SZ_2M)
+#define PCI_IO_END		(PAGE_OFFSET - SZ_2M)
 #define PCI_IO_START		(PCI_IO_END - PCI_IO_SIZE)
 #define FIXADDR_TOP		(PCI_IO_START - SZ_2M)
 #define TASK_SIZE_64		(UL(1) << VA_BITS)
 
+#ifndef CONFIG_KASAN
+#define MODULES_VADDR		(VA_START)
+#else
+#include <asm/kasan.h>
+#define MODULES_VADDR		(KASAN_SHADOW_END)
+#endif
+
+#define MODULES_VSIZE		(SZ_64M)
+#define MODULES_END		(MODULES_VADDR + MODULES_VSIZE)
+
+#define KIMAGE_VADDR		(MODULES_END)
+#define VMALLOC_START		(MODULES_END)
+
 #ifdef CONFIG_COMPAT
 #define TASK_SIZE_32		UL(0x100000000)
 #define TASK_SIZE		(test_thread_flag(TIF_32BIT) ? \
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 7b4e16068c9f..a910a44d7ab3 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -42,13 +42,6 @@
  */
 #define VMEMMAP_SIZE		ALIGN((1UL << (VA_BITS - PAGE_SHIFT)) * sizeof(struct page), PUD_SIZE)
 
-#ifndef CONFIG_KASAN
-#define VMALLOC_START		(VA_START)
-#else
-#include <asm/kasan.h>
-#define VMALLOC_START		(KASAN_SHADOW_END + SZ_64K)
-#endif
-
 #define VMALLOC_END		(PAGE_OFFSET - PUD_SIZE - VMEMMAP_SIZE - SZ_64K)
 
 #define vmemmap			((struct page *)(VMALLOC_END + SZ_64K))
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index cfed56f0ad26..c67ba4453ec6 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -53,6 +53,7 @@
 #include <asm/cpufeature.h>
 #include <asm/cpu_ops.h>
 #include <asm/kasan.h>
+#include <asm/kernel-pgtable.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
 #include <asm/smp_plat.h>
@@ -291,6 +292,17 @@ u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
 
 void __init setup_arch(char **cmdline_p)
 {
+	static struct vm_struct vmlinux_vm;
+
+	vmlinux_vm.addr		= (void *)KIMAGE_VADDR;
+	vmlinux_vm.size		= round_up((u64)_end - KIMAGE_VADDR,
+					   SWAPPER_BLOCK_SIZE);
+	vmlinux_vm.phys_addr	= __pa(KIMAGE_VADDR);
+	vmlinux_vm.flags	= VM_MAP;
+	vmlinux_vm.caller	= setup_arch;
+
+	vm_area_add_early(&vmlinux_vm);
+
 	pr_info("Boot CPU: AArch64 Processor [%08x]\n", read_cpuid_id());
 
 	sprintf(init_utsname()->machine, ELF_PLATFORM);
diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
index 5a22a119a74c..e83ffb00560c 100644
--- a/arch/arm64/mm/dump.c
+++ b/arch/arm64/mm/dump.c
@@ -35,7 +35,9 @@ struct addr_marker {
 };
 
 enum address_markers_idx {
-	VMALLOC_START_NR = 0,
+	MODULES_START_NR = 0,
+	MODULES_END_NR,
+	VMALLOC_START_NR,
 	VMALLOC_END_NR,
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 	VMEMMAP_START_NR,
@@ -45,12 +47,12 @@ enum address_markers_idx {
 	FIXADDR_END_NR,
 	PCI_START_NR,
 	PCI_END_NR,
-	MODULES_START_NR,
-	MODUELS_END_NR,
 	KERNEL_SPACE_NR,
 };
 
 static struct addr_marker address_markers[] = {
+	{ MODULES_VADDR,	"Modules start" },
+	{ MODULES_END,		"Modules end" },
 	{ VMALLOC_START,	"vmalloc() Area" },
 	{ VMALLOC_END,		"vmalloc() End" },
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
@@ -61,9 +63,7 @@ static struct addr_marker address_markers[] = {
 	{ FIXADDR_TOP,		"Fixmap end" },
 	{ PCI_IO_START,		"PCI I/O start" },
 	{ PCI_IO_END,		"PCI I/O end" },
-	{ MODULES_VADDR,	"Modules start" },
-	{ MODULES_END,		"Modules end" },
-	{ PAGE_OFFSET,		"Kernel Mapping" },
+	{ PAGE_OFFSET,		"Linear Mapping" },
 	{ -1,			NULL },
 };
 
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index f3b061e67bfe..baa923bda651 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -302,22 +302,26 @@ void __init mem_init(void)
 #ifdef CONFIG_KASAN
 		  "    kasan   : 0x%16lx - 0x%16lx   (%6ld GB)\n"
 #endif
+		  "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
 		  "    vmalloc : 0x%16lx - 0x%16lx   (%6ld GB)\n"
+		  "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
+		  "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
+		  "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n"
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 		  "    vmemmap : 0x%16lx - 0x%16lx   (%6ld GB maximum)\n"
 		  "              0x%16lx - 0x%16lx   (%6ld MB actual)\n"
 #endif
 		  "    fixed   : 0x%16lx - 0x%16lx   (%6ld KB)\n"
 		  "    PCI I/O : 0x%16lx - 0x%16lx   (%6ld MB)\n"
-		  "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
-		  "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n"
-		  "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
-		  "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
-		  "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n",
+		  "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n",
 #ifdef CONFIG_KASAN
 		  MLG(KASAN_SHADOW_START, KASAN_SHADOW_END),
 #endif
+		  MLM(MODULES_VADDR, MODULES_END),
 		  MLG(VMALLOC_START, VMALLOC_END),
+		  MLK_ROUNDUP(__init_begin, __init_end),
+		  MLK_ROUNDUP(_text, _etext),
+		  MLK_ROUNDUP(_sdata, _edata),
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 		  MLG((unsigned long)vmemmap,
 		      (unsigned long)vmemmap + VMEMMAP_SIZE),
@@ -326,11 +330,7 @@ void __init mem_init(void)
 #endif
 		  MLK(FIXADDR_START, FIXADDR_TOP),
 		  MLM(PCI_IO_START, PCI_IO_END),
-		  MLM(MODULES_VADDR, MODULES_END),
-		  MLM(PAGE_OFFSET, (unsigned long)high_memory),
-		  MLK_ROUNDUP(__init_begin, __init_end),
-		  MLK_ROUNDUP(_text, _etext),
-		  MLK_ROUNDUP(_sdata, _edata));
+		  MLM(PAGE_OFFSET, (unsigned long)high_memory));
 
 #undef MLK
 #undef MLM
diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index 0ca411fc5ea3..acdd1ac166ec 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -17,9 +17,11 @@
 #include <linux/start_kernel.h>
 
 #include <asm/mmu_context.h>
+#include <asm/kernel-pgtable.h>
 #include <asm/page.h>
 #include <asm/pgalloc.h>
 #include <asm/pgtable.h>
+#include <asm/sections.h>
 #include <asm/tlbflush.h>
 
 static pgd_t tmp_pg_dir[PTRS_PER_PGD] __initdata __aligned(PGD_SIZE);
@@ -33,7 +35,7 @@ static void __init kasan_early_pte_populate(pmd_t *pmd, unsigned long addr,
 	if (pmd_none(*pmd))
 		pmd_populate_kernel(&init_mm, pmd, kasan_zero_pte);
 
-	pte = pte_offset_kernel(pmd, addr);
+	pte = pte_offset_kimg(pmd, addr);
 	do {
 		next = addr + PAGE_SIZE;
 		set_pte(pte, pfn_pte(virt_to_pfn(kasan_zero_page),
@@ -51,7 +53,7 @@ static void __init kasan_early_pmd_populate(pud_t *pud,
 	if (pud_none(*pud))
 		pud_populate(&init_mm, pud, kasan_zero_pmd);
 
-	pmd = pmd_offset(pud, addr);
+	pmd = pmd_offset_kimg(pud, addr);
 	do {
 		next = pmd_addr_end(addr, end);
 		kasan_early_pte_populate(pmd, addr, next);
@@ -68,7 +70,7 @@ static void __init kasan_early_pud_populate(pgd_t *pgd,
 	if (pgd_none(*pgd))
 		pgd_populate(&init_mm, pgd, kasan_zero_pud);
 
-	pud = pud_offset(pgd, addr);
+	pud = pud_offset_kimg(pgd, addr);
 	do {
 		next = pud_addr_end(addr, end);
 		kasan_early_pmd_populate(pud, addr, next);
@@ -126,8 +128,14 @@ static void __init clear_pgds(unsigned long start,
 
 void __init kasan_init(void)
 {
+	u64 kimg_shadow_start, kimg_shadow_end;
 	struct memblock_region *reg;
 
+	kimg_shadow_start = round_down((u64)kasan_mem_to_shadow(_text),
+				       SWAPPER_BLOCK_SIZE);
+	kimg_shadow_end = round_up((u64)kasan_mem_to_shadow(_end),
+				   SWAPPER_BLOCK_SIZE);
+
 	/*
 	 * We are going to perform proper setup of shadow memory.
 	 * At first we should unmap early shadow (clear_pgds() call bellow).
@@ -141,8 +149,13 @@ void __init kasan_init(void)
 
 	clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
 
+	vmemmap_populate(kimg_shadow_start, kimg_shadow_end,
+			 pfn_to_nid(virt_to_pfn(kimg_shadow_start)));
+
 	kasan_populate_zero_shadow((void *)KASAN_SHADOW_START,
-			kasan_mem_to_shadow((void *)MODULES_VADDR));
+				   (void *)kimg_shadow_start);
+	kasan_populate_zero_shadow((void *)kimg_shadow_end,
+				   kasan_mem_to_shadow((void *)PAGE_OFFSET));
 
 	for_each_memblock(memory, reg) {
 		void *start = (void *)__phys_to_virt(reg->base);
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 75b5f0dc3bdc..0b28f1469f9b 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -53,6 +53,10 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
 unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
 EXPORT_SYMBOL(empty_zero_page);
 
+static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
+static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
+static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
+
 pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 			      unsigned long size, pgprot_t vma_prot)
 {
@@ -349,14 +353,14 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 {
 
 	unsigned long kernel_start = __pa(_stext);
-	unsigned long kernel_end = __pa(_end);
+	unsigned long kernel_end = __pa(_etext);
 
 	/*
-	 * The kernel itself is mapped at page granularity. Map all other
-	 * memory, making sure we don't overwrite the existing kernel mappings.
+	 * Take care not to create a writable alias for the
+	 * read-only text and rodata sections of the kernel image.
 	 */
 
-	/* No overlap with the kernel. */
+	/* No overlap with the kernel text */
 	if (end < kernel_start || start >= kernel_end) {
 		__create_pgd_mapping(pgd, start, __phys_to_virt(start),
 				     end - start, PAGE_KERNEL,
@@ -365,7 +369,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 	}
 
 	/*
-	 * This block overlaps the kernel mapping. Map the portion(s) which
+	 * This block overlaps the kernel text mapping. Map the portion(s) which
 	 * don't overlap.
 	 */
 	if (start < kernel_start)
@@ -438,12 +442,29 @@ static void __init map_kernel(pgd_t *pgd)
 	map_kernel_chunk(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC);
 	map_kernel_chunk(pgd, _data, _end, PAGE_KERNEL);
 
-	/*
-	 * The fixmap falls in a separate pgd to the kernel, and doesn't live
-	 * in the carveout for the swapper_pg_dir. We can simply re-use the
-	 * existing dir for the fixmap.
-	 */
-	set_pgd(pgd_offset_raw(pgd, FIXADDR_START), *pgd_offset_k(FIXADDR_START));
+	if (pgd_index(FIXADDR_START) != pgd_index((u64)_end)) {
+		/*
+		 * The fixmap falls in a separate pgd to the kernel, and doesn't
+		 * live in the carveout for the swapper_pg_dir. We can simply
+		 * re-use the existing dir for the fixmap.
+		 */
+		set_pgd(pgd_offset_raw(pgd, FIXADDR_START),
+			*pgd_offset_k(FIXADDR_START));
+	} else if (CONFIG_PGTABLE_LEVELS > 3) {
+		/*
+		 * The fixmap shares its top level pgd entry with the kernel
+		 * mapping. This can really only occur when we are running
+		 * with 16k/4 levels, so we can simply reuse the pud level
+		 * entry instead.
+		 */
+		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
+
+		set_pud(pud_set_fixmap_offset(pgd, FIXADDR_START),
+			__pud(__pa(bm_pmd) | PUD_TYPE_TABLE));
+		pud_clear_fixmap();
+	} else {
+		BUG();
+	}
 
 	kasan_copy_shadow(pgd);
 }
@@ -569,10 +590,6 @@ void vmemmap_free(unsigned long start, unsigned long end)
 }
 #endif	/* CONFIG_SPARSEMEM_VMEMMAP */
 
-static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
-static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
-static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
-
 static inline pud_t * fixmap_pud(unsigned long addr)
 {
 	return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
@@ -598,8 +615,19 @@ void __init early_fixmap_init(void)
 	unsigned long addr = FIXADDR_START;
 
 	pgd = pgd_offset_k(addr);
-	pgd_populate(&init_mm, pgd, bm_pud);
-	pud = fixmap_pud(addr);
+	if (CONFIG_PGTABLE_LEVELS > 3 && !pgd_none(*pgd)) {
+		/*
+		 * We only end up here if the kernel mapping and the fixmap
+		 * share the top level pgd entry, which should only happen on
+		 * 16k/4 levels configurations.
+		 */
+		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
+		pud = pud_offset_kimg(pgd, addr);
+		memblock_free(__pa(bm_pud), sizeof(bm_pud));
+	} else {
+		pgd_populate(&init_mm, pgd, bm_pud);
+		pud = fixmap_pud(addr);
+	}
 	pud_populate(&init_mm, pud, bm_pmd);
 	pmd = fixmap_pmd(addr);
 	pmd_populate_kernel(&init_mm, pmd, bm_pte);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel

This moves the module area to right before the vmalloc area, and
moves the kernel image to the base of the vmalloc area. This is
an intermediate step towards implementing kASLR, where the kernel
image can be located anywhere in the vmalloc area.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/kasan.h          | 20 ++++---
 arch/arm64/include/asm/kernel-pgtable.h |  5 +-
 arch/arm64/include/asm/memory.h         | 18 ++++--
 arch/arm64/include/asm/pgtable.h        |  7 ---
 arch/arm64/kernel/setup.c               | 12 ++++
 arch/arm64/mm/dump.c                    | 12 ++--
 arch/arm64/mm/init.c                    | 20 +++----
 arch/arm64/mm/kasan_init.c              | 21 +++++--
 arch/arm64/mm/mmu.c                     | 62 ++++++++++++++------
 9 files changed, 118 insertions(+), 59 deletions(-)

diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
index de0d21211c34..2c583dbf4746 100644
--- a/arch/arm64/include/asm/kasan.h
+++ b/arch/arm64/include/asm/kasan.h
@@ -1,20 +1,16 @@
 #ifndef __ASM_KASAN_H
 #define __ASM_KASAN_H
 
-#ifndef __ASSEMBLY__
-
 #ifdef CONFIG_KASAN
 
 #include <linux/linkage.h>
-#include <asm/memory.h>
-#include <asm/pgtable-types.h>
 
 /*
  * KASAN_SHADOW_START: beginning of the kernel virtual addresses.
  * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
  */
-#define KASAN_SHADOW_START      (VA_START)
-#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))
+#define KASAN_SHADOW_START	(VA_START)
+#define KASAN_SHADOW_END	(KASAN_SHADOW_START + (_AC(1, UL) << (VA_BITS - 3)))
 
 /*
  * This value is used to map an address to the corresponding shadow
@@ -26,16 +22,22 @@
  * should satisfy the following equation:
  *      KASAN_SHADOW_OFFSET = KASAN_SHADOW_END - (1ULL << 61)
  */
-#define KASAN_SHADOW_OFFSET     (KASAN_SHADOW_END - (1ULL << (64 - 3)))
+#define KASAN_SHADOW_OFFSET	(KASAN_SHADOW_END - (_AC(1, ULL) << (64 - 3)))
+
+#ifndef __ASSEMBLY__
+#include <asm/pgtable-types.h>
 
 void kasan_init(void);
 void kasan_copy_shadow(pgd_t *pgdir);
 asmlinkage void kasan_early_init(void);
+#endif
 
 #else
+
+#ifndef __ASSEMBLY__
 static inline void kasan_init(void) { }
 static inline void kasan_copy_shadow(pgd_t *pgdir) { }
 #endif
 
-#endif
-#endif
+#endif /* CONFIG_KASAN */
+#endif /* __ASM_KASAN_H */
diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index a459714ee29e..daa8a7b9917a 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -70,8 +70,9 @@
 /*
  * Initial memory map attributes.
  */
-#define SWAPPER_PTE_FLAGS	(PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
-#define SWAPPER_PMD_FLAGS	(PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
+#define SWAPPER_PTE_FLAGS	(PTE_TYPE_PAGE | PTE_AF | PTE_SHARED | PTE_UXN)
+#define SWAPPER_PMD_FLAGS	(PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S | \
+				 PMD_SECT_UXN)
 
 #if ARM64_SWAPPER_USES_SECTION_MAPS
 #define SWAPPER_MM_MMUFLAGS	(PMD_ATTRINDX(MT_NORMAL) | SWAPPER_PMD_FLAGS)
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index bea9631b34a8..e45d3141ad98 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -51,14 +51,24 @@
 #define VA_BITS			(CONFIG_ARM64_VA_BITS)
 #define VA_START		(UL(0xffffffffffffffff) << VA_BITS)
 #define PAGE_OFFSET		(UL(0xffffffffffffffff) << (VA_BITS - 1))
-#define KIMAGE_VADDR		(PAGE_OFFSET)
-#define MODULES_END		(KIMAGE_VADDR)
-#define MODULES_VADDR		(MODULES_END - SZ_64M)
-#define PCI_IO_END		(MODULES_VADDR - SZ_2M)
+#define PCI_IO_END		(PAGE_OFFSET - SZ_2M)
 #define PCI_IO_START		(PCI_IO_END - PCI_IO_SIZE)
 #define FIXADDR_TOP		(PCI_IO_START - SZ_2M)
 #define TASK_SIZE_64		(UL(1) << VA_BITS)
 
+#ifndef CONFIG_KASAN
+#define MODULES_VADDR		(VA_START)
+#else
+#include <asm/kasan.h>
+#define MODULES_VADDR		(KASAN_SHADOW_END)
+#endif
+
+#define MODULES_VSIZE		(SZ_64M)
+#define MODULES_END		(MODULES_VADDR + MODULES_VSIZE)
+
+#define KIMAGE_VADDR		(MODULES_END)
+#define VMALLOC_START		(MODULES_END)
+
 #ifdef CONFIG_COMPAT
 #define TASK_SIZE_32		UL(0x100000000)
 #define TASK_SIZE		(test_thread_flag(TIF_32BIT) ? \
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 7b4e16068c9f..a910a44d7ab3 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -42,13 +42,6 @@
  */
 #define VMEMMAP_SIZE		ALIGN((1UL << (VA_BITS - PAGE_SHIFT)) * sizeof(struct page), PUD_SIZE)
 
-#ifndef CONFIG_KASAN
-#define VMALLOC_START		(VA_START)
-#else
-#include <asm/kasan.h>
-#define VMALLOC_START		(KASAN_SHADOW_END + SZ_64K)
-#endif
-
 #define VMALLOC_END		(PAGE_OFFSET - PUD_SIZE - VMEMMAP_SIZE - SZ_64K)
 
 #define vmemmap			((struct page *)(VMALLOC_END + SZ_64K))
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index cfed56f0ad26..c67ba4453ec6 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -53,6 +53,7 @@
 #include <asm/cpufeature.h>
 #include <asm/cpu_ops.h>
 #include <asm/kasan.h>
+#include <asm/kernel-pgtable.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
 #include <asm/smp_plat.h>
@@ -291,6 +292,17 @@ u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
 
 void __init setup_arch(char **cmdline_p)
 {
+	static struct vm_struct vmlinux_vm;
+
+	vmlinux_vm.addr		= (void *)KIMAGE_VADDR;
+	vmlinux_vm.size		= round_up((u64)_end - KIMAGE_VADDR,
+					   SWAPPER_BLOCK_SIZE);
+	vmlinux_vm.phys_addr	= __pa(KIMAGE_VADDR);
+	vmlinux_vm.flags	= VM_MAP;
+	vmlinux_vm.caller	= setup_arch;
+
+	vm_area_add_early(&vmlinux_vm);
+
 	pr_info("Boot CPU: AArch64 Processor [%08x]\n", read_cpuid_id());
 
 	sprintf(init_utsname()->machine, ELF_PLATFORM);
diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
index 5a22a119a74c..e83ffb00560c 100644
--- a/arch/arm64/mm/dump.c
+++ b/arch/arm64/mm/dump.c
@@ -35,7 +35,9 @@ struct addr_marker {
 };
 
 enum address_markers_idx {
-	VMALLOC_START_NR = 0,
+	MODULES_START_NR = 0,
+	MODULES_END_NR,
+	VMALLOC_START_NR,
 	VMALLOC_END_NR,
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 	VMEMMAP_START_NR,
@@ -45,12 +47,12 @@ enum address_markers_idx {
 	FIXADDR_END_NR,
 	PCI_START_NR,
 	PCI_END_NR,
-	MODULES_START_NR,
-	MODUELS_END_NR,
 	KERNEL_SPACE_NR,
 };
 
 static struct addr_marker address_markers[] = {
+	{ MODULES_VADDR,	"Modules start" },
+	{ MODULES_END,		"Modules end" },
 	{ VMALLOC_START,	"vmalloc() Area" },
 	{ VMALLOC_END,		"vmalloc() End" },
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
@@ -61,9 +63,7 @@ static struct addr_marker address_markers[] = {
 	{ FIXADDR_TOP,		"Fixmap end" },
 	{ PCI_IO_START,		"PCI I/O start" },
 	{ PCI_IO_END,		"PCI I/O end" },
-	{ MODULES_VADDR,	"Modules start" },
-	{ MODULES_END,		"Modules end" },
-	{ PAGE_OFFSET,		"Kernel Mapping" },
+	{ PAGE_OFFSET,		"Linear Mapping" },
 	{ -1,			NULL },
 };
 
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index f3b061e67bfe..baa923bda651 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -302,22 +302,26 @@ void __init mem_init(void)
 #ifdef CONFIG_KASAN
 		  "    kasan   : 0x%16lx - 0x%16lx   (%6ld GB)\n"
 #endif
+		  "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
 		  "    vmalloc : 0x%16lx - 0x%16lx   (%6ld GB)\n"
+		  "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
+		  "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
+		  "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n"
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 		  "    vmemmap : 0x%16lx - 0x%16lx   (%6ld GB maximum)\n"
 		  "              0x%16lx - 0x%16lx   (%6ld MB actual)\n"
 #endif
 		  "    fixed   : 0x%16lx - 0x%16lx   (%6ld KB)\n"
 		  "    PCI I/O : 0x%16lx - 0x%16lx   (%6ld MB)\n"
-		  "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
-		  "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n"
-		  "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
-		  "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
-		  "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n",
+		  "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n",
 #ifdef CONFIG_KASAN
 		  MLG(KASAN_SHADOW_START, KASAN_SHADOW_END),
 #endif
+		  MLM(MODULES_VADDR, MODULES_END),
 		  MLG(VMALLOC_START, VMALLOC_END),
+		  MLK_ROUNDUP(__init_begin, __init_end),
+		  MLK_ROUNDUP(_text, _etext),
+		  MLK_ROUNDUP(_sdata, _edata),
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 		  MLG((unsigned long)vmemmap,
 		      (unsigned long)vmemmap + VMEMMAP_SIZE),
@@ -326,11 +330,7 @@ void __init mem_init(void)
 #endif
 		  MLK(FIXADDR_START, FIXADDR_TOP),
 		  MLM(PCI_IO_START, PCI_IO_END),
-		  MLM(MODULES_VADDR, MODULES_END),
-		  MLM(PAGE_OFFSET, (unsigned long)high_memory),
-		  MLK_ROUNDUP(__init_begin, __init_end),
-		  MLK_ROUNDUP(_text, _etext),
-		  MLK_ROUNDUP(_sdata, _edata));
+		  MLM(PAGE_OFFSET, (unsigned long)high_memory));
 
 #undef MLK
 #undef MLM
diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index 0ca411fc5ea3..acdd1ac166ec 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -17,9 +17,11 @@
 #include <linux/start_kernel.h>
 
 #include <asm/mmu_context.h>
+#include <asm/kernel-pgtable.h>
 #include <asm/page.h>
 #include <asm/pgalloc.h>
 #include <asm/pgtable.h>
+#include <asm/sections.h>
 #include <asm/tlbflush.h>
 
 static pgd_t tmp_pg_dir[PTRS_PER_PGD] __initdata __aligned(PGD_SIZE);
@@ -33,7 +35,7 @@ static void __init kasan_early_pte_populate(pmd_t *pmd, unsigned long addr,
 	if (pmd_none(*pmd))
 		pmd_populate_kernel(&init_mm, pmd, kasan_zero_pte);
 
-	pte = pte_offset_kernel(pmd, addr);
+	pte = pte_offset_kimg(pmd, addr);
 	do {
 		next = addr + PAGE_SIZE;
 		set_pte(pte, pfn_pte(virt_to_pfn(kasan_zero_page),
@@ -51,7 +53,7 @@ static void __init kasan_early_pmd_populate(pud_t *pud,
 	if (pud_none(*pud))
 		pud_populate(&init_mm, pud, kasan_zero_pmd);
 
-	pmd = pmd_offset(pud, addr);
+	pmd = pmd_offset_kimg(pud, addr);
 	do {
 		next = pmd_addr_end(addr, end);
 		kasan_early_pte_populate(pmd, addr, next);
@@ -68,7 +70,7 @@ static void __init kasan_early_pud_populate(pgd_t *pgd,
 	if (pgd_none(*pgd))
 		pgd_populate(&init_mm, pgd, kasan_zero_pud);
 
-	pud = pud_offset(pgd, addr);
+	pud = pud_offset_kimg(pgd, addr);
 	do {
 		next = pud_addr_end(addr, end);
 		kasan_early_pmd_populate(pud, addr, next);
@@ -126,8 +128,14 @@ static void __init clear_pgds(unsigned long start,
 
 void __init kasan_init(void)
 {
+	u64 kimg_shadow_start, kimg_shadow_end;
 	struct memblock_region *reg;
 
+	kimg_shadow_start = round_down((u64)kasan_mem_to_shadow(_text),
+				       SWAPPER_BLOCK_SIZE);
+	kimg_shadow_end = round_up((u64)kasan_mem_to_shadow(_end),
+				   SWAPPER_BLOCK_SIZE);
+
 	/*
 	 * We are going to perform proper setup of shadow memory.
 	 * At first we should unmap early shadow (clear_pgds() call bellow).
@@ -141,8 +149,13 @@ void __init kasan_init(void)
 
 	clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
 
+	vmemmap_populate(kimg_shadow_start, kimg_shadow_end,
+			 pfn_to_nid(virt_to_pfn(kimg_shadow_start)));
+
 	kasan_populate_zero_shadow((void *)KASAN_SHADOW_START,
-			kasan_mem_to_shadow((void *)MODULES_VADDR));
+				   (void *)kimg_shadow_start);
+	kasan_populate_zero_shadow((void *)kimg_shadow_end,
+				   kasan_mem_to_shadow((void *)PAGE_OFFSET));
 
 	for_each_memblock(memory, reg) {
 		void *start = (void *)__phys_to_virt(reg->base);
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 75b5f0dc3bdc..0b28f1469f9b 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -53,6 +53,10 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
 unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
 EXPORT_SYMBOL(empty_zero_page);
 
+static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
+static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
+static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
+
 pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 			      unsigned long size, pgprot_t vma_prot)
 {
@@ -349,14 +353,14 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 {
 
 	unsigned long kernel_start = __pa(_stext);
-	unsigned long kernel_end = __pa(_end);
+	unsigned long kernel_end = __pa(_etext);
 
 	/*
-	 * The kernel itself is mapped at page granularity. Map all other
-	 * memory, making sure we don't overwrite the existing kernel mappings.
+	 * Take care not to create a writable alias for the
+	 * read-only text and rodata sections of the kernel image.
 	 */
 
-	/* No overlap with the kernel. */
+	/* No overlap with the kernel text */
 	if (end < kernel_start || start >= kernel_end) {
 		__create_pgd_mapping(pgd, start, __phys_to_virt(start),
 				     end - start, PAGE_KERNEL,
@@ -365,7 +369,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 	}
 
 	/*
-	 * This block overlaps the kernel mapping. Map the portion(s) which
+	 * This block overlaps the kernel text mapping. Map the portion(s) which
 	 * don't overlap.
 	 */
 	if (start < kernel_start)
@@ -438,12 +442,29 @@ static void __init map_kernel(pgd_t *pgd)
 	map_kernel_chunk(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC);
 	map_kernel_chunk(pgd, _data, _end, PAGE_KERNEL);
 
-	/*
-	 * The fixmap falls in a separate pgd to the kernel, and doesn't live
-	 * in the carveout for the swapper_pg_dir. We can simply re-use the
-	 * existing dir for the fixmap.
-	 */
-	set_pgd(pgd_offset_raw(pgd, FIXADDR_START), *pgd_offset_k(FIXADDR_START));
+	if (pgd_index(FIXADDR_START) != pgd_index((u64)_end)) {
+		/*
+		 * The fixmap falls in a separate pgd to the kernel, and doesn't
+		 * live in the carveout for the swapper_pg_dir. We can simply
+		 * re-use the existing dir for the fixmap.
+		 */
+		set_pgd(pgd_offset_raw(pgd, FIXADDR_START),
+			*pgd_offset_k(FIXADDR_START));
+	} else if (CONFIG_PGTABLE_LEVELS > 3) {
+		/*
+		 * The fixmap shares its top level pgd entry with the kernel
+		 * mapping. This can really only occur when we are running
+		 * with 16k/4 levels, so we can simply reuse the pud level
+		 * entry instead.
+		 */
+		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
+
+		set_pud(pud_set_fixmap_offset(pgd, FIXADDR_START),
+			__pud(__pa(bm_pmd) | PUD_TYPE_TABLE));
+		pud_clear_fixmap();
+	} else {
+		BUG();
+	}
 
 	kasan_copy_shadow(pgd);
 }
@@ -569,10 +590,6 @@ void vmemmap_free(unsigned long start, unsigned long end)
 }
 #endif	/* CONFIG_SPARSEMEM_VMEMMAP */
 
-static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
-static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
-static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
-
 static inline pud_t * fixmap_pud(unsigned long addr)
 {
 	return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
@@ -598,8 +615,19 @@ void __init early_fixmap_init(void)
 	unsigned long addr = FIXADDR_START;
 
 	pgd = pgd_offset_k(addr);
-	pgd_populate(&init_mm, pgd, bm_pud);
-	pud = fixmap_pud(addr);
+	if (CONFIG_PGTABLE_LEVELS > 3 && !pgd_none(*pgd)) {
+		/*
+		 * We only end up here if the kernel mapping and the fixmap
+		 * share the top level pgd entry, which should only happen on
+		 * 16k/4 levels configurations.
+		 */
+		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
+		pud = pud_offset_kimg(pgd, addr);
+		memblock_free(__pa(bm_pud), sizeof(bm_pud));
+	} else {
+		pgd_populate(&init_mm, pgd, bm_pud);
+		pud = fixmap_pud(addr);
+	}
 	pud_populate(&init_mm, pud, bm_pmd);
 	pmd = fixmap_pmd(addr);
 	pmd_populate_kernel(&init_mm, pmd, bm_pte);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [kernel-hardening] [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

This moves the module area to right before the vmalloc area, and
moves the kernel image to the base of the vmalloc area. This is
an intermediate step towards implementing kASLR, where the kernel
image can be located anywhere in the vmalloc area.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/kasan.h          | 20 ++++---
 arch/arm64/include/asm/kernel-pgtable.h |  5 +-
 arch/arm64/include/asm/memory.h         | 18 ++++--
 arch/arm64/include/asm/pgtable.h        |  7 ---
 arch/arm64/kernel/setup.c               | 12 ++++
 arch/arm64/mm/dump.c                    | 12 ++--
 arch/arm64/mm/init.c                    | 20 +++----
 arch/arm64/mm/kasan_init.c              | 21 +++++--
 arch/arm64/mm/mmu.c                     | 62 ++++++++++++++------
 9 files changed, 118 insertions(+), 59 deletions(-)

diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
index de0d21211c34..2c583dbf4746 100644
--- a/arch/arm64/include/asm/kasan.h
+++ b/arch/arm64/include/asm/kasan.h
@@ -1,20 +1,16 @@
 #ifndef __ASM_KASAN_H
 #define __ASM_KASAN_H
 
-#ifndef __ASSEMBLY__
-
 #ifdef CONFIG_KASAN
 
 #include <linux/linkage.h>
-#include <asm/memory.h>
-#include <asm/pgtable-types.h>
 
 /*
  * KASAN_SHADOW_START: beginning of the kernel virtual addresses.
  * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
  */
-#define KASAN_SHADOW_START      (VA_START)
-#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))
+#define KASAN_SHADOW_START	(VA_START)
+#define KASAN_SHADOW_END	(KASAN_SHADOW_START + (_AC(1, UL) << (VA_BITS - 3)))
 
 /*
  * This value is used to map an address to the corresponding shadow
@@ -26,16 +22,22 @@
  * should satisfy the following equation:
  *      KASAN_SHADOW_OFFSET = KASAN_SHADOW_END - (1ULL << 61)
  */
-#define KASAN_SHADOW_OFFSET     (KASAN_SHADOW_END - (1ULL << (64 - 3)))
+#define KASAN_SHADOW_OFFSET	(KASAN_SHADOW_END - (_AC(1, ULL) << (64 - 3)))
+
+#ifndef __ASSEMBLY__
+#include <asm/pgtable-types.h>
 
 void kasan_init(void);
 void kasan_copy_shadow(pgd_t *pgdir);
 asmlinkage void kasan_early_init(void);
+#endif
 
 #else
+
+#ifndef __ASSEMBLY__
 static inline void kasan_init(void) { }
 static inline void kasan_copy_shadow(pgd_t *pgdir) { }
 #endif
 
-#endif
-#endif
+#endif /* CONFIG_KASAN */
+#endif /* __ASM_KASAN_H */
diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index a459714ee29e..daa8a7b9917a 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -70,8 +70,9 @@
 /*
  * Initial memory map attributes.
  */
-#define SWAPPER_PTE_FLAGS	(PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
-#define SWAPPER_PMD_FLAGS	(PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
+#define SWAPPER_PTE_FLAGS	(PTE_TYPE_PAGE | PTE_AF | PTE_SHARED | PTE_UXN)
+#define SWAPPER_PMD_FLAGS	(PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S | \
+				 PMD_SECT_UXN)
 
 #if ARM64_SWAPPER_USES_SECTION_MAPS
 #define SWAPPER_MM_MMUFLAGS	(PMD_ATTRINDX(MT_NORMAL) | SWAPPER_PMD_FLAGS)
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index bea9631b34a8..e45d3141ad98 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -51,14 +51,24 @@
 #define VA_BITS			(CONFIG_ARM64_VA_BITS)
 #define VA_START		(UL(0xffffffffffffffff) << VA_BITS)
 #define PAGE_OFFSET		(UL(0xffffffffffffffff) << (VA_BITS - 1))
-#define KIMAGE_VADDR		(PAGE_OFFSET)
-#define MODULES_END		(KIMAGE_VADDR)
-#define MODULES_VADDR		(MODULES_END - SZ_64M)
-#define PCI_IO_END		(MODULES_VADDR - SZ_2M)
+#define PCI_IO_END		(PAGE_OFFSET - SZ_2M)
 #define PCI_IO_START		(PCI_IO_END - PCI_IO_SIZE)
 #define FIXADDR_TOP		(PCI_IO_START - SZ_2M)
 #define TASK_SIZE_64		(UL(1) << VA_BITS)
 
+#ifndef CONFIG_KASAN
+#define MODULES_VADDR		(VA_START)
+#else
+#include <asm/kasan.h>
+#define MODULES_VADDR		(KASAN_SHADOW_END)
+#endif
+
+#define MODULES_VSIZE		(SZ_64M)
+#define MODULES_END		(MODULES_VADDR + MODULES_VSIZE)
+
+#define KIMAGE_VADDR		(MODULES_END)
+#define VMALLOC_START		(MODULES_END)
+
 #ifdef CONFIG_COMPAT
 #define TASK_SIZE_32		UL(0x100000000)
 #define TASK_SIZE		(test_thread_flag(TIF_32BIT) ? \
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 7b4e16068c9f..a910a44d7ab3 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -42,13 +42,6 @@
  */
 #define VMEMMAP_SIZE		ALIGN((1UL << (VA_BITS - PAGE_SHIFT)) * sizeof(struct page), PUD_SIZE)
 
-#ifndef CONFIG_KASAN
-#define VMALLOC_START		(VA_START)
-#else
-#include <asm/kasan.h>
-#define VMALLOC_START		(KASAN_SHADOW_END + SZ_64K)
-#endif
-
 #define VMALLOC_END		(PAGE_OFFSET - PUD_SIZE - VMEMMAP_SIZE - SZ_64K)
 
 #define vmemmap			((struct page *)(VMALLOC_END + SZ_64K))
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index cfed56f0ad26..c67ba4453ec6 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -53,6 +53,7 @@
 #include <asm/cpufeature.h>
 #include <asm/cpu_ops.h>
 #include <asm/kasan.h>
+#include <asm/kernel-pgtable.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
 #include <asm/smp_plat.h>
@@ -291,6 +292,17 @@ u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
 
 void __init setup_arch(char **cmdline_p)
 {
+	static struct vm_struct vmlinux_vm;
+
+	vmlinux_vm.addr		= (void *)KIMAGE_VADDR;
+	vmlinux_vm.size		= round_up((u64)_end - KIMAGE_VADDR,
+					   SWAPPER_BLOCK_SIZE);
+	vmlinux_vm.phys_addr	= __pa(KIMAGE_VADDR);
+	vmlinux_vm.flags	= VM_MAP;
+	vmlinux_vm.caller	= setup_arch;
+
+	vm_area_add_early(&vmlinux_vm);
+
 	pr_info("Boot CPU: AArch64 Processor [%08x]\n", read_cpuid_id());
 
 	sprintf(init_utsname()->machine, ELF_PLATFORM);
diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
index 5a22a119a74c..e83ffb00560c 100644
--- a/arch/arm64/mm/dump.c
+++ b/arch/arm64/mm/dump.c
@@ -35,7 +35,9 @@ struct addr_marker {
 };
 
 enum address_markers_idx {
-	VMALLOC_START_NR = 0,
+	MODULES_START_NR = 0,
+	MODULES_END_NR,
+	VMALLOC_START_NR,
 	VMALLOC_END_NR,
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 	VMEMMAP_START_NR,
@@ -45,12 +47,12 @@ enum address_markers_idx {
 	FIXADDR_END_NR,
 	PCI_START_NR,
 	PCI_END_NR,
-	MODULES_START_NR,
-	MODUELS_END_NR,
 	KERNEL_SPACE_NR,
 };
 
 static struct addr_marker address_markers[] = {
+	{ MODULES_VADDR,	"Modules start" },
+	{ MODULES_END,		"Modules end" },
 	{ VMALLOC_START,	"vmalloc() Area" },
 	{ VMALLOC_END,		"vmalloc() End" },
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
@@ -61,9 +63,7 @@ static struct addr_marker address_markers[] = {
 	{ FIXADDR_TOP,		"Fixmap end" },
 	{ PCI_IO_START,		"PCI I/O start" },
 	{ PCI_IO_END,		"PCI I/O end" },
-	{ MODULES_VADDR,	"Modules start" },
-	{ MODULES_END,		"Modules end" },
-	{ PAGE_OFFSET,		"Kernel Mapping" },
+	{ PAGE_OFFSET,		"Linear Mapping" },
 	{ -1,			NULL },
 };
 
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index f3b061e67bfe..baa923bda651 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -302,22 +302,26 @@ void __init mem_init(void)
 #ifdef CONFIG_KASAN
 		  "    kasan   : 0x%16lx - 0x%16lx   (%6ld GB)\n"
 #endif
+		  "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
 		  "    vmalloc : 0x%16lx - 0x%16lx   (%6ld GB)\n"
+		  "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
+		  "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
+		  "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n"
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 		  "    vmemmap : 0x%16lx - 0x%16lx   (%6ld GB maximum)\n"
 		  "              0x%16lx - 0x%16lx   (%6ld MB actual)\n"
 #endif
 		  "    fixed   : 0x%16lx - 0x%16lx   (%6ld KB)\n"
 		  "    PCI I/O : 0x%16lx - 0x%16lx   (%6ld MB)\n"
-		  "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
-		  "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n"
-		  "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
-		  "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
-		  "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n",
+		  "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n",
 #ifdef CONFIG_KASAN
 		  MLG(KASAN_SHADOW_START, KASAN_SHADOW_END),
 #endif
+		  MLM(MODULES_VADDR, MODULES_END),
 		  MLG(VMALLOC_START, VMALLOC_END),
+		  MLK_ROUNDUP(__init_begin, __init_end),
+		  MLK_ROUNDUP(_text, _etext),
+		  MLK_ROUNDUP(_sdata, _edata),
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 		  MLG((unsigned long)vmemmap,
 		      (unsigned long)vmemmap + VMEMMAP_SIZE),
@@ -326,11 +330,7 @@ void __init mem_init(void)
 #endif
 		  MLK(FIXADDR_START, FIXADDR_TOP),
 		  MLM(PCI_IO_START, PCI_IO_END),
-		  MLM(MODULES_VADDR, MODULES_END),
-		  MLM(PAGE_OFFSET, (unsigned long)high_memory),
-		  MLK_ROUNDUP(__init_begin, __init_end),
-		  MLK_ROUNDUP(_text, _etext),
-		  MLK_ROUNDUP(_sdata, _edata));
+		  MLM(PAGE_OFFSET, (unsigned long)high_memory));
 
 #undef MLK
 #undef MLM
diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index 0ca411fc5ea3..acdd1ac166ec 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -17,9 +17,11 @@
 #include <linux/start_kernel.h>
 
 #include <asm/mmu_context.h>
+#include <asm/kernel-pgtable.h>
 #include <asm/page.h>
 #include <asm/pgalloc.h>
 #include <asm/pgtable.h>
+#include <asm/sections.h>
 #include <asm/tlbflush.h>
 
 static pgd_t tmp_pg_dir[PTRS_PER_PGD] __initdata __aligned(PGD_SIZE);
@@ -33,7 +35,7 @@ static void __init kasan_early_pte_populate(pmd_t *pmd, unsigned long addr,
 	if (pmd_none(*pmd))
 		pmd_populate_kernel(&init_mm, pmd, kasan_zero_pte);
 
-	pte = pte_offset_kernel(pmd, addr);
+	pte = pte_offset_kimg(pmd, addr);
 	do {
 		next = addr + PAGE_SIZE;
 		set_pte(pte, pfn_pte(virt_to_pfn(kasan_zero_page),
@@ -51,7 +53,7 @@ static void __init kasan_early_pmd_populate(pud_t *pud,
 	if (pud_none(*pud))
 		pud_populate(&init_mm, pud, kasan_zero_pmd);
 
-	pmd = pmd_offset(pud, addr);
+	pmd = pmd_offset_kimg(pud, addr);
 	do {
 		next = pmd_addr_end(addr, end);
 		kasan_early_pte_populate(pmd, addr, next);
@@ -68,7 +70,7 @@ static void __init kasan_early_pud_populate(pgd_t *pgd,
 	if (pgd_none(*pgd))
 		pgd_populate(&init_mm, pgd, kasan_zero_pud);
 
-	pud = pud_offset(pgd, addr);
+	pud = pud_offset_kimg(pgd, addr);
 	do {
 		next = pud_addr_end(addr, end);
 		kasan_early_pmd_populate(pud, addr, next);
@@ -126,8 +128,14 @@ static void __init clear_pgds(unsigned long start,
 
 void __init kasan_init(void)
 {
+	u64 kimg_shadow_start, kimg_shadow_end;
 	struct memblock_region *reg;
 
+	kimg_shadow_start = round_down((u64)kasan_mem_to_shadow(_text),
+				       SWAPPER_BLOCK_SIZE);
+	kimg_shadow_end = round_up((u64)kasan_mem_to_shadow(_end),
+				   SWAPPER_BLOCK_SIZE);
+
 	/*
 	 * We are going to perform proper setup of shadow memory.
 	 * At first we should unmap early shadow (clear_pgds() call bellow).
@@ -141,8 +149,13 @@ void __init kasan_init(void)
 
 	clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
 
+	vmemmap_populate(kimg_shadow_start, kimg_shadow_end,
+			 pfn_to_nid(virt_to_pfn(kimg_shadow_start)));
+
 	kasan_populate_zero_shadow((void *)KASAN_SHADOW_START,
-			kasan_mem_to_shadow((void *)MODULES_VADDR));
+				   (void *)kimg_shadow_start);
+	kasan_populate_zero_shadow((void *)kimg_shadow_end,
+				   kasan_mem_to_shadow((void *)PAGE_OFFSET));
 
 	for_each_memblock(memory, reg) {
 		void *start = (void *)__phys_to_virt(reg->base);
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 75b5f0dc3bdc..0b28f1469f9b 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -53,6 +53,10 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
 unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
 EXPORT_SYMBOL(empty_zero_page);
 
+static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
+static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
+static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
+
 pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 			      unsigned long size, pgprot_t vma_prot)
 {
@@ -349,14 +353,14 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 {
 
 	unsigned long kernel_start = __pa(_stext);
-	unsigned long kernel_end = __pa(_end);
+	unsigned long kernel_end = __pa(_etext);
 
 	/*
-	 * The kernel itself is mapped at page granularity. Map all other
-	 * memory, making sure we don't overwrite the existing kernel mappings.
+	 * Take care not to create a writable alias for the
+	 * read-only text and rodata sections of the kernel image.
 	 */
 
-	/* No overlap with the kernel. */
+	/* No overlap with the kernel text */
 	if (end < kernel_start || start >= kernel_end) {
 		__create_pgd_mapping(pgd, start, __phys_to_virt(start),
 				     end - start, PAGE_KERNEL,
@@ -365,7 +369,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
 	}
 
 	/*
-	 * This block overlaps the kernel mapping. Map the portion(s) which
+	 * This block overlaps the kernel text mapping. Map the portion(s) which
 	 * don't overlap.
 	 */
 	if (start < kernel_start)
@@ -438,12 +442,29 @@ static void __init map_kernel(pgd_t *pgd)
 	map_kernel_chunk(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC);
 	map_kernel_chunk(pgd, _data, _end, PAGE_KERNEL);
 
-	/*
-	 * The fixmap falls in a separate pgd to the kernel, and doesn't live
-	 * in the carveout for the swapper_pg_dir. We can simply re-use the
-	 * existing dir for the fixmap.
-	 */
-	set_pgd(pgd_offset_raw(pgd, FIXADDR_START), *pgd_offset_k(FIXADDR_START));
+	if (pgd_index(FIXADDR_START) != pgd_index((u64)_end)) {
+		/*
+		 * The fixmap falls in a separate pgd to the kernel, and doesn't
+		 * live in the carveout for the swapper_pg_dir. We can simply
+		 * re-use the existing dir for the fixmap.
+		 */
+		set_pgd(pgd_offset_raw(pgd, FIXADDR_START),
+			*pgd_offset_k(FIXADDR_START));
+	} else if (CONFIG_PGTABLE_LEVELS > 3) {
+		/*
+		 * The fixmap shares its top level pgd entry with the kernel
+		 * mapping. This can really only occur when we are running
+		 * with 16k/4 levels, so we can simply reuse the pud level
+		 * entry instead.
+		 */
+		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
+
+		set_pud(pud_set_fixmap_offset(pgd, FIXADDR_START),
+			__pud(__pa(bm_pmd) | PUD_TYPE_TABLE));
+		pud_clear_fixmap();
+	} else {
+		BUG();
+	}
 
 	kasan_copy_shadow(pgd);
 }
@@ -569,10 +590,6 @@ void vmemmap_free(unsigned long start, unsigned long end)
 }
 #endif	/* CONFIG_SPARSEMEM_VMEMMAP */
 
-static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
-static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
-static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
-
 static inline pud_t * fixmap_pud(unsigned long addr)
 {
 	return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
@@ -598,8 +615,19 @@ void __init early_fixmap_init(void)
 	unsigned long addr = FIXADDR_START;
 
 	pgd = pgd_offset_k(addr);
-	pgd_populate(&init_mm, pgd, bm_pud);
-	pud = fixmap_pud(addr);
+	if (CONFIG_PGTABLE_LEVELS > 3 && !pgd_none(*pgd)) {
+		/*
+		 * We only end up here if the kernel mapping and the fixmap
+		 * share the top level pgd entry, which should only happen on
+		 * 16k/4 levels configurations.
+		 */
+		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
+		pud = pud_offset_kimg(pgd, addr);
+		memblock_free(__pa(bm_pud), sizeof(bm_pud));
+	} else {
+		pgd_populate(&init_mm, pgd, bm_pud);
+		pud = fixmap_pud(addr);
+	}
 	pud_populate(&init_mm, pud, bm_pmd);
 	pmd = fixmap_pmd(addr);
 	pmd_populate_kernel(&init_mm, pmd, bm_pte);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 08/21] arm64: add support for module PLTs
  2016-01-11 13:18 ` Ard Biesheuvel
  (?)
@ 2016-01-11 13:19   ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

This adds support for emitting PLTs at module load time for relative
branches that are out of range. This is a prerequisite for KASLR, which
may place the kernel and the modules anywhere in the vmalloc area,
making it more likely that branch target offsets exceed the maximum
range of +/- 128 MB.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/Kconfig              |   9 ++
 arch/arm64/Makefile             |   6 +-
 arch/arm64/include/asm/module.h |  11 ++
 arch/arm64/kernel/Makefile      |   1 +
 arch/arm64/kernel/module-plts.c | 137 ++++++++++++++++++++
 arch/arm64/kernel/module.c      |  12 ++
 arch/arm64/kernel/module.lds    |   4 +
 7 files changed, 179 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index ffa3c549a4ba..778df20bf623 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -363,6 +363,7 @@ config ARM64_ERRATUM_843419
 	bool "Cortex-A53: 843419: A load or store might access an incorrect address"
 	depends on MODULES
 	default y
+	select ARM64_MODULE_CMODEL_LARGE
 	help
 	  This option builds kernel modules using the large memory model in
 	  order to avoid the use of the ADRP instruction, which can cause
@@ -702,6 +703,14 @@ config ARM64_LSE_ATOMICS
 
 endmenu
 
+config ARM64_MODULE_CMODEL_LARGE
+	bool
+
+config ARM64_MODULE_PLTS
+	bool
+	select ARM64_MODULE_CMODEL_LARGE
+	select HAVE_MOD_ARCH_SPECIFIC
+
 endmenu
 
 menu "Boot options"
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index cd822d8454c0..db462980c6be 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -41,10 +41,14 @@ endif
 
 CHECKFLAGS	+= -D__aarch64__
 
-ifeq ($(CONFIG_ARM64_ERRATUM_843419), y)
+ifeq ($(CONFIG_ARM64_MODULE_CMODEL_LARGE), y)
 KBUILD_CFLAGS_MODULE	+= -mcmodel=large
 endif
 
+ifeq ($(CONFIG_ARM64_MODULE_PLTS),y)
+KBUILD_LDFLAGS_MODULE	+= -T $(srctree)/arch/arm64/kernel/module.lds
+endif
+
 # Default value
 head-y		:= arch/arm64/kernel/head.o
 
diff --git a/arch/arm64/include/asm/module.h b/arch/arm64/include/asm/module.h
index e80e232b730e..7b8cd3dc9d8e 100644
--- a/arch/arm64/include/asm/module.h
+++ b/arch/arm64/include/asm/module.h
@@ -20,4 +20,15 @@
 
 #define MODULE_ARCH_VERMAGIC	"aarch64"
 
+#ifdef CONFIG_ARM64_MODULE_PLTS
+struct mod_arch_specific {
+	struct elf64_shdr	*core_plt;
+	struct elf64_shdr	*init_plt;
+	int			core_plt_count;
+	int			init_plt_count;
+};
+#endif
+
+u64 get_module_plt(struct module *mod, void *loc, u64 val);
+
 #endif /* __ASM_MODULE_H */
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 474691f8b13a..f42b0fff607f 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -30,6 +30,7 @@ arm64-obj-$(CONFIG_COMPAT)		+= sys32.o kuser32.o signal32.o 	\
 					   ../../arm/kernel/opcodes.o
 arm64-obj-$(CONFIG_FUNCTION_TRACER)	+= ftrace.o entry-ftrace.o
 arm64-obj-$(CONFIG_MODULES)		+= arm64ksyms.o module.o
+arm64-obj-$(CONFIG_ARM64_MODULE_PLTS)	+= module-plts.o
 arm64-obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o perf_callchain.o
 arm64-obj-$(CONFIG_HW_PERF_EVENTS)	+= perf_event.o
 arm64-obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= hw_breakpoint.o
diff --git a/arch/arm64/kernel/module-plts.c b/arch/arm64/kernel/module-plts.c
new file mode 100644
index 000000000000..4a8ef9ea01ee
--- /dev/null
+++ b/arch/arm64/kernel/module-plts.c
@@ -0,0 +1,137 @@
+/*
+ * Copyright (C) 2014-2015 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/elf.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+struct plt_entry {
+	__le32	mov0;	/* movn	x16, #0x....			*/
+	__le32	mov1;	/* movk	x16, #0x...., lsl #16		*/
+	__le32	mov2;	/* movk	x16, #0x...., lsl #32		*/
+	__le32	br;	/* br	x16				*/
+} __aligned(8);
+
+static bool in_init(const struct module *mod, void *addr)
+{
+	return (u64)addr - (u64)mod->module_init < mod->init_size;
+}
+
+u64 get_module_plt(struct module *mod, void *loc, u64 val)
+{
+	struct plt_entry entry = {
+		cpu_to_le32(0x92800010 | (((~val      ) & 0xffff)) << 5),
+		cpu_to_le32(0xf2a00010 | ((( val >> 16) & 0xffff)) << 5),
+		cpu_to_le32(0xf2c00010 | ((( val >> 32) & 0xffff)) << 5),
+		cpu_to_le32(0xd61f0200)
+	}, *plt;
+	int i, *count;
+
+	if (in_init(mod, loc)) {
+		plt = (struct plt_entry *)mod->arch.init_plt->sh_addr;
+		count = &mod->arch.init_plt_count;
+	} else {
+		plt = (struct plt_entry *)mod->arch.core_plt->sh_addr;
+		count = &mod->arch.core_plt_count;
+	}
+
+	/* Look for an existing entry pointing to 'val' */
+	for (i = 0; i < *count; i++)
+		if (plt[i].mov0 == entry.mov0 &&
+		    plt[i].mov1 == entry.mov1 &&
+		    plt[i].mov2 == entry.mov2)
+			return (u64)&plt[i];
+
+	i = (*count)++;
+	plt[i] = entry;
+	return (u64)&plt[i];
+}
+
+static int duplicate_rel(Elf64_Addr base, const Elf64_Rela *rela, int num)
+{
+	int i;
+
+	for (i = 0; i < num; i++) {
+		if (rela[i].r_info == rela[num].r_info &&
+		    rela[i].r_addend == rela[num].r_addend)
+			return 1;
+	}
+	return 0;
+}
+
+/* Count how many PLT entries we may need */
+static unsigned int count_plts(Elf64_Addr base, const Elf64_Rela *rela, int num)
+{
+	unsigned int ret = 0;
+	int i;
+
+	/*
+	 * Sure, this is order(n^2), but it's usually short, and not
+	 * time critical
+	 */
+	for (i = 0; i < num; i++)
+		switch (ELF64_R_TYPE(rela[i].r_info)) {
+		case R_AARCH64_JUMP26:
+		case R_AARCH64_CALL26:
+			if (!duplicate_rel(base, rela, i))
+				ret++;
+			break;
+		}
+	return ret;
+}
+
+int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
+			      char *secstrings, struct module *mod)
+{
+	unsigned long core_plts = 0, init_plts = 0;
+	Elf64_Shdr *s, *sechdrs_end = sechdrs + ehdr->e_shnum;
+
+	/*
+	 * To store the PLTs, we expand the .text section for core module code
+	 * and the .init.text section for initialization code.
+	 */
+	for (s = sechdrs; s < sechdrs_end; ++s)
+		if (strcmp(".core.plt", secstrings + s->sh_name) == 0)
+			mod->arch.core_plt = s;
+		else if (strcmp(".init.plt", secstrings + s->sh_name) == 0)
+			mod->arch.init_plt = s;
+
+	if (!mod->arch.core_plt || !mod->arch.init_plt) {
+		pr_err("%s: sections missing\n", mod->name);
+		return -ENOEXEC;
+	}
+
+	for (s = sechdrs + 1; s < sechdrs_end; ++s) {
+		const Elf64_Rela *rels = (void *)ehdr + s->sh_offset;
+		int numrels = s->sh_size / sizeof(Elf64_Rela);
+		Elf64_Shdr *dstsec = sechdrs + s->sh_info;
+
+		if (s->sh_type != SHT_RELA)
+			continue;
+
+		if (strstr(secstrings + s->sh_name, ".init"))
+			init_plts += count_plts(dstsec->sh_addr, rels, numrels);
+		else
+			core_plts += count_plts(dstsec->sh_addr, rels, numrels);
+	}
+
+	mod->arch.core_plt->sh_type = SHT_NOBITS;
+	mod->arch.core_plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
+	mod->arch.core_plt->sh_addralign = L1_CACHE_BYTES;
+	mod->arch.core_plt->sh_size = core_plts * sizeof(struct plt_entry);
+	mod->arch.core_plt_count = 0;
+
+	mod->arch.init_plt->sh_type = SHT_NOBITS;
+	mod->arch.init_plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
+	mod->arch.init_plt->sh_addralign = L1_CACHE_BYTES;
+	mod->arch.init_plt->sh_size = init_plts * sizeof(struct plt_entry);
+	mod->arch.init_plt_count = 0;
+	pr_debug("%s: core.plt=%lld, init.plt=%lld\n", __func__,
+		 mod->arch.core_plt->sh_size, mod->arch.init_plt->sh_size);
+	return 0;
+}
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index 93e970231ca9..3a298b0e21bb 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -38,6 +38,11 @@ void *module_alloc(unsigned long size)
 				GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
 				NUMA_NO_NODE, __builtin_return_address(0));
 
+	if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
+		p = __vmalloc_node_range(size, MODULE_ALIGN, VMALLOC_START,
+				VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
+				NUMA_NO_NODE, __builtin_return_address(0));
+
 	if (p && (kasan_module_alloc(p, size) < 0)) {
 		vfree(p);
 		return NULL;
@@ -361,6 +366,13 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 		case R_AARCH64_CALL26:
 			ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 2, 26,
 					     AARCH64_INSN_IMM_26);
+
+			if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
+			    ovf == -ERANGE) {
+				val = get_module_plt(me, loc, val);
+				ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 2,
+						     26, AARCH64_INSN_IMM_26);
+			}
 			break;
 
 		default:
diff --git a/arch/arm64/kernel/module.lds b/arch/arm64/kernel/module.lds
new file mode 100644
index 000000000000..3682fa107918
--- /dev/null
+++ b/arch/arm64/kernel/module.lds
@@ -0,0 +1,4 @@
+SECTIONS {
+        .core.plt : { BYTE(0) }
+        .init.plt : { BYTE(0) }
+}
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 08/21] arm64: add support for module PLTs
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel

This adds support for emitting PLTs at module load time for relative
branches that are out of range. This is a prerequisite for KASLR, which
may place the kernel and the modules anywhere in the vmalloc area,
making it more likely that branch target offsets exceed the maximum
range of +/- 128 MB.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/Kconfig              |   9 ++
 arch/arm64/Makefile             |   6 +-
 arch/arm64/include/asm/module.h |  11 ++
 arch/arm64/kernel/Makefile      |   1 +
 arch/arm64/kernel/module-plts.c | 137 ++++++++++++++++++++
 arch/arm64/kernel/module.c      |  12 ++
 arch/arm64/kernel/module.lds    |   4 +
 7 files changed, 179 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index ffa3c549a4ba..778df20bf623 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -363,6 +363,7 @@ config ARM64_ERRATUM_843419
 	bool "Cortex-A53: 843419: A load or store might access an incorrect address"
 	depends on MODULES
 	default y
+	select ARM64_MODULE_CMODEL_LARGE
 	help
 	  This option builds kernel modules using the large memory model in
 	  order to avoid the use of the ADRP instruction, which can cause
@@ -702,6 +703,14 @@ config ARM64_LSE_ATOMICS
 
 endmenu
 
+config ARM64_MODULE_CMODEL_LARGE
+	bool
+
+config ARM64_MODULE_PLTS
+	bool
+	select ARM64_MODULE_CMODEL_LARGE
+	select HAVE_MOD_ARCH_SPECIFIC
+
 endmenu
 
 menu "Boot options"
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index cd822d8454c0..db462980c6be 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -41,10 +41,14 @@ endif
 
 CHECKFLAGS	+= -D__aarch64__
 
-ifeq ($(CONFIG_ARM64_ERRATUM_843419), y)
+ifeq ($(CONFIG_ARM64_MODULE_CMODEL_LARGE), y)
 KBUILD_CFLAGS_MODULE	+= -mcmodel=large
 endif
 
+ifeq ($(CONFIG_ARM64_MODULE_PLTS),y)
+KBUILD_LDFLAGS_MODULE	+= -T $(srctree)/arch/arm64/kernel/module.lds
+endif
+
 # Default value
 head-y		:= arch/arm64/kernel/head.o
 
diff --git a/arch/arm64/include/asm/module.h b/arch/arm64/include/asm/module.h
index e80e232b730e..7b8cd3dc9d8e 100644
--- a/arch/arm64/include/asm/module.h
+++ b/arch/arm64/include/asm/module.h
@@ -20,4 +20,15 @@
 
 #define MODULE_ARCH_VERMAGIC	"aarch64"
 
+#ifdef CONFIG_ARM64_MODULE_PLTS
+struct mod_arch_specific {
+	struct elf64_shdr	*core_plt;
+	struct elf64_shdr	*init_plt;
+	int			core_plt_count;
+	int			init_plt_count;
+};
+#endif
+
+u64 get_module_plt(struct module *mod, void *loc, u64 val);
+
 #endif /* __ASM_MODULE_H */
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 474691f8b13a..f42b0fff607f 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -30,6 +30,7 @@ arm64-obj-$(CONFIG_COMPAT)		+= sys32.o kuser32.o signal32.o 	\
 					   ../../arm/kernel/opcodes.o
 arm64-obj-$(CONFIG_FUNCTION_TRACER)	+= ftrace.o entry-ftrace.o
 arm64-obj-$(CONFIG_MODULES)		+= arm64ksyms.o module.o
+arm64-obj-$(CONFIG_ARM64_MODULE_PLTS)	+= module-plts.o
 arm64-obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o perf_callchain.o
 arm64-obj-$(CONFIG_HW_PERF_EVENTS)	+= perf_event.o
 arm64-obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= hw_breakpoint.o
diff --git a/arch/arm64/kernel/module-plts.c b/arch/arm64/kernel/module-plts.c
new file mode 100644
index 000000000000..4a8ef9ea01ee
--- /dev/null
+++ b/arch/arm64/kernel/module-plts.c
@@ -0,0 +1,137 @@
+/*
+ * Copyright (C) 2014-2015 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/elf.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+struct plt_entry {
+	__le32	mov0;	/* movn	x16, #0x....			*/
+	__le32	mov1;	/* movk	x16, #0x...., lsl #16		*/
+	__le32	mov2;	/* movk	x16, #0x...., lsl #32		*/
+	__le32	br;	/* br	x16				*/
+} __aligned(8);
+
+static bool in_init(const struct module *mod, void *addr)
+{
+	return (u64)addr - (u64)mod->module_init < mod->init_size;
+}
+
+u64 get_module_plt(struct module *mod, void *loc, u64 val)
+{
+	struct plt_entry entry = {
+		cpu_to_le32(0x92800010 | (((~val      ) & 0xffff)) << 5),
+		cpu_to_le32(0xf2a00010 | ((( val >> 16) & 0xffff)) << 5),
+		cpu_to_le32(0xf2c00010 | ((( val >> 32) & 0xffff)) << 5),
+		cpu_to_le32(0xd61f0200)
+	}, *plt;
+	int i, *count;
+
+	if (in_init(mod, loc)) {
+		plt = (struct plt_entry *)mod->arch.init_plt->sh_addr;
+		count = &mod->arch.init_plt_count;
+	} else {
+		plt = (struct plt_entry *)mod->arch.core_plt->sh_addr;
+		count = &mod->arch.core_plt_count;
+	}
+
+	/* Look for an existing entry pointing to 'val' */
+	for (i = 0; i < *count; i++)
+		if (plt[i].mov0 == entry.mov0 &&
+		    plt[i].mov1 == entry.mov1 &&
+		    plt[i].mov2 == entry.mov2)
+			return (u64)&plt[i];
+
+	i = (*count)++;
+	plt[i] = entry;
+	return (u64)&plt[i];
+}
+
+static int duplicate_rel(Elf64_Addr base, const Elf64_Rela *rela, int num)
+{
+	int i;
+
+	for (i = 0; i < num; i++) {
+		if (rela[i].r_info == rela[num].r_info &&
+		    rela[i].r_addend == rela[num].r_addend)
+			return 1;
+	}
+	return 0;
+}
+
+/* Count how many PLT entries we may need */
+static unsigned int count_plts(Elf64_Addr base, const Elf64_Rela *rela, int num)
+{
+	unsigned int ret = 0;
+	int i;
+
+	/*
+	 * Sure, this is order(n^2), but it's usually short, and not
+	 * time critical
+	 */
+	for (i = 0; i < num; i++)
+		switch (ELF64_R_TYPE(rela[i].r_info)) {
+		case R_AARCH64_JUMP26:
+		case R_AARCH64_CALL26:
+			if (!duplicate_rel(base, rela, i))
+				ret++;
+			break;
+		}
+	return ret;
+}
+
+int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
+			      char *secstrings, struct module *mod)
+{
+	unsigned long core_plts = 0, init_plts = 0;
+	Elf64_Shdr *s, *sechdrs_end = sechdrs + ehdr->e_shnum;
+
+	/*
+	 * To store the PLTs, we expand the .text section for core module code
+	 * and the .init.text section for initialization code.
+	 */
+	for (s = sechdrs; s < sechdrs_end; ++s)
+		if (strcmp(".core.plt", secstrings + s->sh_name) == 0)
+			mod->arch.core_plt = s;
+		else if (strcmp(".init.plt", secstrings + s->sh_name) == 0)
+			mod->arch.init_plt = s;
+
+	if (!mod->arch.core_plt || !mod->arch.init_plt) {
+		pr_err("%s: sections missing\n", mod->name);
+		return -ENOEXEC;
+	}
+
+	for (s = sechdrs + 1; s < sechdrs_end; ++s) {
+		const Elf64_Rela *rels = (void *)ehdr + s->sh_offset;
+		int numrels = s->sh_size / sizeof(Elf64_Rela);
+		Elf64_Shdr *dstsec = sechdrs + s->sh_info;
+
+		if (s->sh_type != SHT_RELA)
+			continue;
+
+		if (strstr(secstrings + s->sh_name, ".init"))
+			init_plts += count_plts(dstsec->sh_addr, rels, numrels);
+		else
+			core_plts += count_plts(dstsec->sh_addr, rels, numrels);
+	}
+
+	mod->arch.core_plt->sh_type = SHT_NOBITS;
+	mod->arch.core_plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
+	mod->arch.core_plt->sh_addralign = L1_CACHE_BYTES;
+	mod->arch.core_plt->sh_size = core_plts * sizeof(struct plt_entry);
+	mod->arch.core_plt_count = 0;
+
+	mod->arch.init_plt->sh_type = SHT_NOBITS;
+	mod->arch.init_plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
+	mod->arch.init_plt->sh_addralign = L1_CACHE_BYTES;
+	mod->arch.init_plt->sh_size = init_plts * sizeof(struct plt_entry);
+	mod->arch.init_plt_count = 0;
+	pr_debug("%s: core.plt=%lld, init.plt=%lld\n", __func__,
+		 mod->arch.core_plt->sh_size, mod->arch.init_plt->sh_size);
+	return 0;
+}
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index 93e970231ca9..3a298b0e21bb 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -38,6 +38,11 @@ void *module_alloc(unsigned long size)
 				GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
 				NUMA_NO_NODE, __builtin_return_address(0));
 
+	if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
+		p = __vmalloc_node_range(size, MODULE_ALIGN, VMALLOC_START,
+				VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
+				NUMA_NO_NODE, __builtin_return_address(0));
+
 	if (p && (kasan_module_alloc(p, size) < 0)) {
 		vfree(p);
 		return NULL;
@@ -361,6 +366,13 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 		case R_AARCH64_CALL26:
 			ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 2, 26,
 					     AARCH64_INSN_IMM_26);
+
+			if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
+			    ovf == -ERANGE) {
+				val = get_module_plt(me, loc, val);
+				ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 2,
+						     26, AARCH64_INSN_IMM_26);
+			}
 			break;
 
 		default:
diff --git a/arch/arm64/kernel/module.lds b/arch/arm64/kernel/module.lds
new file mode 100644
index 000000000000..3682fa107918
--- /dev/null
+++ b/arch/arm64/kernel/module.lds
@@ -0,0 +1,4 @@
+SECTIONS {
+        .core.plt : { BYTE(0) }
+        .init.plt : { BYTE(0) }
+}
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [kernel-hardening] [PATCH v3 08/21] arm64: add support for module PLTs
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

This adds support for emitting PLTs at module load time for relative
branches that are out of range. This is a prerequisite for KASLR, which
may place the kernel and the modules anywhere in the vmalloc area,
making it more likely that branch target offsets exceed the maximum
range of +/- 128 MB.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/Kconfig              |   9 ++
 arch/arm64/Makefile             |   6 +-
 arch/arm64/include/asm/module.h |  11 ++
 arch/arm64/kernel/Makefile      |   1 +
 arch/arm64/kernel/module-plts.c | 137 ++++++++++++++++++++
 arch/arm64/kernel/module.c      |  12 ++
 arch/arm64/kernel/module.lds    |   4 +
 7 files changed, 179 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index ffa3c549a4ba..778df20bf623 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -363,6 +363,7 @@ config ARM64_ERRATUM_843419
 	bool "Cortex-A53: 843419: A load or store might access an incorrect address"
 	depends on MODULES
 	default y
+	select ARM64_MODULE_CMODEL_LARGE
 	help
 	  This option builds kernel modules using the large memory model in
 	  order to avoid the use of the ADRP instruction, which can cause
@@ -702,6 +703,14 @@ config ARM64_LSE_ATOMICS
 
 endmenu
 
+config ARM64_MODULE_CMODEL_LARGE
+	bool
+
+config ARM64_MODULE_PLTS
+	bool
+	select ARM64_MODULE_CMODEL_LARGE
+	select HAVE_MOD_ARCH_SPECIFIC
+
 endmenu
 
 menu "Boot options"
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index cd822d8454c0..db462980c6be 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -41,10 +41,14 @@ endif
 
 CHECKFLAGS	+= -D__aarch64__
 
-ifeq ($(CONFIG_ARM64_ERRATUM_843419), y)
+ifeq ($(CONFIG_ARM64_MODULE_CMODEL_LARGE), y)
 KBUILD_CFLAGS_MODULE	+= -mcmodel=large
 endif
 
+ifeq ($(CONFIG_ARM64_MODULE_PLTS),y)
+KBUILD_LDFLAGS_MODULE	+= -T $(srctree)/arch/arm64/kernel/module.lds
+endif
+
 # Default value
 head-y		:= arch/arm64/kernel/head.o
 
diff --git a/arch/arm64/include/asm/module.h b/arch/arm64/include/asm/module.h
index e80e232b730e..7b8cd3dc9d8e 100644
--- a/arch/arm64/include/asm/module.h
+++ b/arch/arm64/include/asm/module.h
@@ -20,4 +20,15 @@
 
 #define MODULE_ARCH_VERMAGIC	"aarch64"
 
+#ifdef CONFIG_ARM64_MODULE_PLTS
+struct mod_arch_specific {
+	struct elf64_shdr	*core_plt;
+	struct elf64_shdr	*init_plt;
+	int			core_plt_count;
+	int			init_plt_count;
+};
+#endif
+
+u64 get_module_plt(struct module *mod, void *loc, u64 val);
+
 #endif /* __ASM_MODULE_H */
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 474691f8b13a..f42b0fff607f 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -30,6 +30,7 @@ arm64-obj-$(CONFIG_COMPAT)		+= sys32.o kuser32.o signal32.o 	\
 					   ../../arm/kernel/opcodes.o
 arm64-obj-$(CONFIG_FUNCTION_TRACER)	+= ftrace.o entry-ftrace.o
 arm64-obj-$(CONFIG_MODULES)		+= arm64ksyms.o module.o
+arm64-obj-$(CONFIG_ARM64_MODULE_PLTS)	+= module-plts.o
 arm64-obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o perf_callchain.o
 arm64-obj-$(CONFIG_HW_PERF_EVENTS)	+= perf_event.o
 arm64-obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= hw_breakpoint.o
diff --git a/arch/arm64/kernel/module-plts.c b/arch/arm64/kernel/module-plts.c
new file mode 100644
index 000000000000..4a8ef9ea01ee
--- /dev/null
+++ b/arch/arm64/kernel/module-plts.c
@@ -0,0 +1,137 @@
+/*
+ * Copyright (C) 2014-2015 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/elf.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+struct plt_entry {
+	__le32	mov0;	/* movn	x16, #0x....			*/
+	__le32	mov1;	/* movk	x16, #0x...., lsl #16		*/
+	__le32	mov2;	/* movk	x16, #0x...., lsl #32		*/
+	__le32	br;	/* br	x16				*/
+} __aligned(8);
+
+static bool in_init(const struct module *mod, void *addr)
+{
+	return (u64)addr - (u64)mod->module_init < mod->init_size;
+}
+
+u64 get_module_plt(struct module *mod, void *loc, u64 val)
+{
+	struct plt_entry entry = {
+		cpu_to_le32(0x92800010 | (((~val      ) & 0xffff)) << 5),
+		cpu_to_le32(0xf2a00010 | ((( val >> 16) & 0xffff)) << 5),
+		cpu_to_le32(0xf2c00010 | ((( val >> 32) & 0xffff)) << 5),
+		cpu_to_le32(0xd61f0200)
+	}, *plt;
+	int i, *count;
+
+	if (in_init(mod, loc)) {
+		plt = (struct plt_entry *)mod->arch.init_plt->sh_addr;
+		count = &mod->arch.init_plt_count;
+	} else {
+		plt = (struct plt_entry *)mod->arch.core_plt->sh_addr;
+		count = &mod->arch.core_plt_count;
+	}
+
+	/* Look for an existing entry pointing to 'val' */
+	for (i = 0; i < *count; i++)
+		if (plt[i].mov0 == entry.mov0 &&
+		    plt[i].mov1 == entry.mov1 &&
+		    plt[i].mov2 == entry.mov2)
+			return (u64)&plt[i];
+
+	i = (*count)++;
+	plt[i] = entry;
+	return (u64)&plt[i];
+}
+
+static int duplicate_rel(Elf64_Addr base, const Elf64_Rela *rela, int num)
+{
+	int i;
+
+	for (i = 0; i < num; i++) {
+		if (rela[i].r_info == rela[num].r_info &&
+		    rela[i].r_addend == rela[num].r_addend)
+			return 1;
+	}
+	return 0;
+}
+
+/* Count how many PLT entries we may need */
+static unsigned int count_plts(Elf64_Addr base, const Elf64_Rela *rela, int num)
+{
+	unsigned int ret = 0;
+	int i;
+
+	/*
+	 * Sure, this is order(n^2), but it's usually short, and not
+	 * time critical
+	 */
+	for (i = 0; i < num; i++)
+		switch (ELF64_R_TYPE(rela[i].r_info)) {
+		case R_AARCH64_JUMP26:
+		case R_AARCH64_CALL26:
+			if (!duplicate_rel(base, rela, i))
+				ret++;
+			break;
+		}
+	return ret;
+}
+
+int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
+			      char *secstrings, struct module *mod)
+{
+	unsigned long core_plts = 0, init_plts = 0;
+	Elf64_Shdr *s, *sechdrs_end = sechdrs + ehdr->e_shnum;
+
+	/*
+	 * To store the PLTs, we expand the .text section for core module code
+	 * and the .init.text section for initialization code.
+	 */
+	for (s = sechdrs; s < sechdrs_end; ++s)
+		if (strcmp(".core.plt", secstrings + s->sh_name) == 0)
+			mod->arch.core_plt = s;
+		else if (strcmp(".init.plt", secstrings + s->sh_name) == 0)
+			mod->arch.init_plt = s;
+
+	if (!mod->arch.core_plt || !mod->arch.init_plt) {
+		pr_err("%s: sections missing\n", mod->name);
+		return -ENOEXEC;
+	}
+
+	for (s = sechdrs + 1; s < sechdrs_end; ++s) {
+		const Elf64_Rela *rels = (void *)ehdr + s->sh_offset;
+		int numrels = s->sh_size / sizeof(Elf64_Rela);
+		Elf64_Shdr *dstsec = sechdrs + s->sh_info;
+
+		if (s->sh_type != SHT_RELA)
+			continue;
+
+		if (strstr(secstrings + s->sh_name, ".init"))
+			init_plts += count_plts(dstsec->sh_addr, rels, numrels);
+		else
+			core_plts += count_plts(dstsec->sh_addr, rels, numrels);
+	}
+
+	mod->arch.core_plt->sh_type = SHT_NOBITS;
+	mod->arch.core_plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
+	mod->arch.core_plt->sh_addralign = L1_CACHE_BYTES;
+	mod->arch.core_plt->sh_size = core_plts * sizeof(struct plt_entry);
+	mod->arch.core_plt_count = 0;
+
+	mod->arch.init_plt->sh_type = SHT_NOBITS;
+	mod->arch.init_plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
+	mod->arch.init_plt->sh_addralign = L1_CACHE_BYTES;
+	mod->arch.init_plt->sh_size = init_plts * sizeof(struct plt_entry);
+	mod->arch.init_plt_count = 0;
+	pr_debug("%s: core.plt=%lld, init.plt=%lld\n", __func__,
+		 mod->arch.core_plt->sh_size, mod->arch.init_plt->sh_size);
+	return 0;
+}
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index 93e970231ca9..3a298b0e21bb 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -38,6 +38,11 @@ void *module_alloc(unsigned long size)
 				GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
 				NUMA_NO_NODE, __builtin_return_address(0));
 
+	if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
+		p = __vmalloc_node_range(size, MODULE_ALIGN, VMALLOC_START,
+				VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
+				NUMA_NO_NODE, __builtin_return_address(0));
+
 	if (p && (kasan_module_alloc(p, size) < 0)) {
 		vfree(p);
 		return NULL;
@@ -361,6 +366,13 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 		case R_AARCH64_CALL26:
 			ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 2, 26,
 					     AARCH64_INSN_IMM_26);
+
+			if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
+			    ovf == -ERANGE) {
+				val = get_module_plt(me, loc, val);
+				ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 2,
+						     26, AARCH64_INSN_IMM_26);
+			}
 			break;
 
 		default:
diff --git a/arch/arm64/kernel/module.lds b/arch/arm64/kernel/module.lds
new file mode 100644
index 000000000000..3682fa107918
--- /dev/null
+++ b/arch/arm64/kernel/module.lds
@@ -0,0 +1,4 @@
+SECTIONS {
+        .core.plt : { BYTE(0) }
+        .init.plt : { BYTE(0) }
+}
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 09/21] extable: add support for relative extables to search and sort routines
  2016-01-11 13:18 ` Ard Biesheuvel
  (?)
@ 2016-01-11 13:19   ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

This adds support to the generic search_extable() and sort_extable()
implementations for dealing with exception table entries whose fields
contain relative offsets rather than absolute addresses.

Acked-by: Helge Deller <deller@gmx.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 lib/extable.c | 50 ++++++++++++++++----
 1 file changed, 41 insertions(+), 9 deletions(-)

diff --git a/lib/extable.c b/lib/extable.c
index 4cac81ec225e..0be02ad561e9 100644
--- a/lib/extable.c
+++ b/lib/extable.c
@@ -14,7 +14,37 @@
 #include <linux/sort.h>
 #include <asm/uaccess.h>
 
+#ifndef ARCH_HAS_RELATIVE_EXTABLE
+#define ex_to_insn(x)	((x)->insn)
+#else
+static inline unsigned long ex_to_insn(const struct exception_table_entry *x)
+{
+	return (unsigned long)&x->insn + x->insn;
+}
+#endif
+
 #ifndef ARCH_HAS_SORT_EXTABLE
+#ifndef ARCH_HAS_RELATIVE_EXTABLE
+#define swap_ex		NULL
+#else
+static void swap_ex(void *a, void *b, int size)
+{
+	struct exception_table_entry *x = a, *y = b, tmp;
+	int delta = b - a;
+
+	tmp = *x;
+	x->insn = y->insn + delta;
+	y->insn = tmp.insn - delta;
+
+#ifdef swap_ex_entry_fixup
+	swap_ex_entry_fixup(x, y, tmp, delta);
+#else
+	x->fixup = y->fixup + delta;
+	y->fixup = tmp.fixup - delta;
+#endif
+}
+#endif /* ARCH_HAS_RELATIVE_EXTABLE */
+
 /*
  * The exception table needs to be sorted so that the binary
  * search that we use to find entries in it works properly.
@@ -26,9 +56,9 @@ static int cmp_ex(const void *a, const void *b)
 	const struct exception_table_entry *x = a, *y = b;
 
 	/* avoid overflow */
-	if (x->insn > y->insn)
+	if (ex_to_insn(x) > ex_to_insn(y))
 		return 1;
-	if (x->insn < y->insn)
+	if (ex_to_insn(x) < ex_to_insn(y))
 		return -1;
 	return 0;
 }
@@ -37,7 +67,7 @@ void sort_extable(struct exception_table_entry *start,
 		  struct exception_table_entry *finish)
 {
 	sort(start, finish - start, sizeof(struct exception_table_entry),
-	     cmp_ex, NULL);
+	     cmp_ex, swap_ex);
 }
 
 #ifdef CONFIG_MODULES
@@ -48,13 +78,15 @@ void sort_extable(struct exception_table_entry *start,
 void trim_init_extable(struct module *m)
 {
 	/*trim the beginning*/
-	while (m->num_exentries && within_module_init(m->extable[0].insn, m)) {
+	while (m->num_exentries &&
+	       within_module_init(ex_to_insn(&m->extable[0]), m)) {
 		m->extable++;
 		m->num_exentries--;
 	}
 	/*trim the end*/
 	while (m->num_exentries &&
-		within_module_init(m->extable[m->num_exentries-1].insn, m))
+	       within_module_init(ex_to_insn(&m->extable[m->num_exentries - 1]),
+				  m))
 		m->num_exentries--;
 }
 #endif /* CONFIG_MODULES */
@@ -81,13 +113,13 @@ search_extable(const struct exception_table_entry *first,
 		 * careful, the distance between value and insn
 		 * can be larger than MAX_LONG:
 		 */
-		if (mid->insn < value)
+		if (ex_to_insn(mid) < value)
 			first = mid + 1;
-		else if (mid->insn > value)
+		else if (ex_to_insn(mid) > value)
 			last = mid - 1;
 		else
 			return mid;
-        }
-        return NULL;
+	}
+	return NULL;
 }
 #endif
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 09/21] extable: add support for relative extables to search and sort routines
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel

This adds support to the generic search_extable() and sort_extable()
implementations for dealing with exception table entries whose fields
contain relative offsets rather than absolute addresses.

Acked-by: Helge Deller <deller@gmx.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 lib/extable.c | 50 ++++++++++++++++----
 1 file changed, 41 insertions(+), 9 deletions(-)

diff --git a/lib/extable.c b/lib/extable.c
index 4cac81ec225e..0be02ad561e9 100644
--- a/lib/extable.c
+++ b/lib/extable.c
@@ -14,7 +14,37 @@
 #include <linux/sort.h>
 #include <asm/uaccess.h>
 
+#ifndef ARCH_HAS_RELATIVE_EXTABLE
+#define ex_to_insn(x)	((x)->insn)
+#else
+static inline unsigned long ex_to_insn(const struct exception_table_entry *x)
+{
+	return (unsigned long)&x->insn + x->insn;
+}
+#endif
+
 #ifndef ARCH_HAS_SORT_EXTABLE
+#ifndef ARCH_HAS_RELATIVE_EXTABLE
+#define swap_ex		NULL
+#else
+static void swap_ex(void *a, void *b, int size)
+{
+	struct exception_table_entry *x = a, *y = b, tmp;
+	int delta = b - a;
+
+	tmp = *x;
+	x->insn = y->insn + delta;
+	y->insn = tmp.insn - delta;
+
+#ifdef swap_ex_entry_fixup
+	swap_ex_entry_fixup(x, y, tmp, delta);
+#else
+	x->fixup = y->fixup + delta;
+	y->fixup = tmp.fixup - delta;
+#endif
+}
+#endif /* ARCH_HAS_RELATIVE_EXTABLE */
+
 /*
  * The exception table needs to be sorted so that the binary
  * search that we use to find entries in it works properly.
@@ -26,9 +56,9 @@ static int cmp_ex(const void *a, const void *b)
 	const struct exception_table_entry *x = a, *y = b;
 
 	/* avoid overflow */
-	if (x->insn > y->insn)
+	if (ex_to_insn(x) > ex_to_insn(y))
 		return 1;
-	if (x->insn < y->insn)
+	if (ex_to_insn(x) < ex_to_insn(y))
 		return -1;
 	return 0;
 }
@@ -37,7 +67,7 @@ void sort_extable(struct exception_table_entry *start,
 		  struct exception_table_entry *finish)
 {
 	sort(start, finish - start, sizeof(struct exception_table_entry),
-	     cmp_ex, NULL);
+	     cmp_ex, swap_ex);
 }
 
 #ifdef CONFIG_MODULES
@@ -48,13 +78,15 @@ void sort_extable(struct exception_table_entry *start,
 void trim_init_extable(struct module *m)
 {
 	/*trim the beginning*/
-	while (m->num_exentries && within_module_init(m->extable[0].insn, m)) {
+	while (m->num_exentries &&
+	       within_module_init(ex_to_insn(&m->extable[0]), m)) {
 		m->extable++;
 		m->num_exentries--;
 	}
 	/*trim the end*/
 	while (m->num_exentries &&
-		within_module_init(m->extable[m->num_exentries-1].insn, m))
+	       within_module_init(ex_to_insn(&m->extable[m->num_exentries - 1]),
+				  m))
 		m->num_exentries--;
 }
 #endif /* CONFIG_MODULES */
@@ -81,13 +113,13 @@ search_extable(const struct exception_table_entry *first,
 		 * careful, the distance between value and insn
 		 * can be larger than MAX_LONG:
 		 */
-		if (mid->insn < value)
+		if (ex_to_insn(mid) < value)
 			first = mid + 1;
-		else if (mid->insn > value)
+		else if (ex_to_insn(mid) > value)
 			last = mid - 1;
 		else
 			return mid;
-        }
-        return NULL;
+	}
+	return NULL;
 }
 #endif
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [kernel-hardening] [PATCH v3 09/21] extable: add support for relative extables to search and sort routines
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

This adds support to the generic search_extable() and sort_extable()
implementations for dealing with exception table entries whose fields
contain relative offsets rather than absolute addresses.

Acked-by: Helge Deller <deller@gmx.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 lib/extable.c | 50 ++++++++++++++++----
 1 file changed, 41 insertions(+), 9 deletions(-)

diff --git a/lib/extable.c b/lib/extable.c
index 4cac81ec225e..0be02ad561e9 100644
--- a/lib/extable.c
+++ b/lib/extable.c
@@ -14,7 +14,37 @@
 #include <linux/sort.h>
 #include <asm/uaccess.h>
 
+#ifndef ARCH_HAS_RELATIVE_EXTABLE
+#define ex_to_insn(x)	((x)->insn)
+#else
+static inline unsigned long ex_to_insn(const struct exception_table_entry *x)
+{
+	return (unsigned long)&x->insn + x->insn;
+}
+#endif
+
 #ifndef ARCH_HAS_SORT_EXTABLE
+#ifndef ARCH_HAS_RELATIVE_EXTABLE
+#define swap_ex		NULL
+#else
+static void swap_ex(void *a, void *b, int size)
+{
+	struct exception_table_entry *x = a, *y = b, tmp;
+	int delta = b - a;
+
+	tmp = *x;
+	x->insn = y->insn + delta;
+	y->insn = tmp.insn - delta;
+
+#ifdef swap_ex_entry_fixup
+	swap_ex_entry_fixup(x, y, tmp, delta);
+#else
+	x->fixup = y->fixup + delta;
+	y->fixup = tmp.fixup - delta;
+#endif
+}
+#endif /* ARCH_HAS_RELATIVE_EXTABLE */
+
 /*
  * The exception table needs to be sorted so that the binary
  * search that we use to find entries in it works properly.
@@ -26,9 +56,9 @@ static int cmp_ex(const void *a, const void *b)
 	const struct exception_table_entry *x = a, *y = b;
 
 	/* avoid overflow */
-	if (x->insn > y->insn)
+	if (ex_to_insn(x) > ex_to_insn(y))
 		return 1;
-	if (x->insn < y->insn)
+	if (ex_to_insn(x) < ex_to_insn(y))
 		return -1;
 	return 0;
 }
@@ -37,7 +67,7 @@ void sort_extable(struct exception_table_entry *start,
 		  struct exception_table_entry *finish)
 {
 	sort(start, finish - start, sizeof(struct exception_table_entry),
-	     cmp_ex, NULL);
+	     cmp_ex, swap_ex);
 }
 
 #ifdef CONFIG_MODULES
@@ -48,13 +78,15 @@ void sort_extable(struct exception_table_entry *start,
 void trim_init_extable(struct module *m)
 {
 	/*trim the beginning*/
-	while (m->num_exentries && within_module_init(m->extable[0].insn, m)) {
+	while (m->num_exentries &&
+	       within_module_init(ex_to_insn(&m->extable[0]), m)) {
 		m->extable++;
 		m->num_exentries--;
 	}
 	/*trim the end*/
 	while (m->num_exentries &&
-		within_module_init(m->extable[m->num_exentries-1].insn, m))
+	       within_module_init(ex_to_insn(&m->extable[m->num_exentries - 1]),
+				  m))
 		m->num_exentries--;
 }
 #endif /* CONFIG_MODULES */
@@ -81,13 +113,13 @@ search_extable(const struct exception_table_entry *first,
 		 * careful, the distance between value and insn
 		 * can be larger than MAX_LONG:
 		 */
-		if (mid->insn < value)
+		if (ex_to_insn(mid) < value)
 			first = mid + 1;
-		else if (mid->insn > value)
+		else if (ex_to_insn(mid) > value)
 			last = mid - 1;
 		else
 			return mid;
-        }
-        return NULL;
+	}
+	return NULL;
 }
 #endif
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 10/21] arm64: switch to relative exception tables
  2016-01-11 13:18 ` Ard Biesheuvel
  (?)
@ 2016-01-11 13:19   ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

Instead of using absolute addresses for both the exception location
and the fixup, use offsets relative to the exception table entry values.
Not only does this cut the size of the exception table in half, it is
also a prerequisite for KASLR, since absolute exception table entries
are subject to dynamic relocation, which is incompatible with the sorting
of the exception table that occurs at build time.

This patch also introduces the _ASM_EXTABLE preprocessor macro (which
exists on x86 as well) and its _asm_extable assembly counterpart, as
shorthands to emit exception table entries.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/assembler.h      | 15 +++++++---
 arch/arm64/include/asm/futex.h          | 12 +++-----
 arch/arm64/include/asm/uaccess.h        | 30 +++++++++++---------
 arch/arm64/include/asm/word-at-a-time.h |  7 ++---
 arch/arm64/kernel/armv8_deprecated.c    |  7 ++---
 arch/arm64/mm/extable.c                 |  2 +-
 scripts/sortextable.c                   |  2 +-
 7 files changed, 38 insertions(+), 37 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index bb7b72734c24..d8bfcc1ce923 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -94,12 +94,19 @@
 	dmb	\opt
 	.endm
 
+/*
+ * Emit an entry into the exception table
+ */
+	.macro		_asm_extable, from, to
+	.pushsection	__ex_table, "a"
+	.align		3
+	.long		(\from - .), (\to - .)
+	.popsection
+	.endm
+
 #define USER(l, x...)				\
 9999:	x;					\
-	.section __ex_table,"a";		\
-	.align	3;				\
-	.quad	9999b,l;			\
-	.previous
+	_asm_extable	9999b, l
 
 /*
  * Register aliases.
diff --git a/arch/arm64/include/asm/futex.h b/arch/arm64/include/asm/futex.h
index 007a69fc4f40..1ab15a3b5a0e 100644
--- a/arch/arm64/include/asm/futex.h
+++ b/arch/arm64/include/asm/futex.h
@@ -42,10 +42,8 @@
 "4:	mov	%w0, %w5\n"						\
 "	b	3b\n"							\
 "	.popsection\n"							\
-"	.pushsection __ex_table,\"a\"\n"				\
-"	.align	3\n"							\
-"	.quad	1b, 4b, 2b, 4b\n"					\
-"	.popsection\n"							\
+	_ASM_EXTABLE(1b, 4b)						\
+	_ASM_EXTABLE(2b, 4b)						\
 	ALTERNATIVE("nop", SET_PSTATE_PAN(1), ARM64_HAS_PAN,		\
 		    CONFIG_ARM64_PAN)					\
 	: "=&r" (ret), "=&r" (oldval), "+Q" (*uaddr), "=&r" (tmp)	\
@@ -133,10 +131,8 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
 "4:	mov	%w0, %w6\n"
 "	b	3b\n"
 "	.popsection\n"
-"	.pushsection __ex_table,\"a\"\n"
-"	.align	3\n"
-"	.quad	1b, 4b, 2b, 4b\n"
-"	.popsection\n"
+	_ASM_EXTABLE(1b, 4b)
+	_ASM_EXTABLE(2b, 4b)
 	: "+r" (ret), "=&r" (val), "+Q" (*uaddr), "=&r" (tmp)
 	: "r" (oldval), "r" (newval), "Ir" (-EFAULT)
 	: "memory");
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index b2ede967fe7d..dc11577fab7e 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -36,11 +36,11 @@
 #define VERIFY_WRITE 1
 
 /*
- * The exception table consists of pairs of addresses: the first is the
- * address of an instruction that is allowed to fault, and the second is
- * the address at which the program should continue.  No registers are
- * modified, so it is entirely up to the continuation code to figure out
- * what to do.
+ * The exception table consists of pairs of relative offsets: the first
+ * is the relative offset to an instruction that is allowed to fault,
+ * and the second is the relative offset at which the program should
+ * continue. No registers are modified, so it is entirely up to the
+ * continuation code to figure out what to do.
  *
  * All the routines below use bits of fixup code that are out of line
  * with the main instruction path.  This means when everything is well,
@@ -50,9 +50,11 @@
 
 struct exception_table_entry
 {
-	unsigned long insn, fixup;
+	int insn, fixup;
 };
 
+#define ARCH_HAS_RELATIVE_EXTABLE
+
 extern int fixup_exception(struct pt_regs *regs);
 
 #define KERNEL_DS	(-1UL)
@@ -105,6 +107,12 @@ static inline void set_fs(mm_segment_t fs)
 #define access_ok(type, addr, size)	__range_ok(addr, size)
 #define user_addr_max			get_fs
 
+#define _ASM_EXTABLE(from, to)						\
+	"	.pushsection	__ex_table, \"a\"\n"			\
+	"	.align		3\n"					\
+	"	.long		(" #from " - .), (" #to " - .)\n"	\
+	"	.popsection\n"
+
 /*
  * The "__xxx" versions of the user access functions do not verify the address
  * space - it must have been done previously with a separate "access_ok()"
@@ -123,10 +131,7 @@ static inline void set_fs(mm_segment_t fs)
 	"	mov	%1, #0\n"					\
 	"	b	2b\n"						\
 	"	.previous\n"						\
-	"	.section __ex_table,\"a\"\n"				\
-	"	.align	3\n"						\
-	"	.quad	1b, 3b\n"					\
-	"	.previous"						\
+	_ASM_EXTABLE(1b, 3b)						\
 	: "+r" (err), "=&r" (x)						\
 	: "r" (addr), "i" (-EFAULT))
 
@@ -190,10 +195,7 @@ do {									\
 	"3:	mov	%w0, %3\n"					\
 	"	b	2b\n"						\
 	"	.previous\n"						\
-	"	.section __ex_table,\"a\"\n"				\
-	"	.align	3\n"						\
-	"	.quad	1b, 3b\n"					\
-	"	.previous"						\
+	_ASM_EXTABLE(1b, 3b)						\
 	: "+r" (err)							\
 	: "r" (x), "r" (addr), "i" (-EFAULT))
 
diff --git a/arch/arm64/include/asm/word-at-a-time.h b/arch/arm64/include/asm/word-at-a-time.h
index aab5bf09e9d9..2b79b8a89457 100644
--- a/arch/arm64/include/asm/word-at-a-time.h
+++ b/arch/arm64/include/asm/word-at-a-time.h
@@ -16,6 +16,8 @@
 #ifndef __ASM_WORD_AT_A_TIME_H
 #define __ASM_WORD_AT_A_TIME_H
 
+#include <asm/uaccess.h>
+
 #ifndef __AARCH64EB__
 
 #include <linux/kernel.h>
@@ -81,10 +83,7 @@ static inline unsigned long load_unaligned_zeropad(const void *addr)
 #endif
 	"	b	2b\n"
 	"	.popsection\n"
-	"	.pushsection __ex_table,\"a\"\n"
-	"	.align	3\n"
-	"	.quad	1b, 3b\n"
-	"	.popsection"
+	_ASM_EXTABLE(1b, 3b)
 	: "=&r" (ret), "=&r" (offset)
 	: "r" (addr), "Q" (*(unsigned long *)addr));
 
diff --git a/arch/arm64/kernel/armv8_deprecated.c b/arch/arm64/kernel/armv8_deprecated.c
index 3e01207917b1..c37202c0c838 100644
--- a/arch/arm64/kernel/armv8_deprecated.c
+++ b/arch/arm64/kernel/armv8_deprecated.c
@@ -297,11 +297,8 @@ static void __init register_insn_emulation_sysctl(struct ctl_table *table)
 	"4:	mov		%w0, %w5\n"			\
 	"	b		3b\n"				\
 	"	.popsection"					\
-	"	.pushsection	 __ex_table,\"a\"\n"		\
-	"	.align		3\n"				\
-	"	.quad		0b, 4b\n"			\
-	"	.quad		1b, 4b\n"			\
-	"	.popsection\n"					\
+	_ASM_EXTABLE(0b, 4b)					\
+	_ASM_EXTABLE(1b, 4b)					\
 	ALTERNATIVE("nop", SET_PSTATE_PAN(1), ARM64_HAS_PAN,	\
 		CONFIG_ARM64_PAN)				\
 	: "=&r" (res), "+r" (data), "=&r" (temp)		\
diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
index 79444279ba8c..81acd4706878 100644
--- a/arch/arm64/mm/extable.c
+++ b/arch/arm64/mm/extable.c
@@ -11,7 +11,7 @@ int fixup_exception(struct pt_regs *regs)
 
 	fixup = search_exception_tables(instruction_pointer(regs));
 	if (fixup)
-		regs->pc = fixup->fixup;
+		regs->pc = (unsigned long)&fixup->fixup + fixup->fixup;
 
 	return fixup != NULL;
 }
diff --git a/scripts/sortextable.c b/scripts/sortextable.c
index c2423d913b46..af247c70fb66 100644
--- a/scripts/sortextable.c
+++ b/scripts/sortextable.c
@@ -282,12 +282,12 @@ do_file(char const *const fname)
 	case EM_386:
 	case EM_X86_64:
 	case EM_S390:
+	case EM_AARCH64:
 		custom_sort = sort_relative_table;
 		break;
 	case EM_ARCOMPACT:
 	case EM_ARCV2:
 	case EM_ARM:
-	case EM_AARCH64:
 	case EM_MICROBLAZE:
 	case EM_MIPS:
 	case EM_XTENSA:
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 10/21] arm64: switch to relative exception tables
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel

Instead of using absolute addresses for both the exception location
and the fixup, use offsets relative to the exception table entry values.
Not only does this cut the size of the exception table in half, it is
also a prerequisite for KASLR, since absolute exception table entries
are subject to dynamic relocation, which is incompatible with the sorting
of the exception table that occurs at build time.

This patch also introduces the _ASM_EXTABLE preprocessor macro (which
exists on x86 as well) and its _asm_extable assembly counterpart, as
shorthands to emit exception table entries.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/assembler.h      | 15 +++++++---
 arch/arm64/include/asm/futex.h          | 12 +++-----
 arch/arm64/include/asm/uaccess.h        | 30 +++++++++++---------
 arch/arm64/include/asm/word-at-a-time.h |  7 ++---
 arch/arm64/kernel/armv8_deprecated.c    |  7 ++---
 arch/arm64/mm/extable.c                 |  2 +-
 scripts/sortextable.c                   |  2 +-
 7 files changed, 38 insertions(+), 37 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index bb7b72734c24..d8bfcc1ce923 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -94,12 +94,19 @@
 	dmb	\opt
 	.endm
 
+/*
+ * Emit an entry into the exception table
+ */
+	.macro		_asm_extable, from, to
+	.pushsection	__ex_table, "a"
+	.align		3
+	.long		(\from - .), (\to - .)
+	.popsection
+	.endm
+
 #define USER(l, x...)				\
 9999:	x;					\
-	.section __ex_table,"a";		\
-	.align	3;				\
-	.quad	9999b,l;			\
-	.previous
+	_asm_extable	9999b, l
 
 /*
  * Register aliases.
diff --git a/arch/arm64/include/asm/futex.h b/arch/arm64/include/asm/futex.h
index 007a69fc4f40..1ab15a3b5a0e 100644
--- a/arch/arm64/include/asm/futex.h
+++ b/arch/arm64/include/asm/futex.h
@@ -42,10 +42,8 @@
 "4:	mov	%w0, %w5\n"						\
 "	b	3b\n"							\
 "	.popsection\n"							\
-"	.pushsection __ex_table,\"a\"\n"				\
-"	.align	3\n"							\
-"	.quad	1b, 4b, 2b, 4b\n"					\
-"	.popsection\n"							\
+	_ASM_EXTABLE(1b, 4b)						\
+	_ASM_EXTABLE(2b, 4b)						\
 	ALTERNATIVE("nop", SET_PSTATE_PAN(1), ARM64_HAS_PAN,		\
 		    CONFIG_ARM64_PAN)					\
 	: "=&r" (ret), "=&r" (oldval), "+Q" (*uaddr), "=&r" (tmp)	\
@@ -133,10 +131,8 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
 "4:	mov	%w0, %w6\n"
 "	b	3b\n"
 "	.popsection\n"
-"	.pushsection __ex_table,\"a\"\n"
-"	.align	3\n"
-"	.quad	1b, 4b, 2b, 4b\n"
-"	.popsection\n"
+	_ASM_EXTABLE(1b, 4b)
+	_ASM_EXTABLE(2b, 4b)
 	: "+r" (ret), "=&r" (val), "+Q" (*uaddr), "=&r" (tmp)
 	: "r" (oldval), "r" (newval), "Ir" (-EFAULT)
 	: "memory");
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index b2ede967fe7d..dc11577fab7e 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -36,11 +36,11 @@
 #define VERIFY_WRITE 1
 
 /*
- * The exception table consists of pairs of addresses: the first is the
- * address of an instruction that is allowed to fault, and the second is
- * the address at which the program should continue.  No registers are
- * modified, so it is entirely up to the continuation code to figure out
- * what to do.
+ * The exception table consists of pairs of relative offsets: the first
+ * is the relative offset to an instruction that is allowed to fault,
+ * and the second is the relative offset at which the program should
+ * continue. No registers are modified, so it is entirely up to the
+ * continuation code to figure out what to do.
  *
  * All the routines below use bits of fixup code that are out of line
  * with the main instruction path.  This means when everything is well,
@@ -50,9 +50,11 @@
 
 struct exception_table_entry
 {
-	unsigned long insn, fixup;
+	int insn, fixup;
 };
 
+#define ARCH_HAS_RELATIVE_EXTABLE
+
 extern int fixup_exception(struct pt_regs *regs);
 
 #define KERNEL_DS	(-1UL)
@@ -105,6 +107,12 @@ static inline void set_fs(mm_segment_t fs)
 #define access_ok(type, addr, size)	__range_ok(addr, size)
 #define user_addr_max			get_fs
 
+#define _ASM_EXTABLE(from, to)						\
+	"	.pushsection	__ex_table, \"a\"\n"			\
+	"	.align		3\n"					\
+	"	.long		(" #from " - .), (" #to " - .)\n"	\
+	"	.popsection\n"
+
 /*
  * The "__xxx" versions of the user access functions do not verify the address
  * space - it must have been done previously with a separate "access_ok()"
@@ -123,10 +131,7 @@ static inline void set_fs(mm_segment_t fs)
 	"	mov	%1, #0\n"					\
 	"	b	2b\n"						\
 	"	.previous\n"						\
-	"	.section __ex_table,\"a\"\n"				\
-	"	.align	3\n"						\
-	"	.quad	1b, 3b\n"					\
-	"	.previous"						\
+	_ASM_EXTABLE(1b, 3b)						\
 	: "+r" (err), "=&r" (x)						\
 	: "r" (addr), "i" (-EFAULT))
 
@@ -190,10 +195,7 @@ do {									\
 	"3:	mov	%w0, %3\n"					\
 	"	b	2b\n"						\
 	"	.previous\n"						\
-	"	.section __ex_table,\"a\"\n"				\
-	"	.align	3\n"						\
-	"	.quad	1b, 3b\n"					\
-	"	.previous"						\
+	_ASM_EXTABLE(1b, 3b)						\
 	: "+r" (err)							\
 	: "r" (x), "r" (addr), "i" (-EFAULT))
 
diff --git a/arch/arm64/include/asm/word-at-a-time.h b/arch/arm64/include/asm/word-at-a-time.h
index aab5bf09e9d9..2b79b8a89457 100644
--- a/arch/arm64/include/asm/word-at-a-time.h
+++ b/arch/arm64/include/asm/word-at-a-time.h
@@ -16,6 +16,8 @@
 #ifndef __ASM_WORD_AT_A_TIME_H
 #define __ASM_WORD_AT_A_TIME_H
 
+#include <asm/uaccess.h>
+
 #ifndef __AARCH64EB__
 
 #include <linux/kernel.h>
@@ -81,10 +83,7 @@ static inline unsigned long load_unaligned_zeropad(const void *addr)
 #endif
 	"	b	2b\n"
 	"	.popsection\n"
-	"	.pushsection __ex_table,\"a\"\n"
-	"	.align	3\n"
-	"	.quad	1b, 3b\n"
-	"	.popsection"
+	_ASM_EXTABLE(1b, 3b)
 	: "=&r" (ret), "=&r" (offset)
 	: "r" (addr), "Q" (*(unsigned long *)addr));
 
diff --git a/arch/arm64/kernel/armv8_deprecated.c b/arch/arm64/kernel/armv8_deprecated.c
index 3e01207917b1..c37202c0c838 100644
--- a/arch/arm64/kernel/armv8_deprecated.c
+++ b/arch/arm64/kernel/armv8_deprecated.c
@@ -297,11 +297,8 @@ static void __init register_insn_emulation_sysctl(struct ctl_table *table)
 	"4:	mov		%w0, %w5\n"			\
 	"	b		3b\n"				\
 	"	.popsection"					\
-	"	.pushsection	 __ex_table,\"a\"\n"		\
-	"	.align		3\n"				\
-	"	.quad		0b, 4b\n"			\
-	"	.quad		1b, 4b\n"			\
-	"	.popsection\n"					\
+	_ASM_EXTABLE(0b, 4b)					\
+	_ASM_EXTABLE(1b, 4b)					\
 	ALTERNATIVE("nop", SET_PSTATE_PAN(1), ARM64_HAS_PAN,	\
 		CONFIG_ARM64_PAN)				\
 	: "=&r" (res), "+r" (data), "=&r" (temp)		\
diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
index 79444279ba8c..81acd4706878 100644
--- a/arch/arm64/mm/extable.c
+++ b/arch/arm64/mm/extable.c
@@ -11,7 +11,7 @@ int fixup_exception(struct pt_regs *regs)
 
 	fixup = search_exception_tables(instruction_pointer(regs));
 	if (fixup)
-		regs->pc = fixup->fixup;
+		regs->pc = (unsigned long)&fixup->fixup + fixup->fixup;
 
 	return fixup != NULL;
 }
diff --git a/scripts/sortextable.c b/scripts/sortextable.c
index c2423d913b46..af247c70fb66 100644
--- a/scripts/sortextable.c
+++ b/scripts/sortextable.c
@@ -282,12 +282,12 @@ do_file(char const *const fname)
 	case EM_386:
 	case EM_X86_64:
 	case EM_S390:
+	case EM_AARCH64:
 		custom_sort = sort_relative_table;
 		break;
 	case EM_ARCOMPACT:
 	case EM_ARCV2:
 	case EM_ARM:
-	case EM_AARCH64:
 	case EM_MICROBLAZE:
 	case EM_MIPS:
 	case EM_XTENSA:
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [kernel-hardening] [PATCH v3 10/21] arm64: switch to relative exception tables
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

Instead of using absolute addresses for both the exception location
and the fixup, use offsets relative to the exception table entry values.
Not only does this cut the size of the exception table in half, it is
also a prerequisite for KASLR, since absolute exception table entries
are subject to dynamic relocation, which is incompatible with the sorting
of the exception table that occurs at build time.

This patch also introduces the _ASM_EXTABLE preprocessor macro (which
exists on x86 as well) and its _asm_extable assembly counterpart, as
shorthands to emit exception table entries.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/assembler.h      | 15 +++++++---
 arch/arm64/include/asm/futex.h          | 12 +++-----
 arch/arm64/include/asm/uaccess.h        | 30 +++++++++++---------
 arch/arm64/include/asm/word-at-a-time.h |  7 ++---
 arch/arm64/kernel/armv8_deprecated.c    |  7 ++---
 arch/arm64/mm/extable.c                 |  2 +-
 scripts/sortextable.c                   |  2 +-
 7 files changed, 38 insertions(+), 37 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index bb7b72734c24..d8bfcc1ce923 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -94,12 +94,19 @@
 	dmb	\opt
 	.endm
 
+/*
+ * Emit an entry into the exception table
+ */
+	.macro		_asm_extable, from, to
+	.pushsection	__ex_table, "a"
+	.align		3
+	.long		(\from - .), (\to - .)
+	.popsection
+	.endm
+
 #define USER(l, x...)				\
 9999:	x;					\
-	.section __ex_table,"a";		\
-	.align	3;				\
-	.quad	9999b,l;			\
-	.previous
+	_asm_extable	9999b, l
 
 /*
  * Register aliases.
diff --git a/arch/arm64/include/asm/futex.h b/arch/arm64/include/asm/futex.h
index 007a69fc4f40..1ab15a3b5a0e 100644
--- a/arch/arm64/include/asm/futex.h
+++ b/arch/arm64/include/asm/futex.h
@@ -42,10 +42,8 @@
 "4:	mov	%w0, %w5\n"						\
 "	b	3b\n"							\
 "	.popsection\n"							\
-"	.pushsection __ex_table,\"a\"\n"				\
-"	.align	3\n"							\
-"	.quad	1b, 4b, 2b, 4b\n"					\
-"	.popsection\n"							\
+	_ASM_EXTABLE(1b, 4b)						\
+	_ASM_EXTABLE(2b, 4b)						\
 	ALTERNATIVE("nop", SET_PSTATE_PAN(1), ARM64_HAS_PAN,		\
 		    CONFIG_ARM64_PAN)					\
 	: "=&r" (ret), "=&r" (oldval), "+Q" (*uaddr), "=&r" (tmp)	\
@@ -133,10 +131,8 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
 "4:	mov	%w0, %w6\n"
 "	b	3b\n"
 "	.popsection\n"
-"	.pushsection __ex_table,\"a\"\n"
-"	.align	3\n"
-"	.quad	1b, 4b, 2b, 4b\n"
-"	.popsection\n"
+	_ASM_EXTABLE(1b, 4b)
+	_ASM_EXTABLE(2b, 4b)
 	: "+r" (ret), "=&r" (val), "+Q" (*uaddr), "=&r" (tmp)
 	: "r" (oldval), "r" (newval), "Ir" (-EFAULT)
 	: "memory");
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index b2ede967fe7d..dc11577fab7e 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -36,11 +36,11 @@
 #define VERIFY_WRITE 1
 
 /*
- * The exception table consists of pairs of addresses: the first is the
- * address of an instruction that is allowed to fault, and the second is
- * the address at which the program should continue.  No registers are
- * modified, so it is entirely up to the continuation code to figure out
- * what to do.
+ * The exception table consists of pairs of relative offsets: the first
+ * is the relative offset to an instruction that is allowed to fault,
+ * and the second is the relative offset at which the program should
+ * continue. No registers are modified, so it is entirely up to the
+ * continuation code to figure out what to do.
  *
  * All the routines below use bits of fixup code that are out of line
  * with the main instruction path.  This means when everything is well,
@@ -50,9 +50,11 @@
 
 struct exception_table_entry
 {
-	unsigned long insn, fixup;
+	int insn, fixup;
 };
 
+#define ARCH_HAS_RELATIVE_EXTABLE
+
 extern int fixup_exception(struct pt_regs *regs);
 
 #define KERNEL_DS	(-1UL)
@@ -105,6 +107,12 @@ static inline void set_fs(mm_segment_t fs)
 #define access_ok(type, addr, size)	__range_ok(addr, size)
 #define user_addr_max			get_fs
 
+#define _ASM_EXTABLE(from, to)						\
+	"	.pushsection	__ex_table, \"a\"\n"			\
+	"	.align		3\n"					\
+	"	.long		(" #from " - .), (" #to " - .)\n"	\
+	"	.popsection\n"
+
 /*
  * The "__xxx" versions of the user access functions do not verify the address
  * space - it must have been done previously with a separate "access_ok()"
@@ -123,10 +131,7 @@ static inline void set_fs(mm_segment_t fs)
 	"	mov	%1, #0\n"					\
 	"	b	2b\n"						\
 	"	.previous\n"						\
-	"	.section __ex_table,\"a\"\n"				\
-	"	.align	3\n"						\
-	"	.quad	1b, 3b\n"					\
-	"	.previous"						\
+	_ASM_EXTABLE(1b, 3b)						\
 	: "+r" (err), "=&r" (x)						\
 	: "r" (addr), "i" (-EFAULT))
 
@@ -190,10 +195,7 @@ do {									\
 	"3:	mov	%w0, %3\n"					\
 	"	b	2b\n"						\
 	"	.previous\n"						\
-	"	.section __ex_table,\"a\"\n"				\
-	"	.align	3\n"						\
-	"	.quad	1b, 3b\n"					\
-	"	.previous"						\
+	_ASM_EXTABLE(1b, 3b)						\
 	: "+r" (err)							\
 	: "r" (x), "r" (addr), "i" (-EFAULT))
 
diff --git a/arch/arm64/include/asm/word-at-a-time.h b/arch/arm64/include/asm/word-at-a-time.h
index aab5bf09e9d9..2b79b8a89457 100644
--- a/arch/arm64/include/asm/word-at-a-time.h
+++ b/arch/arm64/include/asm/word-at-a-time.h
@@ -16,6 +16,8 @@
 #ifndef __ASM_WORD_AT_A_TIME_H
 #define __ASM_WORD_AT_A_TIME_H
 
+#include <asm/uaccess.h>
+
 #ifndef __AARCH64EB__
 
 #include <linux/kernel.h>
@@ -81,10 +83,7 @@ static inline unsigned long load_unaligned_zeropad(const void *addr)
 #endif
 	"	b	2b\n"
 	"	.popsection\n"
-	"	.pushsection __ex_table,\"a\"\n"
-	"	.align	3\n"
-	"	.quad	1b, 3b\n"
-	"	.popsection"
+	_ASM_EXTABLE(1b, 3b)
 	: "=&r" (ret), "=&r" (offset)
 	: "r" (addr), "Q" (*(unsigned long *)addr));
 
diff --git a/arch/arm64/kernel/armv8_deprecated.c b/arch/arm64/kernel/armv8_deprecated.c
index 3e01207917b1..c37202c0c838 100644
--- a/arch/arm64/kernel/armv8_deprecated.c
+++ b/arch/arm64/kernel/armv8_deprecated.c
@@ -297,11 +297,8 @@ static void __init register_insn_emulation_sysctl(struct ctl_table *table)
 	"4:	mov		%w0, %w5\n"			\
 	"	b		3b\n"				\
 	"	.popsection"					\
-	"	.pushsection	 __ex_table,\"a\"\n"		\
-	"	.align		3\n"				\
-	"	.quad		0b, 4b\n"			\
-	"	.quad		1b, 4b\n"			\
-	"	.popsection\n"					\
+	_ASM_EXTABLE(0b, 4b)					\
+	_ASM_EXTABLE(1b, 4b)					\
 	ALTERNATIVE("nop", SET_PSTATE_PAN(1), ARM64_HAS_PAN,	\
 		CONFIG_ARM64_PAN)				\
 	: "=&r" (res), "+r" (data), "=&r" (temp)		\
diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
index 79444279ba8c..81acd4706878 100644
--- a/arch/arm64/mm/extable.c
+++ b/arch/arm64/mm/extable.c
@@ -11,7 +11,7 @@ int fixup_exception(struct pt_regs *regs)
 
 	fixup = search_exception_tables(instruction_pointer(regs));
 	if (fixup)
-		regs->pc = fixup->fixup;
+		regs->pc = (unsigned long)&fixup->fixup + fixup->fixup;
 
 	return fixup != NULL;
 }
diff --git a/scripts/sortextable.c b/scripts/sortextable.c
index c2423d913b46..af247c70fb66 100644
--- a/scripts/sortextable.c
+++ b/scripts/sortextable.c
@@ -282,12 +282,12 @@ do_file(char const *const fname)
 	case EM_386:
 	case EM_X86_64:
 	case EM_S390:
+	case EM_AARCH64:
 		custom_sort = sort_relative_table;
 		break;
 	case EM_ARCOMPACT:
 	case EM_ARCV2:
 	case EM_ARM:
-	case EM_AARCH64:
 	case EM_MICROBLAZE:
 	case EM_MIPS:
 	case EM_XTENSA:
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 11/21] arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
  2016-01-11 13:18 ` Ard Biesheuvel
  (?)
@ 2016-01-11 13:19   ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

Unfortunately, the current way of using the linker to emit build time
constants into the Image header will no longer work once we switch to
the use of PIE executables. The reason is that such constants are emitted
into the binary using R_AARCH64_ABS64 relocations, which we will resolve
at runtime, not at build time, and the places targeted by those
relocations will contain zeroes before that.

So move back to assembly time constants or R_AARCH64_ABS32 relocations
(which, interestingly enough, do get resolved at build time)

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/assembler.h | 15 ++++++++
 arch/arm64/kernel/head.S           | 17 +++++++--
 arch/arm64/kernel/image.h          | 37 ++++++--------------
 3 files changed, 40 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index d8bfcc1ce923..e211af783a3d 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -222,4 +222,19 @@ lr	.req	x30		// link register
 	.size	__pi_##x, . - x;	\
 	ENDPROC(x)
 
+	.macro	le16, val
+	.byte	\val & 0xff
+	.byte	(\val >> 8) & 0xff
+	.endm
+
+	.macro	le32, val
+	le16	\val
+	le16	(\val >> 16)
+	.endm
+
+	.macro	le64, val
+	le32	\val
+	le32	(\val >> 32)
+	.endm
+
 #endif	/* __ASM_ASSEMBLER_H */
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 350515276541..211f75e673f4 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -51,6 +51,17 @@
 #define KERNEL_START	_text
 #define KERNEL_END	_end
 
+#ifdef CONFIG_CPU_BIG_ENDIAN
+#define __HEAD_FLAG_BE	1
+#else
+#define __HEAD_FLAG_BE	0
+#endif
+
+#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
+
+#define __HEAD_FLAGS	((__HEAD_FLAG_BE << 0) |	\
+			 (__HEAD_FLAG_PAGE_SIZE << 1))
+
 /*
  * Kernel startup entry point.
  * ---------------------------
@@ -83,9 +94,9 @@ efi_head:
 	b	stext				// branch to kernel start, magic
 	.long	0				// reserved
 #endif
-	.quad	_kernel_offset_le		// Image load offset from start of RAM, little-endian
-	.quad	_kernel_size_le			// Effective size of kernel image, little-endian
-	.quad	_kernel_flags_le		// Informative flags, little-endian
+	le64	TEXT_OFFSET			// Image load offset from start of RAM, little-endian
+	.long	_kernel_size_le, 0		// Effective size of kernel image, little-endian
+	le64	__HEAD_FLAGS			// Informative flags, little-endian
 	.quad	0				// reserved
 	.quad	0				// reserved
 	.quad	0				// reserved
diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
index bc2abb8b1599..bb6b0e69d0a4 100644
--- a/arch/arm64/kernel/image.h
+++ b/arch/arm64/kernel/image.h
@@ -26,41 +26,26 @@
  * There aren't any ELF relocations we can use to endian-swap values known only
  * at link time (e.g. the subtraction of two symbol addresses), so we must get
  * the linker to endian-swap certain values before emitting them.
+ * Note that this will not work for 64-bit values: these are resolved using
+ * R_AARCH64_ABS64 relocations, which are fixed up at runtime rather than at
+ * build time when building the PIE executable (for KASLR).
  */
 #ifdef CONFIG_CPU_BIG_ENDIAN
-#define DATA_LE64(data)					\
-	((((data) & 0x00000000000000ff) << 56) |	\
-	 (((data) & 0x000000000000ff00) << 40) |	\
-	 (((data) & 0x0000000000ff0000) << 24) |	\
-	 (((data) & 0x00000000ff000000) << 8)  |	\
-	 (((data) & 0x000000ff00000000) >> 8)  |	\
-	 (((data) & 0x0000ff0000000000) >> 24) |	\
-	 (((data) & 0x00ff000000000000) >> 40) |	\
-	 (((data) & 0xff00000000000000) >> 56))
+#define DATA_LE32(data)				\
+	((((data) & 0x000000ff) << 24) |	\
+	 (((data) & 0x0000ff00) << 8)  |	\
+	 (((data) & 0x00ff0000) >> 8)  |	\
+	 (((data) & 0xff000000) >> 24))
 #else
-#define DATA_LE64(data) ((data) & 0xffffffffffffffff)
+#define DATA_LE32(data) ((data) & 0xffffffff)
 #endif
 
-#ifdef CONFIG_CPU_BIG_ENDIAN
-#define __HEAD_FLAG_BE	1
-#else
-#define __HEAD_FLAG_BE	0
-#endif
-
-#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
-
-#define __HEAD_FLAGS	((__HEAD_FLAG_BE << 0) |	\
-			 (__HEAD_FLAG_PAGE_SIZE << 1))
-
 /*
  * These will output as part of the Image header, which should be little-endian
- * regardless of the endianness of the kernel. While constant values could be
- * endian swapped in head.S, all are done here for consistency.
+ * regardless of the endianness of the kernel.
  */
 #define HEAD_SYMBOLS						\
-	_kernel_size_le		= DATA_LE64(_end - _text);	\
-	_kernel_offset_le	= DATA_LE64(TEXT_OFFSET);	\
-	_kernel_flags_le	= DATA_LE64(__HEAD_FLAGS);
+	_kernel_size_le		= DATA_LE32(_end - _text);
 
 #ifdef CONFIG_EFI
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 11/21] arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel

Unfortunately, the current way of using the linker to emit build time
constants into the Image header will no longer work once we switch to
the use of PIE executables. The reason is that such constants are emitted
into the binary using R_AARCH64_ABS64 relocations, which we will resolve
at runtime, not at build time, and the places targeted by those
relocations will contain zeroes before that.

So move back to assembly time constants or R_AARCH64_ABS32 relocations
(which, interestingly enough, do get resolved at build time)

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/assembler.h | 15 ++++++++
 arch/arm64/kernel/head.S           | 17 +++++++--
 arch/arm64/kernel/image.h          | 37 ++++++--------------
 3 files changed, 40 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index d8bfcc1ce923..e211af783a3d 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -222,4 +222,19 @@ lr	.req	x30		// link register
 	.size	__pi_##x, . - x;	\
 	ENDPROC(x)
 
+	.macro	le16, val
+	.byte	\val & 0xff
+	.byte	(\val >> 8) & 0xff
+	.endm
+
+	.macro	le32, val
+	le16	\val
+	le16	(\val >> 16)
+	.endm
+
+	.macro	le64, val
+	le32	\val
+	le32	(\val >> 32)
+	.endm
+
 #endif	/* __ASM_ASSEMBLER_H */
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 350515276541..211f75e673f4 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -51,6 +51,17 @@
 #define KERNEL_START	_text
 #define KERNEL_END	_end
 
+#ifdef CONFIG_CPU_BIG_ENDIAN
+#define __HEAD_FLAG_BE	1
+#else
+#define __HEAD_FLAG_BE	0
+#endif
+
+#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
+
+#define __HEAD_FLAGS	((__HEAD_FLAG_BE << 0) |	\
+			 (__HEAD_FLAG_PAGE_SIZE << 1))
+
 /*
  * Kernel startup entry point.
  * ---------------------------
@@ -83,9 +94,9 @@ efi_head:
 	b	stext				// branch to kernel start, magic
 	.long	0				// reserved
 #endif
-	.quad	_kernel_offset_le		// Image load offset from start of RAM, little-endian
-	.quad	_kernel_size_le			// Effective size of kernel image, little-endian
-	.quad	_kernel_flags_le		// Informative flags, little-endian
+	le64	TEXT_OFFSET			// Image load offset from start of RAM, little-endian
+	.long	_kernel_size_le, 0		// Effective size of kernel image, little-endian
+	le64	__HEAD_FLAGS			// Informative flags, little-endian
 	.quad	0				// reserved
 	.quad	0				// reserved
 	.quad	0				// reserved
diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
index bc2abb8b1599..bb6b0e69d0a4 100644
--- a/arch/arm64/kernel/image.h
+++ b/arch/arm64/kernel/image.h
@@ -26,41 +26,26 @@
  * There aren't any ELF relocations we can use to endian-swap values known only
  * at link time (e.g. the subtraction of two symbol addresses), so we must get
  * the linker to endian-swap certain values before emitting them.
+ * Note that this will not work for 64-bit values: these are resolved using
+ * R_AARCH64_ABS64 relocations, which are fixed up@runtime rather than at
+ * build time when building the PIE executable (for KASLR).
  */
 #ifdef CONFIG_CPU_BIG_ENDIAN
-#define DATA_LE64(data)					\
-	((((data) & 0x00000000000000ff) << 56) |	\
-	 (((data) & 0x000000000000ff00) << 40) |	\
-	 (((data) & 0x0000000000ff0000) << 24) |	\
-	 (((data) & 0x00000000ff000000) << 8)  |	\
-	 (((data) & 0x000000ff00000000) >> 8)  |	\
-	 (((data) & 0x0000ff0000000000) >> 24) |	\
-	 (((data) & 0x00ff000000000000) >> 40) |	\
-	 (((data) & 0xff00000000000000) >> 56))
+#define DATA_LE32(data)				\
+	((((data) & 0x000000ff) << 24) |	\
+	 (((data) & 0x0000ff00) << 8)  |	\
+	 (((data) & 0x00ff0000) >> 8)  |	\
+	 (((data) & 0xff000000) >> 24))
 #else
-#define DATA_LE64(data) ((data) & 0xffffffffffffffff)
+#define DATA_LE32(data) ((data) & 0xffffffff)
 #endif
 
-#ifdef CONFIG_CPU_BIG_ENDIAN
-#define __HEAD_FLAG_BE	1
-#else
-#define __HEAD_FLAG_BE	0
-#endif
-
-#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
-
-#define __HEAD_FLAGS	((__HEAD_FLAG_BE << 0) |	\
-			 (__HEAD_FLAG_PAGE_SIZE << 1))
-
 /*
  * These will output as part of the Image header, which should be little-endian
- * regardless of the endianness of the kernel. While constant values could be
- * endian swapped in head.S, all are done here for consistency.
+ * regardless of the endianness of the kernel.
  */
 #define HEAD_SYMBOLS						\
-	_kernel_size_le		= DATA_LE64(_end - _text);	\
-	_kernel_offset_le	= DATA_LE64(TEXT_OFFSET);	\
-	_kernel_flags_le	= DATA_LE64(__HEAD_FLAGS);
+	_kernel_size_le		= DATA_LE32(_end - _text);
 
 #ifdef CONFIG_EFI
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [kernel-hardening] [PATCH v3 11/21] arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

Unfortunately, the current way of using the linker to emit build time
constants into the Image header will no longer work once we switch to
the use of PIE executables. The reason is that such constants are emitted
into the binary using R_AARCH64_ABS64 relocations, which we will resolve
at runtime, not at build time, and the places targeted by those
relocations will contain zeroes before that.

So move back to assembly time constants or R_AARCH64_ABS32 relocations
(which, interestingly enough, do get resolved at build time)

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/assembler.h | 15 ++++++++
 arch/arm64/kernel/head.S           | 17 +++++++--
 arch/arm64/kernel/image.h          | 37 ++++++--------------
 3 files changed, 40 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index d8bfcc1ce923..e211af783a3d 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -222,4 +222,19 @@ lr	.req	x30		// link register
 	.size	__pi_##x, . - x;	\
 	ENDPROC(x)
 
+	.macro	le16, val
+	.byte	\val & 0xff
+	.byte	(\val >> 8) & 0xff
+	.endm
+
+	.macro	le32, val
+	le16	\val
+	le16	(\val >> 16)
+	.endm
+
+	.macro	le64, val
+	le32	\val
+	le32	(\val >> 32)
+	.endm
+
 #endif	/* __ASM_ASSEMBLER_H */
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 350515276541..211f75e673f4 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -51,6 +51,17 @@
 #define KERNEL_START	_text
 #define KERNEL_END	_end
 
+#ifdef CONFIG_CPU_BIG_ENDIAN
+#define __HEAD_FLAG_BE	1
+#else
+#define __HEAD_FLAG_BE	0
+#endif
+
+#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
+
+#define __HEAD_FLAGS	((__HEAD_FLAG_BE << 0) |	\
+			 (__HEAD_FLAG_PAGE_SIZE << 1))
+
 /*
  * Kernel startup entry point.
  * ---------------------------
@@ -83,9 +94,9 @@ efi_head:
 	b	stext				// branch to kernel start, magic
 	.long	0				// reserved
 #endif
-	.quad	_kernel_offset_le		// Image load offset from start of RAM, little-endian
-	.quad	_kernel_size_le			// Effective size of kernel image, little-endian
-	.quad	_kernel_flags_le		// Informative flags, little-endian
+	le64	TEXT_OFFSET			// Image load offset from start of RAM, little-endian
+	.long	_kernel_size_le, 0		// Effective size of kernel image, little-endian
+	le64	__HEAD_FLAGS			// Informative flags, little-endian
 	.quad	0				// reserved
 	.quad	0				// reserved
 	.quad	0				// reserved
diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
index bc2abb8b1599..bb6b0e69d0a4 100644
--- a/arch/arm64/kernel/image.h
+++ b/arch/arm64/kernel/image.h
@@ -26,41 +26,26 @@
  * There aren't any ELF relocations we can use to endian-swap values known only
  * at link time (e.g. the subtraction of two symbol addresses), so we must get
  * the linker to endian-swap certain values before emitting them.
+ * Note that this will not work for 64-bit values: these are resolved using
+ * R_AARCH64_ABS64 relocations, which are fixed up at runtime rather than at
+ * build time when building the PIE executable (for KASLR).
  */
 #ifdef CONFIG_CPU_BIG_ENDIAN
-#define DATA_LE64(data)					\
-	((((data) & 0x00000000000000ff) << 56) |	\
-	 (((data) & 0x000000000000ff00) << 40) |	\
-	 (((data) & 0x0000000000ff0000) << 24) |	\
-	 (((data) & 0x00000000ff000000) << 8)  |	\
-	 (((data) & 0x000000ff00000000) >> 8)  |	\
-	 (((data) & 0x0000ff0000000000) >> 24) |	\
-	 (((data) & 0x00ff000000000000) >> 40) |	\
-	 (((data) & 0xff00000000000000) >> 56))
+#define DATA_LE32(data)				\
+	((((data) & 0x000000ff) << 24) |	\
+	 (((data) & 0x0000ff00) << 8)  |	\
+	 (((data) & 0x00ff0000) >> 8)  |	\
+	 (((data) & 0xff000000) >> 24))
 #else
-#define DATA_LE64(data) ((data) & 0xffffffffffffffff)
+#define DATA_LE32(data) ((data) & 0xffffffff)
 #endif
 
-#ifdef CONFIG_CPU_BIG_ENDIAN
-#define __HEAD_FLAG_BE	1
-#else
-#define __HEAD_FLAG_BE	0
-#endif
-
-#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
-
-#define __HEAD_FLAGS	((__HEAD_FLAG_BE << 0) |	\
-			 (__HEAD_FLAG_PAGE_SIZE << 1))
-
 /*
  * These will output as part of the Image header, which should be little-endian
- * regardless of the endianness of the kernel. While constant values could be
- * endian swapped in head.S, all are done here for consistency.
+ * regardless of the endianness of the kernel.
  */
 #define HEAD_SYMBOLS						\
-	_kernel_size_le		= DATA_LE64(_end - _text);	\
-	_kernel_offset_le	= DATA_LE64(TEXT_OFFSET);	\
-	_kernel_flags_le	= DATA_LE64(__HEAD_FLAGS);
+	_kernel_size_le		= DATA_LE32(_end - _text);
 
 #ifdef CONFIG_EFI
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 12/21] arm64: avoid dynamic relocations in early boot code
  2016-01-11 13:18 ` Ard Biesheuvel
  (?)
@ 2016-01-11 13:19   ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

Before implementing KASLR for arm64 by building a self-relocating PIE
executable, we have to ensure that values we use before the relocation
routine is executed are not subject to dynamic relocation themselves.
This applies not only to virtual addresses, but also to values that are
supplied by the linker at build time and relocated using R_AARCH64_ABS64
relocations.

So instead, use assemble time constants, or force the use of static
relocations by folding the constants into the instructions.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/kernel/efi-entry.S |  2 +-
 arch/arm64/kernel/head.S      | 39 +++++++++++++-------
 2 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kernel/efi-entry.S b/arch/arm64/kernel/efi-entry.S
index a773db92908b..f82036e02485 100644
--- a/arch/arm64/kernel/efi-entry.S
+++ b/arch/arm64/kernel/efi-entry.S
@@ -61,7 +61,7 @@ ENTRY(entry)
 	 */
 	mov	x20, x0		// DTB address
 	ldr	x0, [sp, #16]	// relocated _text address
-	ldr	x21, =stext_offset
+	movz	x21, #:abs_g0:stext_offset
 	add	x21, x0, x21
 
 	/*
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 211f75e673f4..5dc8079cef77 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -78,12 +78,11 @@
  * in the entry routines.
  */
 	__HEAD
-
+_head:
 	/*
 	 * DO NOT MODIFY. Image header expected by Linux boot-loaders.
 	 */
 #ifdef CONFIG_EFI
-efi_head:
 	/*
 	 * This add instruction has no meaningful effect except that
 	 * its opcode forms the magic "MZ" signature required by UEFI.
@@ -105,14 +104,14 @@ efi_head:
 	.byte	0x4d
 	.byte	0x64
 #ifdef CONFIG_EFI
-	.long	pe_header - efi_head		// Offset to the PE header.
+	.long	pe_header - _head		// Offset to the PE header.
 #else
 	.word	0				// reserved
 #endif
 
 #ifdef CONFIG_EFI
 	.globl	__efistub_stext_offset
-	.set	__efistub_stext_offset, stext - efi_head
+	.set	__efistub_stext_offset, stext - _head
 	.align 3
 pe_header:
 	.ascii	"PE"
@@ -135,7 +134,7 @@ optional_header:
 	.long	_end - stext			// SizeOfCode
 	.long	0				// SizeOfInitializedData
 	.long	0				// SizeOfUninitializedData
-	.long	__efistub_entry - efi_head	// AddressOfEntryPoint
+	.long	__efistub_entry - _head		// AddressOfEntryPoint
 	.long	__efistub_stext_offset		// BaseOfCode
 
 extra_header_fields:
@@ -150,7 +149,7 @@ extra_header_fields:
 	.short	0				// MinorSubsystemVersion
 	.long	0				// Win32VersionValue
 
-	.long	_end - efi_head			// SizeOfImage
+	.long	_end - _head			// SizeOfImage
 
 	// Everything before the kernel image is considered part of the header
 	.long	__efistub_stext_offset		// SizeOfHeaders
@@ -230,11 +229,13 @@ ENTRY(stext)
 	 * On return, the CPU will be ready for the MMU to be turned on and
 	 * the TCR will have been set.
 	 */
-	ldr	x27, =__mmap_switched		// address to jump to after
+	ldr	x27, 0f				// address to jump to after
 						// MMU has been enabled
 	adr_l	lr, __enable_mmu		// return (PIC) address
 	b	__cpu_setup			// initialise processor
 ENDPROC(stext)
+	.align	3
+0:	.quad	__mmap_switched - (_head - TEXT_OFFSET) + KIMAGE_VADDR
 
 /*
  * Preserve the arguments passed by the bootloader in x0 .. x3
@@ -402,7 +403,8 @@ __create_page_tables:
 	mov	x0, x26				// swapper_pg_dir
 	ldr	x5, =KIMAGE_VADDR
 	create_pgd_entry x0, x5, x3, x6
-	ldr	x6, =KERNEL_END			// __va(KERNEL_END)
+	ldr	w6, kernel_img_size
+	add	x6, x6, x5
 	mov	x3, x24				// phys offset
 	create_block_map x0, x7, x3, x5, x6
 
@@ -419,6 +421,9 @@ __create_page_tables:
 	mov	lr, x27
 	ret
 ENDPROC(__create_page_tables)
+
+kernel_img_size:
+	.long	_end - (_head - TEXT_OFFSET)
 	.ltorg
 
 /*
@@ -426,6 +431,10 @@ ENDPROC(__create_page_tables)
  */
 	.set	initial_sp, init_thread_union + THREAD_START_SP
 __mmap_switched:
+	adr_l	x8, vectors			// load VBAR_EL1 with virtual
+	msr	vbar_el1, x8			// vector table address
+	isb
+
 	// Clear BSS
 	adr_l	x0, __bss_start
 	mov	x1, xzr
@@ -612,13 +621,19 @@ ENTRY(secondary_startup)
 	adrp	x26, swapper_pg_dir
 	bl	__cpu_setup			// initialise processor
 
-	ldr	x21, =secondary_data
-	ldr	x27, =__secondary_switched	// address to jump to after enabling the MMU
+	ldr	x8, =KIMAGE_VADDR
+	ldr	w9, 0f
+	sub	x27, x8, w9, sxtw		// address to jump to after enabling the MMU
 	b	__enable_mmu
 ENDPROC(secondary_startup)
+0:	.long	(_text - TEXT_OFFSET) - __secondary_switched
 
 ENTRY(__secondary_switched)
-	ldr	x0, [x21]			// get secondary_data.stack
+	adr_l	x5, vectors
+	msr	vbar_el1, x5
+	isb
+
+	ldr_l	x0, secondary_data		// get secondary_data.stack
 	mov	sp, x0
 	and	x0, x0, #~(THREAD_SIZE - 1)
 	msr	sp_el0, x0			// save thread_info
@@ -643,8 +658,6 @@ __enable_mmu:
 	ubfx	x2, x1, #ID_AA64MMFR0_TGRAN_SHIFT, 4
 	cmp	x2, #ID_AA64MMFR0_TGRAN_SUPPORTED
 	b.ne	__no_granule_support
-	ldr	x5, =vectors
-	msr	vbar_el1, x5
 	msr	ttbr0_el1, x25			// load TTBR0
 	msr	ttbr1_el1, x26			// load TTBR1
 	isb
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 12/21] arm64: avoid dynamic relocations in early boot code
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel

Before implementing KASLR for arm64 by building a self-relocating PIE
executable, we have to ensure that values we use before the relocation
routine is executed are not subject to dynamic relocation themselves.
This applies not only to virtual addresses, but also to values that are
supplied by the linker at build time and relocated using R_AARCH64_ABS64
relocations.

So instead, use assemble time constants, or force the use of static
relocations by folding the constants into the instructions.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/kernel/efi-entry.S |  2 +-
 arch/arm64/kernel/head.S      | 39 +++++++++++++-------
 2 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kernel/efi-entry.S b/arch/arm64/kernel/efi-entry.S
index a773db92908b..f82036e02485 100644
--- a/arch/arm64/kernel/efi-entry.S
+++ b/arch/arm64/kernel/efi-entry.S
@@ -61,7 +61,7 @@ ENTRY(entry)
 	 */
 	mov	x20, x0		// DTB address
 	ldr	x0, [sp, #16]	// relocated _text address
-	ldr	x21, =stext_offset
+	movz	x21, #:abs_g0:stext_offset
 	add	x21, x0, x21
 
 	/*
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 211f75e673f4..5dc8079cef77 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -78,12 +78,11 @@
  * in the entry routines.
  */
 	__HEAD
-
+_head:
 	/*
 	 * DO NOT MODIFY. Image header expected by Linux boot-loaders.
 	 */
 #ifdef CONFIG_EFI
-efi_head:
 	/*
 	 * This add instruction has no meaningful effect except that
 	 * its opcode forms the magic "MZ" signature required by UEFI.
@@ -105,14 +104,14 @@ efi_head:
 	.byte	0x4d
 	.byte	0x64
 #ifdef CONFIG_EFI
-	.long	pe_header - efi_head		// Offset to the PE header.
+	.long	pe_header - _head		// Offset to the PE header.
 #else
 	.word	0				// reserved
 #endif
 
 #ifdef CONFIG_EFI
 	.globl	__efistub_stext_offset
-	.set	__efistub_stext_offset, stext - efi_head
+	.set	__efistub_stext_offset, stext - _head
 	.align 3
 pe_header:
 	.ascii	"PE"
@@ -135,7 +134,7 @@ optional_header:
 	.long	_end - stext			// SizeOfCode
 	.long	0				// SizeOfInitializedData
 	.long	0				// SizeOfUninitializedData
-	.long	__efistub_entry - efi_head	// AddressOfEntryPoint
+	.long	__efistub_entry - _head		// AddressOfEntryPoint
 	.long	__efistub_stext_offset		// BaseOfCode
 
 extra_header_fields:
@@ -150,7 +149,7 @@ extra_header_fields:
 	.short	0				// MinorSubsystemVersion
 	.long	0				// Win32VersionValue
 
-	.long	_end - efi_head			// SizeOfImage
+	.long	_end - _head			// SizeOfImage
 
 	// Everything before the kernel image is considered part of the header
 	.long	__efistub_stext_offset		// SizeOfHeaders
@@ -230,11 +229,13 @@ ENTRY(stext)
 	 * On return, the CPU will be ready for the MMU to be turned on and
 	 * the TCR will have been set.
 	 */
-	ldr	x27, =__mmap_switched		// address to jump to after
+	ldr	x27, 0f				// address to jump to after
 						// MMU has been enabled
 	adr_l	lr, __enable_mmu		// return (PIC) address
 	b	__cpu_setup			// initialise processor
 ENDPROC(stext)
+	.align	3
+0:	.quad	__mmap_switched - (_head - TEXT_OFFSET) + KIMAGE_VADDR
 
 /*
  * Preserve the arguments passed by the bootloader in x0 .. x3
@@ -402,7 +403,8 @@ __create_page_tables:
 	mov	x0, x26				// swapper_pg_dir
 	ldr	x5, =KIMAGE_VADDR
 	create_pgd_entry x0, x5, x3, x6
-	ldr	x6, =KERNEL_END			// __va(KERNEL_END)
+	ldr	w6, kernel_img_size
+	add	x6, x6, x5
 	mov	x3, x24				// phys offset
 	create_block_map x0, x7, x3, x5, x6
 
@@ -419,6 +421,9 @@ __create_page_tables:
 	mov	lr, x27
 	ret
 ENDPROC(__create_page_tables)
+
+kernel_img_size:
+	.long	_end - (_head - TEXT_OFFSET)
 	.ltorg
 
 /*
@@ -426,6 +431,10 @@ ENDPROC(__create_page_tables)
  */
 	.set	initial_sp, init_thread_union + THREAD_START_SP
 __mmap_switched:
+	adr_l	x8, vectors			// load VBAR_EL1 with virtual
+	msr	vbar_el1, x8			// vector table address
+	isb
+
 	// Clear BSS
 	adr_l	x0, __bss_start
 	mov	x1, xzr
@@ -612,13 +621,19 @@ ENTRY(secondary_startup)
 	adrp	x26, swapper_pg_dir
 	bl	__cpu_setup			// initialise processor
 
-	ldr	x21, =secondary_data
-	ldr	x27, =__secondary_switched	// address to jump to after enabling the MMU
+	ldr	x8, =KIMAGE_VADDR
+	ldr	w9, 0f
+	sub	x27, x8, w9, sxtw		// address to jump to after enabling the MMU
 	b	__enable_mmu
 ENDPROC(secondary_startup)
+0:	.long	(_text - TEXT_OFFSET) - __secondary_switched
 
 ENTRY(__secondary_switched)
-	ldr	x0, [x21]			// get secondary_data.stack
+	adr_l	x5, vectors
+	msr	vbar_el1, x5
+	isb
+
+	ldr_l	x0, secondary_data		// get secondary_data.stack
 	mov	sp, x0
 	and	x0, x0, #~(THREAD_SIZE - 1)
 	msr	sp_el0, x0			// save thread_info
@@ -643,8 +658,6 @@ __enable_mmu:
 	ubfx	x2, x1, #ID_AA64MMFR0_TGRAN_SHIFT, 4
 	cmp	x2, #ID_AA64MMFR0_TGRAN_SUPPORTED
 	b.ne	__no_granule_support
-	ldr	x5, =vectors
-	msr	vbar_el1, x5
 	msr	ttbr0_el1, x25			// load TTBR0
 	msr	ttbr1_el1, x26			// load TTBR1
 	isb
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [kernel-hardening] [PATCH v3 12/21] arm64: avoid dynamic relocations in early boot code
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

Before implementing KASLR for arm64 by building a self-relocating PIE
executable, we have to ensure that values we use before the relocation
routine is executed are not subject to dynamic relocation themselves.
This applies not only to virtual addresses, but also to values that are
supplied by the linker at build time and relocated using R_AARCH64_ABS64
relocations.

So instead, use assemble time constants, or force the use of static
relocations by folding the constants into the instructions.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/kernel/efi-entry.S |  2 +-
 arch/arm64/kernel/head.S      | 39 +++++++++++++-------
 2 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kernel/efi-entry.S b/arch/arm64/kernel/efi-entry.S
index a773db92908b..f82036e02485 100644
--- a/arch/arm64/kernel/efi-entry.S
+++ b/arch/arm64/kernel/efi-entry.S
@@ -61,7 +61,7 @@ ENTRY(entry)
 	 */
 	mov	x20, x0		// DTB address
 	ldr	x0, [sp, #16]	// relocated _text address
-	ldr	x21, =stext_offset
+	movz	x21, #:abs_g0:stext_offset
 	add	x21, x0, x21
 
 	/*
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 211f75e673f4..5dc8079cef77 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -78,12 +78,11 @@
  * in the entry routines.
  */
 	__HEAD
-
+_head:
 	/*
 	 * DO NOT MODIFY. Image header expected by Linux boot-loaders.
 	 */
 #ifdef CONFIG_EFI
-efi_head:
 	/*
 	 * This add instruction has no meaningful effect except that
 	 * its opcode forms the magic "MZ" signature required by UEFI.
@@ -105,14 +104,14 @@ efi_head:
 	.byte	0x4d
 	.byte	0x64
 #ifdef CONFIG_EFI
-	.long	pe_header - efi_head		// Offset to the PE header.
+	.long	pe_header - _head		// Offset to the PE header.
 #else
 	.word	0				// reserved
 #endif
 
 #ifdef CONFIG_EFI
 	.globl	__efistub_stext_offset
-	.set	__efistub_stext_offset, stext - efi_head
+	.set	__efistub_stext_offset, stext - _head
 	.align 3
 pe_header:
 	.ascii	"PE"
@@ -135,7 +134,7 @@ optional_header:
 	.long	_end - stext			// SizeOfCode
 	.long	0				// SizeOfInitializedData
 	.long	0				// SizeOfUninitializedData
-	.long	__efistub_entry - efi_head	// AddressOfEntryPoint
+	.long	__efistub_entry - _head		// AddressOfEntryPoint
 	.long	__efistub_stext_offset		// BaseOfCode
 
 extra_header_fields:
@@ -150,7 +149,7 @@ extra_header_fields:
 	.short	0				// MinorSubsystemVersion
 	.long	0				// Win32VersionValue
 
-	.long	_end - efi_head			// SizeOfImage
+	.long	_end - _head			// SizeOfImage
 
 	// Everything before the kernel image is considered part of the header
 	.long	__efistub_stext_offset		// SizeOfHeaders
@@ -230,11 +229,13 @@ ENTRY(stext)
 	 * On return, the CPU will be ready for the MMU to be turned on and
 	 * the TCR will have been set.
 	 */
-	ldr	x27, =__mmap_switched		// address to jump to after
+	ldr	x27, 0f				// address to jump to after
 						// MMU has been enabled
 	adr_l	lr, __enable_mmu		// return (PIC) address
 	b	__cpu_setup			// initialise processor
 ENDPROC(stext)
+	.align	3
+0:	.quad	__mmap_switched - (_head - TEXT_OFFSET) + KIMAGE_VADDR
 
 /*
  * Preserve the arguments passed by the bootloader in x0 .. x3
@@ -402,7 +403,8 @@ __create_page_tables:
 	mov	x0, x26				// swapper_pg_dir
 	ldr	x5, =KIMAGE_VADDR
 	create_pgd_entry x0, x5, x3, x6
-	ldr	x6, =KERNEL_END			// __va(KERNEL_END)
+	ldr	w6, kernel_img_size
+	add	x6, x6, x5
 	mov	x3, x24				// phys offset
 	create_block_map x0, x7, x3, x5, x6
 
@@ -419,6 +421,9 @@ __create_page_tables:
 	mov	lr, x27
 	ret
 ENDPROC(__create_page_tables)
+
+kernel_img_size:
+	.long	_end - (_head - TEXT_OFFSET)
 	.ltorg
 
 /*
@@ -426,6 +431,10 @@ ENDPROC(__create_page_tables)
  */
 	.set	initial_sp, init_thread_union + THREAD_START_SP
 __mmap_switched:
+	adr_l	x8, vectors			// load VBAR_EL1 with virtual
+	msr	vbar_el1, x8			// vector table address
+	isb
+
 	// Clear BSS
 	adr_l	x0, __bss_start
 	mov	x1, xzr
@@ -612,13 +621,19 @@ ENTRY(secondary_startup)
 	adrp	x26, swapper_pg_dir
 	bl	__cpu_setup			// initialise processor
 
-	ldr	x21, =secondary_data
-	ldr	x27, =__secondary_switched	// address to jump to after enabling the MMU
+	ldr	x8, =KIMAGE_VADDR
+	ldr	w9, 0f
+	sub	x27, x8, w9, sxtw		// address to jump to after enabling the MMU
 	b	__enable_mmu
 ENDPROC(secondary_startup)
+0:	.long	(_text - TEXT_OFFSET) - __secondary_switched
 
 ENTRY(__secondary_switched)
-	ldr	x0, [x21]			// get secondary_data.stack
+	adr_l	x5, vectors
+	msr	vbar_el1, x5
+	isb
+
+	ldr_l	x0, secondary_data		// get secondary_data.stack
 	mov	sp, x0
 	and	x0, x0, #~(THREAD_SIZE - 1)
 	msr	sp_el0, x0			// save thread_info
@@ -643,8 +658,6 @@ __enable_mmu:
 	ubfx	x2, x1, #ID_AA64MMFR0_TGRAN_SHIFT, 4
 	cmp	x2, #ID_AA64MMFR0_TGRAN_SUPPORTED
 	b.ne	__no_granule_support
-	ldr	x5, =vectors
-	msr	vbar_el1, x5
 	msr	ttbr0_el1, x25			// load TTBR0
 	msr	ttbr1_el1, x26			// load TTBR1
 	isb
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 13/21] arm64: allow kernel Image to be loaded anywhere in physical memory
  2016-01-11 13:18 ` Ard Biesheuvel
  (?)
@ 2016-01-11 13:19   ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

This relaxes the kernel Image placement requirements, so that it
may be placed at any 2 MB aligned offset in physical memory.

This is accomplished by ignoring PHYS_OFFSET when installing
memblocks, and accounting for the apparent virtual offset of
the kernel Image. As a result, virtual address references
below PAGE_OFFSET are correctly mapped onto physical references
into the kernel Image regardless of where it sits in memory.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 Documentation/arm64/booting.txt         | 20 ++++--
 arch/arm64/include/asm/boot.h           |  6 ++
 arch/arm64/include/asm/kernel-pgtable.h | 11 +++
 arch/arm64/include/asm/kvm_mmu.h        |  2 +-
 arch/arm64/include/asm/memory.h         | 15 +++--
 arch/arm64/kernel/head.S                | 19 ++++--
 arch/arm64/mm/init.c                    | 71 +++++++++++++++++++-
 arch/arm64/mm/mmu.c                     |  3 +
 8 files changed, 125 insertions(+), 22 deletions(-)

diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt
index 701d39d3171a..67484067ce4f 100644
--- a/Documentation/arm64/booting.txt
+++ b/Documentation/arm64/booting.txt
@@ -109,7 +109,13 @@ Header notes:
 			1 - 4K
 			2 - 16K
 			3 - 64K
-  Bits 3-63:	Reserved.
+  Bit 3:	Kernel physical placement
+			0 - 2MB aligned base should be as close as possible
+			    to the base of DRAM, since memory below it is not
+			    accessible
+			1 - 2MB aligned base may be anywhere in physical
+			    memory
+  Bits 4-63:	Reserved.
 
 - When image_size is zero, a bootloader should attempt to keep as much
   memory as possible free for use by the kernel immediately after the
@@ -117,14 +123,14 @@ Header notes:
   depending on selected features, and is effectively unbound.
 
 The Image must be placed text_offset bytes from a 2MB aligned base
-address near the start of usable system RAM and called there. Memory
-below that base address is currently unusable by Linux, and therefore it
-is strongly recommended that this location is the start of system RAM.
-The region between the 2 MB aligned base address and the start of the
-image has no special significance to the kernel, and may be used for
-other purposes.
+address anywhere in usable system RAM and called there. The region
+between the 2 MB aligned base address and the start of the image has no
+special significance to the kernel, and may be used for other purposes.
 At least image_size bytes from the start of the image must be free for
 use by the kernel.
+NOTE: versions prior to v4.6 cannot make use of memory below the
+physical offset of the Image so it is recommended that the Image be
+placed as close as possible to the start of system RAM.
 
 Any memory described to the kernel (even that below the start of the
 image) which is not marked as reserved from the kernel (e.g., with a
diff --git a/arch/arm64/include/asm/boot.h b/arch/arm64/include/asm/boot.h
index 81151b67b26b..ebf2481889c3 100644
--- a/arch/arm64/include/asm/boot.h
+++ b/arch/arm64/include/asm/boot.h
@@ -11,4 +11,10 @@
 #define MIN_FDT_ALIGN		8
 #define MAX_FDT_SIZE		SZ_2M
 
+/*
+ * arm64 requires the kernel image to placed
+ * TEXT_OFFSET bytes beyond a 2 MB aligned base
+ */
+#define MIN_KIMG_ALIGN		SZ_2M
+
 #endif
diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index daa8a7b9917a..dfe4bae463b7 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -80,5 +80,16 @@
 #define SWAPPER_MM_MMUFLAGS	(PTE_ATTRINDX(MT_NORMAL) | SWAPPER_PTE_FLAGS)
 #endif
 
+/*
+ * To make optimal use of block mappings when laying out the linear mapping,
+ * round down the base of physical memory to a size that can be mapped
+ * efficiently, i.e., either PUD_SIZE (4k) or PMD_SIZE (64k), or a multiple that
+ * can be mapped using contiguous bits in the page tables: 32 * PMD_SIZE (16k)
+ */
+#ifdef CONFIG_ARM64_64K_PAGES
+#define ARM64_MEMSTART_ALIGN	SZ_512M
+#else
+#define ARM64_MEMSTART_ALIGN	SZ_1G
+#endif
 
 #endif	/* __ASM_KERNEL_PGTABLE_H */
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 0899026a2821..7e9516365b76 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -73,7 +73,7 @@
 
 #define KERN_TO_HYP(kva)	((unsigned long)kva - PAGE_OFFSET + HYP_PAGE_OFFSET)
 
-#define kvm_ksym_ref(sym)	((void *)&sym - KIMAGE_VADDR + PAGE_OFFSET)
+#define kvm_ksym_ref(sym)	phys_to_virt((u64)&sym - kimage_voffset)
 
 /*
  * We currently only support a 40bit IPA.
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index e45d3141ad98..758fb4a503ef 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -89,10 +89,10 @@
 #define __virt_to_phys(x) ({						\
 	phys_addr_t __x = (phys_addr_t)(x);				\
 	__x >= PAGE_OFFSET ? (__x - PAGE_OFFSET + PHYS_OFFSET) :	\
-			     (__x - KIMAGE_VADDR + PHYS_OFFSET); })
+			     (__x - kimage_voffset); })
 
 #define __phys_to_virt(x)	((unsigned long)((x) - PHYS_OFFSET + PAGE_OFFSET))
-#define __phys_to_kimg(x)	((unsigned long)((x) - PHYS_OFFSET + KIMAGE_VADDR))
+#define __phys_to_kimg(x)	((unsigned long)((x) + kimage_voffset))
 
 /*
  * Convert a page to/from a physical address
@@ -122,13 +122,14 @@ extern phys_addr_t		memstart_addr;
 /* PHYS_OFFSET - the physical address of the start of memory. */
 #define PHYS_OFFSET		({ memstart_addr; })
 
+/* the offset between the kernel virtual and physical mappings */
+extern u64			kimage_voffset;
+
 /*
- * The maximum physical address that the linear direct mapping
- * of system RAM can cover. (PAGE_OFFSET can be interpreted as
- * a 2's complement signed quantity and negated to derive the
- * maximum size of the linear mapping.)
+ * Allow all memory at the discovery stage. We will clip it later.
  */
-#define MAX_MEMBLOCK_ADDR	({ memstart_addr - PAGE_OFFSET - 1; })
+#define MIN_MEMBLOCK_ADDR	0
+#define MAX_MEMBLOCK_ADDR	U64_MAX
 
 /*
  * PFNs are used to describe any physical page; this means
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 5dc8079cef77..d66aee595170 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -52,15 +52,18 @@
 #define KERNEL_END	_end
 
 #ifdef CONFIG_CPU_BIG_ENDIAN
-#define __HEAD_FLAG_BE	1
+#define __HEAD_FLAG_BE		1
 #else
-#define __HEAD_FLAG_BE	0
+#define __HEAD_FLAG_BE		0
 #endif
 
-#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
+#define __HEAD_FLAG_PHYS_BASE	1
 
-#define __HEAD_FLAGS	((__HEAD_FLAG_BE << 0) |	\
-			 (__HEAD_FLAG_PAGE_SIZE << 1))
+#define __HEAD_FLAG_PAGE_SIZE	((PAGE_SHIFT - 10) / 2)
+
+#define __HEAD_FLAGS		((__HEAD_FLAG_BE << 0) |	\
+				 (__HEAD_FLAG_PAGE_SIZE << 1) |	\
+				 (__HEAD_FLAG_PHYS_BASE << 3))
 
 /*
  * Kernel startup entry point.
@@ -448,7 +451,11 @@ __mmap_switched:
 	and	x4, x4, #~(THREAD_SIZE - 1)
 	msr	sp_el0, x4			// Save thread_info
 	str_l	x21, __fdt_pointer, x5		// Save FDT pointer
-	str_l	x24, memstart_addr, x6		// Save PHYS_OFFSET
+
+	ldr	x0, =KIMAGE_VADDR		// Save the offset between
+	sub	x24, x0, x24			// the kernel virtual and
+	str_l	x24, kimage_voffset, x0		// physical mappings
+
 	mov	x29, #0
 #ifdef CONFIG_KASAN
 	bl	kasan_early_init
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index baa923bda651..9e89965a2fad 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -35,7 +35,9 @@
 #include <linux/efi.h>
 #include <linux/swiotlb.h>
 
+#include <asm/boot.h>
 #include <asm/fixmap.h>
+#include <asm/kernel-pgtable.h>
 #include <asm/memory.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
@@ -157,9 +159,76 @@ static int __init early_mem(char *p)
 }
 early_param("mem", early_mem);
 
+/*
+ * clip_mem_range() - remove memblock memory between @min and @max until
+ *                    we meet the limit in 'memory_limit'.
+ */
+static void __init clip_mem_range(u64 min, u64 max)
+{
+	u64 mem_size, to_remove;
+	int i;
+
+again:
+	mem_size = memblock_phys_mem_size();
+	if (mem_size <= memory_limit || max <= min)
+		return;
+
+	to_remove = mem_size - memory_limit;
+
+	for (i = memblock.memory.cnt - 1; i >= 0; i--) {
+		struct memblock_region *r = memblock.memory.regions + i;
+		u64 start = max(min, r->base);
+		u64 end = min(max, r->base + r->size);
+
+		if (start >= max || end <= min)
+			continue;
+
+		if (end > min) {
+			u64 size = min(to_remove, end - max(start, min));
+
+			memblock_remove(end - size, size);
+		} else {
+			memblock_remove(start, min(max - start, to_remove));
+		}
+		goto again;
+	}
+}
+
 void __init arm64_memblock_init(void)
 {
-	memblock_enforce_memory_limit(memory_limit);
+	const s64 linear_region_size = -(s64)PAGE_OFFSET;
+
+	/*
+	 * Select a suitable value for the base of physical memory.
+	 */
+	memstart_addr = round_down(memblock_start_of_DRAM(),
+				   ARM64_MEMSTART_ALIGN);
+
+	/*
+	 * Remove the memory that we will not be able to cover
+	 * with the linear mapping.
+	 */
+	memblock_remove(memstart_addr + linear_region_size, ULLONG_MAX);
+
+	if (memory_limit != (phys_addr_t)ULLONG_MAX) {
+		u64 kbase = round_down(__pa(_text), MIN_KIMG_ALIGN);
+		u64 kend = PAGE_ALIGN(__pa(_end));
+		u64 const sz_4g = 0x100000000UL;
+
+		/*
+		 * Clip memory in order of preference:
+		 * - above the kernel and above 4 GB
+		 * - between 4 GB and the start of the kernel (if the kernel
+		 *   is loaded high in memory)
+		 * - between the kernel and 4 GB (if the kernel is loaded
+		 *   low in memory)
+		 * - below 4 GB
+		 */
+		clip_mem_range(max(sz_4g, kend), ULLONG_MAX);
+		clip_mem_range(sz_4g, kbase);
+		clip_mem_range(kend, sz_4g);
+		clip_mem_range(0, min(kbase, sz_4g));
+	}
 
 	/*
 	 * Register the kernel text, kernel data, initrd, and initial
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 0b28f1469f9b..a1fd3414a322 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -46,6 +46,9 @@
 
 u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
 
+u64 kimage_voffset __read_mostly;
+EXPORT_SYMBOL(kimage_voffset);
+
 /*
  * Empty_zero_page is a special page that is used for zero-initialized data
  * and COW.
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 13/21] arm64: allow kernel Image to be loaded anywhere in physical memory
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel

This relaxes the kernel Image placement requirements, so that it
may be placed at any 2 MB aligned offset in physical memory.

This is accomplished by ignoring PHYS_OFFSET when installing
memblocks, and accounting for the apparent virtual offset of
the kernel Image. As a result, virtual address references
below PAGE_OFFSET are correctly mapped onto physical references
into the kernel Image regardless of where it sits in memory.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 Documentation/arm64/booting.txt         | 20 ++++--
 arch/arm64/include/asm/boot.h           |  6 ++
 arch/arm64/include/asm/kernel-pgtable.h | 11 +++
 arch/arm64/include/asm/kvm_mmu.h        |  2 +-
 arch/arm64/include/asm/memory.h         | 15 +++--
 arch/arm64/kernel/head.S                | 19 ++++--
 arch/arm64/mm/init.c                    | 71 +++++++++++++++++++-
 arch/arm64/mm/mmu.c                     |  3 +
 8 files changed, 125 insertions(+), 22 deletions(-)

diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt
index 701d39d3171a..67484067ce4f 100644
--- a/Documentation/arm64/booting.txt
+++ b/Documentation/arm64/booting.txt
@@ -109,7 +109,13 @@ Header notes:
 			1 - 4K
 			2 - 16K
 			3 - 64K
-  Bits 3-63:	Reserved.
+  Bit 3:	Kernel physical placement
+			0 - 2MB aligned base should be as close as possible
+			    to the base of DRAM, since memory below it is not
+			    accessible
+			1 - 2MB aligned base may be anywhere in physical
+			    memory
+  Bits 4-63:	Reserved.
 
 - When image_size is zero, a bootloader should attempt to keep as much
   memory as possible free for use by the kernel immediately after the
@@ -117,14 +123,14 @@ Header notes:
   depending on selected features, and is effectively unbound.
 
 The Image must be placed text_offset bytes from a 2MB aligned base
-address near the start of usable system RAM and called there. Memory
-below that base address is currently unusable by Linux, and therefore it
-is strongly recommended that this location is the start of system RAM.
-The region between the 2 MB aligned base address and the start of the
-image has no special significance to the kernel, and may be used for
-other purposes.
+address anywhere in usable system RAM and called there. The region
+between the 2 MB aligned base address and the start of the image has no
+special significance to the kernel, and may be used for other purposes.
 At least image_size bytes from the start of the image must be free for
 use by the kernel.
+NOTE: versions prior to v4.6 cannot make use of memory below the
+physical offset of the Image so it is recommended that the Image be
+placed as close as possible to the start of system RAM.
 
 Any memory described to the kernel (even that below the start of the
 image) which is not marked as reserved from the kernel (e.g., with a
diff --git a/arch/arm64/include/asm/boot.h b/arch/arm64/include/asm/boot.h
index 81151b67b26b..ebf2481889c3 100644
--- a/arch/arm64/include/asm/boot.h
+++ b/arch/arm64/include/asm/boot.h
@@ -11,4 +11,10 @@
 #define MIN_FDT_ALIGN		8
 #define MAX_FDT_SIZE		SZ_2M
 
+/*
+ * arm64 requires the kernel image to placed
+ * TEXT_OFFSET bytes beyond a 2 MB aligned base
+ */
+#define MIN_KIMG_ALIGN		SZ_2M
+
 #endif
diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index daa8a7b9917a..dfe4bae463b7 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -80,5 +80,16 @@
 #define SWAPPER_MM_MMUFLAGS	(PTE_ATTRINDX(MT_NORMAL) | SWAPPER_PTE_FLAGS)
 #endif
 
+/*
+ * To make optimal use of block mappings when laying out the linear mapping,
+ * round down the base of physical memory to a size that can be mapped
+ * efficiently, i.e., either PUD_SIZE (4k) or PMD_SIZE (64k), or a multiple that
+ * can be mapped using contiguous bits in the page tables: 32 * PMD_SIZE (16k)
+ */
+#ifdef CONFIG_ARM64_64K_PAGES
+#define ARM64_MEMSTART_ALIGN	SZ_512M
+#else
+#define ARM64_MEMSTART_ALIGN	SZ_1G
+#endif
 
 #endif	/* __ASM_KERNEL_PGTABLE_H */
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 0899026a2821..7e9516365b76 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -73,7 +73,7 @@
 
 #define KERN_TO_HYP(kva)	((unsigned long)kva - PAGE_OFFSET + HYP_PAGE_OFFSET)
 
-#define kvm_ksym_ref(sym)	((void *)&sym - KIMAGE_VADDR + PAGE_OFFSET)
+#define kvm_ksym_ref(sym)	phys_to_virt((u64)&sym - kimage_voffset)
 
 /*
  * We currently only support a 40bit IPA.
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index e45d3141ad98..758fb4a503ef 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -89,10 +89,10 @@
 #define __virt_to_phys(x) ({						\
 	phys_addr_t __x = (phys_addr_t)(x);				\
 	__x >= PAGE_OFFSET ? (__x - PAGE_OFFSET + PHYS_OFFSET) :	\
-			     (__x - KIMAGE_VADDR + PHYS_OFFSET); })
+			     (__x - kimage_voffset); })
 
 #define __phys_to_virt(x)	((unsigned long)((x) - PHYS_OFFSET + PAGE_OFFSET))
-#define __phys_to_kimg(x)	((unsigned long)((x) - PHYS_OFFSET + KIMAGE_VADDR))
+#define __phys_to_kimg(x)	((unsigned long)((x) + kimage_voffset))
 
 /*
  * Convert a page to/from a physical address
@@ -122,13 +122,14 @@ extern phys_addr_t		memstart_addr;
 /* PHYS_OFFSET - the physical address of the start of memory. */
 #define PHYS_OFFSET		({ memstart_addr; })
 
+/* the offset between the kernel virtual and physical mappings */
+extern u64			kimage_voffset;
+
 /*
- * The maximum physical address that the linear direct mapping
- * of system RAM can cover. (PAGE_OFFSET can be interpreted as
- * a 2's complement signed quantity and negated to derive the
- * maximum size of the linear mapping.)
+ * Allow all memory at the discovery stage. We will clip it later.
  */
-#define MAX_MEMBLOCK_ADDR	({ memstart_addr - PAGE_OFFSET - 1; })
+#define MIN_MEMBLOCK_ADDR	0
+#define MAX_MEMBLOCK_ADDR	U64_MAX
 
 /*
  * PFNs are used to describe any physical page; this means
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 5dc8079cef77..d66aee595170 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -52,15 +52,18 @@
 #define KERNEL_END	_end
 
 #ifdef CONFIG_CPU_BIG_ENDIAN
-#define __HEAD_FLAG_BE	1
+#define __HEAD_FLAG_BE		1
 #else
-#define __HEAD_FLAG_BE	0
+#define __HEAD_FLAG_BE		0
 #endif
 
-#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
+#define __HEAD_FLAG_PHYS_BASE	1
 
-#define __HEAD_FLAGS	((__HEAD_FLAG_BE << 0) |	\
-			 (__HEAD_FLAG_PAGE_SIZE << 1))
+#define __HEAD_FLAG_PAGE_SIZE	((PAGE_SHIFT - 10) / 2)
+
+#define __HEAD_FLAGS		((__HEAD_FLAG_BE << 0) |	\
+				 (__HEAD_FLAG_PAGE_SIZE << 1) |	\
+				 (__HEAD_FLAG_PHYS_BASE << 3))
 
 /*
  * Kernel startup entry point.
@@ -448,7 +451,11 @@ __mmap_switched:
 	and	x4, x4, #~(THREAD_SIZE - 1)
 	msr	sp_el0, x4			// Save thread_info
 	str_l	x21, __fdt_pointer, x5		// Save FDT pointer
-	str_l	x24, memstart_addr, x6		// Save PHYS_OFFSET
+
+	ldr	x0, =KIMAGE_VADDR		// Save the offset between
+	sub	x24, x0, x24			// the kernel virtual and
+	str_l	x24, kimage_voffset, x0		// physical mappings
+
 	mov	x29, #0
 #ifdef CONFIG_KASAN
 	bl	kasan_early_init
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index baa923bda651..9e89965a2fad 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -35,7 +35,9 @@
 #include <linux/efi.h>
 #include <linux/swiotlb.h>
 
+#include <asm/boot.h>
 #include <asm/fixmap.h>
+#include <asm/kernel-pgtable.h>
 #include <asm/memory.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
@@ -157,9 +159,76 @@ static int __init early_mem(char *p)
 }
 early_param("mem", early_mem);
 
+/*
+ * clip_mem_range() - remove memblock memory between @min and @max until
+ *                    we meet the limit in 'memory_limit'.
+ */
+static void __init clip_mem_range(u64 min, u64 max)
+{
+	u64 mem_size, to_remove;
+	int i;
+
+again:
+	mem_size = memblock_phys_mem_size();
+	if (mem_size <= memory_limit || max <= min)
+		return;
+
+	to_remove = mem_size - memory_limit;
+
+	for (i = memblock.memory.cnt - 1; i >= 0; i--) {
+		struct memblock_region *r = memblock.memory.regions + i;
+		u64 start = max(min, r->base);
+		u64 end = min(max, r->base + r->size);
+
+		if (start >= max || end <= min)
+			continue;
+
+		if (end > min) {
+			u64 size = min(to_remove, end - max(start, min));
+
+			memblock_remove(end - size, size);
+		} else {
+			memblock_remove(start, min(max - start, to_remove));
+		}
+		goto again;
+	}
+}
+
 void __init arm64_memblock_init(void)
 {
-	memblock_enforce_memory_limit(memory_limit);
+	const s64 linear_region_size = -(s64)PAGE_OFFSET;
+
+	/*
+	 * Select a suitable value for the base of physical memory.
+	 */
+	memstart_addr = round_down(memblock_start_of_DRAM(),
+				   ARM64_MEMSTART_ALIGN);
+
+	/*
+	 * Remove the memory that we will not be able to cover
+	 * with the linear mapping.
+	 */
+	memblock_remove(memstart_addr + linear_region_size, ULLONG_MAX);
+
+	if (memory_limit != (phys_addr_t)ULLONG_MAX) {
+		u64 kbase = round_down(__pa(_text), MIN_KIMG_ALIGN);
+		u64 kend = PAGE_ALIGN(__pa(_end));
+		u64 const sz_4g = 0x100000000UL;
+
+		/*
+		 * Clip memory in order of preference:
+		 * - above the kernel and above 4 GB
+		 * - between 4 GB and the start of the kernel (if the kernel
+		 *   is loaded high in memory)
+		 * - between the kernel and 4 GB (if the kernel is loaded
+		 *   low in memory)
+		 * - below 4 GB
+		 */
+		clip_mem_range(max(sz_4g, kend), ULLONG_MAX);
+		clip_mem_range(sz_4g, kbase);
+		clip_mem_range(kend, sz_4g);
+		clip_mem_range(0, min(kbase, sz_4g));
+	}
 
 	/*
 	 * Register the kernel text, kernel data, initrd, and initial
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 0b28f1469f9b..a1fd3414a322 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -46,6 +46,9 @@
 
 u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
 
+u64 kimage_voffset __read_mostly;
+EXPORT_SYMBOL(kimage_voffset);
+
 /*
  * Empty_zero_page is a special page that is used for zero-initialized data
  * and COW.
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [kernel-hardening] [PATCH v3 13/21] arm64: allow kernel Image to be loaded anywhere in physical memory
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

This relaxes the kernel Image placement requirements, so that it
may be placed at any 2 MB aligned offset in physical memory.

This is accomplished by ignoring PHYS_OFFSET when installing
memblocks, and accounting for the apparent virtual offset of
the kernel Image. As a result, virtual address references
below PAGE_OFFSET are correctly mapped onto physical references
into the kernel Image regardless of where it sits in memory.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 Documentation/arm64/booting.txt         | 20 ++++--
 arch/arm64/include/asm/boot.h           |  6 ++
 arch/arm64/include/asm/kernel-pgtable.h | 11 +++
 arch/arm64/include/asm/kvm_mmu.h        |  2 +-
 arch/arm64/include/asm/memory.h         | 15 +++--
 arch/arm64/kernel/head.S                | 19 ++++--
 arch/arm64/mm/init.c                    | 71 +++++++++++++++++++-
 arch/arm64/mm/mmu.c                     |  3 +
 8 files changed, 125 insertions(+), 22 deletions(-)

diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt
index 701d39d3171a..67484067ce4f 100644
--- a/Documentation/arm64/booting.txt
+++ b/Documentation/arm64/booting.txt
@@ -109,7 +109,13 @@ Header notes:
 			1 - 4K
 			2 - 16K
 			3 - 64K
-  Bits 3-63:	Reserved.
+  Bit 3:	Kernel physical placement
+			0 - 2MB aligned base should be as close as possible
+			    to the base of DRAM, since memory below it is not
+			    accessible
+			1 - 2MB aligned base may be anywhere in physical
+			    memory
+  Bits 4-63:	Reserved.
 
 - When image_size is zero, a bootloader should attempt to keep as much
   memory as possible free for use by the kernel immediately after the
@@ -117,14 +123,14 @@ Header notes:
   depending on selected features, and is effectively unbound.
 
 The Image must be placed text_offset bytes from a 2MB aligned base
-address near the start of usable system RAM and called there. Memory
-below that base address is currently unusable by Linux, and therefore it
-is strongly recommended that this location is the start of system RAM.
-The region between the 2 MB aligned base address and the start of the
-image has no special significance to the kernel, and may be used for
-other purposes.
+address anywhere in usable system RAM and called there. The region
+between the 2 MB aligned base address and the start of the image has no
+special significance to the kernel, and may be used for other purposes.
 At least image_size bytes from the start of the image must be free for
 use by the kernel.
+NOTE: versions prior to v4.6 cannot make use of memory below the
+physical offset of the Image so it is recommended that the Image be
+placed as close as possible to the start of system RAM.
 
 Any memory described to the kernel (even that below the start of the
 image) which is not marked as reserved from the kernel (e.g., with a
diff --git a/arch/arm64/include/asm/boot.h b/arch/arm64/include/asm/boot.h
index 81151b67b26b..ebf2481889c3 100644
--- a/arch/arm64/include/asm/boot.h
+++ b/arch/arm64/include/asm/boot.h
@@ -11,4 +11,10 @@
 #define MIN_FDT_ALIGN		8
 #define MAX_FDT_SIZE		SZ_2M
 
+/*
+ * arm64 requires the kernel image to placed
+ * TEXT_OFFSET bytes beyond a 2 MB aligned base
+ */
+#define MIN_KIMG_ALIGN		SZ_2M
+
 #endif
diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index daa8a7b9917a..dfe4bae463b7 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -80,5 +80,16 @@
 #define SWAPPER_MM_MMUFLAGS	(PTE_ATTRINDX(MT_NORMAL) | SWAPPER_PTE_FLAGS)
 #endif
 
+/*
+ * To make optimal use of block mappings when laying out the linear mapping,
+ * round down the base of physical memory to a size that can be mapped
+ * efficiently, i.e., either PUD_SIZE (4k) or PMD_SIZE (64k), or a multiple that
+ * can be mapped using contiguous bits in the page tables: 32 * PMD_SIZE (16k)
+ */
+#ifdef CONFIG_ARM64_64K_PAGES
+#define ARM64_MEMSTART_ALIGN	SZ_512M
+#else
+#define ARM64_MEMSTART_ALIGN	SZ_1G
+#endif
 
 #endif	/* __ASM_KERNEL_PGTABLE_H */
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 0899026a2821..7e9516365b76 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -73,7 +73,7 @@
 
 #define KERN_TO_HYP(kva)	((unsigned long)kva - PAGE_OFFSET + HYP_PAGE_OFFSET)
 
-#define kvm_ksym_ref(sym)	((void *)&sym - KIMAGE_VADDR + PAGE_OFFSET)
+#define kvm_ksym_ref(sym)	phys_to_virt((u64)&sym - kimage_voffset)
 
 /*
  * We currently only support a 40bit IPA.
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index e45d3141ad98..758fb4a503ef 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -89,10 +89,10 @@
 #define __virt_to_phys(x) ({						\
 	phys_addr_t __x = (phys_addr_t)(x);				\
 	__x >= PAGE_OFFSET ? (__x - PAGE_OFFSET + PHYS_OFFSET) :	\
-			     (__x - KIMAGE_VADDR + PHYS_OFFSET); })
+			     (__x - kimage_voffset); })
 
 #define __phys_to_virt(x)	((unsigned long)((x) - PHYS_OFFSET + PAGE_OFFSET))
-#define __phys_to_kimg(x)	((unsigned long)((x) - PHYS_OFFSET + KIMAGE_VADDR))
+#define __phys_to_kimg(x)	((unsigned long)((x) + kimage_voffset))
 
 /*
  * Convert a page to/from a physical address
@@ -122,13 +122,14 @@ extern phys_addr_t		memstart_addr;
 /* PHYS_OFFSET - the physical address of the start of memory. */
 #define PHYS_OFFSET		({ memstart_addr; })
 
+/* the offset between the kernel virtual and physical mappings */
+extern u64			kimage_voffset;
+
 /*
- * The maximum physical address that the linear direct mapping
- * of system RAM can cover. (PAGE_OFFSET can be interpreted as
- * a 2's complement signed quantity and negated to derive the
- * maximum size of the linear mapping.)
+ * Allow all memory at the discovery stage. We will clip it later.
  */
-#define MAX_MEMBLOCK_ADDR	({ memstart_addr - PAGE_OFFSET - 1; })
+#define MIN_MEMBLOCK_ADDR	0
+#define MAX_MEMBLOCK_ADDR	U64_MAX
 
 /*
  * PFNs are used to describe any physical page; this means
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 5dc8079cef77..d66aee595170 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -52,15 +52,18 @@
 #define KERNEL_END	_end
 
 #ifdef CONFIG_CPU_BIG_ENDIAN
-#define __HEAD_FLAG_BE	1
+#define __HEAD_FLAG_BE		1
 #else
-#define __HEAD_FLAG_BE	0
+#define __HEAD_FLAG_BE		0
 #endif
 
-#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
+#define __HEAD_FLAG_PHYS_BASE	1
 
-#define __HEAD_FLAGS	((__HEAD_FLAG_BE << 0) |	\
-			 (__HEAD_FLAG_PAGE_SIZE << 1))
+#define __HEAD_FLAG_PAGE_SIZE	((PAGE_SHIFT - 10) / 2)
+
+#define __HEAD_FLAGS		((__HEAD_FLAG_BE << 0) |	\
+				 (__HEAD_FLAG_PAGE_SIZE << 1) |	\
+				 (__HEAD_FLAG_PHYS_BASE << 3))
 
 /*
  * Kernel startup entry point.
@@ -448,7 +451,11 @@ __mmap_switched:
 	and	x4, x4, #~(THREAD_SIZE - 1)
 	msr	sp_el0, x4			// Save thread_info
 	str_l	x21, __fdt_pointer, x5		// Save FDT pointer
-	str_l	x24, memstart_addr, x6		// Save PHYS_OFFSET
+
+	ldr	x0, =KIMAGE_VADDR		// Save the offset between
+	sub	x24, x0, x24			// the kernel virtual and
+	str_l	x24, kimage_voffset, x0		// physical mappings
+
 	mov	x29, #0
 #ifdef CONFIG_KASAN
 	bl	kasan_early_init
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index baa923bda651..9e89965a2fad 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -35,7 +35,9 @@
 #include <linux/efi.h>
 #include <linux/swiotlb.h>
 
+#include <asm/boot.h>
 #include <asm/fixmap.h>
+#include <asm/kernel-pgtable.h>
 #include <asm/memory.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
@@ -157,9 +159,76 @@ static int __init early_mem(char *p)
 }
 early_param("mem", early_mem);
 
+/*
+ * clip_mem_range() - remove memblock memory between @min and @max until
+ *                    we meet the limit in 'memory_limit'.
+ */
+static void __init clip_mem_range(u64 min, u64 max)
+{
+	u64 mem_size, to_remove;
+	int i;
+
+again:
+	mem_size = memblock_phys_mem_size();
+	if (mem_size <= memory_limit || max <= min)
+		return;
+
+	to_remove = mem_size - memory_limit;
+
+	for (i = memblock.memory.cnt - 1; i >= 0; i--) {
+		struct memblock_region *r = memblock.memory.regions + i;
+		u64 start = max(min, r->base);
+		u64 end = min(max, r->base + r->size);
+
+		if (start >= max || end <= min)
+			continue;
+
+		if (end > min) {
+			u64 size = min(to_remove, end - max(start, min));
+
+			memblock_remove(end - size, size);
+		} else {
+			memblock_remove(start, min(max - start, to_remove));
+		}
+		goto again;
+	}
+}
+
 void __init arm64_memblock_init(void)
 {
-	memblock_enforce_memory_limit(memory_limit);
+	const s64 linear_region_size = -(s64)PAGE_OFFSET;
+
+	/*
+	 * Select a suitable value for the base of physical memory.
+	 */
+	memstart_addr = round_down(memblock_start_of_DRAM(),
+				   ARM64_MEMSTART_ALIGN);
+
+	/*
+	 * Remove the memory that we will not be able to cover
+	 * with the linear mapping.
+	 */
+	memblock_remove(memstart_addr + linear_region_size, ULLONG_MAX);
+
+	if (memory_limit != (phys_addr_t)ULLONG_MAX) {
+		u64 kbase = round_down(__pa(_text), MIN_KIMG_ALIGN);
+		u64 kend = PAGE_ALIGN(__pa(_end));
+		u64 const sz_4g = 0x100000000UL;
+
+		/*
+		 * Clip memory in order of preference:
+		 * - above the kernel and above 4 GB
+		 * - between 4 GB and the start of the kernel (if the kernel
+		 *   is loaded high in memory)
+		 * - between the kernel and 4 GB (if the kernel is loaded
+		 *   low in memory)
+		 * - below 4 GB
+		 */
+		clip_mem_range(max(sz_4g, kend), ULLONG_MAX);
+		clip_mem_range(sz_4g, kbase);
+		clip_mem_range(kend, sz_4g);
+		clip_mem_range(0, min(kbase, sz_4g));
+	}
 
 	/*
 	 * Register the kernel text, kernel data, initrd, and initial
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 0b28f1469f9b..a1fd3414a322 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -46,6 +46,9 @@
 
 u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
 
+u64 kimage_voffset __read_mostly;
+EXPORT_SYMBOL(kimage_voffset);
+
 /*
  * Empty_zero_page is a special page that is used for zero-initialized data
  * and COW.
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 14/21] arm64: redefine SWAPPER_TABLE_SHIFT for use in asm code
  2016-01-11 13:18 ` Ard Biesheuvel
  (?)
@ 2016-01-11 13:19   ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

The current definition of SWAPPER_TABLE_SHIFT can only be used in
asm code if the configured number of translation levels defines
PUD_SHIFT and/or PMD_SHIFT natively (4KB and 16KB/64KB granule,
respectively). Otherwise, it depends on the nopmd/nopud fixup
headers, which can only be included in C code.

So redefine SWAPPER_TABLE_SHIFT in a way that is independent of the
number of configured translation levels.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/kernel-pgtable.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index dfe4bae463b7..b1c96a29fad7 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -57,13 +57,13 @@
 #if ARM64_SWAPPER_USES_SECTION_MAPS
 #define SWAPPER_BLOCK_SHIFT	SECTION_SHIFT
 #define SWAPPER_BLOCK_SIZE	SECTION_SIZE
-#define SWAPPER_TABLE_SHIFT	PUD_SHIFT
 #else
 #define SWAPPER_BLOCK_SHIFT	PAGE_SHIFT
 #define SWAPPER_BLOCK_SIZE	PAGE_SIZE
-#define SWAPPER_TABLE_SHIFT	PMD_SHIFT
 #endif
 
+#define SWAPPER_TABLE_SHIFT	(SWAPPER_BLOCK_SHIFT + PAGE_SHIFT - 3)
+
 /* The size of the initial kernel direct mapping */
 #define SWAPPER_INIT_MAP_SIZE	(_AC(1, UL) << SWAPPER_TABLE_SHIFT)
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 14/21] arm64: redefine SWAPPER_TABLE_SHIFT for use in asm code
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel

The current definition of SWAPPER_TABLE_SHIFT can only be used in
asm code if the configured number of translation levels defines
PUD_SHIFT and/or PMD_SHIFT natively (4KB and 16KB/64KB granule,
respectively). Otherwise, it depends on the nopmd/nopud fixup
headers, which can only be included in C code.

So redefine SWAPPER_TABLE_SHIFT in a way that is independent of the
number of configured translation levels.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/kernel-pgtable.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index dfe4bae463b7..b1c96a29fad7 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -57,13 +57,13 @@
 #if ARM64_SWAPPER_USES_SECTION_MAPS
 #define SWAPPER_BLOCK_SHIFT	SECTION_SHIFT
 #define SWAPPER_BLOCK_SIZE	SECTION_SIZE
-#define SWAPPER_TABLE_SHIFT	PUD_SHIFT
 #else
 #define SWAPPER_BLOCK_SHIFT	PAGE_SHIFT
 #define SWAPPER_BLOCK_SIZE	PAGE_SIZE
-#define SWAPPER_TABLE_SHIFT	PMD_SHIFT
 #endif
 
+#define SWAPPER_TABLE_SHIFT	(SWAPPER_BLOCK_SHIFT + PAGE_SHIFT - 3)
+
 /* The size of the initial kernel direct mapping */
 #define SWAPPER_INIT_MAP_SIZE	(_AC(1, UL) << SWAPPER_TABLE_SHIFT)
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [kernel-hardening] [PATCH v3 14/21] arm64: redefine SWAPPER_TABLE_SHIFT for use in asm code
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

The current definition of SWAPPER_TABLE_SHIFT can only be used in
asm code if the configured number of translation levels defines
PUD_SHIFT and/or PMD_SHIFT natively (4KB and 16KB/64KB granule,
respectively). Otherwise, it depends on the nopmd/nopud fixup
headers, which can only be included in C code.

So redefine SWAPPER_TABLE_SHIFT in a way that is independent of the
number of configured translation levels.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/kernel-pgtable.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index dfe4bae463b7..b1c96a29fad7 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -57,13 +57,13 @@
 #if ARM64_SWAPPER_USES_SECTION_MAPS
 #define SWAPPER_BLOCK_SHIFT	SECTION_SHIFT
 #define SWAPPER_BLOCK_SIZE	SECTION_SIZE
-#define SWAPPER_TABLE_SHIFT	PUD_SHIFT
 #else
 #define SWAPPER_BLOCK_SHIFT	PAGE_SHIFT
 #define SWAPPER_BLOCK_SIZE	PAGE_SIZE
-#define SWAPPER_TABLE_SHIFT	PMD_SHIFT
 #endif
 
+#define SWAPPER_TABLE_SHIFT	(SWAPPER_BLOCK_SHIFT + PAGE_SHIFT - 3)
+
 /* The size of the initial kernel direct mapping */
 #define SWAPPER_INIT_MAP_SIZE	(_AC(1, UL) << SWAPPER_TABLE_SHIFT)
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 14/21] arm64: [re]define SWAPPER_TABLE_[SHIFT|SIZE] for use in asm code
  2016-01-11 13:18 ` Ard Biesheuvel
  (?)
@ 2016-01-11 13:19   ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

The current definition of SWAPPER_TABLE_SHIFT can only be used in
asm code if the configured number of translation levels defines
PUD_SHIFT and/or PMD_SHIFT natively (4KB and 16KB/64KB granule,
respectively). Otherwise, it depends on the nopmd/nopud fixup
headers, which can only be included in C code.

So redefine SWAPPER_TABLE_SHIFT in a way that is independent of the
number of configured translation levels. Define SWAPPER_TABLE_SIZE
as well, we will need it later.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/kernel-pgtable.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index daa8a7b9917a..eaac46097359 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -57,13 +57,14 @@
 #if ARM64_SWAPPER_USES_SECTION_MAPS
 #define SWAPPER_BLOCK_SHIFT	SECTION_SHIFT
 #define SWAPPER_BLOCK_SIZE	SECTION_SIZE
-#define SWAPPER_TABLE_SHIFT	PUD_SHIFT
 #else
 #define SWAPPER_BLOCK_SHIFT	PAGE_SHIFT
 #define SWAPPER_BLOCK_SIZE	PAGE_SIZE
-#define SWAPPER_TABLE_SHIFT	PMD_SHIFT
 #endif
 
+#define SWAPPER_TABLE_SHIFT	(SWAPPER_BLOCK_SHIFT + PAGE_SHIFT - 3)
+#define SWAPPER_TABLE_SIZE	(1 << SWAPPER_TABLE_SHIFT)
+
 /* The size of the initial kernel direct mapping */
 #define SWAPPER_INIT_MAP_SIZE	(_AC(1, UL) << SWAPPER_TABLE_SHIFT)
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 14/21] arm64: [re]define SWAPPER_TABLE_[SHIFT|SIZE] for use in asm code
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel

The current definition of SWAPPER_TABLE_SHIFT can only be used in
asm code if the configured number of translation levels defines
PUD_SHIFT and/or PMD_SHIFT natively (4KB and 16KB/64KB granule,
respectively). Otherwise, it depends on the nopmd/nopud fixup
headers, which can only be included in C code.

So redefine SWAPPER_TABLE_SHIFT in a way that is independent of the
number of configured translation levels. Define SWAPPER_TABLE_SIZE
as well, we will need it later.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/kernel-pgtable.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index daa8a7b9917a..eaac46097359 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -57,13 +57,14 @@
 #if ARM64_SWAPPER_USES_SECTION_MAPS
 #define SWAPPER_BLOCK_SHIFT	SECTION_SHIFT
 #define SWAPPER_BLOCK_SIZE	SECTION_SIZE
-#define SWAPPER_TABLE_SHIFT	PUD_SHIFT
 #else
 #define SWAPPER_BLOCK_SHIFT	PAGE_SHIFT
 #define SWAPPER_BLOCK_SIZE	PAGE_SIZE
-#define SWAPPER_TABLE_SHIFT	PMD_SHIFT
 #endif
 
+#define SWAPPER_TABLE_SHIFT	(SWAPPER_BLOCK_SHIFT + PAGE_SHIFT - 3)
+#define SWAPPER_TABLE_SIZE	(1 << SWAPPER_TABLE_SHIFT)
+
 /* The size of the initial kernel direct mapping */
 #define SWAPPER_INIT_MAP_SIZE	(_AC(1, UL) << SWAPPER_TABLE_SHIFT)
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [kernel-hardening] [PATCH v3 14/21] arm64: [re]define SWAPPER_TABLE_[SHIFT|SIZE] for use in asm code
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

The current definition of SWAPPER_TABLE_SHIFT can only be used in
asm code if the configured number of translation levels defines
PUD_SHIFT and/or PMD_SHIFT natively (4KB and 16KB/64KB granule,
respectively). Otherwise, it depends on the nopmd/nopud fixup
headers, which can only be included in C code.

So redefine SWAPPER_TABLE_SHIFT in a way that is independent of the
number of configured translation levels. Define SWAPPER_TABLE_SIZE
as well, we will need it later.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/kernel-pgtable.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index daa8a7b9917a..eaac46097359 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -57,13 +57,14 @@
 #if ARM64_SWAPPER_USES_SECTION_MAPS
 #define SWAPPER_BLOCK_SHIFT	SECTION_SHIFT
 #define SWAPPER_BLOCK_SIZE	SECTION_SIZE
-#define SWAPPER_TABLE_SHIFT	PUD_SHIFT
 #else
 #define SWAPPER_BLOCK_SHIFT	PAGE_SHIFT
 #define SWAPPER_BLOCK_SIZE	PAGE_SIZE
-#define SWAPPER_TABLE_SHIFT	PMD_SHIFT
 #endif
 
+#define SWAPPER_TABLE_SHIFT	(SWAPPER_BLOCK_SHIFT + PAGE_SHIFT - 3)
+#define SWAPPER_TABLE_SIZE	(1 << SWAPPER_TABLE_SHIFT)
+
 /* The size of the initial kernel direct mapping */
 #define SWAPPER_INIT_MAP_SIZE	(_AC(1, UL) << SWAPPER_TABLE_SHIFT)
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 15/21] arm64: split elf relocs into a separate header.
  2016-01-11 13:18 ` Ard Biesheuvel
  (?)
@ 2016-01-11 13:19   ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

From: Mark Rutland <mark.rutland@arm.com>

Currently, asm/elf.h contains a mixture of simple constants, C structure
definitions, and some constants defined in terms of constants from other
headers (which are themselves mixtures).

To enable the use of AArch64 ELF reloc constants from assembly code (as
we will need for relocatable kernel support), we need an include without
C structure definitions or includes of other files with such definitions.

This patch factors out the relocs into a new header specifically for ELF
reloc types.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/elf.h        | 54 +--------------
 arch/arm64/include/asm/elf_relocs.h | 73 ++++++++++++++++++++
 2 files changed, 74 insertions(+), 53 deletions(-)

diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h
index faad6df49e5b..e4b3cdcaf597 100644
--- a/arch/arm64/include/asm/elf.h
+++ b/arch/arm64/include/asm/elf.h
@@ -16,6 +16,7 @@
 #ifndef __ASM_ELF_H
 #define __ASM_ELF_H
 
+#include <asm/elf_relocs.h>
 #include <asm/hwcap.h>
 
 /*
@@ -34,59 +35,6 @@ typedef elf_greg_t elf_gregset_t[ELF_NGREG];
 typedef struct user_fpsimd_state elf_fpregset_t;
 
 /*
- * AArch64 static relocation types.
- */
-
-/* Miscellaneous. */
-#define R_ARM_NONE			0
-#define R_AARCH64_NONE			256
-
-/* Data. */
-#define R_AARCH64_ABS64			257
-#define R_AARCH64_ABS32			258
-#define R_AARCH64_ABS16			259
-#define R_AARCH64_PREL64		260
-#define R_AARCH64_PREL32		261
-#define R_AARCH64_PREL16		262
-
-/* Instructions. */
-#define R_AARCH64_MOVW_UABS_G0		263
-#define R_AARCH64_MOVW_UABS_G0_NC	264
-#define R_AARCH64_MOVW_UABS_G1		265
-#define R_AARCH64_MOVW_UABS_G1_NC	266
-#define R_AARCH64_MOVW_UABS_G2		267
-#define R_AARCH64_MOVW_UABS_G2_NC	268
-#define R_AARCH64_MOVW_UABS_G3		269
-
-#define R_AARCH64_MOVW_SABS_G0		270
-#define R_AARCH64_MOVW_SABS_G1		271
-#define R_AARCH64_MOVW_SABS_G2		272
-
-#define R_AARCH64_LD_PREL_LO19		273
-#define R_AARCH64_ADR_PREL_LO21		274
-#define R_AARCH64_ADR_PREL_PG_HI21	275
-#define R_AARCH64_ADR_PREL_PG_HI21_NC	276
-#define R_AARCH64_ADD_ABS_LO12_NC	277
-#define R_AARCH64_LDST8_ABS_LO12_NC	278
-
-#define R_AARCH64_TSTBR14		279
-#define R_AARCH64_CONDBR19		280
-#define R_AARCH64_JUMP26		282
-#define R_AARCH64_CALL26		283
-#define R_AARCH64_LDST16_ABS_LO12_NC	284
-#define R_AARCH64_LDST32_ABS_LO12_NC	285
-#define R_AARCH64_LDST64_ABS_LO12_NC	286
-#define R_AARCH64_LDST128_ABS_LO12_NC	299
-
-#define R_AARCH64_MOVW_PREL_G0		287
-#define R_AARCH64_MOVW_PREL_G0_NC	288
-#define R_AARCH64_MOVW_PREL_G1		289
-#define R_AARCH64_MOVW_PREL_G1_NC	290
-#define R_AARCH64_MOVW_PREL_G2		291
-#define R_AARCH64_MOVW_PREL_G2_NC	292
-#define R_AARCH64_MOVW_PREL_G3		293
-
-/*
  * These are used to set parameters in the core dumps.
  */
 #define ELF_CLASS	ELFCLASS64
diff --git a/arch/arm64/include/asm/elf_relocs.h b/arch/arm64/include/asm/elf_relocs.h
new file mode 100644
index 000000000000..3f6b93099011
--- /dev/null
+++ b/arch/arm64/include/asm/elf_relocs.h
@@ -0,0 +1,73 @@
+/*
+ * Copyright (C) 2016 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_ELF_RELOCS_H
+#define __ASM_ELF_RELOCS_H
+
+/*
+ * AArch64 static relocation types.
+ */
+
+/* Miscellaneous. */
+#define R_ARM_NONE			0
+#define R_AARCH64_NONE			256
+
+/* Data. */
+#define R_AARCH64_ABS64			257
+#define R_AARCH64_ABS32			258
+#define R_AARCH64_ABS16			259
+#define R_AARCH64_PREL64		260
+#define R_AARCH64_PREL32		261
+#define R_AARCH64_PREL16		262
+
+/* Instructions. */
+#define R_AARCH64_MOVW_UABS_G0		263
+#define R_AARCH64_MOVW_UABS_G0_NC	264
+#define R_AARCH64_MOVW_UABS_G1		265
+#define R_AARCH64_MOVW_UABS_G1_NC	266
+#define R_AARCH64_MOVW_UABS_G2		267
+#define R_AARCH64_MOVW_UABS_G2_NC	268
+#define R_AARCH64_MOVW_UABS_G3		269
+
+#define R_AARCH64_MOVW_SABS_G0		270
+#define R_AARCH64_MOVW_SABS_G1		271
+#define R_AARCH64_MOVW_SABS_G2		272
+
+#define R_AARCH64_LD_PREL_LO19		273
+#define R_AARCH64_ADR_PREL_LO21		274
+#define R_AARCH64_ADR_PREL_PG_HI21	275
+#define R_AARCH64_ADR_PREL_PG_HI21_NC	276
+#define R_AARCH64_ADD_ABS_LO12_NC	277
+#define R_AARCH64_LDST8_ABS_LO12_NC	278
+
+#define R_AARCH64_TSTBR14		279
+#define R_AARCH64_CONDBR19		280
+#define R_AARCH64_JUMP26		282
+#define R_AARCH64_CALL26		283
+#define R_AARCH64_LDST16_ABS_LO12_NC	284
+#define R_AARCH64_LDST32_ABS_LO12_NC	285
+#define R_AARCH64_LDST64_ABS_LO12_NC	286
+#define R_AARCH64_LDST128_ABS_LO12_NC	299
+
+#define R_AARCH64_MOVW_PREL_G0		287
+#define R_AARCH64_MOVW_PREL_G0_NC	288
+#define R_AARCH64_MOVW_PREL_G1		289
+#define R_AARCH64_MOVW_PREL_G1_NC	290
+#define R_AARCH64_MOVW_PREL_G2		291
+#define R_AARCH64_MOVW_PREL_G2_NC	292
+#define R_AARCH64_MOVW_PREL_G3		293
+
+#endif /* __ASM_ELF_RELOCS_H */
+
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 15/21] arm64: split elf relocs into a separate header.
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel

From: Mark Rutland <mark.rutland@arm.com>

Currently, asm/elf.h contains a mixture of simple constants, C structure
definitions, and some constants defined in terms of constants from other
headers (which are themselves mixtures).

To enable the use of AArch64 ELF reloc constants from assembly code (as
we will need for relocatable kernel support), we need an include without
C structure definitions or includes of other files with such definitions.

This patch factors out the relocs into a new header specifically for ELF
reloc types.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/elf.h        | 54 +--------------
 arch/arm64/include/asm/elf_relocs.h | 73 ++++++++++++++++++++
 2 files changed, 74 insertions(+), 53 deletions(-)

diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h
index faad6df49e5b..e4b3cdcaf597 100644
--- a/arch/arm64/include/asm/elf.h
+++ b/arch/arm64/include/asm/elf.h
@@ -16,6 +16,7 @@
 #ifndef __ASM_ELF_H
 #define __ASM_ELF_H
 
+#include <asm/elf_relocs.h>
 #include <asm/hwcap.h>
 
 /*
@@ -34,59 +35,6 @@ typedef elf_greg_t elf_gregset_t[ELF_NGREG];
 typedef struct user_fpsimd_state elf_fpregset_t;
 
 /*
- * AArch64 static relocation types.
- */
-
-/* Miscellaneous. */
-#define R_ARM_NONE			0
-#define R_AARCH64_NONE			256
-
-/* Data. */
-#define R_AARCH64_ABS64			257
-#define R_AARCH64_ABS32			258
-#define R_AARCH64_ABS16			259
-#define R_AARCH64_PREL64		260
-#define R_AARCH64_PREL32		261
-#define R_AARCH64_PREL16		262
-
-/* Instructions. */
-#define R_AARCH64_MOVW_UABS_G0		263
-#define R_AARCH64_MOVW_UABS_G0_NC	264
-#define R_AARCH64_MOVW_UABS_G1		265
-#define R_AARCH64_MOVW_UABS_G1_NC	266
-#define R_AARCH64_MOVW_UABS_G2		267
-#define R_AARCH64_MOVW_UABS_G2_NC	268
-#define R_AARCH64_MOVW_UABS_G3		269
-
-#define R_AARCH64_MOVW_SABS_G0		270
-#define R_AARCH64_MOVW_SABS_G1		271
-#define R_AARCH64_MOVW_SABS_G2		272
-
-#define R_AARCH64_LD_PREL_LO19		273
-#define R_AARCH64_ADR_PREL_LO21		274
-#define R_AARCH64_ADR_PREL_PG_HI21	275
-#define R_AARCH64_ADR_PREL_PG_HI21_NC	276
-#define R_AARCH64_ADD_ABS_LO12_NC	277
-#define R_AARCH64_LDST8_ABS_LO12_NC	278
-
-#define R_AARCH64_TSTBR14		279
-#define R_AARCH64_CONDBR19		280
-#define R_AARCH64_JUMP26		282
-#define R_AARCH64_CALL26		283
-#define R_AARCH64_LDST16_ABS_LO12_NC	284
-#define R_AARCH64_LDST32_ABS_LO12_NC	285
-#define R_AARCH64_LDST64_ABS_LO12_NC	286
-#define R_AARCH64_LDST128_ABS_LO12_NC	299
-
-#define R_AARCH64_MOVW_PREL_G0		287
-#define R_AARCH64_MOVW_PREL_G0_NC	288
-#define R_AARCH64_MOVW_PREL_G1		289
-#define R_AARCH64_MOVW_PREL_G1_NC	290
-#define R_AARCH64_MOVW_PREL_G2		291
-#define R_AARCH64_MOVW_PREL_G2_NC	292
-#define R_AARCH64_MOVW_PREL_G3		293
-
-/*
  * These are used to set parameters in the core dumps.
  */
 #define ELF_CLASS	ELFCLASS64
diff --git a/arch/arm64/include/asm/elf_relocs.h b/arch/arm64/include/asm/elf_relocs.h
new file mode 100644
index 000000000000..3f6b93099011
--- /dev/null
+++ b/arch/arm64/include/asm/elf_relocs.h
@@ -0,0 +1,73 @@
+/*
+ * Copyright (C) 2016 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_ELF_RELOCS_H
+#define __ASM_ELF_RELOCS_H
+
+/*
+ * AArch64 static relocation types.
+ */
+
+/* Miscellaneous. */
+#define R_ARM_NONE			0
+#define R_AARCH64_NONE			256
+
+/* Data. */
+#define R_AARCH64_ABS64			257
+#define R_AARCH64_ABS32			258
+#define R_AARCH64_ABS16			259
+#define R_AARCH64_PREL64		260
+#define R_AARCH64_PREL32		261
+#define R_AARCH64_PREL16		262
+
+/* Instructions. */
+#define R_AARCH64_MOVW_UABS_G0		263
+#define R_AARCH64_MOVW_UABS_G0_NC	264
+#define R_AARCH64_MOVW_UABS_G1		265
+#define R_AARCH64_MOVW_UABS_G1_NC	266
+#define R_AARCH64_MOVW_UABS_G2		267
+#define R_AARCH64_MOVW_UABS_G2_NC	268
+#define R_AARCH64_MOVW_UABS_G3		269
+
+#define R_AARCH64_MOVW_SABS_G0		270
+#define R_AARCH64_MOVW_SABS_G1		271
+#define R_AARCH64_MOVW_SABS_G2		272
+
+#define R_AARCH64_LD_PREL_LO19		273
+#define R_AARCH64_ADR_PREL_LO21		274
+#define R_AARCH64_ADR_PREL_PG_HI21	275
+#define R_AARCH64_ADR_PREL_PG_HI21_NC	276
+#define R_AARCH64_ADD_ABS_LO12_NC	277
+#define R_AARCH64_LDST8_ABS_LO12_NC	278
+
+#define R_AARCH64_TSTBR14		279
+#define R_AARCH64_CONDBR19		280
+#define R_AARCH64_JUMP26		282
+#define R_AARCH64_CALL26		283
+#define R_AARCH64_LDST16_ABS_LO12_NC	284
+#define R_AARCH64_LDST32_ABS_LO12_NC	285
+#define R_AARCH64_LDST64_ABS_LO12_NC	286
+#define R_AARCH64_LDST128_ABS_LO12_NC	299
+
+#define R_AARCH64_MOVW_PREL_G0		287
+#define R_AARCH64_MOVW_PREL_G0_NC	288
+#define R_AARCH64_MOVW_PREL_G1		289
+#define R_AARCH64_MOVW_PREL_G1_NC	290
+#define R_AARCH64_MOVW_PREL_G2		291
+#define R_AARCH64_MOVW_PREL_G2_NC	292
+#define R_AARCH64_MOVW_PREL_G3		293
+
+#endif /* __ASM_ELF_RELOCS_H */
+
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [kernel-hardening] [PATCH v3 15/21] arm64: split elf relocs into a separate header.
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

From: Mark Rutland <mark.rutland@arm.com>

Currently, asm/elf.h contains a mixture of simple constants, C structure
definitions, and some constants defined in terms of constants from other
headers (which are themselves mixtures).

To enable the use of AArch64 ELF reloc constants from assembly code (as
we will need for relocatable kernel support), we need an include without
C structure definitions or includes of other files with such definitions.

This patch factors out the relocs into a new header specifically for ELF
reloc types.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/elf.h        | 54 +--------------
 arch/arm64/include/asm/elf_relocs.h | 73 ++++++++++++++++++++
 2 files changed, 74 insertions(+), 53 deletions(-)

diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h
index faad6df49e5b..e4b3cdcaf597 100644
--- a/arch/arm64/include/asm/elf.h
+++ b/arch/arm64/include/asm/elf.h
@@ -16,6 +16,7 @@
 #ifndef __ASM_ELF_H
 #define __ASM_ELF_H
 
+#include <asm/elf_relocs.h>
 #include <asm/hwcap.h>
 
 /*
@@ -34,59 +35,6 @@ typedef elf_greg_t elf_gregset_t[ELF_NGREG];
 typedef struct user_fpsimd_state elf_fpregset_t;
 
 /*
- * AArch64 static relocation types.
- */
-
-/* Miscellaneous. */
-#define R_ARM_NONE			0
-#define R_AARCH64_NONE			256
-
-/* Data. */
-#define R_AARCH64_ABS64			257
-#define R_AARCH64_ABS32			258
-#define R_AARCH64_ABS16			259
-#define R_AARCH64_PREL64		260
-#define R_AARCH64_PREL32		261
-#define R_AARCH64_PREL16		262
-
-/* Instructions. */
-#define R_AARCH64_MOVW_UABS_G0		263
-#define R_AARCH64_MOVW_UABS_G0_NC	264
-#define R_AARCH64_MOVW_UABS_G1		265
-#define R_AARCH64_MOVW_UABS_G1_NC	266
-#define R_AARCH64_MOVW_UABS_G2		267
-#define R_AARCH64_MOVW_UABS_G2_NC	268
-#define R_AARCH64_MOVW_UABS_G3		269
-
-#define R_AARCH64_MOVW_SABS_G0		270
-#define R_AARCH64_MOVW_SABS_G1		271
-#define R_AARCH64_MOVW_SABS_G2		272
-
-#define R_AARCH64_LD_PREL_LO19		273
-#define R_AARCH64_ADR_PREL_LO21		274
-#define R_AARCH64_ADR_PREL_PG_HI21	275
-#define R_AARCH64_ADR_PREL_PG_HI21_NC	276
-#define R_AARCH64_ADD_ABS_LO12_NC	277
-#define R_AARCH64_LDST8_ABS_LO12_NC	278
-
-#define R_AARCH64_TSTBR14		279
-#define R_AARCH64_CONDBR19		280
-#define R_AARCH64_JUMP26		282
-#define R_AARCH64_CALL26		283
-#define R_AARCH64_LDST16_ABS_LO12_NC	284
-#define R_AARCH64_LDST32_ABS_LO12_NC	285
-#define R_AARCH64_LDST64_ABS_LO12_NC	286
-#define R_AARCH64_LDST128_ABS_LO12_NC	299
-
-#define R_AARCH64_MOVW_PREL_G0		287
-#define R_AARCH64_MOVW_PREL_G0_NC	288
-#define R_AARCH64_MOVW_PREL_G1		289
-#define R_AARCH64_MOVW_PREL_G1_NC	290
-#define R_AARCH64_MOVW_PREL_G2		291
-#define R_AARCH64_MOVW_PREL_G2_NC	292
-#define R_AARCH64_MOVW_PREL_G3		293
-
-/*
  * These are used to set parameters in the core dumps.
  */
 #define ELF_CLASS	ELFCLASS64
diff --git a/arch/arm64/include/asm/elf_relocs.h b/arch/arm64/include/asm/elf_relocs.h
new file mode 100644
index 000000000000..3f6b93099011
--- /dev/null
+++ b/arch/arm64/include/asm/elf_relocs.h
@@ -0,0 +1,73 @@
+/*
+ * Copyright (C) 2016 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_ELF_RELOCS_H
+#define __ASM_ELF_RELOCS_H
+
+/*
+ * AArch64 static relocation types.
+ */
+
+/* Miscellaneous. */
+#define R_ARM_NONE			0
+#define R_AARCH64_NONE			256
+
+/* Data. */
+#define R_AARCH64_ABS64			257
+#define R_AARCH64_ABS32			258
+#define R_AARCH64_ABS16			259
+#define R_AARCH64_PREL64		260
+#define R_AARCH64_PREL32		261
+#define R_AARCH64_PREL16		262
+
+/* Instructions. */
+#define R_AARCH64_MOVW_UABS_G0		263
+#define R_AARCH64_MOVW_UABS_G0_NC	264
+#define R_AARCH64_MOVW_UABS_G1		265
+#define R_AARCH64_MOVW_UABS_G1_NC	266
+#define R_AARCH64_MOVW_UABS_G2		267
+#define R_AARCH64_MOVW_UABS_G2_NC	268
+#define R_AARCH64_MOVW_UABS_G3		269
+
+#define R_AARCH64_MOVW_SABS_G0		270
+#define R_AARCH64_MOVW_SABS_G1		271
+#define R_AARCH64_MOVW_SABS_G2		272
+
+#define R_AARCH64_LD_PREL_LO19		273
+#define R_AARCH64_ADR_PREL_LO21		274
+#define R_AARCH64_ADR_PREL_PG_HI21	275
+#define R_AARCH64_ADR_PREL_PG_HI21_NC	276
+#define R_AARCH64_ADD_ABS_LO12_NC	277
+#define R_AARCH64_LDST8_ABS_LO12_NC	278
+
+#define R_AARCH64_TSTBR14		279
+#define R_AARCH64_CONDBR19		280
+#define R_AARCH64_JUMP26		282
+#define R_AARCH64_CALL26		283
+#define R_AARCH64_LDST16_ABS_LO12_NC	284
+#define R_AARCH64_LDST32_ABS_LO12_NC	285
+#define R_AARCH64_LDST64_ABS_LO12_NC	286
+#define R_AARCH64_LDST128_ABS_LO12_NC	299
+
+#define R_AARCH64_MOVW_PREL_G0		287
+#define R_AARCH64_MOVW_PREL_G0_NC	288
+#define R_AARCH64_MOVW_PREL_G1		289
+#define R_AARCH64_MOVW_PREL_G1_NC	290
+#define R_AARCH64_MOVW_PREL_G2		291
+#define R_AARCH64_MOVW_PREL_G2_NC	292
+#define R_AARCH64_MOVW_PREL_G3		293
+
+#endif /* __ASM_ELF_RELOCS_H */
+
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 16/21] scripts/sortextable: add support for ET_DYN binaries
  2016-01-11 13:18 ` Ard Biesheuvel
  (?)
@ 2016-01-11 13:19   ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

Add support to scripts/sortextable for handling relocatable (PIE)
executables, whose ELF type is ET_DYN, not ET_EXEC. Other than adding
support for the new type, no changes are needed.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 scripts/sortextable.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/scripts/sortextable.c b/scripts/sortextable.c
index af247c70fb66..19d83647846c 100644
--- a/scripts/sortextable.c
+++ b/scripts/sortextable.c
@@ -266,9 +266,9 @@ do_file(char const *const fname)
 		break;
 	}  /* end switch */
 	if (memcmp(ELFMAG, ehdr->e_ident, SELFMAG) != 0
-	||  r2(&ehdr->e_type) != ET_EXEC
+	||  (r2(&ehdr->e_type) != ET_EXEC && r2(&ehdr->e_type) != ET_DYN)
 	||  ehdr->e_ident[EI_VERSION] != EV_CURRENT) {
-		fprintf(stderr, "unrecognized ET_EXEC file %s\n", fname);
+		fprintf(stderr, "unrecognized ET_EXEC/ET_DYN file %s\n", fname);
 		fail_file();
 	}
 
@@ -304,7 +304,7 @@ do_file(char const *const fname)
 		if (r2(&ehdr->e_ehsize) != sizeof(Elf32_Ehdr)
 		||  r2(&ehdr->e_shentsize) != sizeof(Elf32_Shdr)) {
 			fprintf(stderr,
-				"unrecognized ET_EXEC file: %s\n", fname);
+				"unrecognized ET_EXEC/ET_DYN file: %s\n", fname);
 			fail_file();
 		}
 		do32(ehdr, fname, custom_sort);
@@ -314,7 +314,7 @@ do_file(char const *const fname)
 		if (r2(&ghdr->e_ehsize) != sizeof(Elf64_Ehdr)
 		||  r2(&ghdr->e_shentsize) != sizeof(Elf64_Shdr)) {
 			fprintf(stderr,
-				"unrecognized ET_EXEC file: %s\n", fname);
+				"unrecognized ET_EXEC/ET_DYN file: %s\n", fname);
 			fail_file();
 		}
 		do64(ghdr, fname, custom_sort);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 16/21] scripts/sortextable: add support for ET_DYN binaries
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel

Add support to scripts/sortextable for handling relocatable (PIE)
executables, whose ELF type is ET_DYN, not ET_EXEC. Other than adding
support for the new type, no changes are needed.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 scripts/sortextable.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/scripts/sortextable.c b/scripts/sortextable.c
index af247c70fb66..19d83647846c 100644
--- a/scripts/sortextable.c
+++ b/scripts/sortextable.c
@@ -266,9 +266,9 @@ do_file(char const *const fname)
 		break;
 	}  /* end switch */
 	if (memcmp(ELFMAG, ehdr->e_ident, SELFMAG) != 0
-	||  r2(&ehdr->e_type) != ET_EXEC
+	||  (r2(&ehdr->e_type) != ET_EXEC && r2(&ehdr->e_type) != ET_DYN)
 	||  ehdr->e_ident[EI_VERSION] != EV_CURRENT) {
-		fprintf(stderr, "unrecognized ET_EXEC file %s\n", fname);
+		fprintf(stderr, "unrecognized ET_EXEC/ET_DYN file %s\n", fname);
 		fail_file();
 	}
 
@@ -304,7 +304,7 @@ do_file(char const *const fname)
 		if (r2(&ehdr->e_ehsize) != sizeof(Elf32_Ehdr)
 		||  r2(&ehdr->e_shentsize) != sizeof(Elf32_Shdr)) {
 			fprintf(stderr,
-				"unrecognized ET_EXEC file: %s\n", fname);
+				"unrecognized ET_EXEC/ET_DYN file: %s\n", fname);
 			fail_file();
 		}
 		do32(ehdr, fname, custom_sort);
@@ -314,7 +314,7 @@ do_file(char const *const fname)
 		if (r2(&ghdr->e_ehsize) != sizeof(Elf64_Ehdr)
 		||  r2(&ghdr->e_shentsize) != sizeof(Elf64_Shdr)) {
 			fprintf(stderr,
-				"unrecognized ET_EXEC file: %s\n", fname);
+				"unrecognized ET_EXEC/ET_DYN file: %s\n", fname);
 			fail_file();
 		}
 		do64(ghdr, fname, custom_sort);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [kernel-hardening] [PATCH v3 16/21] scripts/sortextable: add support for ET_DYN binaries
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

Add support to scripts/sortextable for handling relocatable (PIE)
executables, whose ELF type is ET_DYN, not ET_EXEC. Other than adding
support for the new type, no changes are needed.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 scripts/sortextable.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/scripts/sortextable.c b/scripts/sortextable.c
index af247c70fb66..19d83647846c 100644
--- a/scripts/sortextable.c
+++ b/scripts/sortextable.c
@@ -266,9 +266,9 @@ do_file(char const *const fname)
 		break;
 	}  /* end switch */
 	if (memcmp(ELFMAG, ehdr->e_ident, SELFMAG) != 0
-	||  r2(&ehdr->e_type) != ET_EXEC
+	||  (r2(&ehdr->e_type) != ET_EXEC && r2(&ehdr->e_type) != ET_DYN)
 	||  ehdr->e_ident[EI_VERSION] != EV_CURRENT) {
-		fprintf(stderr, "unrecognized ET_EXEC file %s\n", fname);
+		fprintf(stderr, "unrecognized ET_EXEC/ET_DYN file %s\n", fname);
 		fail_file();
 	}
 
@@ -304,7 +304,7 @@ do_file(char const *const fname)
 		if (r2(&ehdr->e_ehsize) != sizeof(Elf32_Ehdr)
 		||  r2(&ehdr->e_shentsize) != sizeof(Elf32_Shdr)) {
 			fprintf(stderr,
-				"unrecognized ET_EXEC file: %s\n", fname);
+				"unrecognized ET_EXEC/ET_DYN file: %s\n", fname);
 			fail_file();
 		}
 		do32(ehdr, fname, custom_sort);
@@ -314,7 +314,7 @@ do_file(char const *const fname)
 		if (r2(&ghdr->e_ehsize) != sizeof(Elf64_Ehdr)
 		||  r2(&ghdr->e_shentsize) != sizeof(Elf64_Shdr)) {
 			fprintf(stderr,
-				"unrecognized ET_EXEC file: %s\n", fname);
+				"unrecognized ET_EXEC/ET_DYN file: %s\n", fname);
 			fail_file();
 		}
 		do64(ghdr, fname, custom_sort);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 17/21] arm64: add support for a relocatable kernel and KASLR
  2016-01-11 13:18 ` Ard Biesheuvel
  (?)
@ 2016-01-11 13:19   ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

This adds support for runtime relocation of the kernel Image, by
building it as a PIE (ET_DYN) executable and applying the dynamic
relocations in the early boot code.

On top of this, support for KASLR is implemented, based on entropy
provided by the bootloader in register x1 at kernel entry. Depending
on the size of the address space (VA_BITS) and the page size, the
entropy in the virtual displacement is up to 13 bits (16k/2 levels)
and up to 25 bits (all 4 levels), with the caveat that displacements
that result in the kernel image straddling a 1GB/32MB/512MB alignment
boundary (for 4KB/16KB/64KB granule kernels, respectively) are not
allowed.

The same virtual offset is applied to the module region: this gives
almost the same security benefits, and keeps the modules in close
proximity to the kernel so we only have to rely on branches via PLTs
once the module region is exhausted (which is slightly more likely
to occur, as the relocated module region is shared with other uses
of the vmalloc area)

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 Documentation/arm64/booting.txt     | 16 +++-
 arch/arm64/Kconfig                  | 26 ++++++
 arch/arm64/Makefile                 |  4 +
 arch/arm64/include/asm/elf_relocs.h |  2 +
 arch/arm64/include/asm/memory.h     |  3 +
 arch/arm64/kernel/head.S            | 94 +++++++++++++++++++-
 arch/arm64/kernel/module.c          |  3 +-
 arch/arm64/kernel/setup.c           | 38 ++++++--
 arch/arm64/kernel/vmlinux.lds.S     |  9 ++
 9 files changed, 180 insertions(+), 15 deletions(-)

diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt
index 67484067ce4f..0bd5ea83a54f 100644
--- a/Documentation/arm64/booting.txt
+++ b/Documentation/arm64/booting.txt
@@ -115,13 +115,25 @@ Header notes:
 			    accessible
 			1 - 2MB aligned base may be anywhere in physical
 			    memory
-  Bits 4-63:	Reserved.
+  Bit 4:	Virtual address space layout randomization (KASLR)
+			0 - kernel will execute from a fixed virtual offset
+			    that is decided at compile time, register x1 should
+			    be zero at kernel entry
+			1 - kernel will execute from a virtual offset that is
+			    randomized based on the contents of register x1 at
+			    kernel entry
+  Bits 5-63:	Reserved.
 
 - When image_size is zero, a bootloader should attempt to keep as much
   memory as possible free for use by the kernel immediately after the
   end of the kernel image. The amount of space required will vary
   depending on selected features, and is effectively unbound.
 
+- It is up to the bootloader to decide whether a KASLR capable kernel should
+  boot with randomization enabled. If this is the case, register x1 should
+  contain a strong random value. If the bootloader passes 'nokaslr' on the
+  kernel command line to disable randomization, it must also pass 0 in x1.
+
 The Image must be placed text_offset bytes from a 2MB aligned base
 address anywhere in usable system RAM and called there. The region
 between the 2 MB aligned base address and the start of the image has no
@@ -145,7 +157,7 @@ Before jumping into the kernel, the following conditions must be met:
 
 - Primary CPU general-purpose register settings
   x0 = physical address of device tree blob (dtb) in system RAM.
-  x1 = 0 (reserved for future use)
+  x1 = 0, unless bit 4 is set in the Image header
   x2 = 0 (reserved for future use)
   x3 = 0 (reserved for future use)
 
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 778df20bf623..7fa5b74ee80d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -711,6 +711,32 @@ config ARM64_MODULE_PLTS
 	select ARM64_MODULE_CMODEL_LARGE
 	select HAVE_MOD_ARCH_SPECIFIC
 
+config RELOCATABLE
+	bool
+	help
+	  This builds the kernel as a Position Independent Executable (PIE),
+	  which retains all relocation metadata required to relocate the
+	  kernel binary at runtime to a different virtual address than the
+	  address it was linked at.
+	  Since AArch64 uses the RELA relocation format, this requires a
+	  relocation pass at runtime even if the kernel is loaded at the
+	  same address it was linked at.
+
+config RANDOMIZE_BASE
+	bool "Randomize the address of the kernel image"
+	select ARM64_MODULE_PLTS
+	select RELOCATABLE
+	help
+	  Randomizes the virtual address at which the kernel image is
+	  loaded, as a security feature that deters exploit attempts
+	  relying on knowledge of the location of kernel internals.
+
+	  It is the bootloader's job to provide entropy, by passing a
+	  random value in x1 at kernel entry.
+
+	  If unsure, say N.
+
+
 endmenu
 
 menu "Boot options"
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index db462980c6be..c3eaa03f9020 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -15,6 +15,10 @@ CPPFLAGS_vmlinux.lds = -DTEXT_OFFSET=$(TEXT_OFFSET)
 OBJCOPYFLAGS	:=-O binary -R .note -R .note.gnu.build-id -R .comment -S
 GZFLAGS		:=-9
 
+ifneq ($(CONFIG_RELOCATABLE),)
+LDFLAGS_vmlinux		+= -pie
+endif
+
 KBUILD_DEFCONFIG := defconfig
 
 # Check for binutils support for specific extensions
diff --git a/arch/arm64/include/asm/elf_relocs.h b/arch/arm64/include/asm/elf_relocs.h
index 3f6b93099011..e1316de840a5 100644
--- a/arch/arm64/include/asm/elf_relocs.h
+++ b/arch/arm64/include/asm/elf_relocs.h
@@ -69,5 +69,7 @@
 #define R_AARCH64_MOVW_PREL_G2_NC	292
 #define R_AARCH64_MOVW_PREL_G3		293
 
+#define R_AARCH64_RELATIVE		1027
+
 #endif /* __ASM_ELF_RELOCS_H */
 
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 758fb4a503ef..422a30a5f328 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -122,6 +122,9 @@ extern phys_addr_t		memstart_addr;
 /* PHYS_OFFSET - the physical address of the start of memory. */
 #define PHYS_OFFSET		({ memstart_addr; })
 
+/* the virtual base of the kernel image (minus TEXT_OFFSET) */
+extern u64			kimage_vaddr;
+
 /* the offset between the kernel virtual and physical mappings */
 extern u64			kimage_voffset;
 
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index d66aee595170..4bf6a5c9a24e 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -29,6 +29,7 @@
 #include <asm/asm-offsets.h>
 #include <asm/cache.h>
 #include <asm/cputype.h>
+#include <asm/elf_relocs.h>
 #include <asm/kernel-pgtable.h>
 #include <asm/memory.h>
 #include <asm/pgtable-hwdef.h>
@@ -61,9 +62,16 @@
 
 #define __HEAD_FLAG_PAGE_SIZE	((PAGE_SHIFT - 10) / 2)
 
+#ifdef CONFIG_RANDOMIZE_BASE
+#define __HEAD_FLAG_KASLR	1
+#else
+#define __HEAD_FLAG_KASLR	0
+#endif
+
 #define __HEAD_FLAGS		((__HEAD_FLAG_BE << 0) |	\
 				 (__HEAD_FLAG_PAGE_SIZE << 1) |	\
-				 (__HEAD_FLAG_PHYS_BASE << 3))
+				 (__HEAD_FLAG_PHYS_BASE << 3) |	\
+				 (__HEAD_FLAG_KASLR << 4))
 
 /*
  * Kernel startup entry point.
@@ -234,6 +242,7 @@ ENTRY(stext)
 	 */
 	ldr	x27, 0f				// address to jump to after
 						// MMU has been enabled
+	add	x27, x27, x23			// add KASLR displacement
 	adr_l	lr, __enable_mmu		// return (PIC) address
 	b	__cpu_setup			// initialise processor
 ENDPROC(stext)
@@ -245,6 +254,7 @@ ENDPROC(stext)
  */
 preserve_boot_args:
 	mov	x21, x0				// x21=FDT
+	mov	x22, x1				// x22=random seed
 
 	adr_l	x0, boot_args			// record the contents of
 	stp	x21, x1, [x0]			// x0 .. x3 at kernel entry
@@ -328,6 +338,40 @@ __create_page_tables:
 	adrp	x26, swapper_pg_dir
 	mov	x27, lr
 
+#ifdef CONFIG_RANDOMIZE_BASE
+	/*
+	 * Mask off the bits of the random value stored in x22 so it can serve
+	 * as a KASLR displacement value which will move the kernel image to a
+	 * random offset in the lower half of the VMALLOC area (VA_BITS - 2).
+	 * Even if we could randomize at page granularity for 16k and 64k
+	 * granule kernels, let's always preserve the 2 MB (21 bit) alignment
+	 * and not interfere with the ability to use ranges of contiguous PTEs.
+	 */
+	.set	RANDOM_WIDTH, VA_BITS - 2
+	.set	RANDOM_ALIGN, 21
+
+	mov	x10, ((1 << (RANDOM_WIDTH - RANDOM_ALIGN)) - 1) << RANDOM_ALIGN
+	and	x23, x22, x10
+
+	/*
+	 * The kernel Image should not extend across a 1GB/32MB/512MB alignment
+	 * boundary (for 4KB/16KB/64KB granule kernels, respectively). If this
+	 * happens, increase the KASLR displacement in x23 by the size of the
+	 * kernel image.
+	 */
+	ldr	w8, kernel_img_size
+	mov	x11, KIMAGE_VADDR & ((1 << SWAPPER_TABLE_SHIFT) - 1)
+	add	x11, x11, x23
+	add	x9, x8, x11
+	eor	x9, x9, x11
+	tbz	x9, SWAPPER_TABLE_SHIFT, 0f
+	add	x23, x23, x8
+	and	x23, x23, x10
+0:
+#else
+	mov	x23, xzr
+#endif
+
 	/*
 	 * Invalidate the idmap and swapper page tables to avoid potential
 	 * dirty cache lines being evicted.
@@ -405,6 +449,7 @@ __create_page_tables:
 	 */
 	mov	x0, x26				// swapper_pg_dir
 	ldr	x5, =KIMAGE_VADDR
+	add	x5, x5, x23			// add KASLR displacement
 	create_pgd_entry x0, x5, x3, x6
 	ldr	w6, kernel_img_size
 	add	x6, x6, x5
@@ -446,13 +491,52 @@ __mmap_switched:
 	bl	__pi_memset
 
 	dsb	ishst				// Make zero page visible to PTW
+
+#ifdef CONFIG_RELOCATABLE
+
+	/*
+	 * Iterate over each entry in the relocation table, and apply the
+	 * relocations in place.
+	 */
+	adr_l	x8, __dynsym_start		// start of symbol table
+	adr_l	x9, __reloc_start		// start of reloc table
+	adr_l	x10, __reloc_end		// end of reloc table
+
+0:	cmp	x9, x10
+	b.hs	2f
+	ldp	x11, x12, [x9], #24
+	ldr	x13, [x9, #-8]
+	cmp	w12, #R_AARCH64_RELATIVE
+	b.ne	1f
+	add	x13, x13, x23			// relocate
+	str	x13, [x11, x23]
+	b	0b
+
+1:	cmp	w12, #R_AARCH64_ABS64
+	b.ne	0b
+	add	x12, x12, x12, lsl #1		// symtab offset: 24x top word
+	add	x12, x8, x12, lsr #(32 - 3)	// ... shifted into bottom word
+	ldrsh	w14, [x12, #6]			// Elf64_Sym::st_shndx
+	ldr	x15, [x12, #8]			// Elf64_Sym::st_value
+	cmp	w14, #-0xf			// SHN_ABS (0xfff1) ?
+	add	x14, x15, x23			// relocate
+	csel	x15, x14, x15, ne
+	add	x15, x13, x15
+	str	x15, [x11, x23]
+	b	0b
+
+2:	adr_l	x8, kimage_vaddr		// make relocated kimage_vaddr
+	dc	cvac, x8			// value visible to secondaries
+	dsb	sy				// with MMU off
+#endif
+
 	adr_l	sp, initial_sp, x4
 	mov	x4, sp
 	and	x4, x4, #~(THREAD_SIZE - 1)
 	msr	sp_el0, x4			// Save thread_info
 	str_l	x21, __fdt_pointer, x5		// Save FDT pointer
 
-	ldr	x0, =KIMAGE_VADDR		// Save the offset between
+	ldr_l	x0, kimage_vaddr		// Save the offset between
 	sub	x24, x0, x24			// the kernel virtual and
 	str_l	x24, kimage_voffset, x0		// physical mappings
 
@@ -468,6 +552,10 @@ ENDPROC(__mmap_switched)
  * hotplug and needs to have the same protections as the text region
  */
 	.section ".text","ax"
+
+ENTRY(kimage_vaddr)
+	.quad		_text - TEXT_OFFSET
+
 /*
  * If we're fortunate enough to boot at EL2, ensure that the world is
  * sane before dropping to EL1.
@@ -628,7 +716,7 @@ ENTRY(secondary_startup)
 	adrp	x26, swapper_pg_dir
 	bl	__cpu_setup			// initialise processor
 
-	ldr	x8, =KIMAGE_VADDR
+	ldr	x8, kimage_vaddr
 	ldr	w9, 0f
 	sub	x27, x8, w9, sxtw		// address to jump to after enabling the MMU
 	b	__enable_mmu
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index 3a298b0e21bb..d38662028200 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -34,7 +34,8 @@ void *module_alloc(unsigned long size)
 {
 	void *p;
 
-	p = __vmalloc_node_range(size, MODULE_ALIGN, MODULES_VADDR, MODULES_END,
+	p = __vmalloc_node_range(size, MODULE_ALIGN,
+				kimage_vaddr - MODULES_VSIZE, kimage_vaddr,
 				GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
 				NUMA_NO_NODE, __builtin_return_address(0));
 
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index c67ba4453ec6..f8111894447c 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -288,16 +288,41 @@ static inline void __init relocate_initrd(void)
 }
 #endif
 
+static bool nokaslr;
+static int __init early_nokaslr(char *p)
+{
+	nokaslr = true;
+	return 0;
+}
+early_param("nokaslr", early_nokaslr);
+
+static void check_boot_args(void)
+{
+	if ((!IS_ENABLED(CONFIG_RANDOMIZE_BASE) && boot_args[1]) ||
+	    boot_args[2] || boot_args[3]) {
+		pr_err("WARNING: x1-x3 nonzero in violation of boot protocol:\n"
+			"\tx1: %016llx\n\tx2: %016llx\n\tx3: %016llx\n"
+			"This indicates a broken bootloader or old kernel\n",
+			boot_args[1], boot_args[2], boot_args[3]);
+	}
+	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && boot_args[1] && nokaslr) {
+		pr_err("WARNING: found KASLR entropy in x1 but 'nokaslr' was passed on the commmand line:\n"
+			"\tx1: %016llx\n"
+			"This indicates a broken bootloader\n",
+			boot_args[1]);
+	}
+}
+
 u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
 
 void __init setup_arch(char **cmdline_p)
 {
 	static struct vm_struct vmlinux_vm;
 
-	vmlinux_vm.addr		= (void *)KIMAGE_VADDR;
-	vmlinux_vm.size		= round_up((u64)_end - KIMAGE_VADDR,
+	vmlinux_vm.addr		= (void *)kimage_vaddr;
+	vmlinux_vm.size		= round_up((u64)_end - kimage_vaddr,
 					   SWAPPER_BLOCK_SIZE);
-	vmlinux_vm.phys_addr	= __pa(KIMAGE_VADDR);
+	vmlinux_vm.phys_addr	= __pa(kimage_vaddr);
 	vmlinux_vm.flags	= VM_MAP;
 	vmlinux_vm.caller	= setup_arch;
 
@@ -366,12 +391,7 @@ void __init setup_arch(char **cmdline_p)
 	conswitchp = &dummy_con;
 #endif
 #endif
-	if (boot_args[1] || boot_args[2] || boot_args[3]) {
-		pr_err("WARNING: x1-x3 nonzero in violation of boot protocol:\n"
-			"\tx1: %016llx\n\tx2: %016llx\n\tx3: %016llx\n"
-			"This indicates a broken bootloader or old kernel\n",
-			boot_args[1], boot_args[2], boot_args[3]);
-	}
+	check_boot_args();
 }
 
 static int __init arm64_device_init(void)
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index ced0dedcabcc..eddd234d7721 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -148,6 +148,15 @@ SECTIONS
 	.altinstr_replacement : {
 		*(.altinstr_replacement)
 	}
+	.rela : ALIGN(8) {
+		__reloc_start = .;
+		*(.rela .rela*)
+		__reloc_end = .;
+	}
+	.dynsym : ALIGN(8) {
+		__dynsym_start = .;
+		*(.dynsym)
+	}
 
 	. = ALIGN(PAGE_SIZE);
 	__init_end = .;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 17/21] arm64: add support for a relocatable kernel and KASLR
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel

This adds support for runtime relocation of the kernel Image, by
building it as a PIE (ET_DYN) executable and applying the dynamic
relocations in the early boot code.

On top of this, support for KASLR is implemented, based on entropy
provided by the bootloader in register x1 at kernel entry. Depending
on the size of the address space (VA_BITS) and the page size, the
entropy in the virtual displacement is up to 13 bits (16k/2 levels)
and up to 25 bits (all 4 levels), with the caveat that displacements
that result in the kernel image straddling a 1GB/32MB/512MB alignment
boundary (for 4KB/16KB/64KB granule kernels, respectively) are not
allowed.

The same virtual offset is applied to the module region: this gives
almost the same security benefits, and keeps the modules in close
proximity to the kernel so we only have to rely on branches via PLTs
once the module region is exhausted (which is slightly more likely
to occur, as the relocated module region is shared with other uses
of the vmalloc area)

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 Documentation/arm64/booting.txt     | 16 +++-
 arch/arm64/Kconfig                  | 26 ++++++
 arch/arm64/Makefile                 |  4 +
 arch/arm64/include/asm/elf_relocs.h |  2 +
 arch/arm64/include/asm/memory.h     |  3 +
 arch/arm64/kernel/head.S            | 94 +++++++++++++++++++-
 arch/arm64/kernel/module.c          |  3 +-
 arch/arm64/kernel/setup.c           | 38 ++++++--
 arch/arm64/kernel/vmlinux.lds.S     |  9 ++
 9 files changed, 180 insertions(+), 15 deletions(-)

diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt
index 67484067ce4f..0bd5ea83a54f 100644
--- a/Documentation/arm64/booting.txt
+++ b/Documentation/arm64/booting.txt
@@ -115,13 +115,25 @@ Header notes:
 			    accessible
 			1 - 2MB aligned base may be anywhere in physical
 			    memory
-  Bits 4-63:	Reserved.
+  Bit 4:	Virtual address space layout randomization (KASLR)
+			0 - kernel will execute from a fixed virtual offset
+			    that is decided at compile time, register x1 should
+			    be zero at kernel entry
+			1 - kernel will execute from a virtual offset that is
+			    randomized based on the contents of register x1 at
+			    kernel entry
+  Bits 5-63:	Reserved.
 
 - When image_size is zero, a bootloader should attempt to keep as much
   memory as possible free for use by the kernel immediately after the
   end of the kernel image. The amount of space required will vary
   depending on selected features, and is effectively unbound.
 
+- It is up to the bootloader to decide whether a KASLR capable kernel should
+  boot with randomization enabled. If this is the case, register x1 should
+  contain a strong random value. If the bootloader passes 'nokaslr' on the
+  kernel command line to disable randomization, it must also pass 0 in x1.
+
 The Image must be placed text_offset bytes from a 2MB aligned base
 address anywhere in usable system RAM and called there. The region
 between the 2 MB aligned base address and the start of the image has no
@@ -145,7 +157,7 @@ Before jumping into the kernel, the following conditions must be met:
 
 - Primary CPU general-purpose register settings
   x0 = physical address of device tree blob (dtb) in system RAM.
-  x1 = 0 (reserved for future use)
+  x1 = 0, unless bit 4 is set in the Image header
   x2 = 0 (reserved for future use)
   x3 = 0 (reserved for future use)
 
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 778df20bf623..7fa5b74ee80d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -711,6 +711,32 @@ config ARM64_MODULE_PLTS
 	select ARM64_MODULE_CMODEL_LARGE
 	select HAVE_MOD_ARCH_SPECIFIC
 
+config RELOCATABLE
+	bool
+	help
+	  This builds the kernel as a Position Independent Executable (PIE),
+	  which retains all relocation metadata required to relocate the
+	  kernel binary at runtime to a different virtual address than the
+	  address it was linked at.
+	  Since AArch64 uses the RELA relocation format, this requires a
+	  relocation pass at runtime even if the kernel is loaded at the
+	  same address it was linked at.
+
+config RANDOMIZE_BASE
+	bool "Randomize the address of the kernel image"
+	select ARM64_MODULE_PLTS
+	select RELOCATABLE
+	help
+	  Randomizes the virtual address at which the kernel image is
+	  loaded, as a security feature that deters exploit attempts
+	  relying on knowledge of the location of kernel internals.
+
+	  It is the bootloader's job to provide entropy, by passing a
+	  random value in x1 at kernel entry.
+
+	  If unsure, say N.
+
+
 endmenu
 
 menu "Boot options"
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index db462980c6be..c3eaa03f9020 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -15,6 +15,10 @@ CPPFLAGS_vmlinux.lds = -DTEXT_OFFSET=$(TEXT_OFFSET)
 OBJCOPYFLAGS	:=-O binary -R .note -R .note.gnu.build-id -R .comment -S
 GZFLAGS		:=-9
 
+ifneq ($(CONFIG_RELOCATABLE),)
+LDFLAGS_vmlinux		+= -pie
+endif
+
 KBUILD_DEFCONFIG := defconfig
 
 # Check for binutils support for specific extensions
diff --git a/arch/arm64/include/asm/elf_relocs.h b/arch/arm64/include/asm/elf_relocs.h
index 3f6b93099011..e1316de840a5 100644
--- a/arch/arm64/include/asm/elf_relocs.h
+++ b/arch/arm64/include/asm/elf_relocs.h
@@ -69,5 +69,7 @@
 #define R_AARCH64_MOVW_PREL_G2_NC	292
 #define R_AARCH64_MOVW_PREL_G3		293
 
+#define R_AARCH64_RELATIVE		1027
+
 #endif /* __ASM_ELF_RELOCS_H */
 
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 758fb4a503ef..422a30a5f328 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -122,6 +122,9 @@ extern phys_addr_t		memstart_addr;
 /* PHYS_OFFSET - the physical address of the start of memory. */
 #define PHYS_OFFSET		({ memstart_addr; })
 
+/* the virtual base of the kernel image (minus TEXT_OFFSET) */
+extern u64			kimage_vaddr;
+
 /* the offset between the kernel virtual and physical mappings */
 extern u64			kimage_voffset;
 
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index d66aee595170..4bf6a5c9a24e 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -29,6 +29,7 @@
 #include <asm/asm-offsets.h>
 #include <asm/cache.h>
 #include <asm/cputype.h>
+#include <asm/elf_relocs.h>
 #include <asm/kernel-pgtable.h>
 #include <asm/memory.h>
 #include <asm/pgtable-hwdef.h>
@@ -61,9 +62,16 @@
 
 #define __HEAD_FLAG_PAGE_SIZE	((PAGE_SHIFT - 10) / 2)
 
+#ifdef CONFIG_RANDOMIZE_BASE
+#define __HEAD_FLAG_KASLR	1
+#else
+#define __HEAD_FLAG_KASLR	0
+#endif
+
 #define __HEAD_FLAGS		((__HEAD_FLAG_BE << 0) |	\
 				 (__HEAD_FLAG_PAGE_SIZE << 1) |	\
-				 (__HEAD_FLAG_PHYS_BASE << 3))
+				 (__HEAD_FLAG_PHYS_BASE << 3) |	\
+				 (__HEAD_FLAG_KASLR << 4))
 
 /*
  * Kernel startup entry point.
@@ -234,6 +242,7 @@ ENTRY(stext)
 	 */
 	ldr	x27, 0f				// address to jump to after
 						// MMU has been enabled
+	add	x27, x27, x23			// add KASLR displacement
 	adr_l	lr, __enable_mmu		// return (PIC) address
 	b	__cpu_setup			// initialise processor
 ENDPROC(stext)
@@ -245,6 +254,7 @@ ENDPROC(stext)
  */
 preserve_boot_args:
 	mov	x21, x0				// x21=FDT
+	mov	x22, x1				// x22=random seed
 
 	adr_l	x0, boot_args			// record the contents of
 	stp	x21, x1, [x0]			// x0 .. x3 at kernel entry
@@ -328,6 +338,40 @@ __create_page_tables:
 	adrp	x26, swapper_pg_dir
 	mov	x27, lr
 
+#ifdef CONFIG_RANDOMIZE_BASE
+	/*
+	 * Mask off the bits of the random value stored in x22 so it can serve
+	 * as a KASLR displacement value which will move the kernel image to a
+	 * random offset in the lower half of the VMALLOC area (VA_BITS - 2).
+	 * Even if we could randomize at page granularity for 16k and 64k
+	 * granule kernels, let's always preserve the 2 MB (21 bit) alignment
+	 * and not interfere with the ability to use ranges of contiguous PTEs.
+	 */
+	.set	RANDOM_WIDTH, VA_BITS - 2
+	.set	RANDOM_ALIGN, 21
+
+	mov	x10, ((1 << (RANDOM_WIDTH - RANDOM_ALIGN)) - 1) << RANDOM_ALIGN
+	and	x23, x22, x10
+
+	/*
+	 * The kernel Image should not extend across a 1GB/32MB/512MB alignment
+	 * boundary (for 4KB/16KB/64KB granule kernels, respectively). If this
+	 * happens, increase the KASLR displacement in x23 by the size of the
+	 * kernel image.
+	 */
+	ldr	w8, kernel_img_size
+	mov	x11, KIMAGE_VADDR & ((1 << SWAPPER_TABLE_SHIFT) - 1)
+	add	x11, x11, x23
+	add	x9, x8, x11
+	eor	x9, x9, x11
+	tbz	x9, SWAPPER_TABLE_SHIFT, 0f
+	add	x23, x23, x8
+	and	x23, x23, x10
+0:
+#else
+	mov	x23, xzr
+#endif
+
 	/*
 	 * Invalidate the idmap and swapper page tables to avoid potential
 	 * dirty cache lines being evicted.
@@ -405,6 +449,7 @@ __create_page_tables:
 	 */
 	mov	x0, x26				// swapper_pg_dir
 	ldr	x5, =KIMAGE_VADDR
+	add	x5, x5, x23			// add KASLR displacement
 	create_pgd_entry x0, x5, x3, x6
 	ldr	w6, kernel_img_size
 	add	x6, x6, x5
@@ -446,13 +491,52 @@ __mmap_switched:
 	bl	__pi_memset
 
 	dsb	ishst				// Make zero page visible to PTW
+
+#ifdef CONFIG_RELOCATABLE
+
+	/*
+	 * Iterate over each entry in the relocation table, and apply the
+	 * relocations in place.
+	 */
+	adr_l	x8, __dynsym_start		// start of symbol table
+	adr_l	x9, __reloc_start		// start of reloc table
+	adr_l	x10, __reloc_end		// end of reloc table
+
+0:	cmp	x9, x10
+	b.hs	2f
+	ldp	x11, x12, [x9], #24
+	ldr	x13, [x9, #-8]
+	cmp	w12, #R_AARCH64_RELATIVE
+	b.ne	1f
+	add	x13, x13, x23			// relocate
+	str	x13, [x11, x23]
+	b	0b
+
+1:	cmp	w12, #R_AARCH64_ABS64
+	b.ne	0b
+	add	x12, x12, x12, lsl #1		// symtab offset: 24x top word
+	add	x12, x8, x12, lsr #(32 - 3)	// ... shifted into bottom word
+	ldrsh	w14, [x12, #6]			// Elf64_Sym::st_shndx
+	ldr	x15, [x12, #8]			// Elf64_Sym::st_value
+	cmp	w14, #-0xf			// SHN_ABS (0xfff1) ?
+	add	x14, x15, x23			// relocate
+	csel	x15, x14, x15, ne
+	add	x15, x13, x15
+	str	x15, [x11, x23]
+	b	0b
+
+2:	adr_l	x8, kimage_vaddr		// make relocated kimage_vaddr
+	dc	cvac, x8			// value visible to secondaries
+	dsb	sy				// with MMU off
+#endif
+
 	adr_l	sp, initial_sp, x4
 	mov	x4, sp
 	and	x4, x4, #~(THREAD_SIZE - 1)
 	msr	sp_el0, x4			// Save thread_info
 	str_l	x21, __fdt_pointer, x5		// Save FDT pointer
 
-	ldr	x0, =KIMAGE_VADDR		// Save the offset between
+	ldr_l	x0, kimage_vaddr		// Save the offset between
 	sub	x24, x0, x24			// the kernel virtual and
 	str_l	x24, kimage_voffset, x0		// physical mappings
 
@@ -468,6 +552,10 @@ ENDPROC(__mmap_switched)
  * hotplug and needs to have the same protections as the text region
  */
 	.section ".text","ax"
+
+ENTRY(kimage_vaddr)
+	.quad		_text - TEXT_OFFSET
+
 /*
  * If we're fortunate enough to boot@EL2, ensure that the world is
  * sane before dropping to EL1.
@@ -628,7 +716,7 @@ ENTRY(secondary_startup)
 	adrp	x26, swapper_pg_dir
 	bl	__cpu_setup			// initialise processor
 
-	ldr	x8, =KIMAGE_VADDR
+	ldr	x8, kimage_vaddr
 	ldr	w9, 0f
 	sub	x27, x8, w9, sxtw		// address to jump to after enabling the MMU
 	b	__enable_mmu
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index 3a298b0e21bb..d38662028200 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -34,7 +34,8 @@ void *module_alloc(unsigned long size)
 {
 	void *p;
 
-	p = __vmalloc_node_range(size, MODULE_ALIGN, MODULES_VADDR, MODULES_END,
+	p = __vmalloc_node_range(size, MODULE_ALIGN,
+				kimage_vaddr - MODULES_VSIZE, kimage_vaddr,
 				GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
 				NUMA_NO_NODE, __builtin_return_address(0));
 
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index c67ba4453ec6..f8111894447c 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -288,16 +288,41 @@ static inline void __init relocate_initrd(void)
 }
 #endif
 
+static bool nokaslr;
+static int __init early_nokaslr(char *p)
+{
+	nokaslr = true;
+	return 0;
+}
+early_param("nokaslr", early_nokaslr);
+
+static void check_boot_args(void)
+{
+	if ((!IS_ENABLED(CONFIG_RANDOMIZE_BASE) && boot_args[1]) ||
+	    boot_args[2] || boot_args[3]) {
+		pr_err("WARNING: x1-x3 nonzero in violation of boot protocol:\n"
+			"\tx1: %016llx\n\tx2: %016llx\n\tx3: %016llx\n"
+			"This indicates a broken bootloader or old kernel\n",
+			boot_args[1], boot_args[2], boot_args[3]);
+	}
+	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && boot_args[1] && nokaslr) {
+		pr_err("WARNING: found KASLR entropy in x1 but 'nokaslr' was passed on the commmand line:\n"
+			"\tx1: %016llx\n"
+			"This indicates a broken bootloader\n",
+			boot_args[1]);
+	}
+}
+
 u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
 
 void __init setup_arch(char **cmdline_p)
 {
 	static struct vm_struct vmlinux_vm;
 
-	vmlinux_vm.addr		= (void *)KIMAGE_VADDR;
-	vmlinux_vm.size		= round_up((u64)_end - KIMAGE_VADDR,
+	vmlinux_vm.addr		= (void *)kimage_vaddr;
+	vmlinux_vm.size		= round_up((u64)_end - kimage_vaddr,
 					   SWAPPER_BLOCK_SIZE);
-	vmlinux_vm.phys_addr	= __pa(KIMAGE_VADDR);
+	vmlinux_vm.phys_addr	= __pa(kimage_vaddr);
 	vmlinux_vm.flags	= VM_MAP;
 	vmlinux_vm.caller	= setup_arch;
 
@@ -366,12 +391,7 @@ void __init setup_arch(char **cmdline_p)
 	conswitchp = &dummy_con;
 #endif
 #endif
-	if (boot_args[1] || boot_args[2] || boot_args[3]) {
-		pr_err("WARNING: x1-x3 nonzero in violation of boot protocol:\n"
-			"\tx1: %016llx\n\tx2: %016llx\n\tx3: %016llx\n"
-			"This indicates a broken bootloader or old kernel\n",
-			boot_args[1], boot_args[2], boot_args[3]);
-	}
+	check_boot_args();
 }
 
 static int __init arm64_device_init(void)
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index ced0dedcabcc..eddd234d7721 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -148,6 +148,15 @@ SECTIONS
 	.altinstr_replacement : {
 		*(.altinstr_replacement)
 	}
+	.rela : ALIGN(8) {
+		__reloc_start = .;
+		*(.rela .rela*)
+		__reloc_end = .;
+	}
+	.dynsym : ALIGN(8) {
+		__dynsym_start = .;
+		*(.dynsym)
+	}
 
 	. = ALIGN(PAGE_SIZE);
 	__init_end = .;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [kernel-hardening] [PATCH v3 17/21] arm64: add support for a relocatable kernel and KASLR
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

This adds support for runtime relocation of the kernel Image, by
building it as a PIE (ET_DYN) executable and applying the dynamic
relocations in the early boot code.

On top of this, support for KASLR is implemented, based on entropy
provided by the bootloader in register x1 at kernel entry. Depending
on the size of the address space (VA_BITS) and the page size, the
entropy in the virtual displacement is up to 13 bits (16k/2 levels)
and up to 25 bits (all 4 levels), with the caveat that displacements
that result in the kernel image straddling a 1GB/32MB/512MB alignment
boundary (for 4KB/16KB/64KB granule kernels, respectively) are not
allowed.

The same virtual offset is applied to the module region: this gives
almost the same security benefits, and keeps the modules in close
proximity to the kernel so we only have to rely on branches via PLTs
once the module region is exhausted (which is slightly more likely
to occur, as the relocated module region is shared with other uses
of the vmalloc area)

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 Documentation/arm64/booting.txt     | 16 +++-
 arch/arm64/Kconfig                  | 26 ++++++
 arch/arm64/Makefile                 |  4 +
 arch/arm64/include/asm/elf_relocs.h |  2 +
 arch/arm64/include/asm/memory.h     |  3 +
 arch/arm64/kernel/head.S            | 94 +++++++++++++++++++-
 arch/arm64/kernel/module.c          |  3 +-
 arch/arm64/kernel/setup.c           | 38 ++++++--
 arch/arm64/kernel/vmlinux.lds.S     |  9 ++
 9 files changed, 180 insertions(+), 15 deletions(-)

diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt
index 67484067ce4f..0bd5ea83a54f 100644
--- a/Documentation/arm64/booting.txt
+++ b/Documentation/arm64/booting.txt
@@ -115,13 +115,25 @@ Header notes:
 			    accessible
 			1 - 2MB aligned base may be anywhere in physical
 			    memory
-  Bits 4-63:	Reserved.
+  Bit 4:	Virtual address space layout randomization (KASLR)
+			0 - kernel will execute from a fixed virtual offset
+			    that is decided at compile time, register x1 should
+			    be zero at kernel entry
+			1 - kernel will execute from a virtual offset that is
+			    randomized based on the contents of register x1 at
+			    kernel entry
+  Bits 5-63:	Reserved.
 
 - When image_size is zero, a bootloader should attempt to keep as much
   memory as possible free for use by the kernel immediately after the
   end of the kernel image. The amount of space required will vary
   depending on selected features, and is effectively unbound.
 
+- It is up to the bootloader to decide whether a KASLR capable kernel should
+  boot with randomization enabled. If this is the case, register x1 should
+  contain a strong random value. If the bootloader passes 'nokaslr' on the
+  kernel command line to disable randomization, it must also pass 0 in x1.
+
 The Image must be placed text_offset bytes from a 2MB aligned base
 address anywhere in usable system RAM and called there. The region
 between the 2 MB aligned base address and the start of the image has no
@@ -145,7 +157,7 @@ Before jumping into the kernel, the following conditions must be met:
 
 - Primary CPU general-purpose register settings
   x0 = physical address of device tree blob (dtb) in system RAM.
-  x1 = 0 (reserved for future use)
+  x1 = 0, unless bit 4 is set in the Image header
   x2 = 0 (reserved for future use)
   x3 = 0 (reserved for future use)
 
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 778df20bf623..7fa5b74ee80d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -711,6 +711,32 @@ config ARM64_MODULE_PLTS
 	select ARM64_MODULE_CMODEL_LARGE
 	select HAVE_MOD_ARCH_SPECIFIC
 
+config RELOCATABLE
+	bool
+	help
+	  This builds the kernel as a Position Independent Executable (PIE),
+	  which retains all relocation metadata required to relocate the
+	  kernel binary at runtime to a different virtual address than the
+	  address it was linked at.
+	  Since AArch64 uses the RELA relocation format, this requires a
+	  relocation pass at runtime even if the kernel is loaded at the
+	  same address it was linked at.
+
+config RANDOMIZE_BASE
+	bool "Randomize the address of the kernel image"
+	select ARM64_MODULE_PLTS
+	select RELOCATABLE
+	help
+	  Randomizes the virtual address at which the kernel image is
+	  loaded, as a security feature that deters exploit attempts
+	  relying on knowledge of the location of kernel internals.
+
+	  It is the bootloader's job to provide entropy, by passing a
+	  random value in x1 at kernel entry.
+
+	  If unsure, say N.
+
+
 endmenu
 
 menu "Boot options"
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index db462980c6be..c3eaa03f9020 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -15,6 +15,10 @@ CPPFLAGS_vmlinux.lds = -DTEXT_OFFSET=$(TEXT_OFFSET)
 OBJCOPYFLAGS	:=-O binary -R .note -R .note.gnu.build-id -R .comment -S
 GZFLAGS		:=-9
 
+ifneq ($(CONFIG_RELOCATABLE),)
+LDFLAGS_vmlinux		+= -pie
+endif
+
 KBUILD_DEFCONFIG := defconfig
 
 # Check for binutils support for specific extensions
diff --git a/arch/arm64/include/asm/elf_relocs.h b/arch/arm64/include/asm/elf_relocs.h
index 3f6b93099011..e1316de840a5 100644
--- a/arch/arm64/include/asm/elf_relocs.h
+++ b/arch/arm64/include/asm/elf_relocs.h
@@ -69,5 +69,7 @@
 #define R_AARCH64_MOVW_PREL_G2_NC	292
 #define R_AARCH64_MOVW_PREL_G3		293
 
+#define R_AARCH64_RELATIVE		1027
+
 #endif /* __ASM_ELF_RELOCS_H */
 
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 758fb4a503ef..422a30a5f328 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -122,6 +122,9 @@ extern phys_addr_t		memstart_addr;
 /* PHYS_OFFSET - the physical address of the start of memory. */
 #define PHYS_OFFSET		({ memstart_addr; })
 
+/* the virtual base of the kernel image (minus TEXT_OFFSET) */
+extern u64			kimage_vaddr;
+
 /* the offset between the kernel virtual and physical mappings */
 extern u64			kimage_voffset;
 
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index d66aee595170..4bf6a5c9a24e 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -29,6 +29,7 @@
 #include <asm/asm-offsets.h>
 #include <asm/cache.h>
 #include <asm/cputype.h>
+#include <asm/elf_relocs.h>
 #include <asm/kernel-pgtable.h>
 #include <asm/memory.h>
 #include <asm/pgtable-hwdef.h>
@@ -61,9 +62,16 @@
 
 #define __HEAD_FLAG_PAGE_SIZE	((PAGE_SHIFT - 10) / 2)
 
+#ifdef CONFIG_RANDOMIZE_BASE
+#define __HEAD_FLAG_KASLR	1
+#else
+#define __HEAD_FLAG_KASLR	0
+#endif
+
 #define __HEAD_FLAGS		((__HEAD_FLAG_BE << 0) |	\
 				 (__HEAD_FLAG_PAGE_SIZE << 1) |	\
-				 (__HEAD_FLAG_PHYS_BASE << 3))
+				 (__HEAD_FLAG_PHYS_BASE << 3) |	\
+				 (__HEAD_FLAG_KASLR << 4))
 
 /*
  * Kernel startup entry point.
@@ -234,6 +242,7 @@ ENTRY(stext)
 	 */
 	ldr	x27, 0f				// address to jump to after
 						// MMU has been enabled
+	add	x27, x27, x23			// add KASLR displacement
 	adr_l	lr, __enable_mmu		// return (PIC) address
 	b	__cpu_setup			// initialise processor
 ENDPROC(stext)
@@ -245,6 +254,7 @@ ENDPROC(stext)
  */
 preserve_boot_args:
 	mov	x21, x0				// x21=FDT
+	mov	x22, x1				// x22=random seed
 
 	adr_l	x0, boot_args			// record the contents of
 	stp	x21, x1, [x0]			// x0 .. x3 at kernel entry
@@ -328,6 +338,40 @@ __create_page_tables:
 	adrp	x26, swapper_pg_dir
 	mov	x27, lr
 
+#ifdef CONFIG_RANDOMIZE_BASE
+	/*
+	 * Mask off the bits of the random value stored in x22 so it can serve
+	 * as a KASLR displacement value which will move the kernel image to a
+	 * random offset in the lower half of the VMALLOC area (VA_BITS - 2).
+	 * Even if we could randomize at page granularity for 16k and 64k
+	 * granule kernels, let's always preserve the 2 MB (21 bit) alignment
+	 * and not interfere with the ability to use ranges of contiguous PTEs.
+	 */
+	.set	RANDOM_WIDTH, VA_BITS - 2
+	.set	RANDOM_ALIGN, 21
+
+	mov	x10, ((1 << (RANDOM_WIDTH - RANDOM_ALIGN)) - 1) << RANDOM_ALIGN
+	and	x23, x22, x10
+
+	/*
+	 * The kernel Image should not extend across a 1GB/32MB/512MB alignment
+	 * boundary (for 4KB/16KB/64KB granule kernels, respectively). If this
+	 * happens, increase the KASLR displacement in x23 by the size of the
+	 * kernel image.
+	 */
+	ldr	w8, kernel_img_size
+	mov	x11, KIMAGE_VADDR & ((1 << SWAPPER_TABLE_SHIFT) - 1)
+	add	x11, x11, x23
+	add	x9, x8, x11
+	eor	x9, x9, x11
+	tbz	x9, SWAPPER_TABLE_SHIFT, 0f
+	add	x23, x23, x8
+	and	x23, x23, x10
+0:
+#else
+	mov	x23, xzr
+#endif
+
 	/*
 	 * Invalidate the idmap and swapper page tables to avoid potential
 	 * dirty cache lines being evicted.
@@ -405,6 +449,7 @@ __create_page_tables:
 	 */
 	mov	x0, x26				// swapper_pg_dir
 	ldr	x5, =KIMAGE_VADDR
+	add	x5, x5, x23			// add KASLR displacement
 	create_pgd_entry x0, x5, x3, x6
 	ldr	w6, kernel_img_size
 	add	x6, x6, x5
@@ -446,13 +491,52 @@ __mmap_switched:
 	bl	__pi_memset
 
 	dsb	ishst				// Make zero page visible to PTW
+
+#ifdef CONFIG_RELOCATABLE
+
+	/*
+	 * Iterate over each entry in the relocation table, and apply the
+	 * relocations in place.
+	 */
+	adr_l	x8, __dynsym_start		// start of symbol table
+	adr_l	x9, __reloc_start		// start of reloc table
+	adr_l	x10, __reloc_end		// end of reloc table
+
+0:	cmp	x9, x10
+	b.hs	2f
+	ldp	x11, x12, [x9], #24
+	ldr	x13, [x9, #-8]
+	cmp	w12, #R_AARCH64_RELATIVE
+	b.ne	1f
+	add	x13, x13, x23			// relocate
+	str	x13, [x11, x23]
+	b	0b
+
+1:	cmp	w12, #R_AARCH64_ABS64
+	b.ne	0b
+	add	x12, x12, x12, lsl #1		// symtab offset: 24x top word
+	add	x12, x8, x12, lsr #(32 - 3)	// ... shifted into bottom word
+	ldrsh	w14, [x12, #6]			// Elf64_Sym::st_shndx
+	ldr	x15, [x12, #8]			// Elf64_Sym::st_value
+	cmp	w14, #-0xf			// SHN_ABS (0xfff1) ?
+	add	x14, x15, x23			// relocate
+	csel	x15, x14, x15, ne
+	add	x15, x13, x15
+	str	x15, [x11, x23]
+	b	0b
+
+2:	adr_l	x8, kimage_vaddr		// make relocated kimage_vaddr
+	dc	cvac, x8			// value visible to secondaries
+	dsb	sy				// with MMU off
+#endif
+
 	adr_l	sp, initial_sp, x4
 	mov	x4, sp
 	and	x4, x4, #~(THREAD_SIZE - 1)
 	msr	sp_el0, x4			// Save thread_info
 	str_l	x21, __fdt_pointer, x5		// Save FDT pointer
 
-	ldr	x0, =KIMAGE_VADDR		// Save the offset between
+	ldr_l	x0, kimage_vaddr		// Save the offset between
 	sub	x24, x0, x24			// the kernel virtual and
 	str_l	x24, kimage_voffset, x0		// physical mappings
 
@@ -468,6 +552,10 @@ ENDPROC(__mmap_switched)
  * hotplug and needs to have the same protections as the text region
  */
 	.section ".text","ax"
+
+ENTRY(kimage_vaddr)
+	.quad		_text - TEXT_OFFSET
+
 /*
  * If we're fortunate enough to boot at EL2, ensure that the world is
  * sane before dropping to EL1.
@@ -628,7 +716,7 @@ ENTRY(secondary_startup)
 	adrp	x26, swapper_pg_dir
 	bl	__cpu_setup			// initialise processor
 
-	ldr	x8, =KIMAGE_VADDR
+	ldr	x8, kimage_vaddr
 	ldr	w9, 0f
 	sub	x27, x8, w9, sxtw		// address to jump to after enabling the MMU
 	b	__enable_mmu
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index 3a298b0e21bb..d38662028200 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -34,7 +34,8 @@ void *module_alloc(unsigned long size)
 {
 	void *p;
 
-	p = __vmalloc_node_range(size, MODULE_ALIGN, MODULES_VADDR, MODULES_END,
+	p = __vmalloc_node_range(size, MODULE_ALIGN,
+				kimage_vaddr - MODULES_VSIZE, kimage_vaddr,
 				GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
 				NUMA_NO_NODE, __builtin_return_address(0));
 
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index c67ba4453ec6..f8111894447c 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -288,16 +288,41 @@ static inline void __init relocate_initrd(void)
 }
 #endif
 
+static bool nokaslr;
+static int __init early_nokaslr(char *p)
+{
+	nokaslr = true;
+	return 0;
+}
+early_param("nokaslr", early_nokaslr);
+
+static void check_boot_args(void)
+{
+	if ((!IS_ENABLED(CONFIG_RANDOMIZE_BASE) && boot_args[1]) ||
+	    boot_args[2] || boot_args[3]) {
+		pr_err("WARNING: x1-x3 nonzero in violation of boot protocol:\n"
+			"\tx1: %016llx\n\tx2: %016llx\n\tx3: %016llx\n"
+			"This indicates a broken bootloader or old kernel\n",
+			boot_args[1], boot_args[2], boot_args[3]);
+	}
+	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && boot_args[1] && nokaslr) {
+		pr_err("WARNING: found KASLR entropy in x1 but 'nokaslr' was passed on the commmand line:\n"
+			"\tx1: %016llx\n"
+			"This indicates a broken bootloader\n",
+			boot_args[1]);
+	}
+}
+
 u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
 
 void __init setup_arch(char **cmdline_p)
 {
 	static struct vm_struct vmlinux_vm;
 
-	vmlinux_vm.addr		= (void *)KIMAGE_VADDR;
-	vmlinux_vm.size		= round_up((u64)_end - KIMAGE_VADDR,
+	vmlinux_vm.addr		= (void *)kimage_vaddr;
+	vmlinux_vm.size		= round_up((u64)_end - kimage_vaddr,
 					   SWAPPER_BLOCK_SIZE);
-	vmlinux_vm.phys_addr	= __pa(KIMAGE_VADDR);
+	vmlinux_vm.phys_addr	= __pa(kimage_vaddr);
 	vmlinux_vm.flags	= VM_MAP;
 	vmlinux_vm.caller	= setup_arch;
 
@@ -366,12 +391,7 @@ void __init setup_arch(char **cmdline_p)
 	conswitchp = &dummy_con;
 #endif
 #endif
-	if (boot_args[1] || boot_args[2] || boot_args[3]) {
-		pr_err("WARNING: x1-x3 nonzero in violation of boot protocol:\n"
-			"\tx1: %016llx\n\tx2: %016llx\n\tx3: %016llx\n"
-			"This indicates a broken bootloader or old kernel\n",
-			boot_args[1], boot_args[2], boot_args[3]);
-	}
+	check_boot_args();
 }
 
 static int __init arm64_device_init(void)
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index ced0dedcabcc..eddd234d7721 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -148,6 +148,15 @@ SECTIONS
 	.altinstr_replacement : {
 		*(.altinstr_replacement)
 	}
+	.rela : ALIGN(8) {
+		__reloc_start = .;
+		*(.rela .rela*)
+		__reloc_end = .;
+	}
+	.dynsym : ALIGN(8) {
+		__dynsym_start = .;
+		*(.dynsym)
+	}
 
 	. = ALIGN(PAGE_SIZE);
 	__init_end = .;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 18/21] efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
  2016-01-11 13:18 ` Ard Biesheuvel
  (?)
@ 2016-01-11 13:19   ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

This exposes the firmware's implementation of EFI_RNG_PROTOCOL via a new
function efi_get_random_bytes().

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 drivers/firmware/efi/libstub/Makefile  |  2 +-
 drivers/firmware/efi/libstub/efistub.h |  3 ++
 drivers/firmware/efi/libstub/random.c  | 35 ++++++++++++++++++++
 include/linux/efi.h                    |  5 ++-
 4 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index c0ddd1b8dca3..9f0c813d739c 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -23,7 +23,7 @@ KBUILD_CFLAGS			:= $(cflags-y) -DDISABLE_BRANCH_PROFILING \
 GCOV_PROFILE			:= n
 KASAN_SANITIZE			:= n
 
-lib-y				:= efi-stub-helper.o
+lib-y				:= efi-stub-helper.o random.o
 
 # include the stub's generic dependencies from lib/ when building for ARM/arm64
 arm-deps := fdt_rw.c fdt_ro.c fdt_wip.c fdt.c fdt_empty_tree.c fdt_sw.c sort.c
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index 6b6548fda089..206b7252b9d1 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -43,4 +43,7 @@ void efi_get_virtmap(efi_memory_desc_t *memory_map, unsigned long map_size,
 		     unsigned long desc_size, efi_memory_desc_t *runtime_map,
 		     int *count);
 
+efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table,
+				  unsigned long size, u8 *out);
+
 #endif
diff --git a/drivers/firmware/efi/libstub/random.c b/drivers/firmware/efi/libstub/random.c
new file mode 100644
index 000000000000..f539b1e31459
--- /dev/null
+++ b/drivers/firmware/efi/libstub/random.c
@@ -0,0 +1,35 @@
+/*
+ * Copyright (C) 2016 Linaro Ltd;  <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/efi.h>
+#include <asm/efi.h>
+
+#include "efistub.h"
+
+struct efi_rng_protocol_t {
+	efi_status_t (*get_info)(struct efi_rng_protocol_t *,
+				 unsigned long *, efi_guid_t *);
+	efi_status_t (*get_rng)(struct efi_rng_protocol_t *,
+				efi_guid_t *, unsigned long, u8 *out);
+};
+
+efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table,
+				  unsigned long size, u8 *out)
+{
+	efi_guid_t rng_proto = EFI_RNG_PROTOCOL_GUID;
+	efi_status_t status;
+	struct efi_rng_protocol_t *rng;
+
+	status = sys_table->boottime->locate_protocol(&rng_proto, NULL,
+						      (void **)&rng);
+	if (status != EFI_SUCCESS)
+		return status;
+
+	return rng->get_rng(rng, NULL, size, out);
+}
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 569b5a866bb1..13783fdc9bdd 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -299,7 +299,7 @@ typedef struct {
 	void *open_protocol_information;
 	void *protocols_per_handle;
 	void *locate_handle_buffer;
-	void *locate_protocol;
+	efi_status_t (*locate_protocol)(efi_guid_t *, void *, void **);
 	void *install_multiple_protocol_interfaces;
 	void *uninstall_multiple_protocol_interfaces;
 	void *calculate_crc32;
@@ -599,6 +599,9 @@ void efi_native_runtime_setup(void);
 #define EFI_PROPERTIES_TABLE_GUID \
     EFI_GUID(  0x880aaca3, 0x4adc, 0x4a04, 0x90, 0x79, 0xb7, 0x47, 0x34, 0x08, 0x25, 0xe5 )
 
+#define EFI_RNG_PROTOCOL_GUID \
+    EFI_GUID(  0x3152bca5, 0xeade, 0x433d, 0x86, 0x2e, 0xc0, 0x1c, 0xdc, 0x29, 0x1f, 0x44 )
+
 typedef struct {
 	efi_guid_t guid;
 	u64 table;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 18/21] efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel

This exposes the firmware's implementation of EFI_RNG_PROTOCOL via a new
function efi_get_random_bytes().

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 drivers/firmware/efi/libstub/Makefile  |  2 +-
 drivers/firmware/efi/libstub/efistub.h |  3 ++
 drivers/firmware/efi/libstub/random.c  | 35 ++++++++++++++++++++
 include/linux/efi.h                    |  5 ++-
 4 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index c0ddd1b8dca3..9f0c813d739c 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -23,7 +23,7 @@ KBUILD_CFLAGS			:= $(cflags-y) -DDISABLE_BRANCH_PROFILING \
 GCOV_PROFILE			:= n
 KASAN_SANITIZE			:= n
 
-lib-y				:= efi-stub-helper.o
+lib-y				:= efi-stub-helper.o random.o
 
 # include the stub's generic dependencies from lib/ when building for ARM/arm64
 arm-deps := fdt_rw.c fdt_ro.c fdt_wip.c fdt.c fdt_empty_tree.c fdt_sw.c sort.c
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index 6b6548fda089..206b7252b9d1 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -43,4 +43,7 @@ void efi_get_virtmap(efi_memory_desc_t *memory_map, unsigned long map_size,
 		     unsigned long desc_size, efi_memory_desc_t *runtime_map,
 		     int *count);
 
+efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table,
+				  unsigned long size, u8 *out);
+
 #endif
diff --git a/drivers/firmware/efi/libstub/random.c b/drivers/firmware/efi/libstub/random.c
new file mode 100644
index 000000000000..f539b1e31459
--- /dev/null
+++ b/drivers/firmware/efi/libstub/random.c
@@ -0,0 +1,35 @@
+/*
+ * Copyright (C) 2016 Linaro Ltd;  <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/efi.h>
+#include <asm/efi.h>
+
+#include "efistub.h"
+
+struct efi_rng_protocol_t {
+	efi_status_t (*get_info)(struct efi_rng_protocol_t *,
+				 unsigned long *, efi_guid_t *);
+	efi_status_t (*get_rng)(struct efi_rng_protocol_t *,
+				efi_guid_t *, unsigned long, u8 *out);
+};
+
+efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table,
+				  unsigned long size, u8 *out)
+{
+	efi_guid_t rng_proto = EFI_RNG_PROTOCOL_GUID;
+	efi_status_t status;
+	struct efi_rng_protocol_t *rng;
+
+	status = sys_table->boottime->locate_protocol(&rng_proto, NULL,
+						      (void **)&rng);
+	if (status != EFI_SUCCESS)
+		return status;
+
+	return rng->get_rng(rng, NULL, size, out);
+}
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 569b5a866bb1..13783fdc9bdd 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -299,7 +299,7 @@ typedef struct {
 	void *open_protocol_information;
 	void *protocols_per_handle;
 	void *locate_handle_buffer;
-	void *locate_protocol;
+	efi_status_t (*locate_protocol)(efi_guid_t *, void *, void **);
 	void *install_multiple_protocol_interfaces;
 	void *uninstall_multiple_protocol_interfaces;
 	void *calculate_crc32;
@@ -599,6 +599,9 @@ void efi_native_runtime_setup(void);
 #define EFI_PROPERTIES_TABLE_GUID \
     EFI_GUID(  0x880aaca3, 0x4adc, 0x4a04, 0x90, 0x79, 0xb7, 0x47, 0x34, 0x08, 0x25, 0xe5 )
 
+#define EFI_RNG_PROTOCOL_GUID \
+    EFI_GUID(  0x3152bca5, 0xeade, 0x433d, 0x86, 0x2e, 0xc0, 0x1c, 0xdc, 0x29, 0x1f, 0x44 )
+
 typedef struct {
 	efi_guid_t guid;
 	u64 table;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [kernel-hardening] [PATCH v3 18/21] efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

This exposes the firmware's implementation of EFI_RNG_PROTOCOL via a new
function efi_get_random_bytes().

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 drivers/firmware/efi/libstub/Makefile  |  2 +-
 drivers/firmware/efi/libstub/efistub.h |  3 ++
 drivers/firmware/efi/libstub/random.c  | 35 ++++++++++++++++++++
 include/linux/efi.h                    |  5 ++-
 4 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index c0ddd1b8dca3..9f0c813d739c 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -23,7 +23,7 @@ KBUILD_CFLAGS			:= $(cflags-y) -DDISABLE_BRANCH_PROFILING \
 GCOV_PROFILE			:= n
 KASAN_SANITIZE			:= n
 
-lib-y				:= efi-stub-helper.o
+lib-y				:= efi-stub-helper.o random.o
 
 # include the stub's generic dependencies from lib/ when building for ARM/arm64
 arm-deps := fdt_rw.c fdt_ro.c fdt_wip.c fdt.c fdt_empty_tree.c fdt_sw.c sort.c
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index 6b6548fda089..206b7252b9d1 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -43,4 +43,7 @@ void efi_get_virtmap(efi_memory_desc_t *memory_map, unsigned long map_size,
 		     unsigned long desc_size, efi_memory_desc_t *runtime_map,
 		     int *count);
 
+efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table,
+				  unsigned long size, u8 *out);
+
 #endif
diff --git a/drivers/firmware/efi/libstub/random.c b/drivers/firmware/efi/libstub/random.c
new file mode 100644
index 000000000000..f539b1e31459
--- /dev/null
+++ b/drivers/firmware/efi/libstub/random.c
@@ -0,0 +1,35 @@
+/*
+ * Copyright (C) 2016 Linaro Ltd;  <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/efi.h>
+#include <asm/efi.h>
+
+#include "efistub.h"
+
+struct efi_rng_protocol_t {
+	efi_status_t (*get_info)(struct efi_rng_protocol_t *,
+				 unsigned long *, efi_guid_t *);
+	efi_status_t (*get_rng)(struct efi_rng_protocol_t *,
+				efi_guid_t *, unsigned long, u8 *out);
+};
+
+efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table,
+				  unsigned long size, u8 *out)
+{
+	efi_guid_t rng_proto = EFI_RNG_PROTOCOL_GUID;
+	efi_status_t status;
+	struct efi_rng_protocol_t *rng;
+
+	status = sys_table->boottime->locate_protocol(&rng_proto, NULL,
+						      (void **)&rng);
+	if (status != EFI_SUCCESS)
+		return status;
+
+	return rng->get_rng(rng, NULL, size, out);
+}
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 569b5a866bb1..13783fdc9bdd 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -299,7 +299,7 @@ typedef struct {
 	void *open_protocol_information;
 	void *protocols_per_handle;
 	void *locate_handle_buffer;
-	void *locate_protocol;
+	efi_status_t (*locate_protocol)(efi_guid_t *, void *, void **);
 	void *install_multiple_protocol_interfaces;
 	void *uninstall_multiple_protocol_interfaces;
 	void *calculate_crc32;
@@ -599,6 +599,9 @@ void efi_native_runtime_setup(void);
 #define EFI_PROPERTIES_TABLE_GUID \
     EFI_GUID(  0x880aaca3, 0x4adc, 0x4a04, 0x90, 0x79, 0xb7, 0x47, 0x34, 0x08, 0x25, 0xe5 )
 
+#define EFI_RNG_PROTOCOL_GUID \
+    EFI_GUID(  0x3152bca5, 0xeade, 0x433d, 0x86, 0x2e, 0xc0, 0x1c, 0xdc, 0x29, 0x1f, 0x44 )
+
 typedef struct {
 	efi_guid_t guid;
 	u64 table;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 19/21] efi: stub: add implementation of efi_random_alloc()
  2016-01-11 13:18 ` Ard Biesheuvel
  (?)
@ 2016-01-11 13:19   ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel, Matt Fleming

This implements efi_random_alloc(), which allocates a chunk of memory of
a certain size at a certain alignment, and uses the random_seed argument
it receives to randomize the offset of the allocation.

This is implemented by iterating over the UEFI memory map, counting the
number of suitable slots (aligned offsets) within each region, and picking
a random number between 0 and 'number of slots - 1' to select the slot,
This should guarantee that each possible offset is chosen equally likely.

Suggested-by: Kees Cook <keescook@chromium.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 drivers/firmware/efi/libstub/efistub.h |  4 +
 drivers/firmware/efi/libstub/random.c  | 85 ++++++++++++++++++++
 2 files changed, 89 insertions(+)

diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index 206b7252b9d1..7a38e29da53d 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -46,4 +46,8 @@ void efi_get_virtmap(efi_memory_desc_t *memory_map, unsigned long map_size,
 efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table,
 				  unsigned long size, u8 *out);
 
+efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg,
+			      unsigned long size, unsigned long align_bits,
+			      unsigned long *addr, unsigned long random_seed);
+
 #endif
diff --git a/drivers/firmware/efi/libstub/random.c b/drivers/firmware/efi/libstub/random.c
index f539b1e31459..d4829824508c 100644
--- a/drivers/firmware/efi/libstub/random.c
+++ b/drivers/firmware/efi/libstub/random.c
@@ -33,3 +33,88 @@ efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table,
 
 	return rng->get_rng(rng, NULL, size, out);
 }
+
+/*
+ * Return a weight for a memory entry depending on how many offsets it covers
+ * that are suitably aligned and supply enough room for the allocation.
+ */
+static unsigned long get_entry_weight(efi_memory_desc_t *md, unsigned long size,
+				      unsigned long align_bits)
+{
+	u64 start, end;
+
+	if (md->type != EFI_CONVENTIONAL_MEMORY)
+		return 0;
+
+	if (!(md->attribute & EFI_MEMORY_WB))
+		return 0;
+
+	start = round_up(md->phys_addr, 1 << align_bits);
+	end = round_down(md->phys_addr + md->num_pages * EFI_PAGE_SIZE - size,
+			 1 << align_bits);
+
+	if (start >= end)
+		return 0;
+
+	return (end - start) >> align_bits;
+}
+
+/*
+ * The UEFI memory descriptors have a virtual address field that is only used
+ * when installing the virtual mapping using SetVirtualAddressMap(). Since it
+ * is unused here, we can reuse it to keep track of each descriptor's weight.
+ */
+#define MD_WEIGHT(md)	((md)->virt_addr)
+
+efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg,
+			      unsigned long size, unsigned long align_bits,
+			      unsigned long *addr, unsigned long random_seed)
+{
+	unsigned long map_size, desc_size, max_weight = 0, target;
+	efi_memory_desc_t *memory_map;
+	efi_status_t status = EFI_NOT_FOUND;
+	int l;
+
+	status = efi_get_memory_map(sys_table_arg, &memory_map, &map_size,
+				    &desc_size, NULL, NULL);
+	if (status != EFI_SUCCESS)
+		return status;
+
+	/* assign each entry in the memory map a weight */
+	for (l = 0; l < map_size; l += desc_size) {
+		efi_memory_desc_t *md = (void *)memory_map + l;
+		unsigned long weight;
+
+		weight = get_entry_weight(md, size, align_bits);
+		MD_WEIGHT(md) = weight;
+		max_weight += weight;
+	}
+
+	/* find a random number between 0 and max_weight */
+	target = (max_weight * (u16)random_seed) >> 16;
+
+	/* find the entry whose accumulated weight covers the target */
+	for (l = 0; l < map_size; l += desc_size) {
+		efi_memory_desc_t *md = (void *)memory_map + l;
+
+		if (target < MD_WEIGHT(md)) {
+			unsigned long pages;
+
+			*addr = round_up(md->phys_addr, 1 << align_bits) +
+				(target << align_bits);
+			pages = round_up(size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE;
+
+			status = efi_call_early(allocate_pages,
+						EFI_ALLOCATE_ADDRESS,
+						EFI_LOADER_DATA,
+						pages,
+						(efi_physical_addr_t *)addr);
+			break;
+		}
+		target -= MD_WEIGHT(md);
+	}
+
+	efi_call_early(free_pool, memory_map);
+
+	return status;
+}
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 19/21] efi: stub: add implementation of efi_random_alloc()
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel

This implements efi_random_alloc(), which allocates a chunk of memory of
a certain size at a certain alignment, and uses the random_seed argument
it receives to randomize the offset of the allocation.

This is implemented by iterating over the UEFI memory map, counting the
number of suitable slots (aligned offsets) within each region, and picking
a random number between 0 and 'number of slots - 1' to select the slot,
This should guarantee that each possible offset is chosen equally likely.

Suggested-by: Kees Cook <keescook@chromium.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 drivers/firmware/efi/libstub/efistub.h |  4 +
 drivers/firmware/efi/libstub/random.c  | 85 ++++++++++++++++++++
 2 files changed, 89 insertions(+)

diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index 206b7252b9d1..7a38e29da53d 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -46,4 +46,8 @@ void efi_get_virtmap(efi_memory_desc_t *memory_map, unsigned long map_size,
 efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table,
 				  unsigned long size, u8 *out);
 
+efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg,
+			      unsigned long size, unsigned long align_bits,
+			      unsigned long *addr, unsigned long random_seed);
+
 #endif
diff --git a/drivers/firmware/efi/libstub/random.c b/drivers/firmware/efi/libstub/random.c
index f539b1e31459..d4829824508c 100644
--- a/drivers/firmware/efi/libstub/random.c
+++ b/drivers/firmware/efi/libstub/random.c
@@ -33,3 +33,88 @@ efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table,
 
 	return rng->get_rng(rng, NULL, size, out);
 }
+
+/*
+ * Return a weight for a memory entry depending on how many offsets it covers
+ * that are suitably aligned and supply enough room for the allocation.
+ */
+static unsigned long get_entry_weight(efi_memory_desc_t *md, unsigned long size,
+				      unsigned long align_bits)
+{
+	u64 start, end;
+
+	if (md->type != EFI_CONVENTIONAL_MEMORY)
+		return 0;
+
+	if (!(md->attribute & EFI_MEMORY_WB))
+		return 0;
+
+	start = round_up(md->phys_addr, 1 << align_bits);
+	end = round_down(md->phys_addr + md->num_pages * EFI_PAGE_SIZE - size,
+			 1 << align_bits);
+
+	if (start >= end)
+		return 0;
+
+	return (end - start) >> align_bits;
+}
+
+/*
+ * The UEFI memory descriptors have a virtual address field that is only used
+ * when installing the virtual mapping using SetVirtualAddressMap(). Since it
+ * is unused here, we can reuse it to keep track of each descriptor's weight.
+ */
+#define MD_WEIGHT(md)	((md)->virt_addr)
+
+efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg,
+			      unsigned long size, unsigned long align_bits,
+			      unsigned long *addr, unsigned long random_seed)
+{
+	unsigned long map_size, desc_size, max_weight = 0, target;
+	efi_memory_desc_t *memory_map;
+	efi_status_t status = EFI_NOT_FOUND;
+	int l;
+
+	status = efi_get_memory_map(sys_table_arg, &memory_map, &map_size,
+				    &desc_size, NULL, NULL);
+	if (status != EFI_SUCCESS)
+		return status;
+
+	/* assign each entry in the memory map a weight */
+	for (l = 0; l < map_size; l += desc_size) {
+		efi_memory_desc_t *md = (void *)memory_map + l;
+		unsigned long weight;
+
+		weight = get_entry_weight(md, size, align_bits);
+		MD_WEIGHT(md) = weight;
+		max_weight += weight;
+	}
+
+	/* find a random number between 0 and max_weight */
+	target = (max_weight * (u16)random_seed) >> 16;
+
+	/* find the entry whose accumulated weight covers the target */
+	for (l = 0; l < map_size; l += desc_size) {
+		efi_memory_desc_t *md = (void *)memory_map + l;
+
+		if (target < MD_WEIGHT(md)) {
+			unsigned long pages;
+
+			*addr = round_up(md->phys_addr, 1 << align_bits) +
+				(target << align_bits);
+			pages = round_up(size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE;
+
+			status = efi_call_early(allocate_pages,
+						EFI_ALLOCATE_ADDRESS,
+						EFI_LOADER_DATA,
+						pages,
+						(efi_physical_addr_t *)addr);
+			break;
+		}
+		target -= MD_WEIGHT(md);
+	}
+
+	efi_call_early(free_pool, memory_map);
+
+	return status;
+}
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [kernel-hardening] [PATCH v3 19/21] efi: stub: add implementation of efi_random_alloc()
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel, Matt Fleming

This implements efi_random_alloc(), which allocates a chunk of memory of
a certain size at a certain alignment, and uses the random_seed argument
it receives to randomize the offset of the allocation.

This is implemented by iterating over the UEFI memory map, counting the
number of suitable slots (aligned offsets) within each region, and picking
a random number between 0 and 'number of slots - 1' to select the slot,
This should guarantee that each possible offset is chosen equally likely.

Suggested-by: Kees Cook <keescook@chromium.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 drivers/firmware/efi/libstub/efistub.h |  4 +
 drivers/firmware/efi/libstub/random.c  | 85 ++++++++++++++++++++
 2 files changed, 89 insertions(+)

diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index 206b7252b9d1..7a38e29da53d 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -46,4 +46,8 @@ void efi_get_virtmap(efi_memory_desc_t *memory_map, unsigned long map_size,
 efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table,
 				  unsigned long size, u8 *out);
 
+efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg,
+			      unsigned long size, unsigned long align_bits,
+			      unsigned long *addr, unsigned long random_seed);
+
 #endif
diff --git a/drivers/firmware/efi/libstub/random.c b/drivers/firmware/efi/libstub/random.c
index f539b1e31459..d4829824508c 100644
--- a/drivers/firmware/efi/libstub/random.c
+++ b/drivers/firmware/efi/libstub/random.c
@@ -33,3 +33,88 @@ efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table,
 
 	return rng->get_rng(rng, NULL, size, out);
 }
+
+/*
+ * Return a weight for a memory entry depending on how many offsets it covers
+ * that are suitably aligned and supply enough room for the allocation.
+ */
+static unsigned long get_entry_weight(efi_memory_desc_t *md, unsigned long size,
+				      unsigned long align_bits)
+{
+	u64 start, end;
+
+	if (md->type != EFI_CONVENTIONAL_MEMORY)
+		return 0;
+
+	if (!(md->attribute & EFI_MEMORY_WB))
+		return 0;
+
+	start = round_up(md->phys_addr, 1 << align_bits);
+	end = round_down(md->phys_addr + md->num_pages * EFI_PAGE_SIZE - size,
+			 1 << align_bits);
+
+	if (start >= end)
+		return 0;
+
+	return (end - start) >> align_bits;
+}
+
+/*
+ * The UEFI memory descriptors have a virtual address field that is only used
+ * when installing the virtual mapping using SetVirtualAddressMap(). Since it
+ * is unused here, we can reuse it to keep track of each descriptor's weight.
+ */
+#define MD_WEIGHT(md)	((md)->virt_addr)
+
+efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg,
+			      unsigned long size, unsigned long align_bits,
+			      unsigned long *addr, unsigned long random_seed)
+{
+	unsigned long map_size, desc_size, max_weight = 0, target;
+	efi_memory_desc_t *memory_map;
+	efi_status_t status = EFI_NOT_FOUND;
+	int l;
+
+	status = efi_get_memory_map(sys_table_arg, &memory_map, &map_size,
+				    &desc_size, NULL, NULL);
+	if (status != EFI_SUCCESS)
+		return status;
+
+	/* assign each entry in the memory map a weight */
+	for (l = 0; l < map_size; l += desc_size) {
+		efi_memory_desc_t *md = (void *)memory_map + l;
+		unsigned long weight;
+
+		weight = get_entry_weight(md, size, align_bits);
+		MD_WEIGHT(md) = weight;
+		max_weight += weight;
+	}
+
+	/* find a random number between 0 and max_weight */
+	target = (max_weight * (u16)random_seed) >> 16;
+
+	/* find the entry whose accumulated weight covers the target */
+	for (l = 0; l < map_size; l += desc_size) {
+		efi_memory_desc_t *md = (void *)memory_map + l;
+
+		if (target < MD_WEIGHT(md)) {
+			unsigned long pages;
+
+			*addr = round_up(md->phys_addr, 1 << align_bits) +
+				(target << align_bits);
+			pages = round_up(size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE;
+
+			status = efi_call_early(allocate_pages,
+						EFI_ALLOCATE_ADDRESS,
+						EFI_LOADER_DATA,
+						pages,
+						(efi_physical_addr_t *)addr);
+			break;
+		}
+		target -= MD_WEIGHT(md);
+	}
+
+	efi_call_early(free_pool, memory_map);
+
+	return status;
+}
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 20/21] efi: stub: use high allocation for converted command line
  2016-01-11 13:18 ` Ard Biesheuvel
  (?)
@ 2016-01-11 13:19   ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel, Matt Fleming

Before we can move the command line processing before the allocation
of the kernel, which is required for detecting the 'nokaslr' option
which controls that allocation, move the converted command line higher
up in memory, to prevent it from interfering with the kernel itself.

Since x86 needs the address to fit in 32 bits, use UINT_MAX as the upper
bound there. Otherwise, use ULONG_MAX (i.e., no limit)

Cc: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/x86/include/asm/efi.h                     |  2 ++
 drivers/firmware/efi/libstub/efi-stub-helper.c | 14 +++++++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 0010c78c4998..08b1f2f6ea50 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -25,6 +25,8 @@
 #define EFI32_LOADER_SIGNATURE	"EL32"
 #define EFI64_LOADER_SIGNATURE	"EL64"
 
+#define MAX_CMDLINE_ADDRESS	UINT_MAX
+
 #ifdef CONFIG_X86_32
 
 
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
index f07d4a67fa76..2a7a3015d7e0 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -649,6 +649,10 @@ static u8 *efi_utf16_to_utf8(u8 *dst, const u16 *src, int n)
 	return dst;
 }
 
+#ifndef MAX_CMDLINE_ADDRESS
+#define MAX_CMDLINE_ADDRESS	ULONG_MAX
+#endif
+
 /*
  * Convert the unicode UEFI command line to ASCII to pass to kernel.
  * Size of memory allocated return in *cmd_line_len.
@@ -684,7 +688,15 @@ char *efi_convert_cmdline(efi_system_table_t *sys_table_arg,
 
 	options_bytes++;	/* NUL termination */
 
-	status = efi_low_alloc(sys_table_arg, options_bytes, 0, &cmdline_addr);
+	/*
+	 * Allocate a buffer for the converted command line as high up
+	 * in memory as is feasible: x86 needs the command line allocation
+	 * to be below 4 GB, but non-x86 architectures may not have any
+	 * memory there. So prefer below 4 GB, and allocate anywhere if
+	 * that fails.
+	 */
+	status = efi_high_alloc(sys_table_arg, options_bytes, 0,
+				&cmdline_addr, MAX_CMDLINE_ADDRESS);
 	if (status != EFI_SUCCESS)
 		return NULL;
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 20/21] efi: stub: use high allocation for converted command line
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel

Before we can move the command line processing before the allocation
of the kernel, which is required for detecting the 'nokaslr' option
which controls that allocation, move the converted command line higher
up in memory, to prevent it from interfering with the kernel itself.

Since x86 needs the address to fit in 32 bits, use UINT_MAX as the upper
bound there. Otherwise, use ULONG_MAX (i.e., no limit)

Cc: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/x86/include/asm/efi.h                     |  2 ++
 drivers/firmware/efi/libstub/efi-stub-helper.c | 14 +++++++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 0010c78c4998..08b1f2f6ea50 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -25,6 +25,8 @@
 #define EFI32_LOADER_SIGNATURE	"EL32"
 #define EFI64_LOADER_SIGNATURE	"EL64"
 
+#define MAX_CMDLINE_ADDRESS	UINT_MAX
+
 #ifdef CONFIG_X86_32
 
 
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
index f07d4a67fa76..2a7a3015d7e0 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -649,6 +649,10 @@ static u8 *efi_utf16_to_utf8(u8 *dst, const u16 *src, int n)
 	return dst;
 }
 
+#ifndef MAX_CMDLINE_ADDRESS
+#define MAX_CMDLINE_ADDRESS	ULONG_MAX
+#endif
+
 /*
  * Convert the unicode UEFI command line to ASCII to pass to kernel.
  * Size of memory allocated return in *cmd_line_len.
@@ -684,7 +688,15 @@ char *efi_convert_cmdline(efi_system_table_t *sys_table_arg,
 
 	options_bytes++;	/* NUL termination */
 
-	status = efi_low_alloc(sys_table_arg, options_bytes, 0, &cmdline_addr);
+	/*
+	 * Allocate a buffer for the converted command line as high up
+	 * in memory as is feasible: x86 needs the command line allocation
+	 * to be below 4 GB, but non-x86 architectures may not have any
+	 * memory there. So prefer below 4 GB, and allocate anywhere if
+	 * that fails.
+	 */
+	status = efi_high_alloc(sys_table_arg, options_bytes, 0,
+				&cmdline_addr, MAX_CMDLINE_ADDRESS);
 	if (status != EFI_SUCCESS)
 		return NULL;
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [kernel-hardening] [PATCH v3 20/21] efi: stub: use high allocation for converted command line
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel, Matt Fleming

Before we can move the command line processing before the allocation
of the kernel, which is required for detecting the 'nokaslr' option
which controls that allocation, move the converted command line higher
up in memory, to prevent it from interfering with the kernel itself.

Since x86 needs the address to fit in 32 bits, use UINT_MAX as the upper
bound there. Otherwise, use ULONG_MAX (i.e., no limit)

Cc: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/x86/include/asm/efi.h                     |  2 ++
 drivers/firmware/efi/libstub/efi-stub-helper.c | 14 +++++++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 0010c78c4998..08b1f2f6ea50 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -25,6 +25,8 @@
 #define EFI32_LOADER_SIGNATURE	"EL32"
 #define EFI64_LOADER_SIGNATURE	"EL64"
 
+#define MAX_CMDLINE_ADDRESS	UINT_MAX
+
 #ifdef CONFIG_X86_32
 
 
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
index f07d4a67fa76..2a7a3015d7e0 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -649,6 +649,10 @@ static u8 *efi_utf16_to_utf8(u8 *dst, const u16 *src, int n)
 	return dst;
 }
 
+#ifndef MAX_CMDLINE_ADDRESS
+#define MAX_CMDLINE_ADDRESS	ULONG_MAX
+#endif
+
 /*
  * Convert the unicode UEFI command line to ASCII to pass to kernel.
  * Size of memory allocated return in *cmd_line_len.
@@ -684,7 +688,15 @@ char *efi_convert_cmdline(efi_system_table_t *sys_table_arg,
 
 	options_bytes++;	/* NUL termination */
 
-	status = efi_low_alloc(sys_table_arg, options_bytes, 0, &cmdline_addr);
+	/*
+	 * Allocate a buffer for the converted command line as high up
+	 * in memory as is feasible: x86 needs the command line allocation
+	 * to be below 4 GB, but non-x86 architectures may not have any
+	 * memory there. So prefer below 4 GB, and allocate anywhere if
+	 * that fails.
+	 */
+	status = efi_high_alloc(sys_table_arg, options_bytes, 0,
+				&cmdline_addr, MAX_CMDLINE_ADDRESS);
 	if (status != EFI_SUCCESS)
 		return NULL;
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 21/21] arm64: efi: invoke EFI_RNG_PROTOCOL to supply KASLR randomness
  2016-01-11 13:18 ` Ard Biesheuvel
  (?)
@ 2016-01-11 13:19   ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

Since arm64 does not use a decompressor that supplies an execution
environment where it is feasible to some extent to provide a source of
randomness, the arm64 KASLR kernel depends on the bootloader to supply
some random bits in register x1 upon kernel entry.

On UEFI systems, we can use the EFI_RNG_PROTOCOL, if supplied, to obtain
some random bits. At the same time, use it to randomize the offset of the
kernel Image in physical memory.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/Kconfig                             |  5 ++
 arch/arm64/kernel/efi-entry.S                  |  7 +-
 drivers/firmware/efi/libstub/arm-stub.c        | 17 ++---
 drivers/firmware/efi/libstub/arm64-stub.c      | 67 +++++++++++++++-----
 drivers/firmware/efi/libstub/efi-stub-helper.c | 10 +++
 drivers/firmware/efi/libstub/efistub.h         |  2 +
 6 files changed, 82 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7fa5b74ee80d..ba347302308b 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -734,6 +734,11 @@ config RANDOMIZE_BASE
 	  It is the bootloader's job to provide entropy, by passing a
 	  random value in x1 at kernel entry.
 
+	  When booting via the UEFI stub, it will invoke the firmware's
+	  EFI_RNG_PROTOCOL implementation (if available) to supply entropy
+	  to the kernel proper. In addition, it will randomise the physical
+	  location of the kernel Image as well.
+
 	  If unsure, say N.
 
 
diff --git a/arch/arm64/kernel/efi-entry.S b/arch/arm64/kernel/efi-entry.S
index f82036e02485..f41073dde7e0 100644
--- a/arch/arm64/kernel/efi-entry.S
+++ b/arch/arm64/kernel/efi-entry.S
@@ -110,7 +110,7 @@ ENTRY(entry)
 2:
 	/* Jump to kernel entry point */
 	mov	x0, x20
-	mov	x1, xzr
+	ldr	x1, efi_rnd
 	mov	x2, xzr
 	mov	x3, xzr
 	br	x21
@@ -119,6 +119,9 @@ efi_load_fail:
 	mov	x0, #EFI_LOAD_ERROR
 	ldp	x29, x30, [sp], #32
 	ret
+ENDPROC(entry)
+
+ENTRY(efi_rnd)
+	.quad	0, 0
 
 entry_end:
-ENDPROC(entry)
diff --git a/drivers/firmware/efi/libstub/arm-stub.c b/drivers/firmware/efi/libstub/arm-stub.c
index 950c87f5d279..c39e04a1e8aa 100644
--- a/drivers/firmware/efi/libstub/arm-stub.c
+++ b/drivers/firmware/efi/libstub/arm-stub.c
@@ -207,14 +207,6 @@ unsigned long efi_entry(void *handle, efi_system_table_t *sys_table,
 		pr_efi_err(sys_table, "Failed to find DRAM base\n");
 		goto fail;
 	}
-	status = handle_kernel_image(sys_table, image_addr, &image_size,
-				     &reserve_addr,
-				     &reserve_size,
-				     dram_base, image);
-	if (status != EFI_SUCCESS) {
-		pr_efi_err(sys_table, "Failed to relocate kernel\n");
-		goto fail;
-	}
 
 	/*
 	 * Get the command line from EFI, using the LOADED_IMAGE
@@ -231,6 +223,15 @@ unsigned long efi_entry(void *handle, efi_system_table_t *sys_table,
 	if (status != EFI_SUCCESS)
 		pr_efi_err(sys_table, "Failed to parse EFI cmdline options\n");
 
+	status = handle_kernel_image(sys_table, image_addr, &image_size,
+				     &reserve_addr,
+				     &reserve_size,
+				     dram_base, image);
+	if (status != EFI_SUCCESS) {
+		pr_efi_err(sys_table, "Failed to relocate kernel\n");
+		goto fail;
+	}
+
 	/*
 	 * Unauthenticated device tree data is a security hazard, so
 	 * ignore 'dtb=' unless UEFI Secure Boot is disabled.
diff --git a/drivers/firmware/efi/libstub/arm64-stub.c b/drivers/firmware/efi/libstub/arm64-stub.c
index 78dfbd34b6bf..96d43bed098f 100644
--- a/drivers/firmware/efi/libstub/arm64-stub.c
+++ b/drivers/firmware/efi/libstub/arm64-stub.c
@@ -13,6 +13,13 @@
 #include <asm/efi.h>
 #include <asm/sections.h>
 
+#include "efistub.h"
+
+extern struct {
+	u64	virt_seed;
+	u64	phys_seed;
+} efi_rnd;
+
 efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table_arg,
 					unsigned long *image_addr,
 					unsigned long *image_size,
@@ -27,6 +34,22 @@ efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table_arg,
 	void *old_image_addr = (void *)*image_addr;
 	unsigned long preferred_offset;
 
+	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
+		if (!__nokaslr) {
+			status = efi_get_random_bytes(sys_table_arg,
+						      sizeof(efi_rnd),
+						      (u8 *)&efi_rnd);
+			if (status == EFI_NOT_FOUND) {
+				pr_efi(sys_table_arg, "EFI_RNG_PROTOCOL unavailable, no randomness supplied\n");
+			} else if (status != EFI_SUCCESS) {
+				pr_efi_err(sys_table_arg, "efi_get_random_bytes() failed\n");
+				return status;
+			}
+		} else {
+			pr_efi(sys_table_arg, "KASLR disabled on kernel command line\n");
+		}
+	}
+
 	/*
 	 * The preferred offset of the kernel Image is TEXT_OFFSET bytes beyond
 	 * a 2 MB aligned base, which itself may be lower than dram_base, as
@@ -36,13 +59,22 @@ efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table_arg,
 	if (preferred_offset < dram_base)
 		preferred_offset += SZ_2M;
 
-	/* Relocate the image, if required. */
 	kernel_size = _edata - _text;
-	if (*image_addr != preferred_offset) {
-		kernel_memsize = kernel_size + (_end - _edata);
+	kernel_memsize = kernel_size + (_end - _edata);
+
+	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && efi_rnd.phys_seed != 0) {
+		/*
+		 * If KASLR is enabled, and we have some randomness available,
+		 * locate the kernel at a randomized offset in physical memory.
+		 */
+		status = efi_random_alloc(sys_table_arg, kernel_size,
+					  ilog2(MIN_KIMG_ALIGN), reserve_addr,
+					  efi_rnd.phys_seed);
 
+		*image_addr = *reserve_addr + TEXT_OFFSET;
+	} else {
 		/*
-		 * First, try a straight allocation at the preferred offset.
+		 * Else, try a straight allocation at the preferred offset.
 		 * This will work around the issue where, if dram_base == 0x0,
 		 * efi_low_alloc() refuses to allocate at 0x0 (to prevent the
 		 * address of the allocation to be mistaken for a FAIL return
@@ -52,27 +84,30 @@ efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table_arg,
 		 * Mustang), we can still place the kernel at the address
 		 * 'dram_base + TEXT_OFFSET'.
 		 */
+		if (*image_addr == preferred_offset)
+			return EFI_SUCCESS;
+
 		*image_addr = *reserve_addr = preferred_offset;
 		nr_pages = round_up(kernel_memsize, EFI_ALLOC_ALIGN) /
 			   EFI_PAGE_SIZE;
 		status = efi_call_early(allocate_pages, EFI_ALLOCATE_ADDRESS,
 					EFI_LOADER_DATA, nr_pages,
 					(efi_physical_addr_t *)reserve_addr);
-		if (status != EFI_SUCCESS) {
-			kernel_memsize += TEXT_OFFSET;
-			status = efi_low_alloc(sys_table_arg, kernel_memsize,
-					       SZ_2M, reserve_addr);
+	}
 
-			if (status != EFI_SUCCESS) {
-				pr_efi_err(sys_table_arg, "Failed to relocate kernel\n");
-				return status;
-			}
-			*image_addr = *reserve_addr + TEXT_OFFSET;
+	if (status != EFI_SUCCESS) {
+		kernel_memsize += TEXT_OFFSET;
+		status = efi_low_alloc(sys_table_arg, kernel_memsize,
+				       SZ_2M, reserve_addr);
+
+		if (status != EFI_SUCCESS) {
+			pr_efi_err(sys_table_arg, "Failed to relocate kernel\n");
+			return status;
 		}
-		memcpy((void *)*image_addr, old_image_addr, kernel_size);
-		*reserve_size = kernel_memsize;
+		*image_addr = *reserve_addr + TEXT_OFFSET;
 	}
-
+	memcpy((void *)*image_addr, old_image_addr, kernel_size);
+	*reserve_size = kernel_memsize;
 
 	return EFI_SUCCESS;
 }
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
index 2a7a3015d7e0..e8a3b8cd53cc 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -32,6 +32,10 @@
 
 static unsigned long __chunk_size = EFI_READ_CHUNK_SIZE;
 
+#ifdef CONFIG_RANDOMIZE_BASE
+bool __nokaslr;
+#endif
+
 /*
  * Allow the platform to override the allocation granularity: this allows
  * systems that have the capability to run with a larger page size to deal
@@ -317,6 +321,12 @@ efi_status_t efi_parse_options(char *cmdline)
 {
 	char *str;
 
+#ifdef CONFIG_RANDOMIZE_BASE
+	str = strstr(cmdline, "nokaslr");
+	if (str && (str == cmdline || *(str - 1) == ' '))
+		__nokaslr = true;
+#endif
+
 	/*
 	 * If no EFI parameters were specified on the cmdline we've got
 	 * nothing to do.
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index 7a38e29da53d..250ed4737298 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -5,6 +5,8 @@
 /* error code which can't be mistaken for valid address */
 #define EFI_ERROR	(~0UL)
 
+extern bool __nokaslr;
+
 void efi_char16_printk(efi_system_table_t *, efi_char16_t *);
 
 efi_status_t efi_open_volume(efi_system_table_t *sys_table_arg, void *__image,
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [PATCH v3 21/21] arm64: efi: invoke EFI_RNG_PROTOCOL to supply KASLR randomness
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel

Since arm64 does not use a decompressor that supplies an execution
environment where it is feasible to some extent to provide a source of
randomness, the arm64 KASLR kernel depends on the bootloader to supply
some random bits in register x1 upon kernel entry.

On UEFI systems, we can use the EFI_RNG_PROTOCOL, if supplied, to obtain
some random bits. At the same time, use it to randomize the offset of the
kernel Image in physical memory.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/Kconfig                             |  5 ++
 arch/arm64/kernel/efi-entry.S                  |  7 +-
 drivers/firmware/efi/libstub/arm-stub.c        | 17 ++---
 drivers/firmware/efi/libstub/arm64-stub.c      | 67 +++++++++++++++-----
 drivers/firmware/efi/libstub/efi-stub-helper.c | 10 +++
 drivers/firmware/efi/libstub/efistub.h         |  2 +
 6 files changed, 82 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7fa5b74ee80d..ba347302308b 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -734,6 +734,11 @@ config RANDOMIZE_BASE
 	  It is the bootloader's job to provide entropy, by passing a
 	  random value in x1 at kernel entry.
 
+	  When booting via the UEFI stub, it will invoke the firmware's
+	  EFI_RNG_PROTOCOL implementation (if available) to supply entropy
+	  to the kernel proper. In addition, it will randomise the physical
+	  location of the kernel Image as well.
+
 	  If unsure, say N.
 
 
diff --git a/arch/arm64/kernel/efi-entry.S b/arch/arm64/kernel/efi-entry.S
index f82036e02485..f41073dde7e0 100644
--- a/arch/arm64/kernel/efi-entry.S
+++ b/arch/arm64/kernel/efi-entry.S
@@ -110,7 +110,7 @@ ENTRY(entry)
 2:
 	/* Jump to kernel entry point */
 	mov	x0, x20
-	mov	x1, xzr
+	ldr	x1, efi_rnd
 	mov	x2, xzr
 	mov	x3, xzr
 	br	x21
@@ -119,6 +119,9 @@ efi_load_fail:
 	mov	x0, #EFI_LOAD_ERROR
 	ldp	x29, x30, [sp], #32
 	ret
+ENDPROC(entry)
+
+ENTRY(efi_rnd)
+	.quad	0, 0
 
 entry_end:
-ENDPROC(entry)
diff --git a/drivers/firmware/efi/libstub/arm-stub.c b/drivers/firmware/efi/libstub/arm-stub.c
index 950c87f5d279..c39e04a1e8aa 100644
--- a/drivers/firmware/efi/libstub/arm-stub.c
+++ b/drivers/firmware/efi/libstub/arm-stub.c
@@ -207,14 +207,6 @@ unsigned long efi_entry(void *handle, efi_system_table_t *sys_table,
 		pr_efi_err(sys_table, "Failed to find DRAM base\n");
 		goto fail;
 	}
-	status = handle_kernel_image(sys_table, image_addr, &image_size,
-				     &reserve_addr,
-				     &reserve_size,
-				     dram_base, image);
-	if (status != EFI_SUCCESS) {
-		pr_efi_err(sys_table, "Failed to relocate kernel\n");
-		goto fail;
-	}
 
 	/*
 	 * Get the command line from EFI, using the LOADED_IMAGE
@@ -231,6 +223,15 @@ unsigned long efi_entry(void *handle, efi_system_table_t *sys_table,
 	if (status != EFI_SUCCESS)
 		pr_efi_err(sys_table, "Failed to parse EFI cmdline options\n");
 
+	status = handle_kernel_image(sys_table, image_addr, &image_size,
+				     &reserve_addr,
+				     &reserve_size,
+				     dram_base, image);
+	if (status != EFI_SUCCESS) {
+		pr_efi_err(sys_table, "Failed to relocate kernel\n");
+		goto fail;
+	}
+
 	/*
 	 * Unauthenticated device tree data is a security hazard, so
 	 * ignore 'dtb=' unless UEFI Secure Boot is disabled.
diff --git a/drivers/firmware/efi/libstub/arm64-stub.c b/drivers/firmware/efi/libstub/arm64-stub.c
index 78dfbd34b6bf..96d43bed098f 100644
--- a/drivers/firmware/efi/libstub/arm64-stub.c
+++ b/drivers/firmware/efi/libstub/arm64-stub.c
@@ -13,6 +13,13 @@
 #include <asm/efi.h>
 #include <asm/sections.h>
 
+#include "efistub.h"
+
+extern struct {
+	u64	virt_seed;
+	u64	phys_seed;
+} efi_rnd;
+
 efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table_arg,
 					unsigned long *image_addr,
 					unsigned long *image_size,
@@ -27,6 +34,22 @@ efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table_arg,
 	void *old_image_addr = (void *)*image_addr;
 	unsigned long preferred_offset;
 
+	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
+		if (!__nokaslr) {
+			status = efi_get_random_bytes(sys_table_arg,
+						      sizeof(efi_rnd),
+						      (u8 *)&efi_rnd);
+			if (status == EFI_NOT_FOUND) {
+				pr_efi(sys_table_arg, "EFI_RNG_PROTOCOL unavailable, no randomness supplied\n");
+			} else if (status != EFI_SUCCESS) {
+				pr_efi_err(sys_table_arg, "efi_get_random_bytes() failed\n");
+				return status;
+			}
+		} else {
+			pr_efi(sys_table_arg, "KASLR disabled on kernel command line\n");
+		}
+	}
+
 	/*
 	 * The preferred offset of the kernel Image is TEXT_OFFSET bytes beyond
 	 * a 2 MB aligned base, which itself may be lower than dram_base, as
@@ -36,13 +59,22 @@ efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table_arg,
 	if (preferred_offset < dram_base)
 		preferred_offset += SZ_2M;
 
-	/* Relocate the image, if required. */
 	kernel_size = _edata - _text;
-	if (*image_addr != preferred_offset) {
-		kernel_memsize = kernel_size + (_end - _edata);
+	kernel_memsize = kernel_size + (_end - _edata);
+
+	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && efi_rnd.phys_seed != 0) {
+		/*
+		 * If KASLR is enabled, and we have some randomness available,
+		 * locate the kernel at a randomized offset in physical memory.
+		 */
+		status = efi_random_alloc(sys_table_arg, kernel_size,
+					  ilog2(MIN_KIMG_ALIGN), reserve_addr,
+					  efi_rnd.phys_seed);
 
+		*image_addr = *reserve_addr + TEXT_OFFSET;
+	} else {
 		/*
-		 * First, try a straight allocation at the preferred offset.
+		 * Else, try a straight allocation at the preferred offset.
 		 * This will work around the issue where, if dram_base == 0x0,
 		 * efi_low_alloc() refuses to allocate at 0x0 (to prevent the
 		 * address of the allocation to be mistaken for a FAIL return
@@ -52,27 +84,30 @@ efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table_arg,
 		 * Mustang), we can still place the kernel@the address
 		 * 'dram_base + TEXT_OFFSET'.
 		 */
+		if (*image_addr == preferred_offset)
+			return EFI_SUCCESS;
+
 		*image_addr = *reserve_addr = preferred_offset;
 		nr_pages = round_up(kernel_memsize, EFI_ALLOC_ALIGN) /
 			   EFI_PAGE_SIZE;
 		status = efi_call_early(allocate_pages, EFI_ALLOCATE_ADDRESS,
 					EFI_LOADER_DATA, nr_pages,
 					(efi_physical_addr_t *)reserve_addr);
-		if (status != EFI_SUCCESS) {
-			kernel_memsize += TEXT_OFFSET;
-			status = efi_low_alloc(sys_table_arg, kernel_memsize,
-					       SZ_2M, reserve_addr);
+	}
 
-			if (status != EFI_SUCCESS) {
-				pr_efi_err(sys_table_arg, "Failed to relocate kernel\n");
-				return status;
-			}
-			*image_addr = *reserve_addr + TEXT_OFFSET;
+	if (status != EFI_SUCCESS) {
+		kernel_memsize += TEXT_OFFSET;
+		status = efi_low_alloc(sys_table_arg, kernel_memsize,
+				       SZ_2M, reserve_addr);
+
+		if (status != EFI_SUCCESS) {
+			pr_efi_err(sys_table_arg, "Failed to relocate kernel\n");
+			return status;
 		}
-		memcpy((void *)*image_addr, old_image_addr, kernel_size);
-		*reserve_size = kernel_memsize;
+		*image_addr = *reserve_addr + TEXT_OFFSET;
 	}
-
+	memcpy((void *)*image_addr, old_image_addr, kernel_size);
+	*reserve_size = kernel_memsize;
 
 	return EFI_SUCCESS;
 }
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
index 2a7a3015d7e0..e8a3b8cd53cc 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -32,6 +32,10 @@
 
 static unsigned long __chunk_size = EFI_READ_CHUNK_SIZE;
 
+#ifdef CONFIG_RANDOMIZE_BASE
+bool __nokaslr;
+#endif
+
 /*
  * Allow the platform to override the allocation granularity: this allows
  * systems that have the capability to run with a larger page size to deal
@@ -317,6 +321,12 @@ efi_status_t efi_parse_options(char *cmdline)
 {
 	char *str;
 
+#ifdef CONFIG_RANDOMIZE_BASE
+	str = strstr(cmdline, "nokaslr");
+	if (str && (str == cmdline || *(str - 1) == ' '))
+		__nokaslr = true;
+#endif
+
 	/*
 	 * If no EFI parameters were specified on the cmdline we've got
 	 * nothing to do.
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index 7a38e29da53d..250ed4737298 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -5,6 +5,8 @@
 /* error code which can't be mistaken for valid address */
 #define EFI_ERROR	(~0UL)
 
+extern bool __nokaslr;
+
 void efi_char16_printk(efi_system_table_t *, efi_char16_t *);
 
 efi_status_t efi_open_volume(efi_system_table_t *sys_table_arg, void *__image,
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* [kernel-hardening] [PATCH v3 21/21] arm64: efi: invoke EFI_RNG_PROTOCOL to supply KASLR randomness
@ 2016-01-11 13:19   ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:19 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel
  Cc: stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall, Ard Biesheuvel

Since arm64 does not use a decompressor that supplies an execution
environment where it is feasible to some extent to provide a source of
randomness, the arm64 KASLR kernel depends on the bootloader to supply
some random bits in register x1 upon kernel entry.

On UEFI systems, we can use the EFI_RNG_PROTOCOL, if supplied, to obtain
some random bits. At the same time, use it to randomize the offset of the
kernel Image in physical memory.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/Kconfig                             |  5 ++
 arch/arm64/kernel/efi-entry.S                  |  7 +-
 drivers/firmware/efi/libstub/arm-stub.c        | 17 ++---
 drivers/firmware/efi/libstub/arm64-stub.c      | 67 +++++++++++++++-----
 drivers/firmware/efi/libstub/efi-stub-helper.c | 10 +++
 drivers/firmware/efi/libstub/efistub.h         |  2 +
 6 files changed, 82 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7fa5b74ee80d..ba347302308b 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -734,6 +734,11 @@ config RANDOMIZE_BASE
 	  It is the bootloader's job to provide entropy, by passing a
 	  random value in x1 at kernel entry.
 
+	  When booting via the UEFI stub, it will invoke the firmware's
+	  EFI_RNG_PROTOCOL implementation (if available) to supply entropy
+	  to the kernel proper. In addition, it will randomise the physical
+	  location of the kernel Image as well.
+
 	  If unsure, say N.
 
 
diff --git a/arch/arm64/kernel/efi-entry.S b/arch/arm64/kernel/efi-entry.S
index f82036e02485..f41073dde7e0 100644
--- a/arch/arm64/kernel/efi-entry.S
+++ b/arch/arm64/kernel/efi-entry.S
@@ -110,7 +110,7 @@ ENTRY(entry)
 2:
 	/* Jump to kernel entry point */
 	mov	x0, x20
-	mov	x1, xzr
+	ldr	x1, efi_rnd
 	mov	x2, xzr
 	mov	x3, xzr
 	br	x21
@@ -119,6 +119,9 @@ efi_load_fail:
 	mov	x0, #EFI_LOAD_ERROR
 	ldp	x29, x30, [sp], #32
 	ret
+ENDPROC(entry)
+
+ENTRY(efi_rnd)
+	.quad	0, 0
 
 entry_end:
-ENDPROC(entry)
diff --git a/drivers/firmware/efi/libstub/arm-stub.c b/drivers/firmware/efi/libstub/arm-stub.c
index 950c87f5d279..c39e04a1e8aa 100644
--- a/drivers/firmware/efi/libstub/arm-stub.c
+++ b/drivers/firmware/efi/libstub/arm-stub.c
@@ -207,14 +207,6 @@ unsigned long efi_entry(void *handle, efi_system_table_t *sys_table,
 		pr_efi_err(sys_table, "Failed to find DRAM base\n");
 		goto fail;
 	}
-	status = handle_kernel_image(sys_table, image_addr, &image_size,
-				     &reserve_addr,
-				     &reserve_size,
-				     dram_base, image);
-	if (status != EFI_SUCCESS) {
-		pr_efi_err(sys_table, "Failed to relocate kernel\n");
-		goto fail;
-	}
 
 	/*
 	 * Get the command line from EFI, using the LOADED_IMAGE
@@ -231,6 +223,15 @@ unsigned long efi_entry(void *handle, efi_system_table_t *sys_table,
 	if (status != EFI_SUCCESS)
 		pr_efi_err(sys_table, "Failed to parse EFI cmdline options\n");
 
+	status = handle_kernel_image(sys_table, image_addr, &image_size,
+				     &reserve_addr,
+				     &reserve_size,
+				     dram_base, image);
+	if (status != EFI_SUCCESS) {
+		pr_efi_err(sys_table, "Failed to relocate kernel\n");
+		goto fail;
+	}
+
 	/*
 	 * Unauthenticated device tree data is a security hazard, so
 	 * ignore 'dtb=' unless UEFI Secure Boot is disabled.
diff --git a/drivers/firmware/efi/libstub/arm64-stub.c b/drivers/firmware/efi/libstub/arm64-stub.c
index 78dfbd34b6bf..96d43bed098f 100644
--- a/drivers/firmware/efi/libstub/arm64-stub.c
+++ b/drivers/firmware/efi/libstub/arm64-stub.c
@@ -13,6 +13,13 @@
 #include <asm/efi.h>
 #include <asm/sections.h>
 
+#include "efistub.h"
+
+extern struct {
+	u64	virt_seed;
+	u64	phys_seed;
+} efi_rnd;
+
 efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table_arg,
 					unsigned long *image_addr,
 					unsigned long *image_size,
@@ -27,6 +34,22 @@ efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table_arg,
 	void *old_image_addr = (void *)*image_addr;
 	unsigned long preferred_offset;
 
+	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
+		if (!__nokaslr) {
+			status = efi_get_random_bytes(sys_table_arg,
+						      sizeof(efi_rnd),
+						      (u8 *)&efi_rnd);
+			if (status == EFI_NOT_FOUND) {
+				pr_efi(sys_table_arg, "EFI_RNG_PROTOCOL unavailable, no randomness supplied\n");
+			} else if (status != EFI_SUCCESS) {
+				pr_efi_err(sys_table_arg, "efi_get_random_bytes() failed\n");
+				return status;
+			}
+		} else {
+			pr_efi(sys_table_arg, "KASLR disabled on kernel command line\n");
+		}
+	}
+
 	/*
 	 * The preferred offset of the kernel Image is TEXT_OFFSET bytes beyond
 	 * a 2 MB aligned base, which itself may be lower than dram_base, as
@@ -36,13 +59,22 @@ efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table_arg,
 	if (preferred_offset < dram_base)
 		preferred_offset += SZ_2M;
 
-	/* Relocate the image, if required. */
 	kernel_size = _edata - _text;
-	if (*image_addr != preferred_offset) {
-		kernel_memsize = kernel_size + (_end - _edata);
+	kernel_memsize = kernel_size + (_end - _edata);
+
+	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && efi_rnd.phys_seed != 0) {
+		/*
+		 * If KASLR is enabled, and we have some randomness available,
+		 * locate the kernel at a randomized offset in physical memory.
+		 */
+		status = efi_random_alloc(sys_table_arg, kernel_size,
+					  ilog2(MIN_KIMG_ALIGN), reserve_addr,
+					  efi_rnd.phys_seed);
 
+		*image_addr = *reserve_addr + TEXT_OFFSET;
+	} else {
 		/*
-		 * First, try a straight allocation at the preferred offset.
+		 * Else, try a straight allocation at the preferred offset.
 		 * This will work around the issue where, if dram_base == 0x0,
 		 * efi_low_alloc() refuses to allocate at 0x0 (to prevent the
 		 * address of the allocation to be mistaken for a FAIL return
@@ -52,27 +84,30 @@ efi_status_t __init handle_kernel_image(efi_system_table_t *sys_table_arg,
 		 * Mustang), we can still place the kernel at the address
 		 * 'dram_base + TEXT_OFFSET'.
 		 */
+		if (*image_addr == preferred_offset)
+			return EFI_SUCCESS;
+
 		*image_addr = *reserve_addr = preferred_offset;
 		nr_pages = round_up(kernel_memsize, EFI_ALLOC_ALIGN) /
 			   EFI_PAGE_SIZE;
 		status = efi_call_early(allocate_pages, EFI_ALLOCATE_ADDRESS,
 					EFI_LOADER_DATA, nr_pages,
 					(efi_physical_addr_t *)reserve_addr);
-		if (status != EFI_SUCCESS) {
-			kernel_memsize += TEXT_OFFSET;
-			status = efi_low_alloc(sys_table_arg, kernel_memsize,
-					       SZ_2M, reserve_addr);
+	}
 
-			if (status != EFI_SUCCESS) {
-				pr_efi_err(sys_table_arg, "Failed to relocate kernel\n");
-				return status;
-			}
-			*image_addr = *reserve_addr + TEXT_OFFSET;
+	if (status != EFI_SUCCESS) {
+		kernel_memsize += TEXT_OFFSET;
+		status = efi_low_alloc(sys_table_arg, kernel_memsize,
+				       SZ_2M, reserve_addr);
+
+		if (status != EFI_SUCCESS) {
+			pr_efi_err(sys_table_arg, "Failed to relocate kernel\n");
+			return status;
 		}
-		memcpy((void *)*image_addr, old_image_addr, kernel_size);
-		*reserve_size = kernel_memsize;
+		*image_addr = *reserve_addr + TEXT_OFFSET;
 	}
-
+	memcpy((void *)*image_addr, old_image_addr, kernel_size);
+	*reserve_size = kernel_memsize;
 
 	return EFI_SUCCESS;
 }
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
index 2a7a3015d7e0..e8a3b8cd53cc 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -32,6 +32,10 @@
 
 static unsigned long __chunk_size = EFI_READ_CHUNK_SIZE;
 
+#ifdef CONFIG_RANDOMIZE_BASE
+bool __nokaslr;
+#endif
+
 /*
  * Allow the platform to override the allocation granularity: this allows
  * systems that have the capability to run with a larger page size to deal
@@ -317,6 +321,12 @@ efi_status_t efi_parse_options(char *cmdline)
 {
 	char *str;
 
+#ifdef CONFIG_RANDOMIZE_BASE
+	str = strstr(cmdline, "nokaslr");
+	if (str && (str == cmdline || *(str - 1) == ' '))
+		__nokaslr = true;
+#endif
+
 	/*
 	 * If no EFI parameters were specified on the cmdline we've got
 	 * nothing to do.
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index 7a38e29da53d..250ed4737298 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -5,6 +5,8 @@
 /* error code which can't be mistaken for valid address */
 #define EFI_ERROR	(~0UL)
 
+extern bool __nokaslr;
+
 void efi_char16_printk(efi_system_table_t *, efi_char16_t *);
 
 efi_status_t efi_open_volume(efi_system_table_t *sys_table_arg, void *__image,
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 14/21] arm64: [re]define SWAPPER_TABLE_[SHIFT|SIZE] for use in asm code
  2016-01-11 13:19   ` Ard Biesheuvel
  (?)
@ 2016-01-11 13:26     ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:26 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Mark Rutland, Leif Lindholm, Kees Cook, linux-kernel
  Cc: Stuart Yoder, Sharma Bhupesh, Arnd Bergmann, Marc Zyngier,
	Christoffer Dall, Ard Biesheuvel

Please disregard this patch, I accidentally sent out two versions of
14/21, and this is the wrong one.


On 11 January 2016 at 14:19, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> The current definition of SWAPPER_TABLE_SHIFT can only be used in
> asm code if the configured number of translation levels defines
> PUD_SHIFT and/or PMD_SHIFT natively (4KB and 16KB/64KB granule,
> respectively). Otherwise, it depends on the nopmd/nopud fixup
> headers, which can only be included in C code.
>
> So redefine SWAPPER_TABLE_SHIFT in a way that is independent of the
> number of configured translation levels. Define SWAPPER_TABLE_SIZE
> as well, we will need it later.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/include/asm/kernel-pgtable.h | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
> index daa8a7b9917a..eaac46097359 100644
> --- a/arch/arm64/include/asm/kernel-pgtable.h
> +++ b/arch/arm64/include/asm/kernel-pgtable.h
> @@ -57,13 +57,14 @@
>  #if ARM64_SWAPPER_USES_SECTION_MAPS
>  #define SWAPPER_BLOCK_SHIFT    SECTION_SHIFT
>  #define SWAPPER_BLOCK_SIZE     SECTION_SIZE
> -#define SWAPPER_TABLE_SHIFT    PUD_SHIFT
>  #else
>  #define SWAPPER_BLOCK_SHIFT    PAGE_SHIFT
>  #define SWAPPER_BLOCK_SIZE     PAGE_SIZE
> -#define SWAPPER_TABLE_SHIFT    PMD_SHIFT
>  #endif
>
> +#define SWAPPER_TABLE_SHIFT    (SWAPPER_BLOCK_SHIFT + PAGE_SHIFT - 3)
> +#define SWAPPER_TABLE_SIZE     (1 << SWAPPER_TABLE_SHIFT)
> +
>  /* The size of the initial kernel direct mapping */
>  #define SWAPPER_INIT_MAP_SIZE  (_AC(1, UL) << SWAPPER_TABLE_SHIFT)
>
> --
> 2.5.0
>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 14/21] arm64: [re]define SWAPPER_TABLE_[SHIFT|SIZE] for use in asm code
@ 2016-01-11 13:26     ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:26 UTC (permalink / raw)
  To: linux-arm-kernel

Please disregard this patch, I accidentally sent out two versions of
14/21, and this is the wrong one.


On 11 January 2016 at 14:19, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> The current definition of SWAPPER_TABLE_SHIFT can only be used in
> asm code if the configured number of translation levels defines
> PUD_SHIFT and/or PMD_SHIFT natively (4KB and 16KB/64KB granule,
> respectively). Otherwise, it depends on the nopmd/nopud fixup
> headers, which can only be included in C code.
>
> So redefine SWAPPER_TABLE_SHIFT in a way that is independent of the
> number of configured translation levels. Define SWAPPER_TABLE_SIZE
> as well, we will need it later.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/include/asm/kernel-pgtable.h | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
> index daa8a7b9917a..eaac46097359 100644
> --- a/arch/arm64/include/asm/kernel-pgtable.h
> +++ b/arch/arm64/include/asm/kernel-pgtable.h
> @@ -57,13 +57,14 @@
>  #if ARM64_SWAPPER_USES_SECTION_MAPS
>  #define SWAPPER_BLOCK_SHIFT    SECTION_SHIFT
>  #define SWAPPER_BLOCK_SIZE     SECTION_SIZE
> -#define SWAPPER_TABLE_SHIFT    PUD_SHIFT
>  #else
>  #define SWAPPER_BLOCK_SHIFT    PAGE_SHIFT
>  #define SWAPPER_BLOCK_SIZE     PAGE_SIZE
> -#define SWAPPER_TABLE_SHIFT    PMD_SHIFT
>  #endif
>
> +#define SWAPPER_TABLE_SHIFT    (SWAPPER_BLOCK_SHIFT + PAGE_SHIFT - 3)
> +#define SWAPPER_TABLE_SIZE     (1 << SWAPPER_TABLE_SHIFT)
> +
>  /* The size of the initial kernel direct mapping */
>  #define SWAPPER_INIT_MAP_SIZE  (_AC(1, UL) << SWAPPER_TABLE_SHIFT)
>
> --
> 2.5.0
>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 14/21] arm64: [re]define SWAPPER_TABLE_[SHIFT|SIZE] for use in asm code
@ 2016-01-11 13:26     ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 13:26 UTC (permalink / raw)
  To: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Mark Rutland, Leif Lindholm, Kees Cook, linux-kernel
  Cc: Stuart Yoder, Sharma Bhupesh, Arnd Bergmann, Marc Zyngier,
	Christoffer Dall, Ard Biesheuvel

Please disregard this patch, I accidentally sent out two versions of
14/21, and this is the wrong one.


On 11 January 2016 at 14:19, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> The current definition of SWAPPER_TABLE_SHIFT can only be used in
> asm code if the configured number of translation levels defines
> PUD_SHIFT and/or PMD_SHIFT natively (4KB and 16KB/64KB granule,
> respectively). Otherwise, it depends on the nopmd/nopud fixup
> headers, which can only be included in C code.
>
> So redefine SWAPPER_TABLE_SHIFT in a way that is independent of the
> number of configured translation levels. Define SWAPPER_TABLE_SIZE
> as well, we will need it later.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/include/asm/kernel-pgtable.h | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
> index daa8a7b9917a..eaac46097359 100644
> --- a/arch/arm64/include/asm/kernel-pgtable.h
> +++ b/arch/arm64/include/asm/kernel-pgtable.h
> @@ -57,13 +57,14 @@
>  #if ARM64_SWAPPER_USES_SECTION_MAPS
>  #define SWAPPER_BLOCK_SHIFT    SECTION_SHIFT
>  #define SWAPPER_BLOCK_SIZE     SECTION_SIZE
> -#define SWAPPER_TABLE_SHIFT    PUD_SHIFT
>  #else
>  #define SWAPPER_BLOCK_SHIFT    PAGE_SHIFT
>  #define SWAPPER_BLOCK_SIZE     PAGE_SIZE
> -#define SWAPPER_TABLE_SHIFT    PMD_SHIFT
>  #endif
>
> +#define SWAPPER_TABLE_SHIFT    (SWAPPER_BLOCK_SHIFT + PAGE_SHIFT - 3)
> +#define SWAPPER_TABLE_SIZE     (1 << SWAPPER_TABLE_SHIFT)
> +
>  /* The size of the initial kernel direct mapping */
>  #define SWAPPER_INIT_MAP_SIZE  (_AC(1, UL) << SWAPPER_TABLE_SHIFT)
>
> --
> 2.5.0
>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping
  2016-01-11 13:18   ` Ard Biesheuvel
  (?)
@ 2016-01-11 16:09     ` Mark Rutland
  -1 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-11 16:09 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	leif.lindholm, keescook, linux-kernel, stuart.yoder,
	bhupesh.sharma, arnd, marc.zyngier, christoffer.dall

On Mon, Jan 11, 2016 at 02:18:57PM +0100, Ard Biesheuvel wrote:
> Since the early fixmap page tables are populated using pages that are
> part of the static footprint of the kernel, they are covered by the
> initial kernel mapping, and we can refer to them without using __va/__pa
> translations, which are tied to the linear mapping.
> 
> Since the fixmap page tables are disjoint from the kernel mapping up
> to the top level pgd entry, we can refer to bm_pte[] directly, and there
> is no need to walk the page tables and perform __pa()/__va() translations
> at each step.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/mm/mmu.c | 32 ++++++--------------
>  1 file changed, 9 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 7711554a94f4..75b5f0dc3bdc 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -570,38 +570,24 @@ void vmemmap_free(unsigned long start, unsigned long end)
>  #endif	/* CONFIG_SPARSEMEM_VMEMMAP */
>  
>  static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> -#if CONFIG_PGTABLE_LEVELS > 2
>  static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
> -#endif
> -#if CONFIG_PGTABLE_LEVELS > 3
>  static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
> -#endif
>  
>  static inline pud_t * fixmap_pud(unsigned long addr)
>  {
> -	pgd_t *pgd = pgd_offset_k(addr);
> -
> -	BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
> -
> -	return pud_offset(pgd, addr);
> +	return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
> +					   : (pud_t *)pgd_offset_k(addr);

If we move patch 6 earlier, we could use pud_offset_kimg here, and avoid
the cast, at the cost of passing the pgd into fixmap_pud.

Similarly for fixmap_pmd.

>  }
>  
> -static inline pmd_t * fixmap_pmd(unsigned long addr)
> +static inline pte_t * fixmap_pmd(unsigned long addr)
>  {
> -	pud_t *pud = fixmap_pud(addr);
> -
> -	BUG_ON(pud_none(*pud) || pud_bad(*pud));
> -
> -	return pmd_offset(pud, addr);
> +	return (CONFIG_PGTABLE_LEVELS > 2) ? &bm_pmd[pmd_index(addr)]
> +					   : (pmd_t *)pgd_offset_k(addr);
>  }

I assume the return type change was unintentional?

With STRICT_MM_TYPECHECKS:

arch/arm64/mm/mmu.c: In function 'fixmap_pmd':
arch/arm64/mm/mmu.c:604:9: warning: return from incompatible pointer type [-Wincompatible-pointer-types]
  return (CONFIG_PGTABLE_LEVELS > 2) ? &bm_pmd[pmd_index(addr)]
         ^
arch/arm64/mm/mmu.c: In function 'early_fixmap_init':
arch/arm64/mm/mmu.c:635:6: warning: assignment from incompatible pointer type [-Wincompatible-pointer-types]
  pmd = fixmap_pmd(addr);
      ^
arch/arm64/mm/mmu.c:645:11: warning: comparison of distinct pointer types lacks a cast
  if ((pmd != fixmap_pmd(fix_to_virt(FIX_BTMAP_BEGIN)))
           ^
arch/arm64/mm/mmu.c:646:14: warning: comparison of distinct pointer types lacks a cast
       || pmd != fixmap_pmd(fix_to_virt(FIX_BTMAP_END))) {
              ^

Side note: is there any reason we can't/shouldn't make
STRICT_MM_TYPECHECKS a common config option? Or simply have it on by
default for arm64?

Having built with and without typechecks I see that it doesn't bloat the
kernel Image size, though the binary isn't quite identical:

[mark@leverpostej:~/src/linux]% ls -al *.*checks
-rwxrwxr-x 1 mark mark   9288192 Jan 11 15:40 Image.checks
-rwxrwxr-x 1 mark mark   9288192 Jan 11 15:36 Image.nochecks
-rwxrwxr-x 1 mark mark 106782024 Jan 11 15:40 vmlinux.checks
-rwxrwxr-x 1 mark mark 106688928 Jan 11 15:35 vmlinux.nochecks

Things didn't quite line up between the two images, though I'm not sure
what the underlying difference was.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping
@ 2016-01-11 16:09     ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-11 16:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 11, 2016 at 02:18:57PM +0100, Ard Biesheuvel wrote:
> Since the early fixmap page tables are populated using pages that are
> part of the static footprint of the kernel, they are covered by the
> initial kernel mapping, and we can refer to them without using __va/__pa
> translations, which are tied to the linear mapping.
> 
> Since the fixmap page tables are disjoint from the kernel mapping up
> to the top level pgd entry, we can refer to bm_pte[] directly, and there
> is no need to walk the page tables and perform __pa()/__va() translations
> at each step.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/mm/mmu.c | 32 ++++++--------------
>  1 file changed, 9 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 7711554a94f4..75b5f0dc3bdc 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -570,38 +570,24 @@ void vmemmap_free(unsigned long start, unsigned long end)
>  #endif	/* CONFIG_SPARSEMEM_VMEMMAP */
>  
>  static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> -#if CONFIG_PGTABLE_LEVELS > 2
>  static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
> -#endif
> -#if CONFIG_PGTABLE_LEVELS > 3
>  static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
> -#endif
>  
>  static inline pud_t * fixmap_pud(unsigned long addr)
>  {
> -	pgd_t *pgd = pgd_offset_k(addr);
> -
> -	BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
> -
> -	return pud_offset(pgd, addr);
> +	return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
> +					   : (pud_t *)pgd_offset_k(addr);

If we move patch 6 earlier, we could use pud_offset_kimg here, and avoid
the cast, at the cost of passing the pgd into fixmap_pud.

Similarly for fixmap_pmd.

>  }
>  
> -static inline pmd_t * fixmap_pmd(unsigned long addr)
> +static inline pte_t * fixmap_pmd(unsigned long addr)
>  {
> -	pud_t *pud = fixmap_pud(addr);
> -
> -	BUG_ON(pud_none(*pud) || pud_bad(*pud));
> -
> -	return pmd_offset(pud, addr);
> +	return (CONFIG_PGTABLE_LEVELS > 2) ? &bm_pmd[pmd_index(addr)]
> +					   : (pmd_t *)pgd_offset_k(addr);
>  }

I assume the return type change was unintentional?

With STRICT_MM_TYPECHECKS:

arch/arm64/mm/mmu.c: In function 'fixmap_pmd':
arch/arm64/mm/mmu.c:604:9: warning: return from incompatible pointer type [-Wincompatible-pointer-types]
  return (CONFIG_PGTABLE_LEVELS > 2) ? &bm_pmd[pmd_index(addr)]
         ^
arch/arm64/mm/mmu.c: In function 'early_fixmap_init':
arch/arm64/mm/mmu.c:635:6: warning: assignment from incompatible pointer type [-Wincompatible-pointer-types]
  pmd = fixmap_pmd(addr);
      ^
arch/arm64/mm/mmu.c:645:11: warning: comparison of distinct pointer types lacks a cast
  if ((pmd != fixmap_pmd(fix_to_virt(FIX_BTMAP_BEGIN)))
           ^
arch/arm64/mm/mmu.c:646:14: warning: comparison of distinct pointer types lacks a cast
       || pmd != fixmap_pmd(fix_to_virt(FIX_BTMAP_END))) {
              ^

Side note: is there any reason we can't/shouldn't make
STRICT_MM_TYPECHECKS a common config option? Or simply have it on by
default for arm64?

Having built with and without typechecks I see that it doesn't bloat the
kernel Image size, though the binary isn't quite identical:

[mark at leverpostej:~/src/linux]% ls -al *.*checks
-rwxrwxr-x 1 mark mark   9288192 Jan 11 15:40 Image.checks
-rwxrwxr-x 1 mark mark   9288192 Jan 11 15:36 Image.nochecks
-rwxrwxr-x 1 mark mark 106782024 Jan 11 15:40 vmlinux.checks
-rwxrwxr-x 1 mark mark 106688928 Jan 11 15:35 vmlinux.nochecks

Things didn't quite line up between the two images, though I'm not sure
what the underlying difference was.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping
@ 2016-01-11 16:09     ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-11 16:09 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	leif.lindholm, keescook, linux-kernel, stuart.yoder,
	bhupesh.sharma, arnd, marc.zyngier, christoffer.dall

On Mon, Jan 11, 2016 at 02:18:57PM +0100, Ard Biesheuvel wrote:
> Since the early fixmap page tables are populated using pages that are
> part of the static footprint of the kernel, they are covered by the
> initial kernel mapping, and we can refer to them without using __va/__pa
> translations, which are tied to the linear mapping.
> 
> Since the fixmap page tables are disjoint from the kernel mapping up
> to the top level pgd entry, we can refer to bm_pte[] directly, and there
> is no need to walk the page tables and perform __pa()/__va() translations
> at each step.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/mm/mmu.c | 32 ++++++--------------
>  1 file changed, 9 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 7711554a94f4..75b5f0dc3bdc 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -570,38 +570,24 @@ void vmemmap_free(unsigned long start, unsigned long end)
>  #endif	/* CONFIG_SPARSEMEM_VMEMMAP */
>  
>  static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> -#if CONFIG_PGTABLE_LEVELS > 2
>  static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
> -#endif
> -#if CONFIG_PGTABLE_LEVELS > 3
>  static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
> -#endif
>  
>  static inline pud_t * fixmap_pud(unsigned long addr)
>  {
> -	pgd_t *pgd = pgd_offset_k(addr);
> -
> -	BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
> -
> -	return pud_offset(pgd, addr);
> +	return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
> +					   : (pud_t *)pgd_offset_k(addr);

If we move patch 6 earlier, we could use pud_offset_kimg here, and avoid
the cast, at the cost of passing the pgd into fixmap_pud.

Similarly for fixmap_pmd.

>  }
>  
> -static inline pmd_t * fixmap_pmd(unsigned long addr)
> +static inline pte_t * fixmap_pmd(unsigned long addr)
>  {
> -	pud_t *pud = fixmap_pud(addr);
> -
> -	BUG_ON(pud_none(*pud) || pud_bad(*pud));
> -
> -	return pmd_offset(pud, addr);
> +	return (CONFIG_PGTABLE_LEVELS > 2) ? &bm_pmd[pmd_index(addr)]
> +					   : (pmd_t *)pgd_offset_k(addr);
>  }

I assume the return type change was unintentional?

With STRICT_MM_TYPECHECKS:

arch/arm64/mm/mmu.c: In function 'fixmap_pmd':
arch/arm64/mm/mmu.c:604:9: warning: return from incompatible pointer type [-Wincompatible-pointer-types]
  return (CONFIG_PGTABLE_LEVELS > 2) ? &bm_pmd[pmd_index(addr)]
         ^
arch/arm64/mm/mmu.c: In function 'early_fixmap_init':
arch/arm64/mm/mmu.c:635:6: warning: assignment from incompatible pointer type [-Wincompatible-pointer-types]
  pmd = fixmap_pmd(addr);
      ^
arch/arm64/mm/mmu.c:645:11: warning: comparison of distinct pointer types lacks a cast
  if ((pmd != fixmap_pmd(fix_to_virt(FIX_BTMAP_BEGIN)))
           ^
arch/arm64/mm/mmu.c:646:14: warning: comparison of distinct pointer types lacks a cast
       || pmd != fixmap_pmd(fix_to_virt(FIX_BTMAP_END))) {
              ^

Side note: is there any reason we can't/shouldn't make
STRICT_MM_TYPECHECKS a common config option? Or simply have it on by
default for arm64?

Having built with and without typechecks I see that it doesn't bloat the
kernel Image size, though the binary isn't quite identical:

[mark@leverpostej:~/src/linux]% ls -al *.*checks
-rwxrwxr-x 1 mark mark   9288192 Jan 11 15:40 Image.checks
-rwxrwxr-x 1 mark mark   9288192 Jan 11 15:36 Image.nochecks
-rwxrwxr-x 1 mark mark 106782024 Jan 11 15:40 vmlinux.checks
-rwxrwxr-x 1 mark mark 106688928 Jan 11 15:35 vmlinux.nochecks

Things didn't quite line up between the two images, though I'm not sure
what the underlying difference was.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping
  2016-01-11 16:09     ` Mark Rutland
  (?)
@ 2016-01-11 16:15       ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 16:15 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 11 January 2016 at 17:09, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Jan 11, 2016 at 02:18:57PM +0100, Ard Biesheuvel wrote:
>> Since the early fixmap page tables are populated using pages that are
>> part of the static footprint of the kernel, they are covered by the
>> initial kernel mapping, and we can refer to them without using __va/__pa
>> translations, which are tied to the linear mapping.
>>
>> Since the fixmap page tables are disjoint from the kernel mapping up
>> to the top level pgd entry, we can refer to bm_pte[] directly, and there
>> is no need to walk the page tables and perform __pa()/__va() translations
>> at each step.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/mm/mmu.c | 32 ++++++--------------
>>  1 file changed, 9 insertions(+), 23 deletions(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 7711554a94f4..75b5f0dc3bdc 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -570,38 +570,24 @@ void vmemmap_free(unsigned long start, unsigned long end)
>>  #endif       /* CONFIG_SPARSEMEM_VMEMMAP */
>>
>>  static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>> -#if CONFIG_PGTABLE_LEVELS > 2
>>  static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
>> -#endif
>> -#if CONFIG_PGTABLE_LEVELS > 3
>>  static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
>> -#endif
>>
>>  static inline pud_t * fixmap_pud(unsigned long addr)
>>  {
>> -     pgd_t *pgd = pgd_offset_k(addr);
>> -
>> -     BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
>> -
>> -     return pud_offset(pgd, addr);
>> +     return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
>> +                                        : (pud_t *)pgd_offset_k(addr);
>
> If we move patch 6 earlier, we could use pud_offset_kimg here, and avoid
> the cast, at the cost of passing the pgd into fixmap_pud.
>
> Similarly for fixmap_pmd.
>

Is that necessarily an improvement? I know it hides the cast, but I
think having an explicit pgd_t* to pud_t* cast that so obviously
applies to CONFIG_PGTABLE_LEVELS < 4 only is fine as well.

>>  }
>>
>> -static inline pmd_t * fixmap_pmd(unsigned long addr)
>> +static inline pte_t * fixmap_pmd(unsigned long addr)
>>  {
>> -     pud_t *pud = fixmap_pud(addr);
>> -
>> -     BUG_ON(pud_none(*pud) || pud_bad(*pud));
>> -
>> -     return pmd_offset(pud, addr);
>> +     return (CONFIG_PGTABLE_LEVELS > 2) ? &bm_pmd[pmd_index(addr)]
>> +                                        : (pmd_t *)pgd_offset_k(addr);
>>  }
>
> I assume the return type change was unintentional?
>

Yes. Thanks for spotting that.

> With STRICT_MM_TYPECHECKS:
>
> arch/arm64/mm/mmu.c: In function 'fixmap_pmd':
> arch/arm64/mm/mmu.c:604:9: warning: return from incompatible pointer type [-Wincompatible-pointer-types]
>   return (CONFIG_PGTABLE_LEVELS > 2) ? &bm_pmd[pmd_index(addr)]
>          ^
> arch/arm64/mm/mmu.c: In function 'early_fixmap_init':
> arch/arm64/mm/mmu.c:635:6: warning: assignment from incompatible pointer type [-Wincompatible-pointer-types]
>   pmd = fixmap_pmd(addr);
>       ^
> arch/arm64/mm/mmu.c:645:11: warning: comparison of distinct pointer types lacks a cast
>   if ((pmd != fixmap_pmd(fix_to_virt(FIX_BTMAP_BEGIN)))
>            ^
> arch/arm64/mm/mmu.c:646:14: warning: comparison of distinct pointer types lacks a cast
>        || pmd != fixmap_pmd(fix_to_virt(FIX_BTMAP_END))) {
>               ^
>
> Side note: is there any reason we can't/shouldn't make
> STRICT_MM_TYPECHECKS a common config option? Or simply have it on by
> default for arm64?
>

I wouldn't mind at all.

> Having built with and without typechecks I see that it doesn't bloat the
> kernel Image size, though the binary isn't quite identical:
>
> [mark@leverpostej:~/src/linux]% ls -al *.*checks
> -rwxrwxr-x 1 mark mark   9288192 Jan 11 15:40 Image.checks
> -rwxrwxr-x 1 mark mark   9288192 Jan 11 15:36 Image.nochecks
> -rwxrwxr-x 1 mark mark 106782024 Jan 11 15:40 vmlinux.checks
> -rwxrwxr-x 1 mark mark 106688928 Jan 11 15:35 vmlinux.nochecks
>
> Things didn't quite line up between the two images, though I'm not sure
> what the underlying difference was.
>
> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping
@ 2016-01-11 16:15       ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 16:15 UTC (permalink / raw)
  To: linux-arm-kernel

On 11 January 2016 at 17:09, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Jan 11, 2016 at 02:18:57PM +0100, Ard Biesheuvel wrote:
>> Since the early fixmap page tables are populated using pages that are
>> part of the static footprint of the kernel, they are covered by the
>> initial kernel mapping, and we can refer to them without using __va/__pa
>> translations, which are tied to the linear mapping.
>>
>> Since the fixmap page tables are disjoint from the kernel mapping up
>> to the top level pgd entry, we can refer to bm_pte[] directly, and there
>> is no need to walk the page tables and perform __pa()/__va() translations
>> at each step.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/mm/mmu.c | 32 ++++++--------------
>>  1 file changed, 9 insertions(+), 23 deletions(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 7711554a94f4..75b5f0dc3bdc 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -570,38 +570,24 @@ void vmemmap_free(unsigned long start, unsigned long end)
>>  #endif       /* CONFIG_SPARSEMEM_VMEMMAP */
>>
>>  static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>> -#if CONFIG_PGTABLE_LEVELS > 2
>>  static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
>> -#endif
>> -#if CONFIG_PGTABLE_LEVELS > 3
>>  static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
>> -#endif
>>
>>  static inline pud_t * fixmap_pud(unsigned long addr)
>>  {
>> -     pgd_t *pgd = pgd_offset_k(addr);
>> -
>> -     BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
>> -
>> -     return pud_offset(pgd, addr);
>> +     return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
>> +                                        : (pud_t *)pgd_offset_k(addr);
>
> If we move patch 6 earlier, we could use pud_offset_kimg here, and avoid
> the cast, at the cost of passing the pgd into fixmap_pud.
>
> Similarly for fixmap_pmd.
>

Is that necessarily an improvement? I know it hides the cast, but I
think having an explicit pgd_t* to pud_t* cast that so obviously
applies to CONFIG_PGTABLE_LEVELS < 4 only is fine as well.

>>  }
>>
>> -static inline pmd_t * fixmap_pmd(unsigned long addr)
>> +static inline pte_t * fixmap_pmd(unsigned long addr)
>>  {
>> -     pud_t *pud = fixmap_pud(addr);
>> -
>> -     BUG_ON(pud_none(*pud) || pud_bad(*pud));
>> -
>> -     return pmd_offset(pud, addr);
>> +     return (CONFIG_PGTABLE_LEVELS > 2) ? &bm_pmd[pmd_index(addr)]
>> +                                        : (pmd_t *)pgd_offset_k(addr);
>>  }
>
> I assume the return type change was unintentional?
>

Yes. Thanks for spotting that.

> With STRICT_MM_TYPECHECKS:
>
> arch/arm64/mm/mmu.c: In function 'fixmap_pmd':
> arch/arm64/mm/mmu.c:604:9: warning: return from incompatible pointer type [-Wincompatible-pointer-types]
>   return (CONFIG_PGTABLE_LEVELS > 2) ? &bm_pmd[pmd_index(addr)]
>          ^
> arch/arm64/mm/mmu.c: In function 'early_fixmap_init':
> arch/arm64/mm/mmu.c:635:6: warning: assignment from incompatible pointer type [-Wincompatible-pointer-types]
>   pmd = fixmap_pmd(addr);
>       ^
> arch/arm64/mm/mmu.c:645:11: warning: comparison of distinct pointer types lacks a cast
>   if ((pmd != fixmap_pmd(fix_to_virt(FIX_BTMAP_BEGIN)))
>            ^
> arch/arm64/mm/mmu.c:646:14: warning: comparison of distinct pointer types lacks a cast
>        || pmd != fixmap_pmd(fix_to_virt(FIX_BTMAP_END))) {
>               ^
>
> Side note: is there any reason we can't/shouldn't make
> STRICT_MM_TYPECHECKS a common config option? Or simply have it on by
> default for arm64?
>

I wouldn't mind at all.

> Having built with and without typechecks I see that it doesn't bloat the
> kernel Image size, though the binary isn't quite identical:
>
> [mark at leverpostej:~/src/linux]% ls -al *.*checks
> -rwxrwxr-x 1 mark mark   9288192 Jan 11 15:40 Image.checks
> -rwxrwxr-x 1 mark mark   9288192 Jan 11 15:36 Image.nochecks
> -rwxrwxr-x 1 mark mark 106782024 Jan 11 15:40 vmlinux.checks
> -rwxrwxr-x 1 mark mark 106688928 Jan 11 15:35 vmlinux.nochecks
>
> Things didn't quite line up between the two images, though I'm not sure
> what the underlying difference was.
>
> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping
@ 2016-01-11 16:15       ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 16:15 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 11 January 2016 at 17:09, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Jan 11, 2016 at 02:18:57PM +0100, Ard Biesheuvel wrote:
>> Since the early fixmap page tables are populated using pages that are
>> part of the static footprint of the kernel, they are covered by the
>> initial kernel mapping, and we can refer to them without using __va/__pa
>> translations, which are tied to the linear mapping.
>>
>> Since the fixmap page tables are disjoint from the kernel mapping up
>> to the top level pgd entry, we can refer to bm_pte[] directly, and there
>> is no need to walk the page tables and perform __pa()/__va() translations
>> at each step.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/mm/mmu.c | 32 ++++++--------------
>>  1 file changed, 9 insertions(+), 23 deletions(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 7711554a94f4..75b5f0dc3bdc 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -570,38 +570,24 @@ void vmemmap_free(unsigned long start, unsigned long end)
>>  #endif       /* CONFIG_SPARSEMEM_VMEMMAP */
>>
>>  static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>> -#if CONFIG_PGTABLE_LEVELS > 2
>>  static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
>> -#endif
>> -#if CONFIG_PGTABLE_LEVELS > 3
>>  static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
>> -#endif
>>
>>  static inline pud_t * fixmap_pud(unsigned long addr)
>>  {
>> -     pgd_t *pgd = pgd_offset_k(addr);
>> -
>> -     BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
>> -
>> -     return pud_offset(pgd, addr);
>> +     return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
>> +                                        : (pud_t *)pgd_offset_k(addr);
>
> If we move patch 6 earlier, we could use pud_offset_kimg here, and avoid
> the cast, at the cost of passing the pgd into fixmap_pud.
>
> Similarly for fixmap_pmd.
>

Is that necessarily an improvement? I know it hides the cast, but I
think having an explicit pgd_t* to pud_t* cast that so obviously
applies to CONFIG_PGTABLE_LEVELS < 4 only is fine as well.

>>  }
>>
>> -static inline pmd_t * fixmap_pmd(unsigned long addr)
>> +static inline pte_t * fixmap_pmd(unsigned long addr)
>>  {
>> -     pud_t *pud = fixmap_pud(addr);
>> -
>> -     BUG_ON(pud_none(*pud) || pud_bad(*pud));
>> -
>> -     return pmd_offset(pud, addr);
>> +     return (CONFIG_PGTABLE_LEVELS > 2) ? &bm_pmd[pmd_index(addr)]
>> +                                        : (pmd_t *)pgd_offset_k(addr);
>>  }
>
> I assume the return type change was unintentional?
>

Yes. Thanks for spotting that.

> With STRICT_MM_TYPECHECKS:
>
> arch/arm64/mm/mmu.c: In function 'fixmap_pmd':
> arch/arm64/mm/mmu.c:604:9: warning: return from incompatible pointer type [-Wincompatible-pointer-types]
>   return (CONFIG_PGTABLE_LEVELS > 2) ? &bm_pmd[pmd_index(addr)]
>          ^
> arch/arm64/mm/mmu.c: In function 'early_fixmap_init':
> arch/arm64/mm/mmu.c:635:6: warning: assignment from incompatible pointer type [-Wincompatible-pointer-types]
>   pmd = fixmap_pmd(addr);
>       ^
> arch/arm64/mm/mmu.c:645:11: warning: comparison of distinct pointer types lacks a cast
>   if ((pmd != fixmap_pmd(fix_to_virt(FIX_BTMAP_BEGIN)))
>            ^
> arch/arm64/mm/mmu.c:646:14: warning: comparison of distinct pointer types lacks a cast
>        || pmd != fixmap_pmd(fix_to_virt(FIX_BTMAP_END))) {
>               ^
>
> Side note: is there any reason we can't/shouldn't make
> STRICT_MM_TYPECHECKS a common config option? Or simply have it on by
> default for arm64?
>

I wouldn't mind at all.

> Having built with and without typechecks I see that it doesn't bloat the
> kernel Image size, though the binary isn't quite identical:
>
> [mark@leverpostej:~/src/linux]% ls -al *.*checks
> -rwxrwxr-x 1 mark mark   9288192 Jan 11 15:40 Image.checks
> -rwxrwxr-x 1 mark mark   9288192 Jan 11 15:36 Image.nochecks
> -rwxrwxr-x 1 mark mark 106782024 Jan 11 15:40 vmlinux.checks
> -rwxrwxr-x 1 mark mark 106688928 Jan 11 15:35 vmlinux.nochecks
>
> Things didn't quite line up between the two images, though I'm not sure
> what the underlying difference was.
>
> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 06/21] arm64: pgtable: implement static [pte|pmd|pud]_offset variants
  2016-01-11 13:18   ` Ard Biesheuvel
  (?)
@ 2016-01-11 16:24     ` Mark Rutland
  -1 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-11 16:24 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	leif.lindholm, keescook, linux-kernel, stuart.yoder,
	bhupesh.sharma, arnd, marc.zyngier, christoffer.dall

On Mon, Jan 11, 2016 at 02:18:59PM +0100, Ard Biesheuvel wrote:
> The page table accessors pte_offset(), pud_offset() and pmd_offset()
> rely on __va translations, so they can only be used after the linear
> mapping has been installed. For the early fixmap and kasan init routines,
> whose page tables are allocated statically in the kernel image, these
> functions will return bogus values. So implement pmd_offset_kimg() and
> pud_offset_kimg(), which can be used instead before any page tables have
> been allocated dynamically.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

This looks good to me. One possible suggsetion below, but either way:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

> ---
>  arch/arm64/include/asm/pgtable.h | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 6129f6755081..7b4e16068c9f 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -449,6 +449,9 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
>  
>  #define pmd_page(pmd)		pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
>  
> +/* use ONLY for statically allocated translation tables */
> +#define pte_offset_kimg(dir,addr)	((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr))))
> +

Given that we're probably only going to use this during one-off setup,
maybe it's worth something like:

#define IN_KERNEL_IMAGE(p) ({						\
	unsigned long __p = (unsigned long)p;				\
	KIMAGE_VADDR <= __p && __p < _end;				\
})

#define pte_offset_kimg(dir,addr) ({					\
	BUG_ON(!IN_KERNEL_IMAGE(dir));					\
	((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr))));	\
})

That might be overkill, though, given all it does is turn one runtime
failure into another runtime failure.

Mark.

>  /*
>   * Conversion functions: convert a page and protection to a page entry,
>   * and a page entry and page directory to the page they refer to.
> @@ -492,6 +495,9 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
>  
>  #define pud_page(pud)		pfn_to_page(__phys_to_pfn(pud_val(pud) & PHYS_MASK))
>  
> +/* use ONLY for statically allocated translation tables */
> +#define pmd_offset_kimg(dir,addr)	((pmd_t *)__phys_to_kimg(pmd_offset_phys((dir), (addr))))
> +
>  #else
>  
>  #define pud_page_paddr(pud)	({ BUILD_BUG(); 0; })
> @@ -502,6 +508,8 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
>  #define pmd_set_fixmap_offset(pudp, addr)	((pmd_t *)pudp)
>  #define pmd_clear_fixmap()
>  
> +#define pmd_offset_kimg(dir,addr)	((pmd_t *)dir)
> +
>  #endif	/* CONFIG_PGTABLE_LEVELS > 2 */
>  
>  #if CONFIG_PGTABLE_LEVELS > 3
> @@ -540,6 +548,9 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
>  
>  #define pgd_page(pgd)		pfn_to_page(__phys_to_pfn(pgd_val(pgd) & PHYS_MASK))
>  
> +/* use ONLY for statically allocated translation tables */
> +#define pud_offset_kimg(dir,addr)	((pud_t *)__phys_to_kimg(pud_offset_phys((dir), (addr))))
> +
>  #else
>  
>  #define pgd_page_paddr(pgd)	({ BUILD_BUG(); 0;})
> @@ -550,6 +561,8 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
>  #define pud_set_fixmap_offset(pgdp, addr)	((pud_t *)pgdp)
>  #define pud_clear_fixmap()
>  
> +#define pud_offset_kimg(dir,addr)	((pud_t *)dir)
> +
>  #endif  /* CONFIG_PGTABLE_LEVELS > 3 */
>  
>  #define pgd_ERROR(pgd)		__pgd_error(__FILE__, __LINE__, pgd_val(pgd))
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 06/21] arm64: pgtable: implement static [pte|pmd|pud]_offset variants
@ 2016-01-11 16:24     ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-11 16:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 11, 2016 at 02:18:59PM +0100, Ard Biesheuvel wrote:
> The page table accessors pte_offset(), pud_offset() and pmd_offset()
> rely on __va translations, so they can only be used after the linear
> mapping has been installed. For the early fixmap and kasan init routines,
> whose page tables are allocated statically in the kernel image, these
> functions will return bogus values. So implement pmd_offset_kimg() and
> pud_offset_kimg(), which can be used instead before any page tables have
> been allocated dynamically.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

This looks good to me. One possible suggsetion below, but either way:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

> ---
>  arch/arm64/include/asm/pgtable.h | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 6129f6755081..7b4e16068c9f 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -449,6 +449,9 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
>  
>  #define pmd_page(pmd)		pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
>  
> +/* use ONLY for statically allocated translation tables */
> +#define pte_offset_kimg(dir,addr)	((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr))))
> +

Given that we're probably only going to use this during one-off setup,
maybe it's worth something like:

#define IN_KERNEL_IMAGE(p) ({						\
	unsigned long __p = (unsigned long)p;				\
	KIMAGE_VADDR <= __p && __p < _end;				\
})

#define pte_offset_kimg(dir,addr) ({					\
	BUG_ON(!IN_KERNEL_IMAGE(dir));					\
	((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr))));	\
})

That might be overkill, though, given all it does is turn one runtime
failure into another runtime failure.

Mark.

>  /*
>   * Conversion functions: convert a page and protection to a page entry,
>   * and a page entry and page directory to the page they refer to.
> @@ -492,6 +495,9 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
>  
>  #define pud_page(pud)		pfn_to_page(__phys_to_pfn(pud_val(pud) & PHYS_MASK))
>  
> +/* use ONLY for statically allocated translation tables */
> +#define pmd_offset_kimg(dir,addr)	((pmd_t *)__phys_to_kimg(pmd_offset_phys((dir), (addr))))
> +
>  #else
>  
>  #define pud_page_paddr(pud)	({ BUILD_BUG(); 0; })
> @@ -502,6 +508,8 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
>  #define pmd_set_fixmap_offset(pudp, addr)	((pmd_t *)pudp)
>  #define pmd_clear_fixmap()
>  
> +#define pmd_offset_kimg(dir,addr)	((pmd_t *)dir)
> +
>  #endif	/* CONFIG_PGTABLE_LEVELS > 2 */
>  
>  #if CONFIG_PGTABLE_LEVELS > 3
> @@ -540,6 +548,9 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
>  
>  #define pgd_page(pgd)		pfn_to_page(__phys_to_pfn(pgd_val(pgd) & PHYS_MASK))
>  
> +/* use ONLY for statically allocated translation tables */
> +#define pud_offset_kimg(dir,addr)	((pud_t *)__phys_to_kimg(pud_offset_phys((dir), (addr))))
> +
>  #else
>  
>  #define pgd_page_paddr(pgd)	({ BUILD_BUG(); 0;})
> @@ -550,6 +561,8 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
>  #define pud_set_fixmap_offset(pgdp, addr)	((pud_t *)pgdp)
>  #define pud_clear_fixmap()
>  
> +#define pud_offset_kimg(dir,addr)	((pud_t *)dir)
> +
>  #endif  /* CONFIG_PGTABLE_LEVELS > 3 */
>  
>  #define pgd_ERROR(pgd)		__pgd_error(__FILE__, __LINE__, pgd_val(pgd))
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 06/21] arm64: pgtable: implement static [pte|pmd|pud]_offset variants
@ 2016-01-11 16:24     ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-11 16:24 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	leif.lindholm, keescook, linux-kernel, stuart.yoder,
	bhupesh.sharma, arnd, marc.zyngier, christoffer.dall

On Mon, Jan 11, 2016 at 02:18:59PM +0100, Ard Biesheuvel wrote:
> The page table accessors pte_offset(), pud_offset() and pmd_offset()
> rely on __va translations, so they can only be used after the linear
> mapping has been installed. For the early fixmap and kasan init routines,
> whose page tables are allocated statically in the kernel image, these
> functions will return bogus values. So implement pmd_offset_kimg() and
> pud_offset_kimg(), which can be used instead before any page tables have
> been allocated dynamically.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

This looks good to me. One possible suggsetion below, but either way:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

> ---
>  arch/arm64/include/asm/pgtable.h | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 6129f6755081..7b4e16068c9f 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -449,6 +449,9 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
>  
>  #define pmd_page(pmd)		pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
>  
> +/* use ONLY for statically allocated translation tables */
> +#define pte_offset_kimg(dir,addr)	((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr))))
> +

Given that we're probably only going to use this during one-off setup,
maybe it's worth something like:

#define IN_KERNEL_IMAGE(p) ({						\
	unsigned long __p = (unsigned long)p;				\
	KIMAGE_VADDR <= __p && __p < _end;				\
})

#define pte_offset_kimg(dir,addr) ({					\
	BUG_ON(!IN_KERNEL_IMAGE(dir));					\
	((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr))));	\
})

That might be overkill, though, given all it does is turn one runtime
failure into another runtime failure.

Mark.

>  /*
>   * Conversion functions: convert a page and protection to a page entry,
>   * and a page entry and page directory to the page they refer to.
> @@ -492,6 +495,9 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
>  
>  #define pud_page(pud)		pfn_to_page(__phys_to_pfn(pud_val(pud) & PHYS_MASK))
>  
> +/* use ONLY for statically allocated translation tables */
> +#define pmd_offset_kimg(dir,addr)	((pmd_t *)__phys_to_kimg(pmd_offset_phys((dir), (addr))))
> +
>  #else
>  
>  #define pud_page_paddr(pud)	({ BUILD_BUG(); 0; })
> @@ -502,6 +508,8 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
>  #define pmd_set_fixmap_offset(pudp, addr)	((pmd_t *)pudp)
>  #define pmd_clear_fixmap()
>  
> +#define pmd_offset_kimg(dir,addr)	((pmd_t *)dir)
> +
>  #endif	/* CONFIG_PGTABLE_LEVELS > 2 */
>  
>  #if CONFIG_PGTABLE_LEVELS > 3
> @@ -540,6 +548,9 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
>  
>  #define pgd_page(pgd)		pfn_to_page(__phys_to_pfn(pgd_val(pgd) & PHYS_MASK))
>  
> +/* use ONLY for statically allocated translation tables */
> +#define pud_offset_kimg(dir,addr)	((pud_t *)__phys_to_kimg(pud_offset_phys((dir), (addr))))
> +
>  #else
>  
>  #define pgd_page_paddr(pgd)	({ BUILD_BUG(); 0;})
> @@ -550,6 +561,8 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
>  #define pud_set_fixmap_offset(pgdp, addr)	((pud_t *)pgdp)
>  #define pud_clear_fixmap()
>  
> +#define pud_offset_kimg(dir,addr)	((pud_t *)dir)
> +
>  #endif  /* CONFIG_PGTABLE_LEVELS > 3 */
>  
>  #define pgd_ERROR(pgd)		__pgd_error(__FILE__, __LINE__, pgd_val(pgd))
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping
  2016-01-11 16:15       ` Ard Biesheuvel
  (?)
@ 2016-01-11 16:27         ` Mark Rutland
  -1 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-11 16:27 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On Mon, Jan 11, 2016 at 05:15:13PM +0100, Ard Biesheuvel wrote:
> On 11 January 2016 at 17:09, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Mon, Jan 11, 2016 at 02:18:57PM +0100, Ard Biesheuvel wrote:
> >> Since the early fixmap page tables are populated using pages that are
> >> part of the static footprint of the kernel, they are covered by the
> >> initial kernel mapping, and we can refer to them without using __va/__pa
> >> translations, which are tied to the linear mapping.
> >>
> >> Since the fixmap page tables are disjoint from the kernel mapping up
> >> to the top level pgd entry, we can refer to bm_pte[] directly, and there
> >> is no need to walk the page tables and perform __pa()/__va() translations
> >> at each step.
> >>
> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >> ---
> >>  arch/arm64/mm/mmu.c | 32 ++++++--------------
> >>  1 file changed, 9 insertions(+), 23 deletions(-)
> >>
> >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> >> index 7711554a94f4..75b5f0dc3bdc 100644
> >> --- a/arch/arm64/mm/mmu.c
> >> +++ b/arch/arm64/mm/mmu.c
> >> @@ -570,38 +570,24 @@ void vmemmap_free(unsigned long start, unsigned long end)
> >>  #endif       /* CONFIG_SPARSEMEM_VMEMMAP */
> >>
> >>  static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> >> -#if CONFIG_PGTABLE_LEVELS > 2
> >>  static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
> >> -#endif
> >> -#if CONFIG_PGTABLE_LEVELS > 3
> >>  static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
> >> -#endif
> >>
> >>  static inline pud_t * fixmap_pud(unsigned long addr)
> >>  {
> >> -     pgd_t *pgd = pgd_offset_k(addr);
> >> -
> >> -     BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
> >> -
> >> -     return pud_offset(pgd, addr);
> >> +     return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
> >> +                                        : (pud_t *)pgd_offset_k(addr);
> >
> > If we move patch 6 earlier, we could use pud_offset_kimg here, and avoid
> > the cast, at the cost of passing the pgd into fixmap_pud.
> >
> > Similarly for fixmap_pmd.
> >
> 
> Is that necessarily an improvement? I know it hides the cast, but I
> think having an explicit pgd_t* to pud_t* cast that so obviously
> applies to CONFIG_PGTABLE_LEVELS < 4 only is fine as well.

True; it's not a big thing either way.

> >>  }
> >>
> >> -static inline pmd_t * fixmap_pmd(unsigned long addr)
> >> +static inline pte_t * fixmap_pmd(unsigned long addr)
> >>  {
> >> -     pud_t *pud = fixmap_pud(addr);
> >> -
> >> -     BUG_ON(pud_none(*pud) || pud_bad(*pud));
> >> -
> >> -     return pmd_offset(pud, addr);
> >> +     return (CONFIG_PGTABLE_LEVELS > 2) ? &bm_pmd[pmd_index(addr)]
> >> +                                        : (pmd_t *)pgd_offset_k(addr);
> >>  }
> >
> > I assume the return type change was unintentional?
> >
> 
> Yes. Thanks for spotting that.

With that fixed:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

> > Side note: is there any reason we can't/shouldn't make
> > STRICT_MM_TYPECHECKS a common config option? Or simply have it on by
> > default for arm64?
> >
> 
> I wouldn't mind at all.

I'll dig into that a bit futher then...

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping
@ 2016-01-11 16:27         ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-11 16:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 11, 2016 at 05:15:13PM +0100, Ard Biesheuvel wrote:
> On 11 January 2016 at 17:09, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Mon, Jan 11, 2016 at 02:18:57PM +0100, Ard Biesheuvel wrote:
> >> Since the early fixmap page tables are populated using pages that are
> >> part of the static footprint of the kernel, they are covered by the
> >> initial kernel mapping, and we can refer to them without using __va/__pa
> >> translations, which are tied to the linear mapping.
> >>
> >> Since the fixmap page tables are disjoint from the kernel mapping up
> >> to the top level pgd entry, we can refer to bm_pte[] directly, and there
> >> is no need to walk the page tables and perform __pa()/__va() translations
> >> at each step.
> >>
> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >> ---
> >>  arch/arm64/mm/mmu.c | 32 ++++++--------------
> >>  1 file changed, 9 insertions(+), 23 deletions(-)
> >>
> >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> >> index 7711554a94f4..75b5f0dc3bdc 100644
> >> --- a/arch/arm64/mm/mmu.c
> >> +++ b/arch/arm64/mm/mmu.c
> >> @@ -570,38 +570,24 @@ void vmemmap_free(unsigned long start, unsigned long end)
> >>  #endif       /* CONFIG_SPARSEMEM_VMEMMAP */
> >>
> >>  static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> >> -#if CONFIG_PGTABLE_LEVELS > 2
> >>  static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
> >> -#endif
> >> -#if CONFIG_PGTABLE_LEVELS > 3
> >>  static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
> >> -#endif
> >>
> >>  static inline pud_t * fixmap_pud(unsigned long addr)
> >>  {
> >> -     pgd_t *pgd = pgd_offset_k(addr);
> >> -
> >> -     BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
> >> -
> >> -     return pud_offset(pgd, addr);
> >> +     return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
> >> +                                        : (pud_t *)pgd_offset_k(addr);
> >
> > If we move patch 6 earlier, we could use pud_offset_kimg here, and avoid
> > the cast, at the cost of passing the pgd into fixmap_pud.
> >
> > Similarly for fixmap_pmd.
> >
> 
> Is that necessarily an improvement? I know it hides the cast, but I
> think having an explicit pgd_t* to pud_t* cast that so obviously
> applies to CONFIG_PGTABLE_LEVELS < 4 only is fine as well.

True; it's not a big thing either way.

> >>  }
> >>
> >> -static inline pmd_t * fixmap_pmd(unsigned long addr)
> >> +static inline pte_t * fixmap_pmd(unsigned long addr)
> >>  {
> >> -     pud_t *pud = fixmap_pud(addr);
> >> -
> >> -     BUG_ON(pud_none(*pud) || pud_bad(*pud));
> >> -
> >> -     return pmd_offset(pud, addr);
> >> +     return (CONFIG_PGTABLE_LEVELS > 2) ? &bm_pmd[pmd_index(addr)]
> >> +                                        : (pmd_t *)pgd_offset_k(addr);
> >>  }
> >
> > I assume the return type change was unintentional?
> >
> 
> Yes. Thanks for spotting that.

With that fixed:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

> > Side note: is there any reason we can't/shouldn't make
> > STRICT_MM_TYPECHECKS a common config option? Or simply have it on by
> > default for arm64?
> >
> 
> I wouldn't mind at all.

I'll dig into that a bit futher then...

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping
@ 2016-01-11 16:27         ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-11 16:27 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On Mon, Jan 11, 2016 at 05:15:13PM +0100, Ard Biesheuvel wrote:
> On 11 January 2016 at 17:09, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Mon, Jan 11, 2016 at 02:18:57PM +0100, Ard Biesheuvel wrote:
> >> Since the early fixmap page tables are populated using pages that are
> >> part of the static footprint of the kernel, they are covered by the
> >> initial kernel mapping, and we can refer to them without using __va/__pa
> >> translations, which are tied to the linear mapping.
> >>
> >> Since the fixmap page tables are disjoint from the kernel mapping up
> >> to the top level pgd entry, we can refer to bm_pte[] directly, and there
> >> is no need to walk the page tables and perform __pa()/__va() translations
> >> at each step.
> >>
> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >> ---
> >>  arch/arm64/mm/mmu.c | 32 ++++++--------------
> >>  1 file changed, 9 insertions(+), 23 deletions(-)
> >>
> >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> >> index 7711554a94f4..75b5f0dc3bdc 100644
> >> --- a/arch/arm64/mm/mmu.c
> >> +++ b/arch/arm64/mm/mmu.c
> >> @@ -570,38 +570,24 @@ void vmemmap_free(unsigned long start, unsigned long end)
> >>  #endif       /* CONFIG_SPARSEMEM_VMEMMAP */
> >>
> >>  static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> >> -#if CONFIG_PGTABLE_LEVELS > 2
> >>  static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
> >> -#endif
> >> -#if CONFIG_PGTABLE_LEVELS > 3
> >>  static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
> >> -#endif
> >>
> >>  static inline pud_t * fixmap_pud(unsigned long addr)
> >>  {
> >> -     pgd_t *pgd = pgd_offset_k(addr);
> >> -
> >> -     BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
> >> -
> >> -     return pud_offset(pgd, addr);
> >> +     return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
> >> +                                        : (pud_t *)pgd_offset_k(addr);
> >
> > If we move patch 6 earlier, we could use pud_offset_kimg here, and avoid
> > the cast, at the cost of passing the pgd into fixmap_pud.
> >
> > Similarly for fixmap_pmd.
> >
> 
> Is that necessarily an improvement? I know it hides the cast, but I
> think having an explicit pgd_t* to pud_t* cast that so obviously
> applies to CONFIG_PGTABLE_LEVELS < 4 only is fine as well.

True; it's not a big thing either way.

> >>  }
> >>
> >> -static inline pmd_t * fixmap_pmd(unsigned long addr)
> >> +static inline pte_t * fixmap_pmd(unsigned long addr)
> >>  {
> >> -     pud_t *pud = fixmap_pud(addr);
> >> -
> >> -     BUG_ON(pud_none(*pud) || pud_bad(*pud));
> >> -
> >> -     return pmd_offset(pud, addr);
> >> +     return (CONFIG_PGTABLE_LEVELS > 2) ? &bm_pmd[pmd_index(addr)]
> >> +                                        : (pmd_t *)pgd_offset_k(addr);
> >>  }
> >
> > I assume the return type change was unintentional?
> >
> 
> Yes. Thanks for spotting that.

With that fixed:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

> > Side note: is there any reason we can't/shouldn't make
> > STRICT_MM_TYPECHECKS a common config option? Or simply have it on by
> > default for arm64?
> >
> 
> I wouldn't mind at all.

I'll dig into that a bit futher then...

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 02/21] arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
  2016-01-11 13:18   ` Ard Biesheuvel
  (?)
@ 2016-01-11 16:31     ` Mark Rutland
  -1 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-11 16:31 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	leif.lindholm, keescook, linux-kernel, stuart.yoder,
	bhupesh.sharma, arnd, marc.zyngier, christoffer.dall

On Mon, Jan 11, 2016 at 02:18:55PM +0100, Ard Biesheuvel wrote:
> This introduces the preprocessor symbol KIMAGE_VADDR which will serve as
> the symbolic virtual base of the kernel region, i.e., the kernel's virtual
> offset will be KIMAGE_VADDR + TEXT_OFFSET. For now, we define it as being
> equal to PAGE_OFFSET, but in the future, it will be moved below it once
> we move the kernel virtual mapping out of the linear mapping.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
>  arch/arm64/include/asm/memory.h | 10 ++++++++--
>  arch/arm64/kernel/head.S        |  2 +-
>  arch/arm64/kernel/vmlinux.lds.S |  4 ++--
>  3 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index 853953cd1f08..bea9631b34a8 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -51,7 +51,8 @@
>  #define VA_BITS			(CONFIG_ARM64_VA_BITS)
>  #define VA_START		(UL(0xffffffffffffffff) << VA_BITS)
>  #define PAGE_OFFSET		(UL(0xffffffffffffffff) << (VA_BITS - 1))
> -#define MODULES_END		(PAGE_OFFSET)
> +#define KIMAGE_VADDR		(PAGE_OFFSET)
> +#define MODULES_END		(KIMAGE_VADDR)
>  #define MODULES_VADDR		(MODULES_END - SZ_64M)
>  #define PCI_IO_END		(MODULES_VADDR - SZ_2M)
>  #define PCI_IO_START		(PCI_IO_END - PCI_IO_SIZE)
> @@ -75,8 +76,13 @@
>   * private definitions which should NOT be used outside memory.h
>   * files.  Use virt_to_phys/phys_to_virt/__pa/__va instead.
>   */
> -#define __virt_to_phys(x)	(((phys_addr_t)(x) - PAGE_OFFSET + PHYS_OFFSET))
> +#define __virt_to_phys(x) ({						\
> +	phys_addr_t __x = (phys_addr_t)(x);				\
> +	__x >= PAGE_OFFSET ? (__x - PAGE_OFFSET + PHYS_OFFSET) :	\
> +			     (__x - KIMAGE_VADDR + PHYS_OFFSET); })
> +
>  #define __phys_to_virt(x)	((unsigned long)((x) - PHYS_OFFSET + PAGE_OFFSET))
> +#define __phys_to_kimg(x)	((unsigned long)((x) - PHYS_OFFSET + KIMAGE_VADDR))
>  
>  /*
>   * Convert a page to/from a physical address
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 63335fa68426..350515276541 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -389,7 +389,7 @@ __create_page_tables:
>  	 * Map the kernel image (starting with PHYS_OFFSET).
>  	 */
>  	mov	x0, x26				// swapper_pg_dir
> -	mov	x5, #PAGE_OFFSET
> +	ldr	x5, =KIMAGE_VADDR
>  	create_pgd_entry x0, x5, x3, x6
>  	ldr	x6, =KERNEL_END			// __va(KERNEL_END)
>  	mov	x3, x24				// phys offset
> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index 7de6c39858a5..ced0dedcabcc 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -88,7 +88,7 @@ SECTIONS
>  		*(.discard.*)
>  	}
>  
> -	. = PAGE_OFFSET + TEXT_OFFSET;
> +	. = KIMAGE_VADDR + TEXT_OFFSET;
>  
>  	.head.text : {
>  		_text = .;
> @@ -185,4 +185,4 @@ ASSERT(__idmap_text_end - (__idmap_text_start & ~(SZ_4K - 1)) <= SZ_4K,
>  /*
>   * If padding is applied before .head.text, virt<->phys conversions will fail.
>   */
> -ASSERT(_text == (PAGE_OFFSET + TEXT_OFFSET), "HEAD is misaligned")
> +ASSERT(_text == (KIMAGE_VADDR + TEXT_OFFSET), "HEAD is misaligned")
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 02/21] arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
@ 2016-01-11 16:31     ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-11 16:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 11, 2016 at 02:18:55PM +0100, Ard Biesheuvel wrote:
> This introduces the preprocessor symbol KIMAGE_VADDR which will serve as
> the symbolic virtual base of the kernel region, i.e., the kernel's virtual
> offset will be KIMAGE_VADDR + TEXT_OFFSET. For now, we define it as being
> equal to PAGE_OFFSET, but in the future, it will be moved below it once
> we move the kernel virtual mapping out of the linear mapping.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
>  arch/arm64/include/asm/memory.h | 10 ++++++++--
>  arch/arm64/kernel/head.S        |  2 +-
>  arch/arm64/kernel/vmlinux.lds.S |  4 ++--
>  3 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index 853953cd1f08..bea9631b34a8 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -51,7 +51,8 @@
>  #define VA_BITS			(CONFIG_ARM64_VA_BITS)
>  #define VA_START		(UL(0xffffffffffffffff) << VA_BITS)
>  #define PAGE_OFFSET		(UL(0xffffffffffffffff) << (VA_BITS - 1))
> -#define MODULES_END		(PAGE_OFFSET)
> +#define KIMAGE_VADDR		(PAGE_OFFSET)
> +#define MODULES_END		(KIMAGE_VADDR)
>  #define MODULES_VADDR		(MODULES_END - SZ_64M)
>  #define PCI_IO_END		(MODULES_VADDR - SZ_2M)
>  #define PCI_IO_START		(PCI_IO_END - PCI_IO_SIZE)
> @@ -75,8 +76,13 @@
>   * private definitions which should NOT be used outside memory.h
>   * files.  Use virt_to_phys/phys_to_virt/__pa/__va instead.
>   */
> -#define __virt_to_phys(x)	(((phys_addr_t)(x) - PAGE_OFFSET + PHYS_OFFSET))
> +#define __virt_to_phys(x) ({						\
> +	phys_addr_t __x = (phys_addr_t)(x);				\
> +	__x >= PAGE_OFFSET ? (__x - PAGE_OFFSET + PHYS_OFFSET) :	\
> +			     (__x - KIMAGE_VADDR + PHYS_OFFSET); })
> +
>  #define __phys_to_virt(x)	((unsigned long)((x) - PHYS_OFFSET + PAGE_OFFSET))
> +#define __phys_to_kimg(x)	((unsigned long)((x) - PHYS_OFFSET + KIMAGE_VADDR))
>  
>  /*
>   * Convert a page to/from a physical address
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 63335fa68426..350515276541 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -389,7 +389,7 @@ __create_page_tables:
>  	 * Map the kernel image (starting with PHYS_OFFSET).
>  	 */
>  	mov	x0, x26				// swapper_pg_dir
> -	mov	x5, #PAGE_OFFSET
> +	ldr	x5, =KIMAGE_VADDR
>  	create_pgd_entry x0, x5, x3, x6
>  	ldr	x6, =KERNEL_END			// __va(KERNEL_END)
>  	mov	x3, x24				// phys offset
> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index 7de6c39858a5..ced0dedcabcc 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -88,7 +88,7 @@ SECTIONS
>  		*(.discard.*)
>  	}
>  
> -	. = PAGE_OFFSET + TEXT_OFFSET;
> +	. = KIMAGE_VADDR + TEXT_OFFSET;
>  
>  	.head.text : {
>  		_text = .;
> @@ -185,4 +185,4 @@ ASSERT(__idmap_text_end - (__idmap_text_start & ~(SZ_4K - 1)) <= SZ_4K,
>  /*
>   * If padding is applied before .head.text, virt<->phys conversions will fail.
>   */
> -ASSERT(_text == (PAGE_OFFSET + TEXT_OFFSET), "HEAD is misaligned")
> +ASSERT(_text == (KIMAGE_VADDR + TEXT_OFFSET), "HEAD is misaligned")
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 02/21] arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
@ 2016-01-11 16:31     ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-11 16:31 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	leif.lindholm, keescook, linux-kernel, stuart.yoder,
	bhupesh.sharma, arnd, marc.zyngier, christoffer.dall

On Mon, Jan 11, 2016 at 02:18:55PM +0100, Ard Biesheuvel wrote:
> This introduces the preprocessor symbol KIMAGE_VADDR which will serve as
> the symbolic virtual base of the kernel region, i.e., the kernel's virtual
> offset will be KIMAGE_VADDR + TEXT_OFFSET. For now, we define it as being
> equal to PAGE_OFFSET, but in the future, it will be moved below it once
> we move the kernel virtual mapping out of the linear mapping.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
>  arch/arm64/include/asm/memory.h | 10 ++++++++--
>  arch/arm64/kernel/head.S        |  2 +-
>  arch/arm64/kernel/vmlinux.lds.S |  4 ++--
>  3 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index 853953cd1f08..bea9631b34a8 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -51,7 +51,8 @@
>  #define VA_BITS			(CONFIG_ARM64_VA_BITS)
>  #define VA_START		(UL(0xffffffffffffffff) << VA_BITS)
>  #define PAGE_OFFSET		(UL(0xffffffffffffffff) << (VA_BITS - 1))
> -#define MODULES_END		(PAGE_OFFSET)
> +#define KIMAGE_VADDR		(PAGE_OFFSET)
> +#define MODULES_END		(KIMAGE_VADDR)
>  #define MODULES_VADDR		(MODULES_END - SZ_64M)
>  #define PCI_IO_END		(MODULES_VADDR - SZ_2M)
>  #define PCI_IO_START		(PCI_IO_END - PCI_IO_SIZE)
> @@ -75,8 +76,13 @@
>   * private definitions which should NOT be used outside memory.h
>   * files.  Use virt_to_phys/phys_to_virt/__pa/__va instead.
>   */
> -#define __virt_to_phys(x)	(((phys_addr_t)(x) - PAGE_OFFSET + PHYS_OFFSET))
> +#define __virt_to_phys(x) ({						\
> +	phys_addr_t __x = (phys_addr_t)(x);				\
> +	__x >= PAGE_OFFSET ? (__x - PAGE_OFFSET + PHYS_OFFSET) :	\
> +			     (__x - KIMAGE_VADDR + PHYS_OFFSET); })
> +
>  #define __phys_to_virt(x)	((unsigned long)((x) - PHYS_OFFSET + PAGE_OFFSET))
> +#define __phys_to_kimg(x)	((unsigned long)((x) - PHYS_OFFSET + KIMAGE_VADDR))
>  
>  /*
>   * Convert a page to/from a physical address
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 63335fa68426..350515276541 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -389,7 +389,7 @@ __create_page_tables:
>  	 * Map the kernel image (starting with PHYS_OFFSET).
>  	 */
>  	mov	x0, x26				// swapper_pg_dir
> -	mov	x5, #PAGE_OFFSET
> +	ldr	x5, =KIMAGE_VADDR
>  	create_pgd_entry x0, x5, x3, x6
>  	ldr	x6, =KERNEL_END			// __va(KERNEL_END)
>  	mov	x3, x24				// phys offset
> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index 7de6c39858a5..ced0dedcabcc 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -88,7 +88,7 @@ SECTIONS
>  		*(.discard.*)
>  	}
>  
> -	. = PAGE_OFFSET + TEXT_OFFSET;
> +	. = KIMAGE_VADDR + TEXT_OFFSET;
>  
>  	.head.text : {
>  		_text = .;
> @@ -185,4 +185,4 @@ ASSERT(__idmap_text_end - (__idmap_text_start & ~(SZ_4K - 1)) <= SZ_4K,
>  /*
>   * If padding is applied before .head.text, virt<->phys conversions will fail.
>   */
> -ASSERT(_text == (PAGE_OFFSET + TEXT_OFFSET), "HEAD is misaligned")
> +ASSERT(_text == (KIMAGE_VADDR + TEXT_OFFSET), "HEAD is misaligned")
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping
  2016-01-11 16:27         ` Mark Rutland
  (?)
@ 2016-01-11 16:51           ` Mark Rutland
  -1 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-11 16:51 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Kees Cook, Arnd Bergmann, kernel-hardening, Sharma Bhupesh,
	Catalin Marinas, Will Deacon, linux-kernel, Leif Lindholm,
	Stuart Yoder, Marc Zyngier, Christoffer Dall, linux-arm-kernel

On Mon, Jan 11, 2016 at 04:27:38PM +0000, Mark Rutland wrote:
> On Mon, Jan 11, 2016 at 05:15:13PM +0100, Ard Biesheuvel wrote:
> > On 11 January 2016 at 17:09, Mark Rutland <mark.rutland@arm.com> wrote:
> > > On Mon, Jan 11, 2016 at 02:18:57PM +0100, Ard Biesheuvel wrote:
> > >> Since the early fixmap page tables are populated using pages that are
> > >> part of the static footprint of the kernel, they are covered by the
> > >> initial kernel mapping, and we can refer to them without using __va/__pa
> > >> translations, which are tied to the linear mapping.
> > >>
> > >> Since the fixmap page tables are disjoint from the kernel mapping up
> > >> to the top level pgd entry, we can refer to bm_pte[] directly, and there
> > >> is no need to walk the page tables and perform __pa()/__va() translations
> > >> at each step.
> > >>
> > >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > >> ---
> > >>  arch/arm64/mm/mmu.c | 32 ++++++--------------
> > >>  1 file changed, 9 insertions(+), 23 deletions(-)
> > >>
> > >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > >> index 7711554a94f4..75b5f0dc3bdc 100644
> > >> --- a/arch/arm64/mm/mmu.c
> > >> +++ b/arch/arm64/mm/mmu.c
> > >> @@ -570,38 +570,24 @@ void vmemmap_free(unsigned long start, unsigned long end)
> > >>  #endif       /* CONFIG_SPARSEMEM_VMEMMAP */
> > >>
> > >>  static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> > >> -#if CONFIG_PGTABLE_LEVELS > 2
> > >>  static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
> > >> -#endif
> > >> -#if CONFIG_PGTABLE_LEVELS > 3
> > >>  static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
> > >> -#endif
> > >>
> > >>  static inline pud_t * fixmap_pud(unsigned long addr)
> > >>  {
> > >> -     pgd_t *pgd = pgd_offset_k(addr);
> > >> -
> > >> -     BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
> > >> -
> > >> -     return pud_offset(pgd, addr);
> > >> +     return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
> > >> +                                        : (pud_t *)pgd_offset_k(addr);
> > >
> > > If we move patch 6 earlier, we could use pud_offset_kimg here, and avoid
> > > the cast, at the cost of passing the pgd into fixmap_pud.
> > >
> > > Similarly for fixmap_pmd.
> > >
> > 
> > Is that necessarily an improvement? I know it hides the cast, but I
> > think having an explicit pgd_t* to pud_t* cast that so obviously
> > applies to CONFIG_PGTABLE_LEVELS < 4 only is fine as well.
> 
> True; it's not a big thing either way.

Sorry,  I'm gonig to change my mind on that again. I think using
p?d_offset_kimg is preferable. e.g.

static inline pud_t * fixmap_pud(unsigned long addr)
{
        pgd_t *pgd = pgd_offset_k(addr);

        BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));

        return pud_offset_kimg(pgd, addr);
}

static inline pmd_t * fixmap_pmd(unsigned long addr)
{
        pud_t *pud = fixmap_pud(addr);

        BUG_ON(pud_none(*pud) || pud_bad(*pud));

        return pmd_offset_kimg(pud, addr);
}

That avoids having to check CONFIG_PGTABLE_LEVELS check and perform a cast,
avoids duplicating details about bm_{pud,pmd}, and keeps the existing structure
so it's easier to reason about the change. I was wrong about having to pass the
pgd or pud in, so callers don't need upating.

>From my PoV that is preferable.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping
@ 2016-01-11 16:51           ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-11 16:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 11, 2016 at 04:27:38PM +0000, Mark Rutland wrote:
> On Mon, Jan 11, 2016 at 05:15:13PM +0100, Ard Biesheuvel wrote:
> > On 11 January 2016 at 17:09, Mark Rutland <mark.rutland@arm.com> wrote:
> > > On Mon, Jan 11, 2016 at 02:18:57PM +0100, Ard Biesheuvel wrote:
> > >> Since the early fixmap page tables are populated using pages that are
> > >> part of the static footprint of the kernel, they are covered by the
> > >> initial kernel mapping, and we can refer to them without using __va/__pa
> > >> translations, which are tied to the linear mapping.
> > >>
> > >> Since the fixmap page tables are disjoint from the kernel mapping up
> > >> to the top level pgd entry, we can refer to bm_pte[] directly, and there
> > >> is no need to walk the page tables and perform __pa()/__va() translations
> > >> at each step.
> > >>
> > >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > >> ---
> > >>  arch/arm64/mm/mmu.c | 32 ++++++--------------
> > >>  1 file changed, 9 insertions(+), 23 deletions(-)
> > >>
> > >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > >> index 7711554a94f4..75b5f0dc3bdc 100644
> > >> --- a/arch/arm64/mm/mmu.c
> > >> +++ b/arch/arm64/mm/mmu.c
> > >> @@ -570,38 +570,24 @@ void vmemmap_free(unsigned long start, unsigned long end)
> > >>  #endif       /* CONFIG_SPARSEMEM_VMEMMAP */
> > >>
> > >>  static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> > >> -#if CONFIG_PGTABLE_LEVELS > 2
> > >>  static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
> > >> -#endif
> > >> -#if CONFIG_PGTABLE_LEVELS > 3
> > >>  static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
> > >> -#endif
> > >>
> > >>  static inline pud_t * fixmap_pud(unsigned long addr)
> > >>  {
> > >> -     pgd_t *pgd = pgd_offset_k(addr);
> > >> -
> > >> -     BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
> > >> -
> > >> -     return pud_offset(pgd, addr);
> > >> +     return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
> > >> +                                        : (pud_t *)pgd_offset_k(addr);
> > >
> > > If we move patch 6 earlier, we could use pud_offset_kimg here, and avoid
> > > the cast, at the cost of passing the pgd into fixmap_pud.
> > >
> > > Similarly for fixmap_pmd.
> > >
> > 
> > Is that necessarily an improvement? I know it hides the cast, but I
> > think having an explicit pgd_t* to pud_t* cast that so obviously
> > applies to CONFIG_PGTABLE_LEVELS < 4 only is fine as well.
> 
> True; it's not a big thing either way.

Sorry,  I'm gonig to change my mind on that again. I think using
p?d_offset_kimg is preferable. e.g.

static inline pud_t * fixmap_pud(unsigned long addr)
{
        pgd_t *pgd = pgd_offset_k(addr);

        BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));

        return pud_offset_kimg(pgd, addr);
}

static inline pmd_t * fixmap_pmd(unsigned long addr)
{
        pud_t *pud = fixmap_pud(addr);

        BUG_ON(pud_none(*pud) || pud_bad(*pud));

        return pmd_offset_kimg(pud, addr);
}

That avoids having to check CONFIG_PGTABLE_LEVELS check and perform a cast,
avoids duplicating details about bm_{pud,pmd}, and keeps the existing structure
so it's easier to reason about the change. I was wrong about having to pass the
pgd or pud in, so callers don't need upating.

>From my PoV that is preferable.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping
@ 2016-01-11 16:51           ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-11 16:51 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Kees Cook, Arnd Bergmann, kernel-hardening, Sharma Bhupesh,
	Catalin Marinas, Will Deacon, linux-kernel, Leif Lindholm,
	Stuart Yoder, Marc Zyngier, Christoffer Dall, linux-arm-kernel

On Mon, Jan 11, 2016 at 04:27:38PM +0000, Mark Rutland wrote:
> On Mon, Jan 11, 2016 at 05:15:13PM +0100, Ard Biesheuvel wrote:
> > On 11 January 2016 at 17:09, Mark Rutland <mark.rutland@arm.com> wrote:
> > > On Mon, Jan 11, 2016 at 02:18:57PM +0100, Ard Biesheuvel wrote:
> > >> Since the early fixmap page tables are populated using pages that are
> > >> part of the static footprint of the kernel, they are covered by the
> > >> initial kernel mapping, and we can refer to them without using __va/__pa
> > >> translations, which are tied to the linear mapping.
> > >>
> > >> Since the fixmap page tables are disjoint from the kernel mapping up
> > >> to the top level pgd entry, we can refer to bm_pte[] directly, and there
> > >> is no need to walk the page tables and perform __pa()/__va() translations
> > >> at each step.
> > >>
> > >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > >> ---
> > >>  arch/arm64/mm/mmu.c | 32 ++++++--------------
> > >>  1 file changed, 9 insertions(+), 23 deletions(-)
> > >>
> > >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > >> index 7711554a94f4..75b5f0dc3bdc 100644
> > >> --- a/arch/arm64/mm/mmu.c
> > >> +++ b/arch/arm64/mm/mmu.c
> > >> @@ -570,38 +570,24 @@ void vmemmap_free(unsigned long start, unsigned long end)
> > >>  #endif       /* CONFIG_SPARSEMEM_VMEMMAP */
> > >>
> > >>  static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> > >> -#if CONFIG_PGTABLE_LEVELS > 2
> > >>  static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
> > >> -#endif
> > >> -#if CONFIG_PGTABLE_LEVELS > 3
> > >>  static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
> > >> -#endif
> > >>
> > >>  static inline pud_t * fixmap_pud(unsigned long addr)
> > >>  {
> > >> -     pgd_t *pgd = pgd_offset_k(addr);
> > >> -
> > >> -     BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
> > >> -
> > >> -     return pud_offset(pgd, addr);
> > >> +     return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
> > >> +                                        : (pud_t *)pgd_offset_k(addr);
> > >
> > > If we move patch 6 earlier, we could use pud_offset_kimg here, and avoid
> > > the cast, at the cost of passing the pgd into fixmap_pud.
> > >
> > > Similarly for fixmap_pmd.
> > >
> > 
> > Is that necessarily an improvement? I know it hides the cast, but I
> > think having an explicit pgd_t* to pud_t* cast that so obviously
> > applies to CONFIG_PGTABLE_LEVELS < 4 only is fine as well.
> 
> True; it's not a big thing either way.

Sorry,  I'm gonig to change my mind on that again. I think using
p?d_offset_kimg is preferable. e.g.

static inline pud_t * fixmap_pud(unsigned long addr)
{
        pgd_t *pgd = pgd_offset_k(addr);

        BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));

        return pud_offset_kimg(pgd, addr);
}

static inline pmd_t * fixmap_pmd(unsigned long addr)
{
        pud_t *pud = fixmap_pud(addr);

        BUG_ON(pud_none(*pud) || pud_bad(*pud));

        return pmd_offset_kimg(pud, addr);
}

That avoids having to check CONFIG_PGTABLE_LEVELS check and perform a cast,
avoids duplicating details about bm_{pud,pmd}, and keeps the existing structure
so it's easier to reason about the change. I was wrong about having to pass the
pgd or pud in, so callers don't need upating.

>From my PoV that is preferable.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping
  2016-01-11 16:51           ` Mark Rutland
  (?)
@ 2016-01-11 17:08             ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 17:08 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Kees Cook, Arnd Bergmann, kernel-hardening, Sharma Bhupesh,
	Catalin Marinas, Will Deacon, linux-kernel, Leif Lindholm,
	Stuart Yoder, Marc Zyngier, Christoffer Dall, linux-arm-kernel

On 11 January 2016 at 17:51, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Jan 11, 2016 at 04:27:38PM +0000, Mark Rutland wrote:
>> On Mon, Jan 11, 2016 at 05:15:13PM +0100, Ard Biesheuvel wrote:
>> > On 11 January 2016 at 17:09, Mark Rutland <mark.rutland@arm.com> wrote:
>> > > On Mon, Jan 11, 2016 at 02:18:57PM +0100, Ard Biesheuvel wrote:
>> > >> Since the early fixmap page tables are populated using pages that are
>> > >> part of the static footprint of the kernel, they are covered by the
>> > >> initial kernel mapping, and we can refer to them without using __va/__pa
>> > >> translations, which are tied to the linear mapping.
>> > >>
>> > >> Since the fixmap page tables are disjoint from the kernel mapping up
>> > >> to the top level pgd entry, we can refer to bm_pte[] directly, and there
>> > >> is no need to walk the page tables and perform __pa()/__va() translations
>> > >> at each step.
>> > >>
>> > >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> > >> ---
>> > >>  arch/arm64/mm/mmu.c | 32 ++++++--------------
>> > >>  1 file changed, 9 insertions(+), 23 deletions(-)
>> > >>
>> > >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> > >> index 7711554a94f4..75b5f0dc3bdc 100644
>> > >> --- a/arch/arm64/mm/mmu.c
>> > >> +++ b/arch/arm64/mm/mmu.c
>> > >> @@ -570,38 +570,24 @@ void vmemmap_free(unsigned long start, unsigned long end)
>> > >>  #endif       /* CONFIG_SPARSEMEM_VMEMMAP */
>> > >>
>> > >>  static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>> > >> -#if CONFIG_PGTABLE_LEVELS > 2
>> > >>  static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
>> > >> -#endif
>> > >> -#if CONFIG_PGTABLE_LEVELS > 3
>> > >>  static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
>> > >> -#endif
>> > >>
>> > >>  static inline pud_t * fixmap_pud(unsigned long addr)
>> > >>  {
>> > >> -     pgd_t *pgd = pgd_offset_k(addr);
>> > >> -
>> > >> -     BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
>> > >> -
>> > >> -     return pud_offset(pgd, addr);
>> > >> +     return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
>> > >> +                                        : (pud_t *)pgd_offset_k(addr);
>> > >
>> > > If we move patch 6 earlier, we could use pud_offset_kimg here, and avoid
>> > > the cast, at the cost of passing the pgd into fixmap_pud.
>> > >
>> > > Similarly for fixmap_pmd.
>> > >
>> >
>> > Is that necessarily an improvement? I know it hides the cast, but I
>> > think having an explicit pgd_t* to pud_t* cast that so obviously
>> > applies to CONFIG_PGTABLE_LEVELS < 4 only is fine as well.
>>
>> True; it's not a big thing either way.
>
> Sorry,  I'm gonig to change my mind on that again. I think using
> p?d_offset_kimg is preferable. e.g.
>
> static inline pud_t * fixmap_pud(unsigned long addr)
> {
>         pgd_t *pgd = pgd_offset_k(addr);
>
>         BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
>
>         return pud_offset_kimg(pgd, addr);
> }
>
> static inline pmd_t * fixmap_pmd(unsigned long addr)
> {
>         pud_t *pud = fixmap_pud(addr);
>
>         BUG_ON(pud_none(*pud) || pud_bad(*pud));
>
>         return pmd_offset_kimg(pud, addr);
> }
>
> That avoids having to check CONFIG_PGTABLE_LEVELS check and perform a cast,
> avoids duplicating details about bm_{pud,pmd}, and keeps the existing structure
> so it's easier to reason about the change. I was wrong about having to pass the
> pgd or pud in, so callers don't need upating.
>
> From my PoV that is preferable.
>

OK. I think it looks better, indeed.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping
@ 2016-01-11 17:08             ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 17:08 UTC (permalink / raw)
  To: linux-arm-kernel

On 11 January 2016 at 17:51, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Jan 11, 2016 at 04:27:38PM +0000, Mark Rutland wrote:
>> On Mon, Jan 11, 2016 at 05:15:13PM +0100, Ard Biesheuvel wrote:
>> > On 11 January 2016 at 17:09, Mark Rutland <mark.rutland@arm.com> wrote:
>> > > On Mon, Jan 11, 2016 at 02:18:57PM +0100, Ard Biesheuvel wrote:
>> > >> Since the early fixmap page tables are populated using pages that are
>> > >> part of the static footprint of the kernel, they are covered by the
>> > >> initial kernel mapping, and we can refer to them without using __va/__pa
>> > >> translations, which are tied to the linear mapping.
>> > >>
>> > >> Since the fixmap page tables are disjoint from the kernel mapping up
>> > >> to the top level pgd entry, we can refer to bm_pte[] directly, and there
>> > >> is no need to walk the page tables and perform __pa()/__va() translations
>> > >> at each step.
>> > >>
>> > >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> > >> ---
>> > >>  arch/arm64/mm/mmu.c | 32 ++++++--------------
>> > >>  1 file changed, 9 insertions(+), 23 deletions(-)
>> > >>
>> > >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> > >> index 7711554a94f4..75b5f0dc3bdc 100644
>> > >> --- a/arch/arm64/mm/mmu.c
>> > >> +++ b/arch/arm64/mm/mmu.c
>> > >> @@ -570,38 +570,24 @@ void vmemmap_free(unsigned long start, unsigned long end)
>> > >>  #endif       /* CONFIG_SPARSEMEM_VMEMMAP */
>> > >>
>> > >>  static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>> > >> -#if CONFIG_PGTABLE_LEVELS > 2
>> > >>  static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
>> > >> -#endif
>> > >> -#if CONFIG_PGTABLE_LEVELS > 3
>> > >>  static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
>> > >> -#endif
>> > >>
>> > >>  static inline pud_t * fixmap_pud(unsigned long addr)
>> > >>  {
>> > >> -     pgd_t *pgd = pgd_offset_k(addr);
>> > >> -
>> > >> -     BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
>> > >> -
>> > >> -     return pud_offset(pgd, addr);
>> > >> +     return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
>> > >> +                                        : (pud_t *)pgd_offset_k(addr);
>> > >
>> > > If we move patch 6 earlier, we could use pud_offset_kimg here, and avoid
>> > > the cast, at the cost of passing the pgd into fixmap_pud.
>> > >
>> > > Similarly for fixmap_pmd.
>> > >
>> >
>> > Is that necessarily an improvement? I know it hides the cast, but I
>> > think having an explicit pgd_t* to pud_t* cast that so obviously
>> > applies to CONFIG_PGTABLE_LEVELS < 4 only is fine as well.
>>
>> True; it's not a big thing either way.
>
> Sorry,  I'm gonig to change my mind on that again. I think using
> p?d_offset_kimg is preferable. e.g.
>
> static inline pud_t * fixmap_pud(unsigned long addr)
> {
>         pgd_t *pgd = pgd_offset_k(addr);
>
>         BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
>
>         return pud_offset_kimg(pgd, addr);
> }
>
> static inline pmd_t * fixmap_pmd(unsigned long addr)
> {
>         pud_t *pud = fixmap_pud(addr);
>
>         BUG_ON(pud_none(*pud) || pud_bad(*pud));
>
>         return pmd_offset_kimg(pud, addr);
> }
>
> That avoids having to check CONFIG_PGTABLE_LEVELS check and perform a cast,
> avoids duplicating details about bm_{pud,pmd}, and keeps the existing structure
> so it's easier to reason about the change. I was wrong about having to pass the
> pgd or pud in, so callers don't need upating.
>
> From my PoV that is preferable.
>

OK. I think it looks better, indeed.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping
@ 2016-01-11 17:08             ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 17:08 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Kees Cook, Arnd Bergmann, kernel-hardening, Sharma Bhupesh,
	Catalin Marinas, Will Deacon, linux-kernel, Leif Lindholm,
	Stuart Yoder, Marc Zyngier, Christoffer Dall, linux-arm-kernel

On 11 January 2016 at 17:51, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Jan 11, 2016 at 04:27:38PM +0000, Mark Rutland wrote:
>> On Mon, Jan 11, 2016 at 05:15:13PM +0100, Ard Biesheuvel wrote:
>> > On 11 January 2016 at 17:09, Mark Rutland <mark.rutland@arm.com> wrote:
>> > > On Mon, Jan 11, 2016 at 02:18:57PM +0100, Ard Biesheuvel wrote:
>> > >> Since the early fixmap page tables are populated using pages that are
>> > >> part of the static footprint of the kernel, they are covered by the
>> > >> initial kernel mapping, and we can refer to them without using __va/__pa
>> > >> translations, which are tied to the linear mapping.
>> > >>
>> > >> Since the fixmap page tables are disjoint from the kernel mapping up
>> > >> to the top level pgd entry, we can refer to bm_pte[] directly, and there
>> > >> is no need to walk the page tables and perform __pa()/__va() translations
>> > >> at each step.
>> > >>
>> > >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> > >> ---
>> > >>  arch/arm64/mm/mmu.c | 32 ++++++--------------
>> > >>  1 file changed, 9 insertions(+), 23 deletions(-)
>> > >>
>> > >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> > >> index 7711554a94f4..75b5f0dc3bdc 100644
>> > >> --- a/arch/arm64/mm/mmu.c
>> > >> +++ b/arch/arm64/mm/mmu.c
>> > >> @@ -570,38 +570,24 @@ void vmemmap_free(unsigned long start, unsigned long end)
>> > >>  #endif       /* CONFIG_SPARSEMEM_VMEMMAP */
>> > >>
>> > >>  static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>> > >> -#if CONFIG_PGTABLE_LEVELS > 2
>> > >>  static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
>> > >> -#endif
>> > >> -#if CONFIG_PGTABLE_LEVELS > 3
>> > >>  static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
>> > >> -#endif
>> > >>
>> > >>  static inline pud_t * fixmap_pud(unsigned long addr)
>> > >>  {
>> > >> -     pgd_t *pgd = pgd_offset_k(addr);
>> > >> -
>> > >> -     BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
>> > >> -
>> > >> -     return pud_offset(pgd, addr);
>> > >> +     return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
>> > >> +                                        : (pud_t *)pgd_offset_k(addr);
>> > >
>> > > If we move patch 6 earlier, we could use pud_offset_kimg here, and avoid
>> > > the cast, at the cost of passing the pgd into fixmap_pud.
>> > >
>> > > Similarly for fixmap_pmd.
>> > >
>> >
>> > Is that necessarily an improvement? I know it hides the cast, but I
>> > think having an explicit pgd_t* to pud_t* cast that so obviously
>> > applies to CONFIG_PGTABLE_LEVELS < 4 only is fine as well.
>>
>> True; it's not a big thing either way.
>
> Sorry,  I'm gonig to change my mind on that again. I think using
> p?d_offset_kimg is preferable. e.g.
>
> static inline pud_t * fixmap_pud(unsigned long addr)
> {
>         pgd_t *pgd = pgd_offset_k(addr);
>
>         BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
>
>         return pud_offset_kimg(pgd, addr);
> }
>
> static inline pmd_t * fixmap_pmd(unsigned long addr)
> {
>         pud_t *pud = fixmap_pud(addr);
>
>         BUG_ON(pud_none(*pud) || pud_bad(*pud));
>
>         return pmd_offset_kimg(pud, addr);
> }
>
> That avoids having to check CONFIG_PGTABLE_LEVELS check and perform a cast,
> avoids duplicating details about bm_{pud,pmd}, and keeps the existing structure
> so it's easier to reason about the change. I was wrong about having to pass the
> pgd or pud in, so callers don't need upating.
>
> From my PoV that is preferable.
>

OK. I think it looks better, indeed.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping
  2016-01-11 17:08             ` Ard Biesheuvel
  (?)
@ 2016-01-11 17:15               ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 17:15 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Kees Cook, Arnd Bergmann, kernel-hardening, Sharma Bhupesh,
	Catalin Marinas, Will Deacon, linux-kernel, Leif Lindholm,
	Stuart Yoder, Marc Zyngier, Christoffer Dall, linux-arm-kernel

On 11 January 2016 at 18:08, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 11 January 2016 at 17:51, Mark Rutland <mark.rutland@arm.com> wrote:
>> On Mon, Jan 11, 2016 at 04:27:38PM +0000, Mark Rutland wrote:
>>> On Mon, Jan 11, 2016 at 05:15:13PM +0100, Ard Biesheuvel wrote:
>>> > On 11 January 2016 at 17:09, Mark Rutland <mark.rutland@arm.com> wrote:
>>> > > On Mon, Jan 11, 2016 at 02:18:57PM +0100, Ard Biesheuvel wrote:
>>> > >> Since the early fixmap page tables are populated using pages that are
>>> > >> part of the static footprint of the kernel, they are covered by the
>>> > >> initial kernel mapping, and we can refer to them without using __va/__pa
>>> > >> translations, which are tied to the linear mapping.
>>> > >>
>>> > >> Since the fixmap page tables are disjoint from the kernel mapping up
>>> > >> to the top level pgd entry, we can refer to bm_pte[] directly, and there
>>> > >> is no need to walk the page tables and perform __pa()/__va() translations
>>> > >> at each step.
>>> > >>
>>> > >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>> > >> ---
>>> > >>  arch/arm64/mm/mmu.c | 32 ++++++--------------
>>> > >>  1 file changed, 9 insertions(+), 23 deletions(-)
>>> > >>
>>> > >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> > >> index 7711554a94f4..75b5f0dc3bdc 100644
>>> > >> --- a/arch/arm64/mm/mmu.c
>>> > >> +++ b/arch/arm64/mm/mmu.c
>>> > >> @@ -570,38 +570,24 @@ void vmemmap_free(unsigned long start, unsigned long end)
>>> > >>  #endif       /* CONFIG_SPARSEMEM_VMEMMAP */
>>> > >>
>>> > >>  static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>>> > >> -#if CONFIG_PGTABLE_LEVELS > 2
>>> > >>  static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
>>> > >> -#endif
>>> > >> -#if CONFIG_PGTABLE_LEVELS > 3
>>> > >>  static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
>>> > >> -#endif
>>> > >>
>>> > >>  static inline pud_t * fixmap_pud(unsigned long addr)
>>> > >>  {
>>> > >> -     pgd_t *pgd = pgd_offset_k(addr);
>>> > >> -
>>> > >> -     BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
>>> > >> -
>>> > >> -     return pud_offset(pgd, addr);
>>> > >> +     return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
>>> > >> +                                        : (pud_t *)pgd_offset_k(addr);
>>> > >
>>> > > If we move patch 6 earlier, we could use pud_offset_kimg here, and avoid
>>> > > the cast, at the cost of passing the pgd into fixmap_pud.
>>> > >
>>> > > Similarly for fixmap_pmd.
>>> > >
>>> >
>>> > Is that necessarily an improvement? I know it hides the cast, but I
>>> > think having an explicit pgd_t* to pud_t* cast that so obviously
>>> > applies to CONFIG_PGTABLE_LEVELS < 4 only is fine as well.
>>>
>>> True; it's not a big thing either way.
>>
>> Sorry,  I'm gonig to change my mind on that again. I think using
>> p?d_offset_kimg is preferable. e.g.
>>
>> static inline pud_t * fixmap_pud(unsigned long addr)
>> {
>>         pgd_t *pgd = pgd_offset_k(addr);
>>
>>         BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
>>
>>         return pud_offset_kimg(pgd, addr);
>> }
>>
>> static inline pmd_t * fixmap_pmd(unsigned long addr)
>> {
>>         pud_t *pud = fixmap_pud(addr);
>>
>>         BUG_ON(pud_none(*pud) || pud_bad(*pud));
>>
>>         return pmd_offset_kimg(pud, addr);
>> }
>>
>> That avoids having to check CONFIG_PGTABLE_LEVELS check and perform a cast,
>> avoids duplicating details about bm_{pud,pmd}, and keeps the existing structure
>> so it's easier to reason about the change. I was wrong about having to pass the
>> pgd or pud in, so callers don't need upating.
>>
>> From my PoV that is preferable.
>>
>
> OK. I think it looks better, indeed.

... however, this does mean we have to go through a __pa() translation
and back just to get to the address of bm_pud/bm_pmd

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping
@ 2016-01-11 17:15               ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 17:15 UTC (permalink / raw)
  To: linux-arm-kernel

On 11 January 2016 at 18:08, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 11 January 2016 at 17:51, Mark Rutland <mark.rutland@arm.com> wrote:
>> On Mon, Jan 11, 2016 at 04:27:38PM +0000, Mark Rutland wrote:
>>> On Mon, Jan 11, 2016 at 05:15:13PM +0100, Ard Biesheuvel wrote:
>>> > On 11 January 2016 at 17:09, Mark Rutland <mark.rutland@arm.com> wrote:
>>> > > On Mon, Jan 11, 2016 at 02:18:57PM +0100, Ard Biesheuvel wrote:
>>> > >> Since the early fixmap page tables are populated using pages that are
>>> > >> part of the static footprint of the kernel, they are covered by the
>>> > >> initial kernel mapping, and we can refer to them without using __va/__pa
>>> > >> translations, which are tied to the linear mapping.
>>> > >>
>>> > >> Since the fixmap page tables are disjoint from the kernel mapping up
>>> > >> to the top level pgd entry, we can refer to bm_pte[] directly, and there
>>> > >> is no need to walk the page tables and perform __pa()/__va() translations
>>> > >> at each step.
>>> > >>
>>> > >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>> > >> ---
>>> > >>  arch/arm64/mm/mmu.c | 32 ++++++--------------
>>> > >>  1 file changed, 9 insertions(+), 23 deletions(-)
>>> > >>
>>> > >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> > >> index 7711554a94f4..75b5f0dc3bdc 100644
>>> > >> --- a/arch/arm64/mm/mmu.c
>>> > >> +++ b/arch/arm64/mm/mmu.c
>>> > >> @@ -570,38 +570,24 @@ void vmemmap_free(unsigned long start, unsigned long end)
>>> > >>  #endif       /* CONFIG_SPARSEMEM_VMEMMAP */
>>> > >>
>>> > >>  static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>>> > >> -#if CONFIG_PGTABLE_LEVELS > 2
>>> > >>  static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
>>> > >> -#endif
>>> > >> -#if CONFIG_PGTABLE_LEVELS > 3
>>> > >>  static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
>>> > >> -#endif
>>> > >>
>>> > >>  static inline pud_t * fixmap_pud(unsigned long addr)
>>> > >>  {
>>> > >> -     pgd_t *pgd = pgd_offset_k(addr);
>>> > >> -
>>> > >> -     BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
>>> > >> -
>>> > >> -     return pud_offset(pgd, addr);
>>> > >> +     return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
>>> > >> +                                        : (pud_t *)pgd_offset_k(addr);
>>> > >
>>> > > If we move patch 6 earlier, we could use pud_offset_kimg here, and avoid
>>> > > the cast, at the cost of passing the pgd into fixmap_pud.
>>> > >
>>> > > Similarly for fixmap_pmd.
>>> > >
>>> >
>>> > Is that necessarily an improvement? I know it hides the cast, but I
>>> > think having an explicit pgd_t* to pud_t* cast that so obviously
>>> > applies to CONFIG_PGTABLE_LEVELS < 4 only is fine as well.
>>>
>>> True; it's not a big thing either way.
>>
>> Sorry,  I'm gonig to change my mind on that again. I think using
>> p?d_offset_kimg is preferable. e.g.
>>
>> static inline pud_t * fixmap_pud(unsigned long addr)
>> {
>>         pgd_t *pgd = pgd_offset_k(addr);
>>
>>         BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
>>
>>         return pud_offset_kimg(pgd, addr);
>> }
>>
>> static inline pmd_t * fixmap_pmd(unsigned long addr)
>> {
>>         pud_t *pud = fixmap_pud(addr);
>>
>>         BUG_ON(pud_none(*pud) || pud_bad(*pud));
>>
>>         return pmd_offset_kimg(pud, addr);
>> }
>>
>> That avoids having to check CONFIG_PGTABLE_LEVELS check and perform a cast,
>> avoids duplicating details about bm_{pud,pmd}, and keeps the existing structure
>> so it's easier to reason about the change. I was wrong about having to pass the
>> pgd or pud in, so callers don't need upating.
>>
>> From my PoV that is preferable.
>>
>
> OK. I think it looks better, indeed.

... however, this does mean we have to go through a __pa() translation
and back just to get to the address of bm_pud/bm_pmd

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping
@ 2016-01-11 17:15               ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 17:15 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Kees Cook, Arnd Bergmann, kernel-hardening, Sharma Bhupesh,
	Catalin Marinas, Will Deacon, linux-kernel, Leif Lindholm,
	Stuart Yoder, Marc Zyngier, Christoffer Dall, linux-arm-kernel

On 11 January 2016 at 18:08, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 11 January 2016 at 17:51, Mark Rutland <mark.rutland@arm.com> wrote:
>> On Mon, Jan 11, 2016 at 04:27:38PM +0000, Mark Rutland wrote:
>>> On Mon, Jan 11, 2016 at 05:15:13PM +0100, Ard Biesheuvel wrote:
>>> > On 11 January 2016 at 17:09, Mark Rutland <mark.rutland@arm.com> wrote:
>>> > > On Mon, Jan 11, 2016 at 02:18:57PM +0100, Ard Biesheuvel wrote:
>>> > >> Since the early fixmap page tables are populated using pages that are
>>> > >> part of the static footprint of the kernel, they are covered by the
>>> > >> initial kernel mapping, and we can refer to them without using __va/__pa
>>> > >> translations, which are tied to the linear mapping.
>>> > >>
>>> > >> Since the fixmap page tables are disjoint from the kernel mapping up
>>> > >> to the top level pgd entry, we can refer to bm_pte[] directly, and there
>>> > >> is no need to walk the page tables and perform __pa()/__va() translations
>>> > >> at each step.
>>> > >>
>>> > >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>> > >> ---
>>> > >>  arch/arm64/mm/mmu.c | 32 ++++++--------------
>>> > >>  1 file changed, 9 insertions(+), 23 deletions(-)
>>> > >>
>>> > >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> > >> index 7711554a94f4..75b5f0dc3bdc 100644
>>> > >> --- a/arch/arm64/mm/mmu.c
>>> > >> +++ b/arch/arm64/mm/mmu.c
>>> > >> @@ -570,38 +570,24 @@ void vmemmap_free(unsigned long start, unsigned long end)
>>> > >>  #endif       /* CONFIG_SPARSEMEM_VMEMMAP */
>>> > >>
>>> > >>  static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>>> > >> -#if CONFIG_PGTABLE_LEVELS > 2
>>> > >>  static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
>>> > >> -#endif
>>> > >> -#if CONFIG_PGTABLE_LEVELS > 3
>>> > >>  static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
>>> > >> -#endif
>>> > >>
>>> > >>  static inline pud_t * fixmap_pud(unsigned long addr)
>>> > >>  {
>>> > >> -     pgd_t *pgd = pgd_offset_k(addr);
>>> > >> -
>>> > >> -     BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
>>> > >> -
>>> > >> -     return pud_offset(pgd, addr);
>>> > >> +     return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
>>> > >> +                                        : (pud_t *)pgd_offset_k(addr);
>>> > >
>>> > > If we move patch 6 earlier, we could use pud_offset_kimg here, and avoid
>>> > > the cast, at the cost of passing the pgd into fixmap_pud.
>>> > >
>>> > > Similarly for fixmap_pmd.
>>> > >
>>> >
>>> > Is that necessarily an improvement? I know it hides the cast, but I
>>> > think having an explicit pgd_t* to pud_t* cast that so obviously
>>> > applies to CONFIG_PGTABLE_LEVELS < 4 only is fine as well.
>>>
>>> True; it's not a big thing either way.
>>
>> Sorry,  I'm gonig to change my mind on that again. I think using
>> p?d_offset_kimg is preferable. e.g.
>>
>> static inline pud_t * fixmap_pud(unsigned long addr)
>> {
>>         pgd_t *pgd = pgd_offset_k(addr);
>>
>>         BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
>>
>>         return pud_offset_kimg(pgd, addr);
>> }
>>
>> static inline pmd_t * fixmap_pmd(unsigned long addr)
>> {
>>         pud_t *pud = fixmap_pud(addr);
>>
>>         BUG_ON(pud_none(*pud) || pud_bad(*pud));
>>
>>         return pmd_offset_kimg(pud, addr);
>> }
>>
>> That avoids having to check CONFIG_PGTABLE_LEVELS check and perform a cast,
>> avoids duplicating details about bm_{pud,pmd}, and keeps the existing structure
>> so it's easier to reason about the change. I was wrong about having to pass the
>> pgd or pud in, so callers don't need upating.
>>
>> From my PoV that is preferable.
>>
>
> OK. I think it looks better, indeed.

... however, this does mean we have to go through a __pa() translation
and back just to get to the address of bm_pud/bm_pmd

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping
  2016-01-11 17:15               ` Ard Biesheuvel
  (?)
@ 2016-01-11 17:21                 ` Mark Rutland
  -1 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-11 17:21 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Kees Cook, Arnd Bergmann, kernel-hardening, Sharma Bhupesh,
	Catalin Marinas, Will Deacon, linux-kernel, Leif Lindholm,
	Stuart Yoder, Marc Zyngier, Christoffer Dall, linux-arm-kernel

On Mon, Jan 11, 2016 at 06:15:56PM +0100, Ard Biesheuvel wrote:
> On 11 January 2016 at 18:08, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > On 11 January 2016 at 17:51, Mark Rutland <mark.rutland@arm.com> wrote:
> >> Sorry,  I'm gonig to change my mind on that again. I think using
> >> p?d_offset_kimg is preferable. e.g.
> >>
> >> static inline pud_t * fixmap_pud(unsigned long addr)
> >> {
> >>         pgd_t *pgd = pgd_offset_k(addr);
> >>
> >>         BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
> >>
> >>         return pud_offset_kimg(pgd, addr);
> >> }
> >>
> >> static inline pmd_t * fixmap_pmd(unsigned long addr)
> >> {
> >>         pud_t *pud = fixmap_pud(addr);
> >>
> >>         BUG_ON(pud_none(*pud) || pud_bad(*pud));
> >>
> >>         return pmd_offset_kimg(pud, addr);
> >> }
> >>
> >> That avoids having to check CONFIG_PGTABLE_LEVELS check and perform a cast,
> >> avoids duplicating details about bm_{pud,pmd}, and keeps the existing structure
> >> so it's easier to reason about the change. I was wrong about having to pass the
> >> pgd or pud in, so callers don't need upating.
> >>
> >> From my PoV that is preferable.
> >>
> >
> > OK. I think it looks better, indeed.
> 
> ... however, this does mean we have to go through a __pa() translation
> and back just to get to the address of bm_pud/bm_pmd

True, but we only do it in the case of a one-off init function, so I
don't think we'll notice the overhead.

Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping
@ 2016-01-11 17:21                 ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-11 17:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 11, 2016 at 06:15:56PM +0100, Ard Biesheuvel wrote:
> On 11 January 2016 at 18:08, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > On 11 January 2016 at 17:51, Mark Rutland <mark.rutland@arm.com> wrote:
> >> Sorry,  I'm gonig to change my mind on that again. I think using
> >> p?d_offset_kimg is preferable. e.g.
> >>
> >> static inline pud_t * fixmap_pud(unsigned long addr)
> >> {
> >>         pgd_t *pgd = pgd_offset_k(addr);
> >>
> >>         BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
> >>
> >>         return pud_offset_kimg(pgd, addr);
> >> }
> >>
> >> static inline pmd_t * fixmap_pmd(unsigned long addr)
> >> {
> >>         pud_t *pud = fixmap_pud(addr);
> >>
> >>         BUG_ON(pud_none(*pud) || pud_bad(*pud));
> >>
> >>         return pmd_offset_kimg(pud, addr);
> >> }
> >>
> >> That avoids having to check CONFIG_PGTABLE_LEVELS check and perform a cast,
> >> avoids duplicating details about bm_{pud,pmd}, and keeps the existing structure
> >> so it's easier to reason about the change. I was wrong about having to pass the
> >> pgd or pud in, so callers don't need upating.
> >>
> >> From my PoV that is preferable.
> >>
> >
> > OK. I think it looks better, indeed.
> 
> ... however, this does mean we have to go through a __pa() translation
> and back just to get to the address of bm_pud/bm_pmd

True, but we only do it in the case of a one-off init function, so I
don't think we'll notice the overhead.

Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping
@ 2016-01-11 17:21                 ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-11 17:21 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Kees Cook, Arnd Bergmann, kernel-hardening, Sharma Bhupesh,
	Catalin Marinas, Will Deacon, linux-kernel, Leif Lindholm,
	Stuart Yoder, Marc Zyngier, Christoffer Dall, linux-arm-kernel

On Mon, Jan 11, 2016 at 06:15:56PM +0100, Ard Biesheuvel wrote:
> On 11 January 2016 at 18:08, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > On 11 January 2016 at 17:51, Mark Rutland <mark.rutland@arm.com> wrote:
> >> Sorry,  I'm gonig to change my mind on that again. I think using
> >> p?d_offset_kimg is preferable. e.g.
> >>
> >> static inline pud_t * fixmap_pud(unsigned long addr)
> >> {
> >>         pgd_t *pgd = pgd_offset_k(addr);
> >>
> >>         BUG_ON(pgd_none(*pgd) || pgd_bad(*pgd));
> >>
> >>         return pud_offset_kimg(pgd, addr);
> >> }
> >>
> >> static inline pmd_t * fixmap_pmd(unsigned long addr)
> >> {
> >>         pud_t *pud = fixmap_pud(addr);
> >>
> >>         BUG_ON(pud_none(*pud) || pud_bad(*pud));
> >>
> >>         return pmd_offset_kimg(pud, addr);
> >> }
> >>
> >> That avoids having to check CONFIG_PGTABLE_LEVELS check and perform a cast,
> >> avoids duplicating details about bm_{pud,pmd}, and keeps the existing structure
> >> so it's easier to reason about the change. I was wrong about having to pass the
> >> pgd or pud in, so callers don't need upating.
> >>
> >> From my PoV that is preferable.
> >>
> >
> > OK. I think it looks better, indeed.
> 
> ... however, this does mean we have to go through a __pa() translation
> and back just to get to the address of bm_pud/bm_pmd

True, but we only do it in the case of a one-off init function, so I
don't think we'll notice the overhead.

Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 06/21] arm64: pgtable: implement static [pte|pmd|pud]_offset variants
  2016-01-11 16:24     ` Mark Rutland
  (?)
@ 2016-01-11 17:28       ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 17:28 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 11 January 2016 at 17:24, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Jan 11, 2016 at 02:18:59PM +0100, Ard Biesheuvel wrote:
>> The page table accessors pte_offset(), pud_offset() and pmd_offset()
>> rely on __va translations, so they can only be used after the linear
>> mapping has been installed. For the early fixmap and kasan init routines,
>> whose page tables are allocated statically in the kernel image, these
>> functions will return bogus values. So implement pmd_offset_kimg() and
>> pud_offset_kimg(), which can be used instead before any page tables have
>> been allocated dynamically.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>
> This looks good to me. One possible suggsetion below, but either way:
>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
>
>> ---
>>  arch/arm64/include/asm/pgtable.h | 13 +++++++++++++
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index 6129f6755081..7b4e16068c9f 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -449,6 +449,9 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
>>
>>  #define pmd_page(pmd)                pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
>>
>> +/* use ONLY for statically allocated translation tables */
>> +#define pte_offset_kimg(dir,addr)    ((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr))))
>> +
>
> Given that we're probably only going to use this during one-off setup,
> maybe it's worth something like:
>
> #define IN_KERNEL_IMAGE(p) ({                                           \
>         unsigned long __p = (unsigned long)p;                           \
>         KIMAGE_VADDR <= __p && __p < _end;                              \
> })
>
> #define pte_offset_kimg(dir,addr) ({                                    \
>         BUG_ON(!IN_KERNEL_IMAGE(dir));                                  \
>         ((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr))));      \
> })
>
> That might be overkill, though, given all it does is turn one runtime
> failure into another runtime failure.
>

Yes. I did consider implementing them out of line, with __init
annotations so you at least get complaints if you refer to them from
non-init code, but I don't see how we would ever need these anywhere
beyond fixmap and kasan anyway

>
>>  /*
>>   * Conversion functions: convert a page and protection to a page entry,
>>   * and a page entry and page directory to the page they refer to.
>> @@ -492,6 +495,9 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
>>
>>  #define pud_page(pud)                pfn_to_page(__phys_to_pfn(pud_val(pud) & PHYS_MASK))
>>
>> +/* use ONLY for statically allocated translation tables */
>> +#define pmd_offset_kimg(dir,addr)    ((pmd_t *)__phys_to_kimg(pmd_offset_phys((dir), (addr))))
>> +
>>  #else
>>
>>  #define pud_page_paddr(pud)  ({ BUILD_BUG(); 0; })
>> @@ -502,6 +508,8 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
>>  #define pmd_set_fixmap_offset(pudp, addr)    ((pmd_t *)pudp)
>>  #define pmd_clear_fixmap()
>>
>> +#define pmd_offset_kimg(dir,addr)    ((pmd_t *)dir)
>> +
>>  #endif       /* CONFIG_PGTABLE_LEVELS > 2 */
>>
>>  #if CONFIG_PGTABLE_LEVELS > 3
>> @@ -540,6 +548,9 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
>>
>>  #define pgd_page(pgd)                pfn_to_page(__phys_to_pfn(pgd_val(pgd) & PHYS_MASK))
>>
>> +/* use ONLY for statically allocated translation tables */
>> +#define pud_offset_kimg(dir,addr)    ((pud_t *)__phys_to_kimg(pud_offset_phys((dir), (addr))))
>> +
>>  #else
>>
>>  #define pgd_page_paddr(pgd)  ({ BUILD_BUG(); 0;})
>> @@ -550,6 +561,8 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
>>  #define pud_set_fixmap_offset(pgdp, addr)    ((pud_t *)pgdp)
>>  #define pud_clear_fixmap()
>>
>> +#define pud_offset_kimg(dir,addr)    ((pud_t *)dir)
>> +
>>  #endif  /* CONFIG_PGTABLE_LEVELS > 3 */
>>
>>  #define pgd_ERROR(pgd)               __pgd_error(__FILE__, __LINE__, pgd_val(pgd))
>> --
>> 2.5.0
>>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 06/21] arm64: pgtable: implement static [pte|pmd|pud]_offset variants
@ 2016-01-11 17:28       ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 17:28 UTC (permalink / raw)
  To: linux-arm-kernel

On 11 January 2016 at 17:24, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Jan 11, 2016 at 02:18:59PM +0100, Ard Biesheuvel wrote:
>> The page table accessors pte_offset(), pud_offset() and pmd_offset()
>> rely on __va translations, so they can only be used after the linear
>> mapping has been installed. For the early fixmap and kasan init routines,
>> whose page tables are allocated statically in the kernel image, these
>> functions will return bogus values. So implement pmd_offset_kimg() and
>> pud_offset_kimg(), which can be used instead before any page tables have
>> been allocated dynamically.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>
> This looks good to me. One possible suggsetion below, but either way:
>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
>
>> ---
>>  arch/arm64/include/asm/pgtable.h | 13 +++++++++++++
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index 6129f6755081..7b4e16068c9f 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -449,6 +449,9 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
>>
>>  #define pmd_page(pmd)                pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
>>
>> +/* use ONLY for statically allocated translation tables */
>> +#define pte_offset_kimg(dir,addr)    ((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr))))
>> +
>
> Given that we're probably only going to use this during one-off setup,
> maybe it's worth something like:
>
> #define IN_KERNEL_IMAGE(p) ({                                           \
>         unsigned long __p = (unsigned long)p;                           \
>         KIMAGE_VADDR <= __p && __p < _end;                              \
> })
>
> #define pte_offset_kimg(dir,addr) ({                                    \
>         BUG_ON(!IN_KERNEL_IMAGE(dir));                                  \
>         ((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr))));      \
> })
>
> That might be overkill, though, given all it does is turn one runtime
> failure into another runtime failure.
>

Yes. I did consider implementing them out of line, with __init
annotations so you at least get complaints if you refer to them from
non-init code, but I don't see how we would ever need these anywhere
beyond fixmap and kasan anyway

>
>>  /*
>>   * Conversion functions: convert a page and protection to a page entry,
>>   * and a page entry and page directory to the page they refer to.
>> @@ -492,6 +495,9 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
>>
>>  #define pud_page(pud)                pfn_to_page(__phys_to_pfn(pud_val(pud) & PHYS_MASK))
>>
>> +/* use ONLY for statically allocated translation tables */
>> +#define pmd_offset_kimg(dir,addr)    ((pmd_t *)__phys_to_kimg(pmd_offset_phys((dir), (addr))))
>> +
>>  #else
>>
>>  #define pud_page_paddr(pud)  ({ BUILD_BUG(); 0; })
>> @@ -502,6 +508,8 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
>>  #define pmd_set_fixmap_offset(pudp, addr)    ((pmd_t *)pudp)
>>  #define pmd_clear_fixmap()
>>
>> +#define pmd_offset_kimg(dir,addr)    ((pmd_t *)dir)
>> +
>>  #endif       /* CONFIG_PGTABLE_LEVELS > 2 */
>>
>>  #if CONFIG_PGTABLE_LEVELS > 3
>> @@ -540,6 +548,9 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
>>
>>  #define pgd_page(pgd)                pfn_to_page(__phys_to_pfn(pgd_val(pgd) & PHYS_MASK))
>>
>> +/* use ONLY for statically allocated translation tables */
>> +#define pud_offset_kimg(dir,addr)    ((pud_t *)__phys_to_kimg(pud_offset_phys((dir), (addr))))
>> +
>>  #else
>>
>>  #define pgd_page_paddr(pgd)  ({ BUILD_BUG(); 0;})
>> @@ -550,6 +561,8 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
>>  #define pud_set_fixmap_offset(pgdp, addr)    ((pud_t *)pgdp)
>>  #define pud_clear_fixmap()
>>
>> +#define pud_offset_kimg(dir,addr)    ((pud_t *)dir)
>> +
>>  #endif  /* CONFIG_PGTABLE_LEVELS > 3 */
>>
>>  #define pgd_ERROR(pgd)               __pgd_error(__FILE__, __LINE__, pgd_val(pgd))
>> --
>> 2.5.0
>>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 06/21] arm64: pgtable: implement static [pte|pmd|pud]_offset variants
@ 2016-01-11 17:28       ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-11 17:28 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 11 January 2016 at 17:24, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Jan 11, 2016 at 02:18:59PM +0100, Ard Biesheuvel wrote:
>> The page table accessors pte_offset(), pud_offset() and pmd_offset()
>> rely on __va translations, so they can only be used after the linear
>> mapping has been installed. For the early fixmap and kasan init routines,
>> whose page tables are allocated statically in the kernel image, these
>> functions will return bogus values. So implement pmd_offset_kimg() and
>> pud_offset_kimg(), which can be used instead before any page tables have
>> been allocated dynamically.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>
> This looks good to me. One possible suggsetion below, but either way:
>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
>
>> ---
>>  arch/arm64/include/asm/pgtable.h | 13 +++++++++++++
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index 6129f6755081..7b4e16068c9f 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -449,6 +449,9 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
>>
>>  #define pmd_page(pmd)                pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
>>
>> +/* use ONLY for statically allocated translation tables */
>> +#define pte_offset_kimg(dir,addr)    ((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr))))
>> +
>
> Given that we're probably only going to use this during one-off setup,
> maybe it's worth something like:
>
> #define IN_KERNEL_IMAGE(p) ({                                           \
>         unsigned long __p = (unsigned long)p;                           \
>         KIMAGE_VADDR <= __p && __p < _end;                              \
> })
>
> #define pte_offset_kimg(dir,addr) ({                                    \
>         BUG_ON(!IN_KERNEL_IMAGE(dir));                                  \
>         ((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr))));      \
> })
>
> That might be overkill, though, given all it does is turn one runtime
> failure into another runtime failure.
>

Yes. I did consider implementing them out of line, with __init
annotations so you at least get complaints if you refer to them from
non-init code, but I don't see how we would ever need these anywhere
beyond fixmap and kasan anyway

>
>>  /*
>>   * Conversion functions: convert a page and protection to a page entry,
>>   * and a page entry and page directory to the page they refer to.
>> @@ -492,6 +495,9 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
>>
>>  #define pud_page(pud)                pfn_to_page(__phys_to_pfn(pud_val(pud) & PHYS_MASK))
>>
>> +/* use ONLY for statically allocated translation tables */
>> +#define pmd_offset_kimg(dir,addr)    ((pmd_t *)__phys_to_kimg(pmd_offset_phys((dir), (addr))))
>> +
>>  #else
>>
>>  #define pud_page_paddr(pud)  ({ BUILD_BUG(); 0; })
>> @@ -502,6 +508,8 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
>>  #define pmd_set_fixmap_offset(pudp, addr)    ((pmd_t *)pudp)
>>  #define pmd_clear_fixmap()
>>
>> +#define pmd_offset_kimg(dir,addr)    ((pmd_t *)dir)
>> +
>>  #endif       /* CONFIG_PGTABLE_LEVELS > 2 */
>>
>>  #if CONFIG_PGTABLE_LEVELS > 3
>> @@ -540,6 +548,9 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
>>
>>  #define pgd_page(pgd)                pfn_to_page(__phys_to_pfn(pgd_val(pgd) & PHYS_MASK))
>>
>> +/* use ONLY for statically allocated translation tables */
>> +#define pud_offset_kimg(dir,addr)    ((pud_t *)__phys_to_kimg(pud_offset_phys((dir), (addr))))
>> +
>>  #else
>>
>>  #define pgd_page_paddr(pgd)  ({ BUILD_BUG(); 0;})
>> @@ -550,6 +561,8 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
>>  #define pud_set_fixmap_offset(pgdp, addr)    ((pud_t *)pgdp)
>>  #define pud_clear_fixmap()
>>
>> +#define pud_offset_kimg(dir,addr)    ((pud_t *)dir)
>> +
>>  #endif  /* CONFIG_PGTABLE_LEVELS > 3 */
>>
>>  #define pgd_ERROR(pgd)               __pgd_error(__FILE__, __LINE__, pgd_val(pgd))
>> --
>> 2.5.0
>>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 06/21] arm64: pgtable: implement static [pte|pmd|pud]_offset variants
  2016-01-11 17:28       ` Ard Biesheuvel
  (?)
@ 2016-01-11 17:31         ` Mark Rutland
  -1 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-11 17:31 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On Mon, Jan 11, 2016 at 06:28:51PM +0100, Ard Biesheuvel wrote:
> On 11 January 2016 at 17:24, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Mon, Jan 11, 2016 at 02:18:59PM +0100, Ard Biesheuvel wrote:
> >> The page table accessors pte_offset(), pud_offset() and pmd_offset()
> >> rely on __va translations, so they can only be used after the linear
> >> mapping has been installed. For the early fixmap and kasan init routines,
> >> whose page tables are allocated statically in the kernel image, these
> >> functions will return bogus values. So implement pmd_offset_kimg() and
> >> pud_offset_kimg(), which can be used instead before any page tables have
> >> been allocated dynamically.
> >>
> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >
> > This looks good to me. One possible suggsetion below, but either way:
> >
> > Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> >
> >> ---
> >>  arch/arm64/include/asm/pgtable.h | 13 +++++++++++++
> >>  1 file changed, 13 insertions(+)
> >>
> >> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> >> index 6129f6755081..7b4e16068c9f 100644
> >> --- a/arch/arm64/include/asm/pgtable.h
> >> +++ b/arch/arm64/include/asm/pgtable.h
> >> @@ -449,6 +449,9 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
> >>
> >>  #define pmd_page(pmd)                pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
> >>
> >> +/* use ONLY for statically allocated translation tables */
> >> +#define pte_offset_kimg(dir,addr)    ((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr))))
> >> +
> >
> > Given that we're probably only going to use this during one-off setup,
> > maybe it's worth something like:
> >
> > #define IN_KERNEL_IMAGE(p) ({                                           \
> >         unsigned long __p = (unsigned long)p;                           \
> >         KIMAGE_VADDR <= __p && __p < _end;                              \
> > })
> >
> > #define pte_offset_kimg(dir,addr) ({                                    \
> >         BUG_ON(!IN_KERNEL_IMAGE(dir));                                  \
> >         ((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr))));      \
> > })
> >
> > That might be overkill, though, given all it does is turn one runtime
> > failure into another runtime failure.
> >
> 
> Yes. I did consider implementing them out of line, with __init
> annotations so you at least get complaints if you refer to them from
> non-init code, but I don't see how we would ever need these anywhere
> beyond fixmap and kasan anyway

Ok. Let's forget about that for now then. :)

Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 06/21] arm64: pgtable: implement static [pte|pmd|pud]_offset variants
@ 2016-01-11 17:31         ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-11 17:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 11, 2016 at 06:28:51PM +0100, Ard Biesheuvel wrote:
> On 11 January 2016 at 17:24, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Mon, Jan 11, 2016 at 02:18:59PM +0100, Ard Biesheuvel wrote:
> >> The page table accessors pte_offset(), pud_offset() and pmd_offset()
> >> rely on __va translations, so they can only be used after the linear
> >> mapping has been installed. For the early fixmap and kasan init routines,
> >> whose page tables are allocated statically in the kernel image, these
> >> functions will return bogus values. So implement pmd_offset_kimg() and
> >> pud_offset_kimg(), which can be used instead before any page tables have
> >> been allocated dynamically.
> >>
> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >
> > This looks good to me. One possible suggsetion below, but either way:
> >
> > Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> >
> >> ---
> >>  arch/arm64/include/asm/pgtable.h | 13 +++++++++++++
> >>  1 file changed, 13 insertions(+)
> >>
> >> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> >> index 6129f6755081..7b4e16068c9f 100644
> >> --- a/arch/arm64/include/asm/pgtable.h
> >> +++ b/arch/arm64/include/asm/pgtable.h
> >> @@ -449,6 +449,9 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
> >>
> >>  #define pmd_page(pmd)                pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
> >>
> >> +/* use ONLY for statically allocated translation tables */
> >> +#define pte_offset_kimg(dir,addr)    ((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr))))
> >> +
> >
> > Given that we're probably only going to use this during one-off setup,
> > maybe it's worth something like:
> >
> > #define IN_KERNEL_IMAGE(p) ({                                           \
> >         unsigned long __p = (unsigned long)p;                           \
> >         KIMAGE_VADDR <= __p && __p < _end;                              \
> > })
> >
> > #define pte_offset_kimg(dir,addr) ({                                    \
> >         BUG_ON(!IN_KERNEL_IMAGE(dir));                                  \
> >         ((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr))));      \
> > })
> >
> > That might be overkill, though, given all it does is turn one runtime
> > failure into another runtime failure.
> >
> 
> Yes. I did consider implementing them out of line, with __init
> annotations so you at least get complaints if you refer to them from
> non-init code, but I don't see how we would ever need these anywhere
> beyond fixmap and kasan anyway

Ok. Let's forget about that for now then. :)

Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 06/21] arm64: pgtable: implement static [pte|pmd|pud]_offset variants
@ 2016-01-11 17:31         ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-11 17:31 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On Mon, Jan 11, 2016 at 06:28:51PM +0100, Ard Biesheuvel wrote:
> On 11 January 2016 at 17:24, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Mon, Jan 11, 2016 at 02:18:59PM +0100, Ard Biesheuvel wrote:
> >> The page table accessors pte_offset(), pud_offset() and pmd_offset()
> >> rely on __va translations, so they can only be used after the linear
> >> mapping has been installed. For the early fixmap and kasan init routines,
> >> whose page tables are allocated statically in the kernel image, these
> >> functions will return bogus values. So implement pmd_offset_kimg() and
> >> pud_offset_kimg(), which can be used instead before any page tables have
> >> been allocated dynamically.
> >>
> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >
> > This looks good to me. One possible suggsetion below, but either way:
> >
> > Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> >
> >> ---
> >>  arch/arm64/include/asm/pgtable.h | 13 +++++++++++++
> >>  1 file changed, 13 insertions(+)
> >>
> >> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> >> index 6129f6755081..7b4e16068c9f 100644
> >> --- a/arch/arm64/include/asm/pgtable.h
> >> +++ b/arch/arm64/include/asm/pgtable.h
> >> @@ -449,6 +449,9 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
> >>
> >>  #define pmd_page(pmd)                pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
> >>
> >> +/* use ONLY for statically allocated translation tables */
> >> +#define pte_offset_kimg(dir,addr)    ((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr))))
> >> +
> >
> > Given that we're probably only going to use this during one-off setup,
> > maybe it's worth something like:
> >
> > #define IN_KERNEL_IMAGE(p) ({                                           \
> >         unsigned long __p = (unsigned long)p;                           \
> >         KIMAGE_VADDR <= __p && __p < _end;                              \
> > })
> >
> > #define pte_offset_kimg(dir,addr) ({                                    \
> >         BUG_ON(!IN_KERNEL_IMAGE(dir));                                  \
> >         ((pte_t *)__phys_to_kimg(pte_offset_phys((dir), (addr))));      \
> > })
> >
> > That might be overkill, though, given all it does is turn one runtime
> > failure into another runtime failure.
> >
> 
> Yes. I did consider implementing them out of line, with __init
> annotations so you at least get complaints if you refer to them from
> non-init code, but I don't see how we would ever need these anywhere
> beyond fixmap and kasan anyway

Ok. Let's forget about that for now then. :)

Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 03/21] arm64: pgtable: add dummy pud_index() and pmd_index() definitions
  2016-01-11 13:18   ` Ard Biesheuvel
  (?)
@ 2016-01-11 17:40     ` Mark Rutland
  -1 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-11 17:40 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	leif.lindholm, keescook, linux-kernel, stuart.yoder,
	bhupesh.sharma, arnd, marc.zyngier, christoffer.dall

On Mon, Jan 11, 2016 at 02:18:56PM +0100, Ard Biesheuvel wrote:
> Add definitions of pud_index() and pmd_index() for configurations with
> fewer than 4 resp. 3 translation levels. This makes it easier to keep
> the users (e.g., the fixmap init code) generic.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/include/asm/pgtable.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index fe9bf47db5d3..6129f6755081 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -495,6 +495,7 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
>  #else
>  
>  #define pud_page_paddr(pud)	({ BUILD_BUG(); 0; })
> +#define pmd_index(addr)		({ BUILD_BUG(); 0; })
>  
>  /* Match pmd_offset folding in <asm/generic/pgtable-nopmd.h> */
>  #define pmd_set_fixmap(addr)		NULL
> @@ -542,6 +543,7 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
>  #else
>  
>  #define pgd_page_paddr(pgd)	({ BUILD_BUG(); 0;})
> +#define pud_index(addr)		({ BUILD_BUG(); 0;})

I think we don't need these if we use p??_ofset_kimg for the fixmap
initialisation.

Regardless, these look good conceptually, so if they're useful
elsewhere:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 03/21] arm64: pgtable: add dummy pud_index() and pmd_index() definitions
@ 2016-01-11 17:40     ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-11 17:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 11, 2016 at 02:18:56PM +0100, Ard Biesheuvel wrote:
> Add definitions of pud_index() and pmd_index() for configurations with
> fewer than 4 resp. 3 translation levels. This makes it easier to keep
> the users (e.g., the fixmap init code) generic.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/include/asm/pgtable.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index fe9bf47db5d3..6129f6755081 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -495,6 +495,7 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
>  #else
>  
>  #define pud_page_paddr(pud)	({ BUILD_BUG(); 0; })
> +#define pmd_index(addr)		({ BUILD_BUG(); 0; })
>  
>  /* Match pmd_offset folding in <asm/generic/pgtable-nopmd.h> */
>  #define pmd_set_fixmap(addr)		NULL
> @@ -542,6 +543,7 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
>  #else
>  
>  #define pgd_page_paddr(pgd)	({ BUILD_BUG(); 0;})
> +#define pud_index(addr)		({ BUILD_BUG(); 0;})

I think we don't need these if we use p??_ofset_kimg for the fixmap
initialisation.

Regardless, these look good conceptually, so if they're useful
elsewhere:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 03/21] arm64: pgtable: add dummy pud_index() and pmd_index() definitions
@ 2016-01-11 17:40     ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-11 17:40 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	leif.lindholm, keescook, linux-kernel, stuart.yoder,
	bhupesh.sharma, arnd, marc.zyngier, christoffer.dall

On Mon, Jan 11, 2016 at 02:18:56PM +0100, Ard Biesheuvel wrote:
> Add definitions of pud_index() and pmd_index() for configurations with
> fewer than 4 resp. 3 translation levels. This makes it easier to keep
> the users (e.g., the fixmap init code) generic.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/include/asm/pgtable.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index fe9bf47db5d3..6129f6755081 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -495,6 +495,7 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
>  #else
>  
>  #define pud_page_paddr(pud)	({ BUILD_BUG(); 0; })
> +#define pmd_index(addr)		({ BUILD_BUG(); 0; })
>  
>  /* Match pmd_offset folding in <asm/generic/pgtable-nopmd.h> */
>  #define pmd_set_fixmap(addr)		NULL
> @@ -542,6 +543,7 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
>  #else
>  
>  #define pgd_page_paddr(pgd)	({ BUILD_BUG(); 0;})
> +#define pud_index(addr)		({ BUILD_BUG(); 0;})

I think we don't need these if we use p??_ofset_kimg for the fixmap
initialisation.

Regardless, these look good conceptually, so if they're useful
elsewhere:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 00/21] arm64: implement support for KASLR
  2016-01-11 13:18 ` Ard Biesheuvel
  (?)
@ 2016-01-11 22:07   ` Kees Cook
  -1 siblings, 0 replies; 207+ messages in thread
From: Kees Cook @ 2016-01-11 22:07 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Mark Rutland, Leif Lindholm, LKML, stuart.yoder, bhupesh.sharma,
	Arnd Bergmann, Marc Zyngier, Christoffer Dall

On Mon, Jan 11, 2016 at 5:18 AM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> This series implements KASLR for arm64, by building the kernel as a PIE
> executable that can relocate itself at runtime, and moving it to a random
> offset in the vmalloc area. v2 and up also implement physical randomization,
> i.e., it allows the kernel to deal with being loaded at any physical offset
> (modulo the required alignment), and invokes the EFI_RNG_PROTOCOL from the
> UEFI stub to obtain random bits and perform the actual randomization of the
> physical load address.

I will continue cheering! :)

> Changes since v2:
> - Incorporated feedback from Marc Zyngier into the KVM patch (#5)
> - Dropped the pgdir section and the patch that memblock_reserve()'s the kernel
>   sections at a smaller granularity. This is no longer necessary with the pgdir
>   section gone. This also fixes an issue spotted by James Morse where the fixmap
>   page tables are not zeroed correctly; these have been moved back to the .bss
>   section.
> - Got rid of all ifdef'ery regarding the number of translation levels in the
>   changed .c files, by introducing new definitions in pgtable.h (#3, #6)
> - Fixed KAsan support, which was broken by all earlier versions.
> - Moved module region along with the virtually randomized kernel, so that module
>   addresses become unpredictable as well, and we only have to rely on veneers in
>   the PLTs when the module region is exhausted (which is somewhat more likely
>   since the module region is now shared with other uses of the vmalloc area)

Just to make sure I understand: this means that the offset between
kernel and modules remains static? It may still be useful to bump
modules as well, just so that leaking a module address doesn't
compromise the base kernel image address too. Don't block the series
for this, though. It's a minor nit. :)

-Kees

> - Added support for the 'nokaslr' command line option. This affects the
>   randomization performed by the stub, and results in a warning if passed while
>   the bootloader also presented a random seed for virtual KASLR in register x1.
> - The .text/.rodata sections of the kernel are no longer aliased in the linear
>   region with a writable mapping.
> - Added a separate image header flag for kernel images that may be loaded at any
>   2 MB aligned offset (+ TEXT_OFFSET)
> - The KASLR displacement is now corrected if it results in the kernel image
>   intersecting a PUD/PMD boundary (4k and 16k/64k granule kernels, respectively)
> - Split out UEFI stub random routines into separate patches.
> - Implemented a weight based EFI random allocation routine so that each suitable
>   offset in available memory is equally likely to be selected (as suggested by
>   Kees Cook)
> - Reused CONFIG_RELOCATABLE and CONFIG_RANDOMIZE_BASE instead of introducing
>   new Kconfig symbols to describe the same functionality.
> - Reimplemented mem= logic so memory is clipped from the top first.
>
> Changes since v1/RFC:
> - This series now implements fully independent virtual and physical address
>   randomization at load time. I have recycled some patches from this series:
>   http://thread.gmane.org/gmane.linux.ports.arm.kernel/455151, and updated the
>   final UEFI stub patch to randomize the physical address as well.
> - Added a patch to deal with the way KVM on arm64 makes assumptions about the
>   relation between kernel symbols and the linear mapping (on which the HYP
>   mapping is based), as these assumptions cease to be valid once we move the
>   kernel Image out of the linear mapping.
> - Updated the module PLT patch so it works on BE kernels as well.
> - Moved the constant Image header values to head.S, and updated the linker
>   script to provide the kernel size using R_AARCH64_ABS32 relocation rather
>   than a R_AARCH64_ABS64 relocation, since those are always resolved at build
>   time. This allows me to get rid of the post-build perl script to swab header
>   values on BE kernels.
> - Minor style tweaks.
>
> Notes:
> - These patches apply on top of Mark Rutland's pagetable rework series:
>   http://thread.gmane.org/gmane.linux.ports.arm.kernel/462438
> - The arm64 Image is uncompressed by default, and the Elf64_Rela format uses
>   24 bytes per relocation entry. This results in considerable bloat (i.e., a
>   couple of MBs worth of relocation data in an .init section). However, no
>   build time postprocessing is required, we rely fully on the toolchain to
>   produce the image
> - We have to rely on the bootloader to supply some randomness in register x1
>   upon kernel entry. Since we have no decompressor, it is simply not feasible
>   to collect randomness in the head.S code path before mapping the kernel and
>   enabling the MMU.
> - The EFI_RNG_PROTOCOL that is invoked in patch #13 to supply randomness on
>   UEFI systems is not universally available. A QEMU/KVM firmware image that
>   implements a pseudo-random version is available here:
>   http://people.linaro.org/~ard.biesheuvel/QEMU_EFI.fd.aarch64-rng.bz2
>   (requires access to PMCCNTR_EL0 and support for AES instructions)
>   See below for instructions how to run the pseudo-random version on real
>   hardware.
> - Only mildly tested. Help appreciated.
>
> Code can be found here:
> git://git.linaro.org/people/ard.biesheuvel/linux-arm.git arm64-kaslr-v3
> https://git.linaro.org/people/ard.biesheuvel/linux-arm.git/shortlog/refs/heads/arm64-kaslr-v3
>
> Patch #1 updates the OF code to allow the minimum memblock physical address to
> be overridden by the arch.
>
> Patch #2 introduces KIMAGE_VADDR as the base of the kernel virtual region.
>
> Patch #3 introduces dummy pud_index() and pmd_index() macros that are intended
> to be optimized away if the configured number of translation levels does not
> actually use them.
>
> Patch #4 rewrites early_fixmap_init() so it does not rely on the linear mapping
> (i.e., the use of phys_to_virt() is avoided)
>
> Patch #5 updates KVM on arm64 so it can deal with kernel symbols whose addresses
> are not covered by the linear mapping.
>
> Patch #6 introduces pte_offset_kimg(), pmd_offset_kimg() and pud_offset_kimg()
> that allow statically allocated page tables (i.e., by fixmap and kasan) to be
> traversed before the linear mapping is installed.
>
> Patch #7 moves the kernel virtual mapping to the vmalloc area, along with the
> module region which is kept right below it, as before.
>
> Patch #8 adds support for PLTs in modules so that relative branches can be
> resolved via a PLT if the target is out of range. This is required for KASLR,
> since modules may be loaded far away from the core kernel.
>
> Patch #9 and #10 move arm64 to the a new generic relative version of the extable
> implementation so that it no longer contains absolute addresses that require
> fixing up at relocation time, but uses relative offsets instead.
>
> Patch #11 reverts some changes to the Image header population code so we no
> longer depend on the linker to populate the header fields. This is necessary
> since the R_AARCH64_ABS64 relocations that are emitted for these fields are not
> resolved at build time for PIE executables.
>
> Patch #12 updates the code in head.S that needs to execute before relocation to
> avoid the use of values that are subject to dynamic relocation. These values
> will not be populated in PIE executables.
>
> Patch #13 allows the kernel Image to be loaded anywhere in physical memory, by
> decoupling PHYS_OFFSET from the base of the kernel image.
>
> Patch #14 redefines SWAPPER_TABLE_SHIFT in a way that allows it to be used from
> assembler code regardless of the number of configured translation levels.
>
> Patch #15 (from Mark Rutland) moves the ELF relocation type #defines to a
> separate file so we can use it from head.S later
>
> Patch #16 updates scripts/sortextable.c so it accepts ET_DYN (relocatable)
> executables as well as ET_EXEC (static) executables.
>
> Patch #17 implements the core KASLR, by taking randomness supplied in register
> x1 and using it to move the kernel inside the vmalloc area.
>
> Patch #18 implements efi_get_random_bytes() based on the EFI_RNG_PROTOCOL
>
> Patch #19 implements efi_random_alloc()
>
> Patch #20 moves the allocation for the converted command line (UTF-16 to ASCII)
> away from the base of memory. This is necessary since for parsing
>
> Patch #21 implements the actual KASLR, by randomizing the kernel physical
> address, and passing entropy in x1 so that the kernel proper can relocate itself
> virtually.
>
> Ard Biesheuvel (20):
>   of/fdt: make memblock minimum physical address arch configurable
>   arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
>   arm64: pgtable: add dummy pud_index() and pmd_index() definitions
>   arm64: decouple early fixmap init from linear mapping
>   arm64: kvm: deal with kernel symbols outside of linear mapping
>   arm64: pgtable: implement static [pte|pmd|pud]_offset variants
>   arm64: move kernel image to base of vmalloc area
>   arm64: add support for module PLTs
>   extable: add support for relative extables to search and sort routines
>   arm64: switch to relative exception tables
>   arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
>   arm64: avoid dynamic relocations in early boot code
>   arm64: allow kernel Image to be loaded anywhere in physical memory
>   arm64: redefine SWAPPER_TABLE_SHIFT for use in asm code
>   scripts/sortextable: add support for ET_DYN binaries
>   arm64: add support for a relocatable kernel and KASLR
>   efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
>   efi: stub: add implementation of efi_random_alloc()
>   efi: stub: use high allocation for converted command line
>   arm64: efi: invoke EFI_RNG_PROTOCOL to supply KASLR randomness
>
> Mark Rutland (1):
>   arm64: split elf relocs into a separate header.
>
>  Documentation/arm64/booting.txt                |  34 ++++-
>  arch/arm/include/asm/kvm_asm.h                 |   2 +
>  arch/arm/include/asm/kvm_mmu.h                 |   2 +
>  arch/arm/kvm/arm.c                             |   5 +-
>  arch/arm/kvm/mmu.c                             |   8 +-
>  arch/arm64/Kconfig                             |  40 +++++
>  arch/arm64/Makefile                            |  10 +-
>  arch/arm64/include/asm/assembler.h             |  30 +++-
>  arch/arm64/include/asm/boot.h                  |   6 +
>  arch/arm64/include/asm/elf.h                   |  54 +------
>  arch/arm64/include/asm/elf_relocs.h            |  75 ++++++++++
>  arch/arm64/include/asm/futex.h                 |  12 +-
>  arch/arm64/include/asm/kasan.h                 |  20 +--
>  arch/arm64/include/asm/kernel-pgtable.h        |  20 ++-
>  arch/arm64/include/asm/kvm_asm.h               |  19 ++-
>  arch/arm64/include/asm/kvm_host.h              |   8 +-
>  arch/arm64/include/asm/kvm_mmu.h               |   2 +
>  arch/arm64/include/asm/memory.h                |  38 +++--
>  arch/arm64/include/asm/module.h                |  11 ++
>  arch/arm64/include/asm/pgtable.h               |  22 ++-
>  arch/arm64/include/asm/uaccess.h               |  30 ++--
>  arch/arm64/include/asm/virt.h                  |   4 -
>  arch/arm64/include/asm/word-at-a-time.h        |   7 +-
>  arch/arm64/kernel/Makefile                     |   1 +
>  arch/arm64/kernel/armv8_deprecated.c           |   7 +-
>  arch/arm64/kernel/efi-entry.S                  |   9 +-
>  arch/arm64/kernel/head.S                       | 155 +++++++++++++++++---
>  arch/arm64/kernel/image.h                      |  37 ++---
>  arch/arm64/kernel/module-plts.c                | 137 +++++++++++++++++
>  arch/arm64/kernel/module.c                     |  15 +-
>  arch/arm64/kernel/module.lds                   |   4 +
>  arch/arm64/kernel/setup.c                      |  44 +++++-
>  arch/arm64/kernel/vmlinux.lds.S                |  13 +-
>  arch/arm64/kvm/debug.c                         |   1 +
>  arch/arm64/kvm/hyp.S                           |   6 +-
>  arch/arm64/mm/dump.c                           |  12 +-
>  arch/arm64/mm/extable.c                        |   2 +-
>  arch/arm64/mm/init.c                           |  91 ++++++++++--
>  arch/arm64/mm/kasan_init.c                     |  21 ++-
>  arch/arm64/mm/mmu.c                            |  95 +++++++-----
>  arch/x86/include/asm/efi.h                     |   2 +
>  drivers/firmware/efi/libstub/Makefile          |   2 +-
>  drivers/firmware/efi/libstub/arm-stub.c        |  17 ++-
>  drivers/firmware/efi/libstub/arm64-stub.c      |  67 +++++++--
>  drivers/firmware/efi/libstub/efi-stub-helper.c |  24 ++-
>  drivers/firmware/efi/libstub/efistub.h         |   9 ++
>  drivers/firmware/efi/libstub/random.c          | 120 +++++++++++++++
>  drivers/of/fdt.c                               |   5 +-
>  include/linux/efi.h                            |   5 +-
>  lib/extable.c                                  |  50 +++++--
>  scripts/sortextable.c                          |  10 +-
>  51 files changed, 1111 insertions(+), 309 deletions(-)
>  create mode 100644 arch/arm64/include/asm/elf_relocs.h
>  create mode 100644 arch/arm64/kernel/module-plts.c
>  create mode 100644 arch/arm64/kernel/module.lds
>  create mode 100644 drivers/firmware/efi/libstub/random.c
>
> EFI_RNG_PROTOCOL on real hardware
> =================================
>
> To test whether your UEFI implements the EFI_RNG_PROTOCOL, download the
> following executable and run it from the UEFI Shell:
> http://people.linaro.org/~ard.biesheuvel/RngTest.efi
>
> FS0:\> rngtest
> UEFI RNG Protocol Testing :
> ----------------------------
>  -- Locate UEFI RNG Protocol : [Fail - Status = Not Found]
>
> If your UEFI does not implement the EFI_RNG_PROTOCOL, you can download and
> install the pseudo-random version that uses the generic timer and PMCCNTR_EL0
> values and permutes them using a couple of rounds of AES.
> http://people.linaro.org/~ard.biesheuvel/RngDxe.efi
>
> NOTE: not for production!! This is a quick and dirty hack to test the KASLR
> code, and is not suitable for anything else.
>
> FS0:\> rngdxe
> FS0:\> rngtest
> UEFI RNG Protocol Testing :
> ----------------------------
>  -- Locate UEFI RNG Protocol : [Pass]
>  -- Call RNG->GetInfo() interface :
>      >> Supported RNG Algorithm (Count = 2) :
>           0) 44F0DE6E-4D8C-4045-A8C7-4DD168856B9E
>           1) E43176D7-B6E8-4827-B784-7FFDC4B68561
>  -- Call RNG->GetRNG() interface :
>      >> RNG with default algorithm : [Pass]
>      >> RNG with SP800-90-HMAC-256 : [Fail - Status = Unsupported]
>      >> RNG with SP800-90-Hash-256 : [Fail - Status = Unsupported]
>      >> RNG with SP800-90-CTR-256 : [Pass]
>      >> RNG with X9.31-3DES : [Fail - Status = Unsupported]
>      >> RNG with X9.31-AES : [Fail - Status = Unsupported]
>      >> RNG with RAW Entropy : [Pass]
>  -- Random Number Generation Test with default RNG Algorithm (20 Rounds):
>           01) - 27
>           02) - 61E8
>           03) - 496FD8
>           04) - DDD793BF
>           05) - B6C37C8E23
>           06) - 4D183C604A96
>           07) - 9363311DB61298
>           08) - 5715A7294F4E436E
>           09) - F0D4D7BAA0DD52318E
>           10) - C88C6EBCF4C0474D87C3
>           11) - B5594602B482A643932172
>           12) - CA7573F704B2089B726B9CF1
>           13) - A93E9451CB533DCFBA87B97C33
>           14) - 45AA7B83DB6044F7BBAB031F0D24
>           15) - 3DD7A4D61F34ADCB400B5976730DCF
>           16) - 4DD168D21FAB8F59708330D6A9BEB021
>           17) - 4BBB225E61C465F174254159467E65939F
>           18) - 030A156C9616337A20070941E702827DA8E1
>           19) - AB0FC11C9A4E225011382A9D164D9D55CA2B64
>           20) - 72B9B4735DC445E5DA6AF88DE965B7E87CB9A23C



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 00/21] arm64: implement support for KASLR
@ 2016-01-11 22:07   ` Kees Cook
  0 siblings, 0 replies; 207+ messages in thread
From: Kees Cook @ 2016-01-11 22:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 11, 2016 at 5:18 AM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> This series implements KASLR for arm64, by building the kernel as a PIE
> executable that can relocate itself at runtime, and moving it to a random
> offset in the vmalloc area. v2 and up also implement physical randomization,
> i.e., it allows the kernel to deal with being loaded at any physical offset
> (modulo the required alignment), and invokes the EFI_RNG_PROTOCOL from the
> UEFI stub to obtain random bits and perform the actual randomization of the
> physical load address.

I will continue cheering! :)

> Changes since v2:
> - Incorporated feedback from Marc Zyngier into the KVM patch (#5)
> - Dropped the pgdir section and the patch that memblock_reserve()'s the kernel
>   sections at a smaller granularity. This is no longer necessary with the pgdir
>   section gone. This also fixes an issue spotted by James Morse where the fixmap
>   page tables are not zeroed correctly; these have been moved back to the .bss
>   section.
> - Got rid of all ifdef'ery regarding the number of translation levels in the
>   changed .c files, by introducing new definitions in pgtable.h (#3, #6)
> - Fixed KAsan support, which was broken by all earlier versions.
> - Moved module region along with the virtually randomized kernel, so that module
>   addresses become unpredictable as well, and we only have to rely on veneers in
>   the PLTs when the module region is exhausted (which is somewhat more likely
>   since the module region is now shared with other uses of the vmalloc area)

Just to make sure I understand: this means that the offset between
kernel and modules remains static? It may still be useful to bump
modules as well, just so that leaking a module address doesn't
compromise the base kernel image address too. Don't block the series
for this, though. It's a minor nit. :)

-Kees

> - Added support for the 'nokaslr' command line option. This affects the
>   randomization performed by the stub, and results in a warning if passed while
>   the bootloader also presented a random seed for virtual KASLR in register x1.
> - The .text/.rodata sections of the kernel are no longer aliased in the linear
>   region with a writable mapping.
> - Added a separate image header flag for kernel images that may be loaded at any
>   2 MB aligned offset (+ TEXT_OFFSET)
> - The KASLR displacement is now corrected if it results in the kernel image
>   intersecting a PUD/PMD boundary (4k and 16k/64k granule kernels, respectively)
> - Split out UEFI stub random routines into separate patches.
> - Implemented a weight based EFI random allocation routine so that each suitable
>   offset in available memory is equally likely to be selected (as suggested by
>   Kees Cook)
> - Reused CONFIG_RELOCATABLE and CONFIG_RANDOMIZE_BASE instead of introducing
>   new Kconfig symbols to describe the same functionality.
> - Reimplemented mem= logic so memory is clipped from the top first.
>
> Changes since v1/RFC:
> - This series now implements fully independent virtual and physical address
>   randomization at load time. I have recycled some patches from this series:
>   http://thread.gmane.org/gmane.linux.ports.arm.kernel/455151, and updated the
>   final UEFI stub patch to randomize the physical address as well.
> - Added a patch to deal with the way KVM on arm64 makes assumptions about the
>   relation between kernel symbols and the linear mapping (on which the HYP
>   mapping is based), as these assumptions cease to be valid once we move the
>   kernel Image out of the linear mapping.
> - Updated the module PLT patch so it works on BE kernels as well.
> - Moved the constant Image header values to head.S, and updated the linker
>   script to provide the kernel size using R_AARCH64_ABS32 relocation rather
>   than a R_AARCH64_ABS64 relocation, since those are always resolved at build
>   time. This allows me to get rid of the post-build perl script to swab header
>   values on BE kernels.
> - Minor style tweaks.
>
> Notes:
> - These patches apply on top of Mark Rutland's pagetable rework series:
>   http://thread.gmane.org/gmane.linux.ports.arm.kernel/462438
> - The arm64 Image is uncompressed by default, and the Elf64_Rela format uses
>   24 bytes per relocation entry. This results in considerable bloat (i.e., a
>   couple of MBs worth of relocation data in an .init section). However, no
>   build time postprocessing is required, we rely fully on the toolchain to
>   produce the image
> - We have to rely on the bootloader to supply some randomness in register x1
>   upon kernel entry. Since we have no decompressor, it is simply not feasible
>   to collect randomness in the head.S code path before mapping the kernel and
>   enabling the MMU.
> - The EFI_RNG_PROTOCOL that is invoked in patch #13 to supply randomness on
>   UEFI systems is not universally available. A QEMU/KVM firmware image that
>   implements a pseudo-random version is available here:
>   http://people.linaro.org/~ard.biesheuvel/QEMU_EFI.fd.aarch64-rng.bz2
>   (requires access to PMCCNTR_EL0 and support for AES instructions)
>   See below for instructions how to run the pseudo-random version on real
>   hardware.
> - Only mildly tested. Help appreciated.
>
> Code can be found here:
> git://git.linaro.org/people/ard.biesheuvel/linux-arm.git arm64-kaslr-v3
> https://git.linaro.org/people/ard.biesheuvel/linux-arm.git/shortlog/refs/heads/arm64-kaslr-v3
>
> Patch #1 updates the OF code to allow the minimum memblock physical address to
> be overridden by the arch.
>
> Patch #2 introduces KIMAGE_VADDR as the base of the kernel virtual region.
>
> Patch #3 introduces dummy pud_index() and pmd_index() macros that are intended
> to be optimized away if the configured number of translation levels does not
> actually use them.
>
> Patch #4 rewrites early_fixmap_init() so it does not rely on the linear mapping
> (i.e., the use of phys_to_virt() is avoided)
>
> Patch #5 updates KVM on arm64 so it can deal with kernel symbols whose addresses
> are not covered by the linear mapping.
>
> Patch #6 introduces pte_offset_kimg(), pmd_offset_kimg() and pud_offset_kimg()
> that allow statically allocated page tables (i.e., by fixmap and kasan) to be
> traversed before the linear mapping is installed.
>
> Patch #7 moves the kernel virtual mapping to the vmalloc area, along with the
> module region which is kept right below it, as before.
>
> Patch #8 adds support for PLTs in modules so that relative branches can be
> resolved via a PLT if the target is out of range. This is required for KASLR,
> since modules may be loaded far away from the core kernel.
>
> Patch #9 and #10 move arm64 to the a new generic relative version of the extable
> implementation so that it no longer contains absolute addresses that require
> fixing up at relocation time, but uses relative offsets instead.
>
> Patch #11 reverts some changes to the Image header population code so we no
> longer depend on the linker to populate the header fields. This is necessary
> since the R_AARCH64_ABS64 relocations that are emitted for these fields are not
> resolved at build time for PIE executables.
>
> Patch #12 updates the code in head.S that needs to execute before relocation to
> avoid the use of values that are subject to dynamic relocation. These values
> will not be populated in PIE executables.
>
> Patch #13 allows the kernel Image to be loaded anywhere in physical memory, by
> decoupling PHYS_OFFSET from the base of the kernel image.
>
> Patch #14 redefines SWAPPER_TABLE_SHIFT in a way that allows it to be used from
> assembler code regardless of the number of configured translation levels.
>
> Patch #15 (from Mark Rutland) moves the ELF relocation type #defines to a
> separate file so we can use it from head.S later
>
> Patch #16 updates scripts/sortextable.c so it accepts ET_DYN (relocatable)
> executables as well as ET_EXEC (static) executables.
>
> Patch #17 implements the core KASLR, by taking randomness supplied in register
> x1 and using it to move the kernel inside the vmalloc area.
>
> Patch #18 implements efi_get_random_bytes() based on the EFI_RNG_PROTOCOL
>
> Patch #19 implements efi_random_alloc()
>
> Patch #20 moves the allocation for the converted command line (UTF-16 to ASCII)
> away from the base of memory. This is necessary since for parsing
>
> Patch #21 implements the actual KASLR, by randomizing the kernel physical
> address, and passing entropy in x1 so that the kernel proper can relocate itself
> virtually.
>
> Ard Biesheuvel (20):
>   of/fdt: make memblock minimum physical address arch configurable
>   arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
>   arm64: pgtable: add dummy pud_index() and pmd_index() definitions
>   arm64: decouple early fixmap init from linear mapping
>   arm64: kvm: deal with kernel symbols outside of linear mapping
>   arm64: pgtable: implement static [pte|pmd|pud]_offset variants
>   arm64: move kernel image to base of vmalloc area
>   arm64: add support for module PLTs
>   extable: add support for relative extables to search and sort routines
>   arm64: switch to relative exception tables
>   arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
>   arm64: avoid dynamic relocations in early boot code
>   arm64: allow kernel Image to be loaded anywhere in physical memory
>   arm64: redefine SWAPPER_TABLE_SHIFT for use in asm code
>   scripts/sortextable: add support for ET_DYN binaries
>   arm64: add support for a relocatable kernel and KASLR
>   efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
>   efi: stub: add implementation of efi_random_alloc()
>   efi: stub: use high allocation for converted command line
>   arm64: efi: invoke EFI_RNG_PROTOCOL to supply KASLR randomness
>
> Mark Rutland (1):
>   arm64: split elf relocs into a separate header.
>
>  Documentation/arm64/booting.txt                |  34 ++++-
>  arch/arm/include/asm/kvm_asm.h                 |   2 +
>  arch/arm/include/asm/kvm_mmu.h                 |   2 +
>  arch/arm/kvm/arm.c                             |   5 +-
>  arch/arm/kvm/mmu.c                             |   8 +-
>  arch/arm64/Kconfig                             |  40 +++++
>  arch/arm64/Makefile                            |  10 +-
>  arch/arm64/include/asm/assembler.h             |  30 +++-
>  arch/arm64/include/asm/boot.h                  |   6 +
>  arch/arm64/include/asm/elf.h                   |  54 +------
>  arch/arm64/include/asm/elf_relocs.h            |  75 ++++++++++
>  arch/arm64/include/asm/futex.h                 |  12 +-
>  arch/arm64/include/asm/kasan.h                 |  20 +--
>  arch/arm64/include/asm/kernel-pgtable.h        |  20 ++-
>  arch/arm64/include/asm/kvm_asm.h               |  19 ++-
>  arch/arm64/include/asm/kvm_host.h              |   8 +-
>  arch/arm64/include/asm/kvm_mmu.h               |   2 +
>  arch/arm64/include/asm/memory.h                |  38 +++--
>  arch/arm64/include/asm/module.h                |  11 ++
>  arch/arm64/include/asm/pgtable.h               |  22 ++-
>  arch/arm64/include/asm/uaccess.h               |  30 ++--
>  arch/arm64/include/asm/virt.h                  |   4 -
>  arch/arm64/include/asm/word-at-a-time.h        |   7 +-
>  arch/arm64/kernel/Makefile                     |   1 +
>  arch/arm64/kernel/armv8_deprecated.c           |   7 +-
>  arch/arm64/kernel/efi-entry.S                  |   9 +-
>  arch/arm64/kernel/head.S                       | 155 +++++++++++++++++---
>  arch/arm64/kernel/image.h                      |  37 ++---
>  arch/arm64/kernel/module-plts.c                | 137 +++++++++++++++++
>  arch/arm64/kernel/module.c                     |  15 +-
>  arch/arm64/kernel/module.lds                   |   4 +
>  arch/arm64/kernel/setup.c                      |  44 +++++-
>  arch/arm64/kernel/vmlinux.lds.S                |  13 +-
>  arch/arm64/kvm/debug.c                         |   1 +
>  arch/arm64/kvm/hyp.S                           |   6 +-
>  arch/arm64/mm/dump.c                           |  12 +-
>  arch/arm64/mm/extable.c                        |   2 +-
>  arch/arm64/mm/init.c                           |  91 ++++++++++--
>  arch/arm64/mm/kasan_init.c                     |  21 ++-
>  arch/arm64/mm/mmu.c                            |  95 +++++++-----
>  arch/x86/include/asm/efi.h                     |   2 +
>  drivers/firmware/efi/libstub/Makefile          |   2 +-
>  drivers/firmware/efi/libstub/arm-stub.c        |  17 ++-
>  drivers/firmware/efi/libstub/arm64-stub.c      |  67 +++++++--
>  drivers/firmware/efi/libstub/efi-stub-helper.c |  24 ++-
>  drivers/firmware/efi/libstub/efistub.h         |   9 ++
>  drivers/firmware/efi/libstub/random.c          | 120 +++++++++++++++
>  drivers/of/fdt.c                               |   5 +-
>  include/linux/efi.h                            |   5 +-
>  lib/extable.c                                  |  50 +++++--
>  scripts/sortextable.c                          |  10 +-
>  51 files changed, 1111 insertions(+), 309 deletions(-)
>  create mode 100644 arch/arm64/include/asm/elf_relocs.h
>  create mode 100644 arch/arm64/kernel/module-plts.c
>  create mode 100644 arch/arm64/kernel/module.lds
>  create mode 100644 drivers/firmware/efi/libstub/random.c
>
> EFI_RNG_PROTOCOL on real hardware
> =================================
>
> To test whether your UEFI implements the EFI_RNG_PROTOCOL, download the
> following executable and run it from the UEFI Shell:
> http://people.linaro.org/~ard.biesheuvel/RngTest.efi
>
> FS0:\> rngtest
> UEFI RNG Protocol Testing :
> ----------------------------
>  -- Locate UEFI RNG Protocol : [Fail - Status = Not Found]
>
> If your UEFI does not implement the EFI_RNG_PROTOCOL, you can download and
> install the pseudo-random version that uses the generic timer and PMCCNTR_EL0
> values and permutes them using a couple of rounds of AES.
> http://people.linaro.org/~ard.biesheuvel/RngDxe.efi
>
> NOTE: not for production!! This is a quick and dirty hack to test the KASLR
> code, and is not suitable for anything else.
>
> FS0:\> rngdxe
> FS0:\> rngtest
> UEFI RNG Protocol Testing :
> ----------------------------
>  -- Locate UEFI RNG Protocol : [Pass]
>  -- Call RNG->GetInfo() interface :
>      >> Supported RNG Algorithm (Count = 2) :
>           0) 44F0DE6E-4D8C-4045-A8C7-4DD168856B9E
>           1) E43176D7-B6E8-4827-B784-7FFDC4B68561
>  -- Call RNG->GetRNG() interface :
>      >> RNG with default algorithm : [Pass]
>      >> RNG with SP800-90-HMAC-256 : [Fail - Status = Unsupported]
>      >> RNG with SP800-90-Hash-256 : [Fail - Status = Unsupported]
>      >> RNG with SP800-90-CTR-256 : [Pass]
>      >> RNG with X9.31-3DES : [Fail - Status = Unsupported]
>      >> RNG with X9.31-AES : [Fail - Status = Unsupported]
>      >> RNG with RAW Entropy : [Pass]
>  -- Random Number Generation Test with default RNG Algorithm (20 Rounds):
>           01) - 27
>           02) - 61E8
>           03) - 496FD8
>           04) - DDD793BF
>           05) - B6C37C8E23
>           06) - 4D183C604A96
>           07) - 9363311DB61298
>           08) - 5715A7294F4E436E
>           09) - F0D4D7BAA0DD52318E
>           10) - C88C6EBCF4C0474D87C3
>           11) - B5594602B482A643932172
>           12) - CA7573F704B2089B726B9CF1
>           13) - A93E9451CB533DCFBA87B97C33
>           14) - 45AA7B83DB6044F7BBAB031F0D24
>           15) - 3DD7A4D61F34ADCB400B5976730DCF
>           16) - 4DD168D21FAB8F59708330D6A9BEB021
>           17) - 4BBB225E61C465F174254159467E65939F
>           18) - 030A156C9616337A20070941E702827DA8E1
>           19) - AB0FC11C9A4E225011382A9D164D9D55CA2B64
>           20) - 72B9B4735DC445E5DA6AF88DE965B7E87CB9A23C



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 00/21] arm64: implement support for KASLR
@ 2016-01-11 22:07   ` Kees Cook
  0 siblings, 0 replies; 207+ messages in thread
From: Kees Cook @ 2016-01-11 22:07 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Mark Rutland, Leif Lindholm, LKML, stuart.yoder, bhupesh.sharma,
	Arnd Bergmann, Marc Zyngier, Christoffer Dall

On Mon, Jan 11, 2016 at 5:18 AM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> This series implements KASLR for arm64, by building the kernel as a PIE
> executable that can relocate itself at runtime, and moving it to a random
> offset in the vmalloc area. v2 and up also implement physical randomization,
> i.e., it allows the kernel to deal with being loaded at any physical offset
> (modulo the required alignment), and invokes the EFI_RNG_PROTOCOL from the
> UEFI stub to obtain random bits and perform the actual randomization of the
> physical load address.

I will continue cheering! :)

> Changes since v2:
> - Incorporated feedback from Marc Zyngier into the KVM patch (#5)
> - Dropped the pgdir section and the patch that memblock_reserve()'s the kernel
>   sections at a smaller granularity. This is no longer necessary with the pgdir
>   section gone. This also fixes an issue spotted by James Morse where the fixmap
>   page tables are not zeroed correctly; these have been moved back to the .bss
>   section.
> - Got rid of all ifdef'ery regarding the number of translation levels in the
>   changed .c files, by introducing new definitions in pgtable.h (#3, #6)
> - Fixed KAsan support, which was broken by all earlier versions.
> - Moved module region along with the virtually randomized kernel, so that module
>   addresses become unpredictable as well, and we only have to rely on veneers in
>   the PLTs when the module region is exhausted (which is somewhat more likely
>   since the module region is now shared with other uses of the vmalloc area)

Just to make sure I understand: this means that the offset between
kernel and modules remains static? It may still be useful to bump
modules as well, just so that leaking a module address doesn't
compromise the base kernel image address too. Don't block the series
for this, though. It's a minor nit. :)

-Kees

> - Added support for the 'nokaslr' command line option. This affects the
>   randomization performed by the stub, and results in a warning if passed while
>   the bootloader also presented a random seed for virtual KASLR in register x1.
> - The .text/.rodata sections of the kernel are no longer aliased in the linear
>   region with a writable mapping.
> - Added a separate image header flag for kernel images that may be loaded at any
>   2 MB aligned offset (+ TEXT_OFFSET)
> - The KASLR displacement is now corrected if it results in the kernel image
>   intersecting a PUD/PMD boundary (4k and 16k/64k granule kernels, respectively)
> - Split out UEFI stub random routines into separate patches.
> - Implemented a weight based EFI random allocation routine so that each suitable
>   offset in available memory is equally likely to be selected (as suggested by
>   Kees Cook)
> - Reused CONFIG_RELOCATABLE and CONFIG_RANDOMIZE_BASE instead of introducing
>   new Kconfig symbols to describe the same functionality.
> - Reimplemented mem= logic so memory is clipped from the top first.
>
> Changes since v1/RFC:
> - This series now implements fully independent virtual and physical address
>   randomization at load time. I have recycled some patches from this series:
>   http://thread.gmane.org/gmane.linux.ports.arm.kernel/455151, and updated the
>   final UEFI stub patch to randomize the physical address as well.
> - Added a patch to deal with the way KVM on arm64 makes assumptions about the
>   relation between kernel symbols and the linear mapping (on which the HYP
>   mapping is based), as these assumptions cease to be valid once we move the
>   kernel Image out of the linear mapping.
> - Updated the module PLT patch so it works on BE kernels as well.
> - Moved the constant Image header values to head.S, and updated the linker
>   script to provide the kernel size using R_AARCH64_ABS32 relocation rather
>   than a R_AARCH64_ABS64 relocation, since those are always resolved at build
>   time. This allows me to get rid of the post-build perl script to swab header
>   values on BE kernels.
> - Minor style tweaks.
>
> Notes:
> - These patches apply on top of Mark Rutland's pagetable rework series:
>   http://thread.gmane.org/gmane.linux.ports.arm.kernel/462438
> - The arm64 Image is uncompressed by default, and the Elf64_Rela format uses
>   24 bytes per relocation entry. This results in considerable bloat (i.e., a
>   couple of MBs worth of relocation data in an .init section). However, no
>   build time postprocessing is required, we rely fully on the toolchain to
>   produce the image
> - We have to rely on the bootloader to supply some randomness in register x1
>   upon kernel entry. Since we have no decompressor, it is simply not feasible
>   to collect randomness in the head.S code path before mapping the kernel and
>   enabling the MMU.
> - The EFI_RNG_PROTOCOL that is invoked in patch #13 to supply randomness on
>   UEFI systems is not universally available. A QEMU/KVM firmware image that
>   implements a pseudo-random version is available here:
>   http://people.linaro.org/~ard.biesheuvel/QEMU_EFI.fd.aarch64-rng.bz2
>   (requires access to PMCCNTR_EL0 and support for AES instructions)
>   See below for instructions how to run the pseudo-random version on real
>   hardware.
> - Only mildly tested. Help appreciated.
>
> Code can be found here:
> git://git.linaro.org/people/ard.biesheuvel/linux-arm.git arm64-kaslr-v3
> https://git.linaro.org/people/ard.biesheuvel/linux-arm.git/shortlog/refs/heads/arm64-kaslr-v3
>
> Patch #1 updates the OF code to allow the minimum memblock physical address to
> be overridden by the arch.
>
> Patch #2 introduces KIMAGE_VADDR as the base of the kernel virtual region.
>
> Patch #3 introduces dummy pud_index() and pmd_index() macros that are intended
> to be optimized away if the configured number of translation levels does not
> actually use them.
>
> Patch #4 rewrites early_fixmap_init() so it does not rely on the linear mapping
> (i.e., the use of phys_to_virt() is avoided)
>
> Patch #5 updates KVM on arm64 so it can deal with kernel symbols whose addresses
> are not covered by the linear mapping.
>
> Patch #6 introduces pte_offset_kimg(), pmd_offset_kimg() and pud_offset_kimg()
> that allow statically allocated page tables (i.e., by fixmap and kasan) to be
> traversed before the linear mapping is installed.
>
> Patch #7 moves the kernel virtual mapping to the vmalloc area, along with the
> module region which is kept right below it, as before.
>
> Patch #8 adds support for PLTs in modules so that relative branches can be
> resolved via a PLT if the target is out of range. This is required for KASLR,
> since modules may be loaded far away from the core kernel.
>
> Patch #9 and #10 move arm64 to the a new generic relative version of the extable
> implementation so that it no longer contains absolute addresses that require
> fixing up at relocation time, but uses relative offsets instead.
>
> Patch #11 reverts some changes to the Image header population code so we no
> longer depend on the linker to populate the header fields. This is necessary
> since the R_AARCH64_ABS64 relocations that are emitted for these fields are not
> resolved at build time for PIE executables.
>
> Patch #12 updates the code in head.S that needs to execute before relocation to
> avoid the use of values that are subject to dynamic relocation. These values
> will not be populated in PIE executables.
>
> Patch #13 allows the kernel Image to be loaded anywhere in physical memory, by
> decoupling PHYS_OFFSET from the base of the kernel image.
>
> Patch #14 redefines SWAPPER_TABLE_SHIFT in a way that allows it to be used from
> assembler code regardless of the number of configured translation levels.
>
> Patch #15 (from Mark Rutland) moves the ELF relocation type #defines to a
> separate file so we can use it from head.S later
>
> Patch #16 updates scripts/sortextable.c so it accepts ET_DYN (relocatable)
> executables as well as ET_EXEC (static) executables.
>
> Patch #17 implements the core KASLR, by taking randomness supplied in register
> x1 and using it to move the kernel inside the vmalloc area.
>
> Patch #18 implements efi_get_random_bytes() based on the EFI_RNG_PROTOCOL
>
> Patch #19 implements efi_random_alloc()
>
> Patch #20 moves the allocation for the converted command line (UTF-16 to ASCII)
> away from the base of memory. This is necessary since for parsing
>
> Patch #21 implements the actual KASLR, by randomizing the kernel physical
> address, and passing entropy in x1 so that the kernel proper can relocate itself
> virtually.
>
> Ard Biesheuvel (20):
>   of/fdt: make memblock minimum physical address arch configurable
>   arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
>   arm64: pgtable: add dummy pud_index() and pmd_index() definitions
>   arm64: decouple early fixmap init from linear mapping
>   arm64: kvm: deal with kernel symbols outside of linear mapping
>   arm64: pgtable: implement static [pte|pmd|pud]_offset variants
>   arm64: move kernel image to base of vmalloc area
>   arm64: add support for module PLTs
>   extable: add support for relative extables to search and sort routines
>   arm64: switch to relative exception tables
>   arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
>   arm64: avoid dynamic relocations in early boot code
>   arm64: allow kernel Image to be loaded anywhere in physical memory
>   arm64: redefine SWAPPER_TABLE_SHIFT for use in asm code
>   scripts/sortextable: add support for ET_DYN binaries
>   arm64: add support for a relocatable kernel and KASLR
>   efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
>   efi: stub: add implementation of efi_random_alloc()
>   efi: stub: use high allocation for converted command line
>   arm64: efi: invoke EFI_RNG_PROTOCOL to supply KASLR randomness
>
> Mark Rutland (1):
>   arm64: split elf relocs into a separate header.
>
>  Documentation/arm64/booting.txt                |  34 ++++-
>  arch/arm/include/asm/kvm_asm.h                 |   2 +
>  arch/arm/include/asm/kvm_mmu.h                 |   2 +
>  arch/arm/kvm/arm.c                             |   5 +-
>  arch/arm/kvm/mmu.c                             |   8 +-
>  arch/arm64/Kconfig                             |  40 +++++
>  arch/arm64/Makefile                            |  10 +-
>  arch/arm64/include/asm/assembler.h             |  30 +++-
>  arch/arm64/include/asm/boot.h                  |   6 +
>  arch/arm64/include/asm/elf.h                   |  54 +------
>  arch/arm64/include/asm/elf_relocs.h            |  75 ++++++++++
>  arch/arm64/include/asm/futex.h                 |  12 +-
>  arch/arm64/include/asm/kasan.h                 |  20 +--
>  arch/arm64/include/asm/kernel-pgtable.h        |  20 ++-
>  arch/arm64/include/asm/kvm_asm.h               |  19 ++-
>  arch/arm64/include/asm/kvm_host.h              |   8 +-
>  arch/arm64/include/asm/kvm_mmu.h               |   2 +
>  arch/arm64/include/asm/memory.h                |  38 +++--
>  arch/arm64/include/asm/module.h                |  11 ++
>  arch/arm64/include/asm/pgtable.h               |  22 ++-
>  arch/arm64/include/asm/uaccess.h               |  30 ++--
>  arch/arm64/include/asm/virt.h                  |   4 -
>  arch/arm64/include/asm/word-at-a-time.h        |   7 +-
>  arch/arm64/kernel/Makefile                     |   1 +
>  arch/arm64/kernel/armv8_deprecated.c           |   7 +-
>  arch/arm64/kernel/efi-entry.S                  |   9 +-
>  arch/arm64/kernel/head.S                       | 155 +++++++++++++++++---
>  arch/arm64/kernel/image.h                      |  37 ++---
>  arch/arm64/kernel/module-plts.c                | 137 +++++++++++++++++
>  arch/arm64/kernel/module.c                     |  15 +-
>  arch/arm64/kernel/module.lds                   |   4 +
>  arch/arm64/kernel/setup.c                      |  44 +++++-
>  arch/arm64/kernel/vmlinux.lds.S                |  13 +-
>  arch/arm64/kvm/debug.c                         |   1 +
>  arch/arm64/kvm/hyp.S                           |   6 +-
>  arch/arm64/mm/dump.c                           |  12 +-
>  arch/arm64/mm/extable.c                        |   2 +-
>  arch/arm64/mm/init.c                           |  91 ++++++++++--
>  arch/arm64/mm/kasan_init.c                     |  21 ++-
>  arch/arm64/mm/mmu.c                            |  95 +++++++-----
>  arch/x86/include/asm/efi.h                     |   2 +
>  drivers/firmware/efi/libstub/Makefile          |   2 +-
>  drivers/firmware/efi/libstub/arm-stub.c        |  17 ++-
>  drivers/firmware/efi/libstub/arm64-stub.c      |  67 +++++++--
>  drivers/firmware/efi/libstub/efi-stub-helper.c |  24 ++-
>  drivers/firmware/efi/libstub/efistub.h         |   9 ++
>  drivers/firmware/efi/libstub/random.c          | 120 +++++++++++++++
>  drivers/of/fdt.c                               |   5 +-
>  include/linux/efi.h                            |   5 +-
>  lib/extable.c                                  |  50 +++++--
>  scripts/sortextable.c                          |  10 +-
>  51 files changed, 1111 insertions(+), 309 deletions(-)
>  create mode 100644 arch/arm64/include/asm/elf_relocs.h
>  create mode 100644 arch/arm64/kernel/module-plts.c
>  create mode 100644 arch/arm64/kernel/module.lds
>  create mode 100644 drivers/firmware/efi/libstub/random.c
>
> EFI_RNG_PROTOCOL on real hardware
> =================================
>
> To test whether your UEFI implements the EFI_RNG_PROTOCOL, download the
> following executable and run it from the UEFI Shell:
> http://people.linaro.org/~ard.biesheuvel/RngTest.efi
>
> FS0:\> rngtest
> UEFI RNG Protocol Testing :
> ----------------------------
>  -- Locate UEFI RNG Protocol : [Fail - Status = Not Found]
>
> If your UEFI does not implement the EFI_RNG_PROTOCOL, you can download and
> install the pseudo-random version that uses the generic timer and PMCCNTR_EL0
> values and permutes them using a couple of rounds of AES.
> http://people.linaro.org/~ard.biesheuvel/RngDxe.efi
>
> NOTE: not for production!! This is a quick and dirty hack to test the KASLR
> code, and is not suitable for anything else.
>
> FS0:\> rngdxe
> FS0:\> rngtest
> UEFI RNG Protocol Testing :
> ----------------------------
>  -- Locate UEFI RNG Protocol : [Pass]
>  -- Call RNG->GetInfo() interface :
>      >> Supported RNG Algorithm (Count = 2) :
>           0) 44F0DE6E-4D8C-4045-A8C7-4DD168856B9E
>           1) E43176D7-B6E8-4827-B784-7FFDC4B68561
>  -- Call RNG->GetRNG() interface :
>      >> RNG with default algorithm : [Pass]
>      >> RNG with SP800-90-HMAC-256 : [Fail - Status = Unsupported]
>      >> RNG with SP800-90-Hash-256 : [Fail - Status = Unsupported]
>      >> RNG with SP800-90-CTR-256 : [Pass]
>      >> RNG with X9.31-3DES : [Fail - Status = Unsupported]
>      >> RNG with X9.31-AES : [Fail - Status = Unsupported]
>      >> RNG with RAW Entropy : [Pass]
>  -- Random Number Generation Test with default RNG Algorithm (20 Rounds):
>           01) - 27
>           02) - 61E8
>           03) - 496FD8
>           04) - DDD793BF
>           05) - B6C37C8E23
>           06) - 4D183C604A96
>           07) - 9363311DB61298
>           08) - 5715A7294F4E436E
>           09) - F0D4D7BAA0DD52318E
>           10) - C88C6EBCF4C0474D87C3
>           11) - B5594602B482A643932172
>           12) - CA7573F704B2089B726B9CF1
>           13) - A93E9451CB533DCFBA87B97C33
>           14) - 45AA7B83DB6044F7BBAB031F0D24
>           15) - 3DD7A4D61F34ADCB400B5976730DCF
>           16) - 4DD168D21FAB8F59708330D6A9BEB021
>           17) - 4BBB225E61C465F174254159467E65939F
>           18) - 030A156C9616337A20070941E702827DA8E1
>           19) - AB0FC11C9A4E225011382A9D164D9D55CA2B64
>           20) - 72B9B4735DC445E5DA6AF88DE965B7E87CB9A23C



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 00/21] arm64: implement support for KASLR
  2016-01-11 22:07   ` Kees Cook
  (?)
@ 2016-01-12  7:17     ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-12  7:17 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Mark Rutland, Leif Lindholm, LKML, Stuart Yoder, Sharma Bhupesh,
	Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 11 January 2016 at 23:07, Kees Cook <keescook@chromium.org> wrote:
> On Mon, Jan 11, 2016 at 5:18 AM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> This series implements KASLR for arm64, by building the kernel as a PIE
>> executable that can relocate itself at runtime, and moving it to a random
>> offset in the vmalloc area. v2 and up also implement physical randomization,
>> i.e., it allows the kernel to deal with being loaded at any physical offset
>> (modulo the required alignment), and invokes the EFI_RNG_PROTOCOL from the
>> UEFI stub to obtain random bits and perform the actual randomization of the
>> physical load address.
>
> I will continue cheering! :)
>

:-)

>> Changes since v2:
>> - Incorporated feedback from Marc Zyngier into the KVM patch (#5)
>> - Dropped the pgdir section and the patch that memblock_reserve()'s the kernel
>>   sections at a smaller granularity. This is no longer necessary with the pgdir
>>   section gone. This also fixes an issue spotted by James Morse where the fixmap
>>   page tables are not zeroed correctly; these have been moved back to the .bss
>>   section.
>> - Got rid of all ifdef'ery regarding the number of translation levels in the
>>   changed .c files, by introducing new definitions in pgtable.h (#3, #6)
>> - Fixed KAsan support, which was broken by all earlier versions.
>> - Moved module region along with the virtually randomized kernel, so that module
>>   addresses become unpredictable as well, and we only have to rely on veneers in
>>   the PLTs when the module region is exhausted (which is somewhat more likely
>>   since the module region is now shared with other uses of the vmalloc area)
>
> Just to make sure I understand: this means that the offset between
> kernel and modules remains static? It may still be useful to bump
> modules as well, just so that leaking a module address doesn't
> compromise the base kernel image address too. Don't block the series
> for this, though. It's a minor nit. :)
>

Well, the module region could be any 128 MB memory region that also
covers the [_stext, _etext) interval. This would still allow all
modules to branch to all other modules and the core kernel without
resorting to indirect PLT jumps.
IOW, I think I can work around this quite easily.

>> - Added support for the 'nokaslr' command line option. This affects the
>>   randomization performed by the stub, and results in a warning if passed while
>>   the bootloader also presented a random seed for virtual KASLR in register x1.
>> - The .text/.rodata sections of the kernel are no longer aliased in the linear
>>   region with a writable mapping.
>> - Added a separate image header flag for kernel images that may be loaded at any
>>   2 MB aligned offset (+ TEXT_OFFSET)
>> - The KASLR displacement is now corrected if it results in the kernel image
>>   intersecting a PUD/PMD boundary (4k and 16k/64k granule kernels, respectively)
>> - Split out UEFI stub random routines into separate patches.
>> - Implemented a weight based EFI random allocation routine so that each suitable
>>   offset in available memory is equally likely to be selected (as suggested by
>>   Kees Cook)
>> - Reused CONFIG_RELOCATABLE and CONFIG_RANDOMIZE_BASE instead of introducing
>>   new Kconfig symbols to describe the same functionality.
>> - Reimplemented mem= logic so memory is clipped from the top first.
>>
>> Changes since v1/RFC:
>> - This series now implements fully independent virtual and physical address
>>   randomization at load time. I have recycled some patches from this series:
>>   http://thread.gmane.org/gmane.linux.ports.arm.kernel/455151, and updated the
>>   final UEFI stub patch to randomize the physical address as well.
>> - Added a patch to deal with the way KVM on arm64 makes assumptions about the
>>   relation between kernel symbols and the linear mapping (on which the HYP
>>   mapping is based), as these assumptions cease to be valid once we move the
>>   kernel Image out of the linear mapping.
>> - Updated the module PLT patch so it works on BE kernels as well.
>> - Moved the constant Image header values to head.S, and updated the linker
>>   script to provide the kernel size using R_AARCH64_ABS32 relocation rather
>>   than a R_AARCH64_ABS64 relocation, since those are always resolved at build
>>   time. This allows me to get rid of the post-build perl script to swab header
>>   values on BE kernels.
>> - Minor style tweaks.
>>
>> Notes:
>> - These patches apply on top of Mark Rutland's pagetable rework series:
>>   http://thread.gmane.org/gmane.linux.ports.arm.kernel/462438
>> - The arm64 Image is uncompressed by default, and the Elf64_Rela format uses
>>   24 bytes per relocation entry. This results in considerable bloat (i.e., a
>>   couple of MBs worth of relocation data in an .init section). However, no
>>   build time postprocessing is required, we rely fully on the toolchain to
>>   produce the image
>> - We have to rely on the bootloader to supply some randomness in register x1
>>   upon kernel entry. Since we have no decompressor, it is simply not feasible
>>   to collect randomness in the head.S code path before mapping the kernel and
>>   enabling the MMU.
>> - The EFI_RNG_PROTOCOL that is invoked in patch #13 to supply randomness on
>>   UEFI systems is not universally available. A QEMU/KVM firmware image that
>>   implements a pseudo-random version is available here:
>>   http://people.linaro.org/~ard.biesheuvel/QEMU_EFI.fd.aarch64-rng.bz2
>>   (requires access to PMCCNTR_EL0 and support for AES instructions)
>>   See below for instructions how to run the pseudo-random version on real
>>   hardware.
>> - Only mildly tested. Help appreciated.
>>
>> Code can be found here:
>> git://git.linaro.org/people/ard.biesheuvel/linux-arm.git arm64-kaslr-v3
>> https://git.linaro.org/people/ard.biesheuvel/linux-arm.git/shortlog/refs/heads/arm64-kaslr-v3
>>
>> Patch #1 updates the OF code to allow the minimum memblock physical address to
>> be overridden by the arch.
>>
>> Patch #2 introduces KIMAGE_VADDR as the base of the kernel virtual region.
>>
>> Patch #3 introduces dummy pud_index() and pmd_index() macros that are intended
>> to be optimized away if the configured number of translation levels does not
>> actually use them.
>>
>> Patch #4 rewrites early_fixmap_init() so it does not rely on the linear mapping
>> (i.e., the use of phys_to_virt() is avoided)
>>
>> Patch #5 updates KVM on arm64 so it can deal with kernel symbols whose addresses
>> are not covered by the linear mapping.
>>
>> Patch #6 introduces pte_offset_kimg(), pmd_offset_kimg() and pud_offset_kimg()
>> that allow statically allocated page tables (i.e., by fixmap and kasan) to be
>> traversed before the linear mapping is installed.
>>
>> Patch #7 moves the kernel virtual mapping to the vmalloc area, along with the
>> module region which is kept right below it, as before.
>>
>> Patch #8 adds support for PLTs in modules so that relative branches can be
>> resolved via a PLT if the target is out of range. This is required for KASLR,
>> since modules may be loaded far away from the core kernel.
>>
>> Patch #9 and #10 move arm64 to the a new generic relative version of the extable
>> implementation so that it no longer contains absolute addresses that require
>> fixing up at relocation time, but uses relative offsets instead.
>>
>> Patch #11 reverts some changes to the Image header population code so we no
>> longer depend on the linker to populate the header fields. This is necessary
>> since the R_AARCH64_ABS64 relocations that are emitted for these fields are not
>> resolved at build time for PIE executables.
>>
>> Patch #12 updates the code in head.S that needs to execute before relocation to
>> avoid the use of values that are subject to dynamic relocation. These values
>> will not be populated in PIE executables.
>>
>> Patch #13 allows the kernel Image to be loaded anywhere in physical memory, by
>> decoupling PHYS_OFFSET from the base of the kernel image.
>>
>> Patch #14 redefines SWAPPER_TABLE_SHIFT in a way that allows it to be used from
>> assembler code regardless of the number of configured translation levels.
>>
>> Patch #15 (from Mark Rutland) moves the ELF relocation type #defines to a
>> separate file so we can use it from head.S later
>>
>> Patch #16 updates scripts/sortextable.c so it accepts ET_DYN (relocatable)
>> executables as well as ET_EXEC (static) executables.
>>
>> Patch #17 implements the core KASLR, by taking randomness supplied in register
>> x1 and using it to move the kernel inside the vmalloc area.
>>
>> Patch #18 implements efi_get_random_bytes() based on the EFI_RNG_PROTOCOL
>>
>> Patch #19 implements efi_random_alloc()
>>
>> Patch #20 moves the allocation for the converted command line (UTF-16 to ASCII)
>> away from the base of memory. This is necessary since for parsing
>>
>> Patch #21 implements the actual KASLR, by randomizing the kernel physical
>> address, and passing entropy in x1 so that the kernel proper can relocate itself
>> virtually.
>>
>> Ard Biesheuvel (20):
>>   of/fdt: make memblock minimum physical address arch configurable
>>   arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
>>   arm64: pgtable: add dummy pud_index() and pmd_index() definitions
>>   arm64: decouple early fixmap init from linear mapping
>>   arm64: kvm: deal with kernel symbols outside of linear mapping
>>   arm64: pgtable: implement static [pte|pmd|pud]_offset variants
>>   arm64: move kernel image to base of vmalloc area
>>   arm64: add support for module PLTs
>>   extable: add support for relative extables to search and sort routines
>>   arm64: switch to relative exception tables
>>   arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
>>   arm64: avoid dynamic relocations in early boot code
>>   arm64: allow kernel Image to be loaded anywhere in physical memory
>>   arm64: redefine SWAPPER_TABLE_SHIFT for use in asm code
>>   scripts/sortextable: add support for ET_DYN binaries
>>   arm64: add support for a relocatable kernel and KASLR
>>   efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
>>   efi: stub: add implementation of efi_random_alloc()
>>   efi: stub: use high allocation for converted command line
>>   arm64: efi: invoke EFI_RNG_PROTOCOL to supply KASLR randomness
>>
>> Mark Rutland (1):
>>   arm64: split elf relocs into a separate header.
>>
>>  Documentation/arm64/booting.txt                |  34 ++++-
>>  arch/arm/include/asm/kvm_asm.h                 |   2 +
>>  arch/arm/include/asm/kvm_mmu.h                 |   2 +
>>  arch/arm/kvm/arm.c                             |   5 +-
>>  arch/arm/kvm/mmu.c                             |   8 +-
>>  arch/arm64/Kconfig                             |  40 +++++
>>  arch/arm64/Makefile                            |  10 +-
>>  arch/arm64/include/asm/assembler.h             |  30 +++-
>>  arch/arm64/include/asm/boot.h                  |   6 +
>>  arch/arm64/include/asm/elf.h                   |  54 +------
>>  arch/arm64/include/asm/elf_relocs.h            |  75 ++++++++++
>>  arch/arm64/include/asm/futex.h                 |  12 +-
>>  arch/arm64/include/asm/kasan.h                 |  20 +--
>>  arch/arm64/include/asm/kernel-pgtable.h        |  20 ++-
>>  arch/arm64/include/asm/kvm_asm.h               |  19 ++-
>>  arch/arm64/include/asm/kvm_host.h              |   8 +-
>>  arch/arm64/include/asm/kvm_mmu.h               |   2 +
>>  arch/arm64/include/asm/memory.h                |  38 +++--
>>  arch/arm64/include/asm/module.h                |  11 ++
>>  arch/arm64/include/asm/pgtable.h               |  22 ++-
>>  arch/arm64/include/asm/uaccess.h               |  30 ++--
>>  arch/arm64/include/asm/virt.h                  |   4 -
>>  arch/arm64/include/asm/word-at-a-time.h        |   7 +-
>>  arch/arm64/kernel/Makefile                     |   1 +
>>  arch/arm64/kernel/armv8_deprecated.c           |   7 +-
>>  arch/arm64/kernel/efi-entry.S                  |   9 +-
>>  arch/arm64/kernel/head.S                       | 155 +++++++++++++++++---
>>  arch/arm64/kernel/image.h                      |  37 ++---
>>  arch/arm64/kernel/module-plts.c                | 137 +++++++++++++++++
>>  arch/arm64/kernel/module.c                     |  15 +-
>>  arch/arm64/kernel/module.lds                   |   4 +
>>  arch/arm64/kernel/setup.c                      |  44 +++++-
>>  arch/arm64/kernel/vmlinux.lds.S                |  13 +-
>>  arch/arm64/kvm/debug.c                         |   1 +
>>  arch/arm64/kvm/hyp.S                           |   6 +-
>>  arch/arm64/mm/dump.c                           |  12 +-
>>  arch/arm64/mm/extable.c                        |   2 +-
>>  arch/arm64/mm/init.c                           |  91 ++++++++++--
>>  arch/arm64/mm/kasan_init.c                     |  21 ++-
>>  arch/arm64/mm/mmu.c                            |  95 +++++++-----
>>  arch/x86/include/asm/efi.h                     |   2 +
>>  drivers/firmware/efi/libstub/Makefile          |   2 +-
>>  drivers/firmware/efi/libstub/arm-stub.c        |  17 ++-
>>  drivers/firmware/efi/libstub/arm64-stub.c      |  67 +++++++--
>>  drivers/firmware/efi/libstub/efi-stub-helper.c |  24 ++-
>>  drivers/firmware/efi/libstub/efistub.h         |   9 ++
>>  drivers/firmware/efi/libstub/random.c          | 120 +++++++++++++++
>>  drivers/of/fdt.c                               |   5 +-
>>  include/linux/efi.h                            |   5 +-
>>  lib/extable.c                                  |  50 +++++--
>>  scripts/sortextable.c                          |  10 +-
>>  51 files changed, 1111 insertions(+), 309 deletions(-)
>>  create mode 100644 arch/arm64/include/asm/elf_relocs.h
>>  create mode 100644 arch/arm64/kernel/module-plts.c
>>  create mode 100644 arch/arm64/kernel/module.lds
>>  create mode 100644 drivers/firmware/efi/libstub/random.c
>>
>> EFI_RNG_PROTOCOL on real hardware
>> =================================
>>
>> To test whether your UEFI implements the EFI_RNG_PROTOCOL, download the
>> following executable and run it from the UEFI Shell:
>> http://people.linaro.org/~ard.biesheuvel/RngTest.efi
>>
>> FS0:\> rngtest
>> UEFI RNG Protocol Testing :
>> ----------------------------
>>  -- Locate UEFI RNG Protocol : [Fail - Status = Not Found]
>>
>> If your UEFI does not implement the EFI_RNG_PROTOCOL, you can download and
>> install the pseudo-random version that uses the generic timer and PMCCNTR_EL0
>> values and permutes them using a couple of rounds of AES.
>> http://people.linaro.org/~ard.biesheuvel/RngDxe.efi
>>
>> NOTE: not for production!! This is a quick and dirty hack to test the KASLR
>> code, and is not suitable for anything else.
>>
>> FS0:\> rngdxe
>> FS0:\> rngtest
>> UEFI RNG Protocol Testing :
>> ----------------------------
>>  -- Locate UEFI RNG Protocol : [Pass]
>>  -- Call RNG->GetInfo() interface :
>>      >> Supported RNG Algorithm (Count = 2) :
>>           0) 44F0DE6E-4D8C-4045-A8C7-4DD168856B9E
>>           1) E43176D7-B6E8-4827-B784-7FFDC4B68561
>>  -- Call RNG->GetRNG() interface :
>>      >> RNG with default algorithm : [Pass]
>>      >> RNG with SP800-90-HMAC-256 : [Fail - Status = Unsupported]
>>      >> RNG with SP800-90-Hash-256 : [Fail - Status = Unsupported]
>>      >> RNG with SP800-90-CTR-256 : [Pass]
>>      >> RNG with X9.31-3DES : [Fail - Status = Unsupported]
>>      >> RNG with X9.31-AES : [Fail - Status = Unsupported]
>>      >> RNG with RAW Entropy : [Pass]
>>  -- Random Number Generation Test with default RNG Algorithm (20 Rounds):
>>           01) - 27
>>           02) - 61E8
>>           03) - 496FD8
>>           04) - DDD793BF
>>           05) - B6C37C8E23
>>           06) - 4D183C604A96
>>           07) - 9363311DB61298
>>           08) - 5715A7294F4E436E
>>           09) - F0D4D7BAA0DD52318E
>>           10) - C88C6EBCF4C0474D87C3
>>           11) - B5594602B482A643932172
>>           12) - CA7573F704B2089B726B9CF1
>>           13) - A93E9451CB533DCFBA87B97C33
>>           14) - 45AA7B83DB6044F7BBAB031F0D24
>>           15) - 3DD7A4D61F34ADCB400B5976730DCF
>>           16) - 4DD168D21FAB8F59708330D6A9BEB021
>>           17) - 4BBB225E61C465F174254159467E65939F
>>           18) - 030A156C9616337A20070941E702827DA8E1
>>           19) - AB0FC11C9A4E225011382A9D164D9D55CA2B64
>>           20) - 72B9B4735DC445E5DA6AF88DE965B7E87CB9A23C
>
>
>
> --
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 00/21] arm64: implement support for KASLR
@ 2016-01-12  7:17     ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-12  7:17 UTC (permalink / raw)
  To: linux-arm-kernel

On 11 January 2016 at 23:07, Kees Cook <keescook@chromium.org> wrote:
> On Mon, Jan 11, 2016 at 5:18 AM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> This series implements KASLR for arm64, by building the kernel as a PIE
>> executable that can relocate itself at runtime, and moving it to a random
>> offset in the vmalloc area. v2 and up also implement physical randomization,
>> i.e., it allows the kernel to deal with being loaded at any physical offset
>> (modulo the required alignment), and invokes the EFI_RNG_PROTOCOL from the
>> UEFI stub to obtain random bits and perform the actual randomization of the
>> physical load address.
>
> I will continue cheering! :)
>

:-)

>> Changes since v2:
>> - Incorporated feedback from Marc Zyngier into the KVM patch (#5)
>> - Dropped the pgdir section and the patch that memblock_reserve()'s the kernel
>>   sections at a smaller granularity. This is no longer necessary with the pgdir
>>   section gone. This also fixes an issue spotted by James Morse where the fixmap
>>   page tables are not zeroed correctly; these have been moved back to the .bss
>>   section.
>> - Got rid of all ifdef'ery regarding the number of translation levels in the
>>   changed .c files, by introducing new definitions in pgtable.h (#3, #6)
>> - Fixed KAsan support, which was broken by all earlier versions.
>> - Moved module region along with the virtually randomized kernel, so that module
>>   addresses become unpredictable as well, and we only have to rely on veneers in
>>   the PLTs when the module region is exhausted (which is somewhat more likely
>>   since the module region is now shared with other uses of the vmalloc area)
>
> Just to make sure I understand: this means that the offset between
> kernel and modules remains static? It may still be useful to bump
> modules as well, just so that leaking a module address doesn't
> compromise the base kernel image address too. Don't block the series
> for this, though. It's a minor nit. :)
>

Well, the module region could be any 128 MB memory region that also
covers the [_stext, _etext) interval. This would still allow all
modules to branch to all other modules and the core kernel without
resorting to indirect PLT jumps.
IOW, I think I can work around this quite easily.

>> - Added support for the 'nokaslr' command line option. This affects the
>>   randomization performed by the stub, and results in a warning if passed while
>>   the bootloader also presented a random seed for virtual KASLR in register x1.
>> - The .text/.rodata sections of the kernel are no longer aliased in the linear
>>   region with a writable mapping.
>> - Added a separate image header flag for kernel images that may be loaded at any
>>   2 MB aligned offset (+ TEXT_OFFSET)
>> - The KASLR displacement is now corrected if it results in the kernel image
>>   intersecting a PUD/PMD boundary (4k and 16k/64k granule kernels, respectively)
>> - Split out UEFI stub random routines into separate patches.
>> - Implemented a weight based EFI random allocation routine so that each suitable
>>   offset in available memory is equally likely to be selected (as suggested by
>>   Kees Cook)
>> - Reused CONFIG_RELOCATABLE and CONFIG_RANDOMIZE_BASE instead of introducing
>>   new Kconfig symbols to describe the same functionality.
>> - Reimplemented mem= logic so memory is clipped from the top first.
>>
>> Changes since v1/RFC:
>> - This series now implements fully independent virtual and physical address
>>   randomization at load time. I have recycled some patches from this series:
>>   http://thread.gmane.org/gmane.linux.ports.arm.kernel/455151, and updated the
>>   final UEFI stub patch to randomize the physical address as well.
>> - Added a patch to deal with the way KVM on arm64 makes assumptions about the
>>   relation between kernel symbols and the linear mapping (on which the HYP
>>   mapping is based), as these assumptions cease to be valid once we move the
>>   kernel Image out of the linear mapping.
>> - Updated the module PLT patch so it works on BE kernels as well.
>> - Moved the constant Image header values to head.S, and updated the linker
>>   script to provide the kernel size using R_AARCH64_ABS32 relocation rather
>>   than a R_AARCH64_ABS64 relocation, since those are always resolved at build
>>   time. This allows me to get rid of the post-build perl script to swab header
>>   values on BE kernels.
>> - Minor style tweaks.
>>
>> Notes:
>> - These patches apply on top of Mark Rutland's pagetable rework series:
>>   http://thread.gmane.org/gmane.linux.ports.arm.kernel/462438
>> - The arm64 Image is uncompressed by default, and the Elf64_Rela format uses
>>   24 bytes per relocation entry. This results in considerable bloat (i.e., a
>>   couple of MBs worth of relocation data in an .init section). However, no
>>   build time postprocessing is required, we rely fully on the toolchain to
>>   produce the image
>> - We have to rely on the bootloader to supply some randomness in register x1
>>   upon kernel entry. Since we have no decompressor, it is simply not feasible
>>   to collect randomness in the head.S code path before mapping the kernel and
>>   enabling the MMU.
>> - The EFI_RNG_PROTOCOL that is invoked in patch #13 to supply randomness on
>>   UEFI systems is not universally available. A QEMU/KVM firmware image that
>>   implements a pseudo-random version is available here:
>>   http://people.linaro.org/~ard.biesheuvel/QEMU_EFI.fd.aarch64-rng.bz2
>>   (requires access to PMCCNTR_EL0 and support for AES instructions)
>>   See below for instructions how to run the pseudo-random version on real
>>   hardware.
>> - Only mildly tested. Help appreciated.
>>
>> Code can be found here:
>> git://git.linaro.org/people/ard.biesheuvel/linux-arm.git arm64-kaslr-v3
>> https://git.linaro.org/people/ard.biesheuvel/linux-arm.git/shortlog/refs/heads/arm64-kaslr-v3
>>
>> Patch #1 updates the OF code to allow the minimum memblock physical address to
>> be overridden by the arch.
>>
>> Patch #2 introduces KIMAGE_VADDR as the base of the kernel virtual region.
>>
>> Patch #3 introduces dummy pud_index() and pmd_index() macros that are intended
>> to be optimized away if the configured number of translation levels does not
>> actually use them.
>>
>> Patch #4 rewrites early_fixmap_init() so it does not rely on the linear mapping
>> (i.e., the use of phys_to_virt() is avoided)
>>
>> Patch #5 updates KVM on arm64 so it can deal with kernel symbols whose addresses
>> are not covered by the linear mapping.
>>
>> Patch #6 introduces pte_offset_kimg(), pmd_offset_kimg() and pud_offset_kimg()
>> that allow statically allocated page tables (i.e., by fixmap and kasan) to be
>> traversed before the linear mapping is installed.
>>
>> Patch #7 moves the kernel virtual mapping to the vmalloc area, along with the
>> module region which is kept right below it, as before.
>>
>> Patch #8 adds support for PLTs in modules so that relative branches can be
>> resolved via a PLT if the target is out of range. This is required for KASLR,
>> since modules may be loaded far away from the core kernel.
>>
>> Patch #9 and #10 move arm64 to the a new generic relative version of the extable
>> implementation so that it no longer contains absolute addresses that require
>> fixing up at relocation time, but uses relative offsets instead.
>>
>> Patch #11 reverts some changes to the Image header population code so we no
>> longer depend on the linker to populate the header fields. This is necessary
>> since the R_AARCH64_ABS64 relocations that are emitted for these fields are not
>> resolved at build time for PIE executables.
>>
>> Patch #12 updates the code in head.S that needs to execute before relocation to
>> avoid the use of values that are subject to dynamic relocation. These values
>> will not be populated in PIE executables.
>>
>> Patch #13 allows the kernel Image to be loaded anywhere in physical memory, by
>> decoupling PHYS_OFFSET from the base of the kernel image.
>>
>> Patch #14 redefines SWAPPER_TABLE_SHIFT in a way that allows it to be used from
>> assembler code regardless of the number of configured translation levels.
>>
>> Patch #15 (from Mark Rutland) moves the ELF relocation type #defines to a
>> separate file so we can use it from head.S later
>>
>> Patch #16 updates scripts/sortextable.c so it accepts ET_DYN (relocatable)
>> executables as well as ET_EXEC (static) executables.
>>
>> Patch #17 implements the core KASLR, by taking randomness supplied in register
>> x1 and using it to move the kernel inside the vmalloc area.
>>
>> Patch #18 implements efi_get_random_bytes() based on the EFI_RNG_PROTOCOL
>>
>> Patch #19 implements efi_random_alloc()
>>
>> Patch #20 moves the allocation for the converted command line (UTF-16 to ASCII)
>> away from the base of memory. This is necessary since for parsing
>>
>> Patch #21 implements the actual KASLR, by randomizing the kernel physical
>> address, and passing entropy in x1 so that the kernel proper can relocate itself
>> virtually.
>>
>> Ard Biesheuvel (20):
>>   of/fdt: make memblock minimum physical address arch configurable
>>   arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
>>   arm64: pgtable: add dummy pud_index() and pmd_index() definitions
>>   arm64: decouple early fixmap init from linear mapping
>>   arm64: kvm: deal with kernel symbols outside of linear mapping
>>   arm64: pgtable: implement static [pte|pmd|pud]_offset variants
>>   arm64: move kernel image to base of vmalloc area
>>   arm64: add support for module PLTs
>>   extable: add support for relative extables to search and sort routines
>>   arm64: switch to relative exception tables
>>   arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
>>   arm64: avoid dynamic relocations in early boot code
>>   arm64: allow kernel Image to be loaded anywhere in physical memory
>>   arm64: redefine SWAPPER_TABLE_SHIFT for use in asm code
>>   scripts/sortextable: add support for ET_DYN binaries
>>   arm64: add support for a relocatable kernel and KASLR
>>   efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
>>   efi: stub: add implementation of efi_random_alloc()
>>   efi: stub: use high allocation for converted command line
>>   arm64: efi: invoke EFI_RNG_PROTOCOL to supply KASLR randomness
>>
>> Mark Rutland (1):
>>   arm64: split elf relocs into a separate header.
>>
>>  Documentation/arm64/booting.txt                |  34 ++++-
>>  arch/arm/include/asm/kvm_asm.h                 |   2 +
>>  arch/arm/include/asm/kvm_mmu.h                 |   2 +
>>  arch/arm/kvm/arm.c                             |   5 +-
>>  arch/arm/kvm/mmu.c                             |   8 +-
>>  arch/arm64/Kconfig                             |  40 +++++
>>  arch/arm64/Makefile                            |  10 +-
>>  arch/arm64/include/asm/assembler.h             |  30 +++-
>>  arch/arm64/include/asm/boot.h                  |   6 +
>>  arch/arm64/include/asm/elf.h                   |  54 +------
>>  arch/arm64/include/asm/elf_relocs.h            |  75 ++++++++++
>>  arch/arm64/include/asm/futex.h                 |  12 +-
>>  arch/arm64/include/asm/kasan.h                 |  20 +--
>>  arch/arm64/include/asm/kernel-pgtable.h        |  20 ++-
>>  arch/arm64/include/asm/kvm_asm.h               |  19 ++-
>>  arch/arm64/include/asm/kvm_host.h              |   8 +-
>>  arch/arm64/include/asm/kvm_mmu.h               |   2 +
>>  arch/arm64/include/asm/memory.h                |  38 +++--
>>  arch/arm64/include/asm/module.h                |  11 ++
>>  arch/arm64/include/asm/pgtable.h               |  22 ++-
>>  arch/arm64/include/asm/uaccess.h               |  30 ++--
>>  arch/arm64/include/asm/virt.h                  |   4 -
>>  arch/arm64/include/asm/word-at-a-time.h        |   7 +-
>>  arch/arm64/kernel/Makefile                     |   1 +
>>  arch/arm64/kernel/armv8_deprecated.c           |   7 +-
>>  arch/arm64/kernel/efi-entry.S                  |   9 +-
>>  arch/arm64/kernel/head.S                       | 155 +++++++++++++++++---
>>  arch/arm64/kernel/image.h                      |  37 ++---
>>  arch/arm64/kernel/module-plts.c                | 137 +++++++++++++++++
>>  arch/arm64/kernel/module.c                     |  15 +-
>>  arch/arm64/kernel/module.lds                   |   4 +
>>  arch/arm64/kernel/setup.c                      |  44 +++++-
>>  arch/arm64/kernel/vmlinux.lds.S                |  13 +-
>>  arch/arm64/kvm/debug.c                         |   1 +
>>  arch/arm64/kvm/hyp.S                           |   6 +-
>>  arch/arm64/mm/dump.c                           |  12 +-
>>  arch/arm64/mm/extable.c                        |   2 +-
>>  arch/arm64/mm/init.c                           |  91 ++++++++++--
>>  arch/arm64/mm/kasan_init.c                     |  21 ++-
>>  arch/arm64/mm/mmu.c                            |  95 +++++++-----
>>  arch/x86/include/asm/efi.h                     |   2 +
>>  drivers/firmware/efi/libstub/Makefile          |   2 +-
>>  drivers/firmware/efi/libstub/arm-stub.c        |  17 ++-
>>  drivers/firmware/efi/libstub/arm64-stub.c      |  67 +++++++--
>>  drivers/firmware/efi/libstub/efi-stub-helper.c |  24 ++-
>>  drivers/firmware/efi/libstub/efistub.h         |   9 ++
>>  drivers/firmware/efi/libstub/random.c          | 120 +++++++++++++++
>>  drivers/of/fdt.c                               |   5 +-
>>  include/linux/efi.h                            |   5 +-
>>  lib/extable.c                                  |  50 +++++--
>>  scripts/sortextable.c                          |  10 +-
>>  51 files changed, 1111 insertions(+), 309 deletions(-)
>>  create mode 100644 arch/arm64/include/asm/elf_relocs.h
>>  create mode 100644 arch/arm64/kernel/module-plts.c
>>  create mode 100644 arch/arm64/kernel/module.lds
>>  create mode 100644 drivers/firmware/efi/libstub/random.c
>>
>> EFI_RNG_PROTOCOL on real hardware
>> =================================
>>
>> To test whether your UEFI implements the EFI_RNG_PROTOCOL, download the
>> following executable and run it from the UEFI Shell:
>> http://people.linaro.org/~ard.biesheuvel/RngTest.efi
>>
>> FS0:\> rngtest
>> UEFI RNG Protocol Testing :
>> ----------------------------
>>  -- Locate UEFI RNG Protocol : [Fail - Status = Not Found]
>>
>> If your UEFI does not implement the EFI_RNG_PROTOCOL, you can download and
>> install the pseudo-random version that uses the generic timer and PMCCNTR_EL0
>> values and permutes them using a couple of rounds of AES.
>> http://people.linaro.org/~ard.biesheuvel/RngDxe.efi
>>
>> NOTE: not for production!! This is a quick and dirty hack to test the KASLR
>> code, and is not suitable for anything else.
>>
>> FS0:\> rngdxe
>> FS0:\> rngtest
>> UEFI RNG Protocol Testing :
>> ----------------------------
>>  -- Locate UEFI RNG Protocol : [Pass]
>>  -- Call RNG->GetInfo() interface :
>>      >> Supported RNG Algorithm (Count = 2) :
>>           0) 44F0DE6E-4D8C-4045-A8C7-4DD168856B9E
>>           1) E43176D7-B6E8-4827-B784-7FFDC4B68561
>>  -- Call RNG->GetRNG() interface :
>>      >> RNG with default algorithm : [Pass]
>>      >> RNG with SP800-90-HMAC-256 : [Fail - Status = Unsupported]
>>      >> RNG with SP800-90-Hash-256 : [Fail - Status = Unsupported]
>>      >> RNG with SP800-90-CTR-256 : [Pass]
>>      >> RNG with X9.31-3DES : [Fail - Status = Unsupported]
>>      >> RNG with X9.31-AES : [Fail - Status = Unsupported]
>>      >> RNG with RAW Entropy : [Pass]
>>  -- Random Number Generation Test with default RNG Algorithm (20 Rounds):
>>           01) - 27
>>           02) - 61E8
>>           03) - 496FD8
>>           04) - DDD793BF
>>           05) - B6C37C8E23
>>           06) - 4D183C604A96
>>           07) - 9363311DB61298
>>           08) - 5715A7294F4E436E
>>           09) - F0D4D7BAA0DD52318E
>>           10) - C88C6EBCF4C0474D87C3
>>           11) - B5594602B482A643932172
>>           12) - CA7573F704B2089B726B9CF1
>>           13) - A93E9451CB533DCFBA87B97C33
>>           14) - 45AA7B83DB6044F7BBAB031F0D24
>>           15) - 3DD7A4D61F34ADCB400B5976730DCF
>>           16) - 4DD168D21FAB8F59708330D6A9BEB021
>>           17) - 4BBB225E61C465F174254159467E65939F
>>           18) - 030A156C9616337A20070941E702827DA8E1
>>           19) - AB0FC11C9A4E225011382A9D164D9D55CA2B64
>>           20) - 72B9B4735DC445E5DA6AF88DE965B7E87CB9A23C
>
>
>
> --
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 00/21] arm64: implement support for KASLR
@ 2016-01-12  7:17     ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-12  7:17 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Mark Rutland, Leif Lindholm, LKML, Stuart Yoder, Sharma Bhupesh,
	Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 11 January 2016 at 23:07, Kees Cook <keescook@chromium.org> wrote:
> On Mon, Jan 11, 2016 at 5:18 AM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> This series implements KASLR for arm64, by building the kernel as a PIE
>> executable that can relocate itself at runtime, and moving it to a random
>> offset in the vmalloc area. v2 and up also implement physical randomization,
>> i.e., it allows the kernel to deal with being loaded at any physical offset
>> (modulo the required alignment), and invokes the EFI_RNG_PROTOCOL from the
>> UEFI stub to obtain random bits and perform the actual randomization of the
>> physical load address.
>
> I will continue cheering! :)
>

:-)

>> Changes since v2:
>> - Incorporated feedback from Marc Zyngier into the KVM patch (#5)
>> - Dropped the pgdir section and the patch that memblock_reserve()'s the kernel
>>   sections at a smaller granularity. This is no longer necessary with the pgdir
>>   section gone. This also fixes an issue spotted by James Morse where the fixmap
>>   page tables are not zeroed correctly; these have been moved back to the .bss
>>   section.
>> - Got rid of all ifdef'ery regarding the number of translation levels in the
>>   changed .c files, by introducing new definitions in pgtable.h (#3, #6)
>> - Fixed KAsan support, which was broken by all earlier versions.
>> - Moved module region along with the virtually randomized kernel, so that module
>>   addresses become unpredictable as well, and we only have to rely on veneers in
>>   the PLTs when the module region is exhausted (which is somewhat more likely
>>   since the module region is now shared with other uses of the vmalloc area)
>
> Just to make sure I understand: this means that the offset between
> kernel and modules remains static? It may still be useful to bump
> modules as well, just so that leaking a module address doesn't
> compromise the base kernel image address too. Don't block the series
> for this, though. It's a minor nit. :)
>

Well, the module region could be any 128 MB memory region that also
covers the [_stext, _etext) interval. This would still allow all
modules to branch to all other modules and the core kernel without
resorting to indirect PLT jumps.
IOW, I think I can work around this quite easily.

>> - Added support for the 'nokaslr' command line option. This affects the
>>   randomization performed by the stub, and results in a warning if passed while
>>   the bootloader also presented a random seed for virtual KASLR in register x1.
>> - The .text/.rodata sections of the kernel are no longer aliased in the linear
>>   region with a writable mapping.
>> - Added a separate image header flag for kernel images that may be loaded at any
>>   2 MB aligned offset (+ TEXT_OFFSET)
>> - The KASLR displacement is now corrected if it results in the kernel image
>>   intersecting a PUD/PMD boundary (4k and 16k/64k granule kernels, respectively)
>> - Split out UEFI stub random routines into separate patches.
>> - Implemented a weight based EFI random allocation routine so that each suitable
>>   offset in available memory is equally likely to be selected (as suggested by
>>   Kees Cook)
>> - Reused CONFIG_RELOCATABLE and CONFIG_RANDOMIZE_BASE instead of introducing
>>   new Kconfig symbols to describe the same functionality.
>> - Reimplemented mem= logic so memory is clipped from the top first.
>>
>> Changes since v1/RFC:
>> - This series now implements fully independent virtual and physical address
>>   randomization at load time. I have recycled some patches from this series:
>>   http://thread.gmane.org/gmane.linux.ports.arm.kernel/455151, and updated the
>>   final UEFI stub patch to randomize the physical address as well.
>> - Added a patch to deal with the way KVM on arm64 makes assumptions about the
>>   relation between kernel symbols and the linear mapping (on which the HYP
>>   mapping is based), as these assumptions cease to be valid once we move the
>>   kernel Image out of the linear mapping.
>> - Updated the module PLT patch so it works on BE kernels as well.
>> - Moved the constant Image header values to head.S, and updated the linker
>>   script to provide the kernel size using R_AARCH64_ABS32 relocation rather
>>   than a R_AARCH64_ABS64 relocation, since those are always resolved at build
>>   time. This allows me to get rid of the post-build perl script to swab header
>>   values on BE kernels.
>> - Minor style tweaks.
>>
>> Notes:
>> - These patches apply on top of Mark Rutland's pagetable rework series:
>>   http://thread.gmane.org/gmane.linux.ports.arm.kernel/462438
>> - The arm64 Image is uncompressed by default, and the Elf64_Rela format uses
>>   24 bytes per relocation entry. This results in considerable bloat (i.e., a
>>   couple of MBs worth of relocation data in an .init section). However, no
>>   build time postprocessing is required, we rely fully on the toolchain to
>>   produce the image
>> - We have to rely on the bootloader to supply some randomness in register x1
>>   upon kernel entry. Since we have no decompressor, it is simply not feasible
>>   to collect randomness in the head.S code path before mapping the kernel and
>>   enabling the MMU.
>> - The EFI_RNG_PROTOCOL that is invoked in patch #13 to supply randomness on
>>   UEFI systems is not universally available. A QEMU/KVM firmware image that
>>   implements a pseudo-random version is available here:
>>   http://people.linaro.org/~ard.biesheuvel/QEMU_EFI.fd.aarch64-rng.bz2
>>   (requires access to PMCCNTR_EL0 and support for AES instructions)
>>   See below for instructions how to run the pseudo-random version on real
>>   hardware.
>> - Only mildly tested. Help appreciated.
>>
>> Code can be found here:
>> git://git.linaro.org/people/ard.biesheuvel/linux-arm.git arm64-kaslr-v3
>> https://git.linaro.org/people/ard.biesheuvel/linux-arm.git/shortlog/refs/heads/arm64-kaslr-v3
>>
>> Patch #1 updates the OF code to allow the minimum memblock physical address to
>> be overridden by the arch.
>>
>> Patch #2 introduces KIMAGE_VADDR as the base of the kernel virtual region.
>>
>> Patch #3 introduces dummy pud_index() and pmd_index() macros that are intended
>> to be optimized away if the configured number of translation levels does not
>> actually use them.
>>
>> Patch #4 rewrites early_fixmap_init() so it does not rely on the linear mapping
>> (i.e., the use of phys_to_virt() is avoided)
>>
>> Patch #5 updates KVM on arm64 so it can deal with kernel symbols whose addresses
>> are not covered by the linear mapping.
>>
>> Patch #6 introduces pte_offset_kimg(), pmd_offset_kimg() and pud_offset_kimg()
>> that allow statically allocated page tables (i.e., by fixmap and kasan) to be
>> traversed before the linear mapping is installed.
>>
>> Patch #7 moves the kernel virtual mapping to the vmalloc area, along with the
>> module region which is kept right below it, as before.
>>
>> Patch #8 adds support for PLTs in modules so that relative branches can be
>> resolved via a PLT if the target is out of range. This is required for KASLR,
>> since modules may be loaded far away from the core kernel.
>>
>> Patch #9 and #10 move arm64 to the a new generic relative version of the extable
>> implementation so that it no longer contains absolute addresses that require
>> fixing up at relocation time, but uses relative offsets instead.
>>
>> Patch #11 reverts some changes to the Image header population code so we no
>> longer depend on the linker to populate the header fields. This is necessary
>> since the R_AARCH64_ABS64 relocations that are emitted for these fields are not
>> resolved at build time for PIE executables.
>>
>> Patch #12 updates the code in head.S that needs to execute before relocation to
>> avoid the use of values that are subject to dynamic relocation. These values
>> will not be populated in PIE executables.
>>
>> Patch #13 allows the kernel Image to be loaded anywhere in physical memory, by
>> decoupling PHYS_OFFSET from the base of the kernel image.
>>
>> Patch #14 redefines SWAPPER_TABLE_SHIFT in a way that allows it to be used from
>> assembler code regardless of the number of configured translation levels.
>>
>> Patch #15 (from Mark Rutland) moves the ELF relocation type #defines to a
>> separate file so we can use it from head.S later
>>
>> Patch #16 updates scripts/sortextable.c so it accepts ET_DYN (relocatable)
>> executables as well as ET_EXEC (static) executables.
>>
>> Patch #17 implements the core KASLR, by taking randomness supplied in register
>> x1 and using it to move the kernel inside the vmalloc area.
>>
>> Patch #18 implements efi_get_random_bytes() based on the EFI_RNG_PROTOCOL
>>
>> Patch #19 implements efi_random_alloc()
>>
>> Patch #20 moves the allocation for the converted command line (UTF-16 to ASCII)
>> away from the base of memory. This is necessary since for parsing
>>
>> Patch #21 implements the actual KASLR, by randomizing the kernel physical
>> address, and passing entropy in x1 so that the kernel proper can relocate itself
>> virtually.
>>
>> Ard Biesheuvel (20):
>>   of/fdt: make memblock minimum physical address arch configurable
>>   arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
>>   arm64: pgtable: add dummy pud_index() and pmd_index() definitions
>>   arm64: decouple early fixmap init from linear mapping
>>   arm64: kvm: deal with kernel symbols outside of linear mapping
>>   arm64: pgtable: implement static [pte|pmd|pud]_offset variants
>>   arm64: move kernel image to base of vmalloc area
>>   arm64: add support for module PLTs
>>   extable: add support for relative extables to search and sort routines
>>   arm64: switch to relative exception tables
>>   arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
>>   arm64: avoid dynamic relocations in early boot code
>>   arm64: allow kernel Image to be loaded anywhere in physical memory
>>   arm64: redefine SWAPPER_TABLE_SHIFT for use in asm code
>>   scripts/sortextable: add support for ET_DYN binaries
>>   arm64: add support for a relocatable kernel and KASLR
>>   efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
>>   efi: stub: add implementation of efi_random_alloc()
>>   efi: stub: use high allocation for converted command line
>>   arm64: efi: invoke EFI_RNG_PROTOCOL to supply KASLR randomness
>>
>> Mark Rutland (1):
>>   arm64: split elf relocs into a separate header.
>>
>>  Documentation/arm64/booting.txt                |  34 ++++-
>>  arch/arm/include/asm/kvm_asm.h                 |   2 +
>>  arch/arm/include/asm/kvm_mmu.h                 |   2 +
>>  arch/arm/kvm/arm.c                             |   5 +-
>>  arch/arm/kvm/mmu.c                             |   8 +-
>>  arch/arm64/Kconfig                             |  40 +++++
>>  arch/arm64/Makefile                            |  10 +-
>>  arch/arm64/include/asm/assembler.h             |  30 +++-
>>  arch/arm64/include/asm/boot.h                  |   6 +
>>  arch/arm64/include/asm/elf.h                   |  54 +------
>>  arch/arm64/include/asm/elf_relocs.h            |  75 ++++++++++
>>  arch/arm64/include/asm/futex.h                 |  12 +-
>>  arch/arm64/include/asm/kasan.h                 |  20 +--
>>  arch/arm64/include/asm/kernel-pgtable.h        |  20 ++-
>>  arch/arm64/include/asm/kvm_asm.h               |  19 ++-
>>  arch/arm64/include/asm/kvm_host.h              |   8 +-
>>  arch/arm64/include/asm/kvm_mmu.h               |   2 +
>>  arch/arm64/include/asm/memory.h                |  38 +++--
>>  arch/arm64/include/asm/module.h                |  11 ++
>>  arch/arm64/include/asm/pgtable.h               |  22 ++-
>>  arch/arm64/include/asm/uaccess.h               |  30 ++--
>>  arch/arm64/include/asm/virt.h                  |   4 -
>>  arch/arm64/include/asm/word-at-a-time.h        |   7 +-
>>  arch/arm64/kernel/Makefile                     |   1 +
>>  arch/arm64/kernel/armv8_deprecated.c           |   7 +-
>>  arch/arm64/kernel/efi-entry.S                  |   9 +-
>>  arch/arm64/kernel/head.S                       | 155 +++++++++++++++++---
>>  arch/arm64/kernel/image.h                      |  37 ++---
>>  arch/arm64/kernel/module-plts.c                | 137 +++++++++++++++++
>>  arch/arm64/kernel/module.c                     |  15 +-
>>  arch/arm64/kernel/module.lds                   |   4 +
>>  arch/arm64/kernel/setup.c                      |  44 +++++-
>>  arch/arm64/kernel/vmlinux.lds.S                |  13 +-
>>  arch/arm64/kvm/debug.c                         |   1 +
>>  arch/arm64/kvm/hyp.S                           |   6 +-
>>  arch/arm64/mm/dump.c                           |  12 +-
>>  arch/arm64/mm/extable.c                        |   2 +-
>>  arch/arm64/mm/init.c                           |  91 ++++++++++--
>>  arch/arm64/mm/kasan_init.c                     |  21 ++-
>>  arch/arm64/mm/mmu.c                            |  95 +++++++-----
>>  arch/x86/include/asm/efi.h                     |   2 +
>>  drivers/firmware/efi/libstub/Makefile          |   2 +-
>>  drivers/firmware/efi/libstub/arm-stub.c        |  17 ++-
>>  drivers/firmware/efi/libstub/arm64-stub.c      |  67 +++++++--
>>  drivers/firmware/efi/libstub/efi-stub-helper.c |  24 ++-
>>  drivers/firmware/efi/libstub/efistub.h         |   9 ++
>>  drivers/firmware/efi/libstub/random.c          | 120 +++++++++++++++
>>  drivers/of/fdt.c                               |   5 +-
>>  include/linux/efi.h                            |   5 +-
>>  lib/extable.c                                  |  50 +++++--
>>  scripts/sortextable.c                          |  10 +-
>>  51 files changed, 1111 insertions(+), 309 deletions(-)
>>  create mode 100644 arch/arm64/include/asm/elf_relocs.h
>>  create mode 100644 arch/arm64/kernel/module-plts.c
>>  create mode 100644 arch/arm64/kernel/module.lds
>>  create mode 100644 drivers/firmware/efi/libstub/random.c
>>
>> EFI_RNG_PROTOCOL on real hardware
>> =================================
>>
>> To test whether your UEFI implements the EFI_RNG_PROTOCOL, download the
>> following executable and run it from the UEFI Shell:
>> http://people.linaro.org/~ard.biesheuvel/RngTest.efi
>>
>> FS0:\> rngtest
>> UEFI RNG Protocol Testing :
>> ----------------------------
>>  -- Locate UEFI RNG Protocol : [Fail - Status = Not Found]
>>
>> If your UEFI does not implement the EFI_RNG_PROTOCOL, you can download and
>> install the pseudo-random version that uses the generic timer and PMCCNTR_EL0
>> values and permutes them using a couple of rounds of AES.
>> http://people.linaro.org/~ard.biesheuvel/RngDxe.efi
>>
>> NOTE: not for production!! This is a quick and dirty hack to test the KASLR
>> code, and is not suitable for anything else.
>>
>> FS0:\> rngdxe
>> FS0:\> rngtest
>> UEFI RNG Protocol Testing :
>> ----------------------------
>>  -- Locate UEFI RNG Protocol : [Pass]
>>  -- Call RNG->GetInfo() interface :
>>      >> Supported RNG Algorithm (Count = 2) :
>>           0) 44F0DE6E-4D8C-4045-A8C7-4DD168856B9E
>>           1) E43176D7-B6E8-4827-B784-7FFDC4B68561
>>  -- Call RNG->GetRNG() interface :
>>      >> RNG with default algorithm : [Pass]
>>      >> RNG with SP800-90-HMAC-256 : [Fail - Status = Unsupported]
>>      >> RNG with SP800-90-Hash-256 : [Fail - Status = Unsupported]
>>      >> RNG with SP800-90-CTR-256 : [Pass]
>>      >> RNG with X9.31-3DES : [Fail - Status = Unsupported]
>>      >> RNG with X9.31-AES : [Fail - Status = Unsupported]
>>      >> RNG with RAW Entropy : [Pass]
>>  -- Random Number Generation Test with default RNG Algorithm (20 Rounds):
>>           01) - 27
>>           02) - 61E8
>>           03) - 496FD8
>>           04) - DDD793BF
>>           05) - B6C37C8E23
>>           06) - 4D183C604A96
>>           07) - 9363311DB61298
>>           08) - 5715A7294F4E436E
>>           09) - F0D4D7BAA0DD52318E
>>           10) - C88C6EBCF4C0474D87C3
>>           11) - B5594602B482A643932172
>>           12) - CA7573F704B2089B726B9CF1
>>           13) - A93E9451CB533DCFBA87B97C33
>>           14) - 45AA7B83DB6044F7BBAB031F0D24
>>           15) - 3DD7A4D61F34ADCB400B5976730DCF
>>           16) - 4DD168D21FAB8F59708330D6A9BEB021
>>           17) - 4BBB225E61C465F174254159467E65939F
>>           18) - 030A156C9616337A20070941E702827DA8E1
>>           19) - AB0FC11C9A4E225011382A9D164D9D55CA2B64
>>           20) - 72B9B4735DC445E5DA6AF88DE965B7E87CB9A23C
>
>
>
> --
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 05/21] arm64: kvm: deal with kernel symbols outside of linear mapping
  2016-01-11 13:18   ` Ard Biesheuvel
  (?)
@ 2016-01-12 12:36     ` Mark Rutland
  -1 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-12 12:36 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	leif.lindholm, keescook, linux-kernel, stuart.yoder,
	bhupesh.sharma, arnd, marc.zyngier, christoffer.dall

On Mon, Jan 11, 2016 at 02:18:58PM +0100, Ard Biesheuvel wrote:
> KVM on arm64 uses a fixed offset between the linear mapping at EL1 and
> the HYP mapping at EL2. Before we can move the kernel virtual mapping
> out of the linear mapping, we have to make sure that references to kernel
> symbols that are accessed via the HYP mapping are translated to their
> linear equivalent.
> 
> To prevent inadvertent direct references from sneaking in later, change
> the type of all extern declarations to HYP kernel symbols to the opaque
> 'struct kvm_ksym', which does not decay to a pointer type like char arrays
> and function references. This is not bullet proof, but at least forces the
> user to take the address explicitly rather than referencing it directly.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

Cool feature!

> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 5e377101f919..e3865845d3e1 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -105,24 +105,29 @@
>  #ifndef __ASSEMBLY__
>  struct kvm;
>  struct kvm_vcpu;
> +struct kvm_ksym;

So that one doesn't have to trawl git logs, it might be worth a comment
as to the purpose of struct kvm_ksym (and thus why we never need to
actually define it).

Either way:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 05/21] arm64: kvm: deal with kernel symbols outside of linear mapping
@ 2016-01-12 12:36     ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-12 12:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 11, 2016 at 02:18:58PM +0100, Ard Biesheuvel wrote:
> KVM on arm64 uses a fixed offset between the linear mapping at EL1 and
> the HYP mapping at EL2. Before we can move the kernel virtual mapping
> out of the linear mapping, we have to make sure that references to kernel
> symbols that are accessed via the HYP mapping are translated to their
> linear equivalent.
> 
> To prevent inadvertent direct references from sneaking in later, change
> the type of all extern declarations to HYP kernel symbols to the opaque
> 'struct kvm_ksym', which does not decay to a pointer type like char arrays
> and function references. This is not bullet proof, but at least forces the
> user to take the address explicitly rather than referencing it directly.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

Cool feature!

> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 5e377101f919..e3865845d3e1 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -105,24 +105,29 @@
>  #ifndef __ASSEMBLY__
>  struct kvm;
>  struct kvm_vcpu;
> +struct kvm_ksym;

So that one doesn't have to trawl git logs, it might be worth a comment
as to the purpose of struct kvm_ksym (and thus why we never need to
actually define it).

Either way:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 05/21] arm64: kvm: deal with kernel symbols outside of linear mapping
@ 2016-01-12 12:36     ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-12 12:36 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	leif.lindholm, keescook, linux-kernel, stuart.yoder,
	bhupesh.sharma, arnd, marc.zyngier, christoffer.dall

On Mon, Jan 11, 2016 at 02:18:58PM +0100, Ard Biesheuvel wrote:
> KVM on arm64 uses a fixed offset between the linear mapping at EL1 and
> the HYP mapping at EL2. Before we can move the kernel virtual mapping
> out of the linear mapping, we have to make sure that references to kernel
> symbols that are accessed via the HYP mapping are translated to their
> linear equivalent.
> 
> To prevent inadvertent direct references from sneaking in later, change
> the type of all extern declarations to HYP kernel symbols to the opaque
> 'struct kvm_ksym', which does not decay to a pointer type like char arrays
> and function references. This is not bullet proof, but at least forces the
> user to take the address explicitly rather than referencing it directly.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

Cool feature!

> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 5e377101f919..e3865845d3e1 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -105,24 +105,29 @@
>  #ifndef __ASSEMBLY__
>  struct kvm;
>  struct kvm_vcpu;
> +struct kvm_ksym;

So that one doesn't have to trawl git logs, it might be worth a comment
as to the purpose of struct kvm_ksym (and thus why we never need to
actually define it).

Either way:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 05/21] arm64: kvm: deal with kernel symbols outside of linear mapping
  2016-01-12 12:36     ` Mark Rutland
  (?)
@ 2016-01-12 13:23       ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-12 13:23 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 12 January 2016 at 13:36, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Jan 11, 2016 at 02:18:58PM +0100, Ard Biesheuvel wrote:
>> KVM on arm64 uses a fixed offset between the linear mapping at EL1 and
>> the HYP mapping at EL2. Before we can move the kernel virtual mapping
>> out of the linear mapping, we have to make sure that references to kernel
>> symbols that are accessed via the HYP mapping are translated to their
>> linear equivalent.
>>
>> To prevent inadvertent direct references from sneaking in later, change
>> the type of all extern declarations to HYP kernel symbols to the opaque
>> 'struct kvm_ksym', which does not decay to a pointer type like char arrays
>> and function references. This is not bullet proof, but at least forces the
>> user to take the address explicitly rather than referencing it directly.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>
> Cool feature!
>
>> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
>> index 5e377101f919..e3865845d3e1 100644
>> --- a/arch/arm64/include/asm/kvm_asm.h
>> +++ b/arch/arm64/include/asm/kvm_asm.h
>> @@ -105,24 +105,29 @@
>>  #ifndef __ASSEMBLY__
>>  struct kvm;
>>  struct kvm_vcpu;
>> +struct kvm_ksym;
>
> So that one doesn't have to trawl git logs, it might be worth a comment
> as to the purpose of struct kvm_ksym (and thus why we never need to
> actually define it).
>

Yes, I can add something

> Either way:
>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
>

Thanks

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 05/21] arm64: kvm: deal with kernel symbols outside of linear mapping
@ 2016-01-12 13:23       ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-12 13:23 UTC (permalink / raw)
  To: linux-arm-kernel

On 12 January 2016 at 13:36, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Jan 11, 2016 at 02:18:58PM +0100, Ard Biesheuvel wrote:
>> KVM on arm64 uses a fixed offset between the linear mapping at EL1 and
>> the HYP mapping at EL2. Before we can move the kernel virtual mapping
>> out of the linear mapping, we have to make sure that references to kernel
>> symbols that are accessed via the HYP mapping are translated to their
>> linear equivalent.
>>
>> To prevent inadvertent direct references from sneaking in later, change
>> the type of all extern declarations to HYP kernel symbols to the opaque
>> 'struct kvm_ksym', which does not decay to a pointer type like char arrays
>> and function references. This is not bullet proof, but at least forces the
>> user to take the address explicitly rather than referencing it directly.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>
> Cool feature!
>
>> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
>> index 5e377101f919..e3865845d3e1 100644
>> --- a/arch/arm64/include/asm/kvm_asm.h
>> +++ b/arch/arm64/include/asm/kvm_asm.h
>> @@ -105,24 +105,29 @@
>>  #ifndef __ASSEMBLY__
>>  struct kvm;
>>  struct kvm_vcpu;
>> +struct kvm_ksym;
>
> So that one doesn't have to trawl git logs, it might be worth a comment
> as to the purpose of struct kvm_ksym (and thus why we never need to
> actually define it).
>

Yes, I can add something

> Either way:
>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
>

Thanks

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 05/21] arm64: kvm: deal with kernel symbols outside of linear mapping
@ 2016-01-12 13:23       ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-12 13:23 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 12 January 2016 at 13:36, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Jan 11, 2016 at 02:18:58PM +0100, Ard Biesheuvel wrote:
>> KVM on arm64 uses a fixed offset between the linear mapping at EL1 and
>> the HYP mapping at EL2. Before we can move the kernel virtual mapping
>> out of the linear mapping, we have to make sure that references to kernel
>> symbols that are accessed via the HYP mapping are translated to their
>> linear equivalent.
>>
>> To prevent inadvertent direct references from sneaking in later, change
>> the type of all extern declarations to HYP kernel symbols to the opaque
>> 'struct kvm_ksym', which does not decay to a pointer type like char arrays
>> and function references. This is not bullet proof, but at least forces the
>> user to take the address explicitly rather than referencing it directly.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>
> Cool feature!
>
>> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
>> index 5e377101f919..e3865845d3e1 100644
>> --- a/arch/arm64/include/asm/kvm_asm.h
>> +++ b/arch/arm64/include/asm/kvm_asm.h
>> @@ -105,24 +105,29 @@
>>  #ifndef __ASSEMBLY__
>>  struct kvm;
>>  struct kvm_vcpu;
>> +struct kvm_ksym;
>
> So that one doesn't have to trawl git logs, it might be worth a comment
> as to the purpose of struct kvm_ksym (and thus why we never need to
> actually define it).
>

Yes, I can add something

> Either way:
>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
>

Thanks

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 03/21] arm64: pgtable: add dummy pud_index() and pmd_index() definitions
  2016-01-11 17:40     ` Mark Rutland
  (?)
@ 2016-01-12 17:25       ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-12 17:25 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 11 January 2016 at 18:40, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Jan 11, 2016 at 02:18:56PM +0100, Ard Biesheuvel wrote:
>> Add definitions of pud_index() and pmd_index() for configurations with
>> fewer than 4 resp. 3 translation levels. This makes it easier to keep
>> the users (e.g., the fixmap init code) generic.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/include/asm/pgtable.h | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index fe9bf47db5d3..6129f6755081 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -495,6 +495,7 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
>>  #else
>>
>>  #define pud_page_paddr(pud)  ({ BUILD_BUG(); 0; })
>> +#define pmd_index(addr)              ({ BUILD_BUG(); 0; })
>>
>>  /* Match pmd_offset folding in <asm/generic/pgtable-nopmd.h> */
>>  #define pmd_set_fixmap(addr)         NULL
>> @@ -542,6 +543,7 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
>>  #else
>>
>>  #define pgd_page_paddr(pgd)  ({ BUILD_BUG(); 0;})
>> +#define pud_index(addr)              ({ BUILD_BUG(); 0;})
>
> I think we don't need these if we use p??_ofset_kimg for the fixmap
> initialisation.
>
> Regardless, these look good conceptually, so if they're useful
> elsewhere:
>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
>

Thanks, but this can indeed be dropped after the proposed changes have
been made to the fixmap init code.

-- 
Ard.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 03/21] arm64: pgtable: add dummy pud_index() and pmd_index() definitions
@ 2016-01-12 17:25       ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-12 17:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 11 January 2016 at 18:40, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Jan 11, 2016 at 02:18:56PM +0100, Ard Biesheuvel wrote:
>> Add definitions of pud_index() and pmd_index() for configurations with
>> fewer than 4 resp. 3 translation levels. This makes it easier to keep
>> the users (e.g., the fixmap init code) generic.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/include/asm/pgtable.h | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index fe9bf47db5d3..6129f6755081 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -495,6 +495,7 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
>>  #else
>>
>>  #define pud_page_paddr(pud)  ({ BUILD_BUG(); 0; })
>> +#define pmd_index(addr)              ({ BUILD_BUG(); 0; })
>>
>>  /* Match pmd_offset folding in <asm/generic/pgtable-nopmd.h> */
>>  #define pmd_set_fixmap(addr)         NULL
>> @@ -542,6 +543,7 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
>>  #else
>>
>>  #define pgd_page_paddr(pgd)  ({ BUILD_BUG(); 0;})
>> +#define pud_index(addr)              ({ BUILD_BUG(); 0;})
>
> I think we don't need these if we use p??_ofset_kimg for the fixmap
> initialisation.
>
> Regardless, these look good conceptually, so if they're useful
> elsewhere:
>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
>

Thanks, but this can indeed be dropped after the proposed changes have
been made to the fixmap init code.

-- 
Ard.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 03/21] arm64: pgtable: add dummy pud_index() and pmd_index() definitions
@ 2016-01-12 17:25       ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-12 17:25 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 11 January 2016 at 18:40, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Jan 11, 2016 at 02:18:56PM +0100, Ard Biesheuvel wrote:
>> Add definitions of pud_index() and pmd_index() for configurations with
>> fewer than 4 resp. 3 translation levels. This makes it easier to keep
>> the users (e.g., the fixmap init code) generic.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/include/asm/pgtable.h | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index fe9bf47db5d3..6129f6755081 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -495,6 +495,7 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
>>  #else
>>
>>  #define pud_page_paddr(pud)  ({ BUILD_BUG(); 0; })
>> +#define pmd_index(addr)              ({ BUILD_BUG(); 0; })
>>
>>  /* Match pmd_offset folding in <asm/generic/pgtable-nopmd.h> */
>>  #define pmd_set_fixmap(addr)         NULL
>> @@ -542,6 +543,7 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
>>  #else
>>
>>  #define pgd_page_paddr(pgd)  ({ BUILD_BUG(); 0;})
>> +#define pud_index(addr)              ({ BUILD_BUG(); 0;})
>
> I think we don't need these if we use p??_ofset_kimg for the fixmap
> initialisation.
>
> Regardless, these look good conceptually, so if they're useful
> elsewhere:
>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
>

Thanks, but this can indeed be dropped after the proposed changes have
been made to the fixmap init code.

-- 
Ard.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
  2016-01-11 13:19   ` Ard Biesheuvel
  (?)
@ 2016-01-12 18:14     ` Mark Rutland
  -1 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-12 18:14 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	leif.lindholm, keescook, linux-kernel, stuart.yoder,
	bhupesh.sharma, arnd, marc.zyngier, christoffer.dall

On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
> This moves the module area to right before the vmalloc area, and
> moves the kernel image to the base of the vmalloc area. This is
> an intermediate step towards implementing kASLR, where the kernel
> image can be located anywhere in the vmalloc area.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/include/asm/kasan.h          | 20 ++++---
>  arch/arm64/include/asm/kernel-pgtable.h |  5 +-
>  arch/arm64/include/asm/memory.h         | 18 ++++--
>  arch/arm64/include/asm/pgtable.h        |  7 ---
>  arch/arm64/kernel/setup.c               | 12 ++++
>  arch/arm64/mm/dump.c                    | 12 ++--
>  arch/arm64/mm/init.c                    | 20 +++----
>  arch/arm64/mm/kasan_init.c              | 21 +++++--
>  arch/arm64/mm/mmu.c                     | 62 ++++++++++++++------
>  9 files changed, 118 insertions(+), 59 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
> index de0d21211c34..2c583dbf4746 100644
> --- a/arch/arm64/include/asm/kasan.h
> +++ b/arch/arm64/include/asm/kasan.h
> @@ -1,20 +1,16 @@
>  #ifndef __ASM_KASAN_H
>  #define __ASM_KASAN_H
>  
> -#ifndef __ASSEMBLY__
> -
>  #ifdef CONFIG_KASAN
>  
>  #include <linux/linkage.h>
> -#include <asm/memory.h>
> -#include <asm/pgtable-types.h>
>  
>  /*
>   * KASAN_SHADOW_START: beginning of the kernel virtual addresses.
>   * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
>   */
> -#define KASAN_SHADOW_START      (VA_START)
> -#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))
> +#define KASAN_SHADOW_START	(VA_START)
> +#define KASAN_SHADOW_END	(KASAN_SHADOW_START + (_AC(1, UL) << (VA_BITS - 3)))
>  
>  /*
>   * This value is used to map an address to the corresponding shadow
> @@ -26,16 +22,22 @@
>   * should satisfy the following equation:
>   *      KASAN_SHADOW_OFFSET = KASAN_SHADOW_END - (1ULL << 61)
>   */
> -#define KASAN_SHADOW_OFFSET     (KASAN_SHADOW_END - (1ULL << (64 - 3)))
> +#define KASAN_SHADOW_OFFSET	(KASAN_SHADOW_END - (_AC(1, ULL) << (64 - 3)))
> +

I couldn't immediately spot where KASAN_SHADOW_* were used in assembly.
I guess there's some other definition built atop of them that I've
missed.

Where should I be looking?

> +#ifndef __ASSEMBLY__
> +#include <asm/pgtable-types.h>
>  
>  void kasan_init(void);
>  void kasan_copy_shadow(pgd_t *pgdir);
>  asmlinkage void kasan_early_init(void);
> +#endif
>  
>  #else
> +
> +#ifndef __ASSEMBLY__
>  static inline void kasan_init(void) { }
>  static inline void kasan_copy_shadow(pgd_t *pgdir) { }
>  #endif
>  
> -#endif
> -#endif
> +#endif /* CONFIG_KASAN */
> +#endif /* __ASM_KASAN_H */
> diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
> index a459714ee29e..daa8a7b9917a 100644
> --- a/arch/arm64/include/asm/kernel-pgtable.h
> +++ b/arch/arm64/include/asm/kernel-pgtable.h
> @@ -70,8 +70,9 @@
>  /*
>   * Initial memory map attributes.
>   */
> -#define SWAPPER_PTE_FLAGS	(PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
> -#define SWAPPER_PMD_FLAGS	(PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
> +#define SWAPPER_PTE_FLAGS	(PTE_TYPE_PAGE | PTE_AF | PTE_SHARED | PTE_UXN)
> +#define SWAPPER_PMD_FLAGS	(PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S | \
> +				 PMD_SECT_UXN)

This will only affect the tables created in head.S. Before we start
userspace we'll have switched over to a new set of tables using
PAGE_KERNEL (including UXN).

Given that, this doesn't look necessary for the vmalloc area changes. Am
I missing something?

>  #if ARM64_SWAPPER_USES_SECTION_MAPS
>  #define SWAPPER_MM_MMUFLAGS	(PMD_ATTRINDX(MT_NORMAL) | SWAPPER_PMD_FLAGS)
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index bea9631b34a8..e45d3141ad98 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -51,14 +51,24 @@
>  #define VA_BITS			(CONFIG_ARM64_VA_BITS)
>  #define VA_START		(UL(0xffffffffffffffff) << VA_BITS)
>  #define PAGE_OFFSET		(UL(0xffffffffffffffff) << (VA_BITS - 1))
> -#define KIMAGE_VADDR		(PAGE_OFFSET)
> -#define MODULES_END		(KIMAGE_VADDR)
> -#define MODULES_VADDR		(MODULES_END - SZ_64M)
> -#define PCI_IO_END		(MODULES_VADDR - SZ_2M)
> +#define PCI_IO_END		(PAGE_OFFSET - SZ_2M)
>  #define PCI_IO_START		(PCI_IO_END - PCI_IO_SIZE)
>  #define FIXADDR_TOP		(PCI_IO_START - SZ_2M)
>  #define TASK_SIZE_64		(UL(1) << VA_BITS)
>  
> +#ifndef CONFIG_KASAN
> +#define MODULES_VADDR		(VA_START)
> +#else
> +#include <asm/kasan.h>
> +#define MODULES_VADDR		(KASAN_SHADOW_END)
> +#endif
> +
> +#define MODULES_VSIZE		(SZ_64M)
> +#define MODULES_END		(MODULES_VADDR + MODULES_VSIZE)
> +
> +#define KIMAGE_VADDR		(MODULES_END)
> +#define VMALLOC_START		(MODULES_END)
> +
>  #ifdef CONFIG_COMPAT
>  #define TASK_SIZE_32		UL(0x100000000)
>  #define TASK_SIZE		(test_thread_flag(TIF_32BIT) ? \
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 7b4e16068c9f..a910a44d7ab3 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -42,13 +42,6 @@
>   */
>  #define VMEMMAP_SIZE		ALIGN((1UL << (VA_BITS - PAGE_SHIFT)) * sizeof(struct page), PUD_SIZE)
>  
> -#ifndef CONFIG_KASAN
> -#define VMALLOC_START		(VA_START)
> -#else
> -#include <asm/kasan.h>
> -#define VMALLOC_START		(KASAN_SHADOW_END + SZ_64K)
> -#endif
> -
>  #define VMALLOC_END		(PAGE_OFFSET - PUD_SIZE - VMEMMAP_SIZE - SZ_64K)

It's a shame VMALLOC_START and VMALLOC_END are now in different headers.
It would be nice if we could keep them together.

As VMEMMAP_SIZE depends on sizeof(struct page), it's not just a simple
move. We could either place that in the !__ASSEMBLY__ portion of
memory.h, or we could add S_PAGE to asm-offsets.

If that's too painful now, we can leave that for subsequent cleanup;
there's other stuff in that area I'd like to unify at some point (e.g.
the mem_init and dump.c section boundary descriptions).

>  
>  #define vmemmap			((struct page *)(VMALLOC_END + SZ_64K))
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index cfed56f0ad26..c67ba4453ec6 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -53,6 +53,7 @@
>  #include <asm/cpufeature.h>
>  #include <asm/cpu_ops.h>
>  #include <asm/kasan.h>
> +#include <asm/kernel-pgtable.h>
>  #include <asm/sections.h>
>  #include <asm/setup.h>
>  #include <asm/smp_plat.h>
> @@ -291,6 +292,17 @@ u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
>  
>  void __init setup_arch(char **cmdline_p)
>  {
> +	static struct vm_struct vmlinux_vm;
> +
> +	vmlinux_vm.addr		= (void *)KIMAGE_VADDR;
> +	vmlinux_vm.size		= round_up((u64)_end - KIMAGE_VADDR,
> +					   SWAPPER_BLOCK_SIZE);

With the fine grained tables we should only need to round up to
PAGE_SIZE (though _end is implicitly page-aligned anyway). Given that,
is the SWAPPER_BLOCK_SIZE rounding necessary? 

> +	vmlinux_vm.phys_addr	= __pa(KIMAGE_VADDR);
> +	vmlinux_vm.flags	= VM_MAP;

I was going to say we should set VM_KASAN also per its description in
include/vmalloc.h, though per its uses its not clear if it will ever
matter.

> +	vmlinux_vm.caller	= setup_arch;
> +
> +	vm_area_add_early(&vmlinux_vm);

Do we need to register the kernel VA range quite this early, or could we
do this around paging_init/map_kernel time?

> +
>  	pr_info("Boot CPU: AArch64 Processor [%08x]\n", read_cpuid_id());
>  
>  	sprintf(init_utsname()->machine, ELF_PLATFORM);
> diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
> index 5a22a119a74c..e83ffb00560c 100644
> --- a/arch/arm64/mm/dump.c
> +++ b/arch/arm64/mm/dump.c
> @@ -35,7 +35,9 @@ struct addr_marker {
>  };
>  
>  enum address_markers_idx {
> -	VMALLOC_START_NR = 0,
> +	MODULES_START_NR = 0,
> +	MODULES_END_NR,
> +	VMALLOC_START_NR,
>  	VMALLOC_END_NR,
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  	VMEMMAP_START_NR,
> @@ -45,12 +47,12 @@ enum address_markers_idx {
>  	FIXADDR_END_NR,
>  	PCI_START_NR,
>  	PCI_END_NR,
> -	MODULES_START_NR,
> -	MODUELS_END_NR,
>  	KERNEL_SPACE_NR,
>  };
>  
>  static struct addr_marker address_markers[] = {
> +	{ MODULES_VADDR,	"Modules start" },
> +	{ MODULES_END,		"Modules end" },
>  	{ VMALLOC_START,	"vmalloc() Area" },
>  	{ VMALLOC_END,		"vmalloc() End" },
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
> @@ -61,9 +63,7 @@ static struct addr_marker address_markers[] = {
>  	{ FIXADDR_TOP,		"Fixmap end" },
>  	{ PCI_IO_START,		"PCI I/O start" },
>  	{ PCI_IO_END,		"PCI I/O end" },
> -	{ MODULES_VADDR,	"Modules start" },
> -	{ MODULES_END,		"Modules end" },
> -	{ PAGE_OFFSET,		"Kernel Mapping" },
> +	{ PAGE_OFFSET,		"Linear Mapping" },
>  	{ -1,			NULL },
>  };
>  
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index f3b061e67bfe..baa923bda651 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -302,22 +302,26 @@ void __init mem_init(void)
>  #ifdef CONFIG_KASAN
>  		  "    kasan   : 0x%16lx - 0x%16lx   (%6ld GB)\n"
>  #endif
> +		  "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>  		  "    vmalloc : 0x%16lx - 0x%16lx   (%6ld GB)\n"
> +		  "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
> +		  "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
> +		  "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  		  "    vmemmap : 0x%16lx - 0x%16lx   (%6ld GB maximum)\n"
>  		  "              0x%16lx - 0x%16lx   (%6ld MB actual)\n"
>  #endif
>  		  "    fixed   : 0x%16lx - 0x%16lx   (%6ld KB)\n"
>  		  "    PCI I/O : 0x%16lx - 0x%16lx   (%6ld MB)\n"
> -		  "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
> -		  "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n"
> -		  "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
> -		  "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
> -		  "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n",
> +		  "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n",
>  #ifdef CONFIG_KASAN
>  		  MLG(KASAN_SHADOW_START, KASAN_SHADOW_END),
>  #endif
> +		  MLM(MODULES_VADDR, MODULES_END),
>  		  MLG(VMALLOC_START, VMALLOC_END),
> +		  MLK_ROUNDUP(__init_begin, __init_end),
> +		  MLK_ROUNDUP(_text, _etext),
> +		  MLK_ROUNDUP(_sdata, _edata),
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  		  MLG((unsigned long)vmemmap,
>  		      (unsigned long)vmemmap + VMEMMAP_SIZE),
> @@ -326,11 +330,7 @@ void __init mem_init(void)
>  #endif
>  		  MLK(FIXADDR_START, FIXADDR_TOP),
>  		  MLM(PCI_IO_START, PCI_IO_END),
> -		  MLM(MODULES_VADDR, MODULES_END),
> -		  MLM(PAGE_OFFSET, (unsigned long)high_memory),
> -		  MLK_ROUNDUP(__init_begin, __init_end),
> -		  MLK_ROUNDUP(_text, _etext),
> -		  MLK_ROUNDUP(_sdata, _edata));
> +		  MLM(PAGE_OFFSET, (unsigned long)high_memory));
>  
>  #undef MLK
>  #undef MLM
> diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
> index 0ca411fc5ea3..acdd1ac166ec 100644
> --- a/arch/arm64/mm/kasan_init.c
> +++ b/arch/arm64/mm/kasan_init.c
> @@ -17,9 +17,11 @@
>  #include <linux/start_kernel.h>
>  
>  #include <asm/mmu_context.h>
> +#include <asm/kernel-pgtable.h>
>  #include <asm/page.h>
>  #include <asm/pgalloc.h>
>  #include <asm/pgtable.h>
> +#include <asm/sections.h>
>  #include <asm/tlbflush.h>
>  
>  static pgd_t tmp_pg_dir[PTRS_PER_PGD] __initdata __aligned(PGD_SIZE);
> @@ -33,7 +35,7 @@ static void __init kasan_early_pte_populate(pmd_t *pmd, unsigned long addr,
>  	if (pmd_none(*pmd))
>  		pmd_populate_kernel(&init_mm, pmd, kasan_zero_pte);
>  
> -	pte = pte_offset_kernel(pmd, addr);
> +	pte = pte_offset_kimg(pmd, addr);
>  	do {
>  		next = addr + PAGE_SIZE;
>  		set_pte(pte, pfn_pte(virt_to_pfn(kasan_zero_page),
> @@ -51,7 +53,7 @@ static void __init kasan_early_pmd_populate(pud_t *pud,
>  	if (pud_none(*pud))
>  		pud_populate(&init_mm, pud, kasan_zero_pmd);
>  
> -	pmd = pmd_offset(pud, addr);
> +	pmd = pmd_offset_kimg(pud, addr);
>  	do {
>  		next = pmd_addr_end(addr, end);
>  		kasan_early_pte_populate(pmd, addr, next);
> @@ -68,7 +70,7 @@ static void __init kasan_early_pud_populate(pgd_t *pgd,
>  	if (pgd_none(*pgd))
>  		pgd_populate(&init_mm, pgd, kasan_zero_pud);
>  
> -	pud = pud_offset(pgd, addr);
> +	pud = pud_offset_kimg(pgd, addr);
>  	do {
>  		next = pud_addr_end(addr, end);
>  		kasan_early_pmd_populate(pud, addr, next);
> @@ -126,8 +128,14 @@ static void __init clear_pgds(unsigned long start,
>  
>  void __init kasan_init(void)
>  {
> +	u64 kimg_shadow_start, kimg_shadow_end;
>  	struct memblock_region *reg;
>  
> +	kimg_shadow_start = round_down((u64)kasan_mem_to_shadow(_text),
> +				       SWAPPER_BLOCK_SIZE);
> +	kimg_shadow_end = round_up((u64)kasan_mem_to_shadow(_end),
> +				   SWAPPER_BLOCK_SIZE);

This rounding looks suspect to me, given it's applied to the shadow
addresses rather than the kimage addresses. That's roughly equivalent to
kasan_mem_to_shadow(round_up(_end, 8 * SWAPPER_BLOCK_SIZE).

I don't think we need any rounding for the kimage addresses. The image
end is page-granular (and the fine-grained mapping will reflect that).
Any accesses between _end and roud_up(_end, SWAPPER_BLOCK_SIZE) would be
bugs (and would most likely fault) regardless of KASAN.

Or am I just being thick here?

> +
>  	/*
>  	 * We are going to perform proper setup of shadow memory.
>  	 * At first we should unmap early shadow (clear_pgds() call bellow).
> @@ -141,8 +149,13 @@ void __init kasan_init(void)
>  
>  	clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
>  
> +	vmemmap_populate(kimg_shadow_start, kimg_shadow_end,
> +			 pfn_to_nid(virt_to_pfn(kimg_shadow_start)));

That virt_to_pfn doesn't look right -- kimg_shadow_start is neither a
linear address nor an image address. As pfn_to_nid is hard-coded to 0
for !NUMA this happens to be ok for us for the moment.

I think we should follow the x86 KASAN code and use NUMA_NO_NODE for
this for now.

> +
>  	kasan_populate_zero_shadow((void *)KASAN_SHADOW_START,
> -			kasan_mem_to_shadow((void *)MODULES_VADDR));
> +				   (void *)kimg_shadow_start);
> +	kasan_populate_zero_shadow((void *)kimg_shadow_end,
> +				   kasan_mem_to_shadow((void *)PAGE_OFFSET));
>  
>  	for_each_memblock(memory, reg) {
>  		void *start = (void *)__phys_to_virt(reg->base);
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 75b5f0dc3bdc..0b28f1469f9b 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -53,6 +53,10 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
>  unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
>  EXPORT_SYMBOL(empty_zero_page);
>  
> +static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> +static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
> +static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
> +
>  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>  			      unsigned long size, pgprot_t vma_prot)
>  {
> @@ -349,14 +353,14 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>  {
>  
>  	unsigned long kernel_start = __pa(_stext);
> -	unsigned long kernel_end = __pa(_end);
> +	unsigned long kernel_end = __pa(_etext);
>  
>  	/*
> -	 * The kernel itself is mapped at page granularity. Map all other
> -	 * memory, making sure we don't overwrite the existing kernel mappings.
> +	 * Take care not to create a writable alias for the
> +	 * read-only text and rodata sections of the kernel image.
>  	 */
>  
> -	/* No overlap with the kernel. */
> +	/* No overlap with the kernel text */
>  	if (end < kernel_start || start >= kernel_end) {
>  		__create_pgd_mapping(pgd, start, __phys_to_virt(start),
>  				     end - start, PAGE_KERNEL,
> @@ -365,7 +369,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>  	}
>  
>  	/*
> -	 * This block overlaps the kernel mapping. Map the portion(s) which
> +	 * This block overlaps the kernel text mapping. Map the portion(s) which
>  	 * don't overlap.
>  	 */
>  	if (start < kernel_start)
> @@ -438,12 +442,29 @@ static void __init map_kernel(pgd_t *pgd)
>  	map_kernel_chunk(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC);
>  	map_kernel_chunk(pgd, _data, _end, PAGE_KERNEL);
>  
> -	/*
> -	 * The fixmap falls in a separate pgd to the kernel, and doesn't live
> -	 * in the carveout for the swapper_pg_dir. We can simply re-use the
> -	 * existing dir for the fixmap.
> -	 */
> -	set_pgd(pgd_offset_raw(pgd, FIXADDR_START), *pgd_offset_k(FIXADDR_START));
> +	if (pgd_index(FIXADDR_START) != pgd_index((u64)_end)) {

To match the style of early_fixmap_init, and given we already mapped the
kernel image, this could be:

	if (pgd_none(pgd_offset_raw(pgd, FIXADDR_START))) {

Which also serves as a run-time check that the pgd entry really was
clear.

Other than that, this looks good to me!

Thanks,
Mark.

> +		/*
> +		 * The fixmap falls in a separate pgd to the kernel, and doesn't
> +		 * live in the carveout for the swapper_pg_dir. We can simply
> +		 * re-use the existing dir for the fixmap.
> +		 */
> +		set_pgd(pgd_offset_raw(pgd, FIXADDR_START),
> +			*pgd_offset_k(FIXADDR_START));
> +	} else if (CONFIG_PGTABLE_LEVELS > 3) {
> +		/*
> +		 * The fixmap shares its top level pgd entry with the kernel
> +		 * mapping. This can really only occur when we are running
> +		 * with 16k/4 levels, so we can simply reuse the pud level
> +		 * entry instead.
> +		 */
> +		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
> +
> +		set_pud(pud_set_fixmap_offset(pgd, FIXADDR_START),
> +			__pud(__pa(bm_pmd) | PUD_TYPE_TABLE));
> +		pud_clear_fixmap();
> +	} else {
> +		BUG();
> +	}
>  
>  	kasan_copy_shadow(pgd);
>  }
> @@ -569,10 +590,6 @@ void vmemmap_free(unsigned long start, unsigned long end)
>  }
>  #endif	/* CONFIG_SPARSEMEM_VMEMMAP */
>  
> -static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> -static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
> -static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
> -
>  static inline pud_t * fixmap_pud(unsigned long addr)
>  {
>  	return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
> @@ -598,8 +615,19 @@ void __init early_fixmap_init(void)
>  	unsigned long addr = FIXADDR_START;
>  
>  	pgd = pgd_offset_k(addr);
> -	pgd_populate(&init_mm, pgd, bm_pud);
> -	pud = fixmap_pud(addr);
> +	if (CONFIG_PGTABLE_LEVELS > 3 && !pgd_none(*pgd)) {
> +		/*
> +		 * We only end up here if the kernel mapping and the fixmap
> +		 * share the top level pgd entry, which should only happen on
> +		 * 16k/4 levels configurations.
> +		 */
> +		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
> +		pud = pud_offset_kimg(pgd, addr);
> +		memblock_free(__pa(bm_pud), sizeof(bm_pud));
> +	} else {
> +		pgd_populate(&init_mm, pgd, bm_pud);
> +		pud = fixmap_pud(addr);
> +	}
>  	pud_populate(&init_mm, pud, bm_pmd);
>  	pmd = fixmap_pmd(addr);
>  	pmd_populate_kernel(&init_mm, pmd, bm_pte);
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-12 18:14     ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-12 18:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
> This moves the module area to right before the vmalloc area, and
> moves the kernel image to the base of the vmalloc area. This is
> an intermediate step towards implementing kASLR, where the kernel
> image can be located anywhere in the vmalloc area.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/include/asm/kasan.h          | 20 ++++---
>  arch/arm64/include/asm/kernel-pgtable.h |  5 +-
>  arch/arm64/include/asm/memory.h         | 18 ++++--
>  arch/arm64/include/asm/pgtable.h        |  7 ---
>  arch/arm64/kernel/setup.c               | 12 ++++
>  arch/arm64/mm/dump.c                    | 12 ++--
>  arch/arm64/mm/init.c                    | 20 +++----
>  arch/arm64/mm/kasan_init.c              | 21 +++++--
>  arch/arm64/mm/mmu.c                     | 62 ++++++++++++++------
>  9 files changed, 118 insertions(+), 59 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
> index de0d21211c34..2c583dbf4746 100644
> --- a/arch/arm64/include/asm/kasan.h
> +++ b/arch/arm64/include/asm/kasan.h
> @@ -1,20 +1,16 @@
>  #ifndef __ASM_KASAN_H
>  #define __ASM_KASAN_H
>  
> -#ifndef __ASSEMBLY__
> -
>  #ifdef CONFIG_KASAN
>  
>  #include <linux/linkage.h>
> -#include <asm/memory.h>
> -#include <asm/pgtable-types.h>
>  
>  /*
>   * KASAN_SHADOW_START: beginning of the kernel virtual addresses.
>   * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
>   */
> -#define KASAN_SHADOW_START      (VA_START)
> -#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))
> +#define KASAN_SHADOW_START	(VA_START)
> +#define KASAN_SHADOW_END	(KASAN_SHADOW_START + (_AC(1, UL) << (VA_BITS - 3)))
>  
>  /*
>   * This value is used to map an address to the corresponding shadow
> @@ -26,16 +22,22 @@
>   * should satisfy the following equation:
>   *      KASAN_SHADOW_OFFSET = KASAN_SHADOW_END - (1ULL << 61)
>   */
> -#define KASAN_SHADOW_OFFSET     (KASAN_SHADOW_END - (1ULL << (64 - 3)))
> +#define KASAN_SHADOW_OFFSET	(KASAN_SHADOW_END - (_AC(1, ULL) << (64 - 3)))
> +

I couldn't immediately spot where KASAN_SHADOW_* were used in assembly.
I guess there's some other definition built atop of them that I've
missed.

Where should I be looking?

> +#ifndef __ASSEMBLY__
> +#include <asm/pgtable-types.h>
>  
>  void kasan_init(void);
>  void kasan_copy_shadow(pgd_t *pgdir);
>  asmlinkage void kasan_early_init(void);
> +#endif
>  
>  #else
> +
> +#ifndef __ASSEMBLY__
>  static inline void kasan_init(void) { }
>  static inline void kasan_copy_shadow(pgd_t *pgdir) { }
>  #endif
>  
> -#endif
> -#endif
> +#endif /* CONFIG_KASAN */
> +#endif /* __ASM_KASAN_H */
> diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
> index a459714ee29e..daa8a7b9917a 100644
> --- a/arch/arm64/include/asm/kernel-pgtable.h
> +++ b/arch/arm64/include/asm/kernel-pgtable.h
> @@ -70,8 +70,9 @@
>  /*
>   * Initial memory map attributes.
>   */
> -#define SWAPPER_PTE_FLAGS	(PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
> -#define SWAPPER_PMD_FLAGS	(PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
> +#define SWAPPER_PTE_FLAGS	(PTE_TYPE_PAGE | PTE_AF | PTE_SHARED | PTE_UXN)
> +#define SWAPPER_PMD_FLAGS	(PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S | \
> +				 PMD_SECT_UXN)

This will only affect the tables created in head.S. Before we start
userspace we'll have switched over to a new set of tables using
PAGE_KERNEL (including UXN).

Given that, this doesn't look necessary for the vmalloc area changes. Am
I missing something?

>  #if ARM64_SWAPPER_USES_SECTION_MAPS
>  #define SWAPPER_MM_MMUFLAGS	(PMD_ATTRINDX(MT_NORMAL) | SWAPPER_PMD_FLAGS)
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index bea9631b34a8..e45d3141ad98 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -51,14 +51,24 @@
>  #define VA_BITS			(CONFIG_ARM64_VA_BITS)
>  #define VA_START		(UL(0xffffffffffffffff) << VA_BITS)
>  #define PAGE_OFFSET		(UL(0xffffffffffffffff) << (VA_BITS - 1))
> -#define KIMAGE_VADDR		(PAGE_OFFSET)
> -#define MODULES_END		(KIMAGE_VADDR)
> -#define MODULES_VADDR		(MODULES_END - SZ_64M)
> -#define PCI_IO_END		(MODULES_VADDR - SZ_2M)
> +#define PCI_IO_END		(PAGE_OFFSET - SZ_2M)
>  #define PCI_IO_START		(PCI_IO_END - PCI_IO_SIZE)
>  #define FIXADDR_TOP		(PCI_IO_START - SZ_2M)
>  #define TASK_SIZE_64		(UL(1) << VA_BITS)
>  
> +#ifndef CONFIG_KASAN
> +#define MODULES_VADDR		(VA_START)
> +#else
> +#include <asm/kasan.h>
> +#define MODULES_VADDR		(KASAN_SHADOW_END)
> +#endif
> +
> +#define MODULES_VSIZE		(SZ_64M)
> +#define MODULES_END		(MODULES_VADDR + MODULES_VSIZE)
> +
> +#define KIMAGE_VADDR		(MODULES_END)
> +#define VMALLOC_START		(MODULES_END)
> +
>  #ifdef CONFIG_COMPAT
>  #define TASK_SIZE_32		UL(0x100000000)
>  #define TASK_SIZE		(test_thread_flag(TIF_32BIT) ? \
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 7b4e16068c9f..a910a44d7ab3 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -42,13 +42,6 @@
>   */
>  #define VMEMMAP_SIZE		ALIGN((1UL << (VA_BITS - PAGE_SHIFT)) * sizeof(struct page), PUD_SIZE)
>  
> -#ifndef CONFIG_KASAN
> -#define VMALLOC_START		(VA_START)
> -#else
> -#include <asm/kasan.h>
> -#define VMALLOC_START		(KASAN_SHADOW_END + SZ_64K)
> -#endif
> -
>  #define VMALLOC_END		(PAGE_OFFSET - PUD_SIZE - VMEMMAP_SIZE - SZ_64K)

It's a shame VMALLOC_START and VMALLOC_END are now in different headers.
It would be nice if we could keep them together.

As VMEMMAP_SIZE depends on sizeof(struct page), it's not just a simple
move. We could either place that in the !__ASSEMBLY__ portion of
memory.h, or we could add S_PAGE to asm-offsets.

If that's too painful now, we can leave that for subsequent cleanup;
there's other stuff in that area I'd like to unify at some point (e.g.
the mem_init and dump.c section boundary descriptions).

>  
>  #define vmemmap			((struct page *)(VMALLOC_END + SZ_64K))
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index cfed56f0ad26..c67ba4453ec6 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -53,6 +53,7 @@
>  #include <asm/cpufeature.h>
>  #include <asm/cpu_ops.h>
>  #include <asm/kasan.h>
> +#include <asm/kernel-pgtable.h>
>  #include <asm/sections.h>
>  #include <asm/setup.h>
>  #include <asm/smp_plat.h>
> @@ -291,6 +292,17 @@ u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
>  
>  void __init setup_arch(char **cmdline_p)
>  {
> +	static struct vm_struct vmlinux_vm;
> +
> +	vmlinux_vm.addr		= (void *)KIMAGE_VADDR;
> +	vmlinux_vm.size		= round_up((u64)_end - KIMAGE_VADDR,
> +					   SWAPPER_BLOCK_SIZE);

With the fine grained tables we should only need to round up to
PAGE_SIZE (though _end is implicitly page-aligned anyway). Given that,
is the SWAPPER_BLOCK_SIZE rounding necessary? 

> +	vmlinux_vm.phys_addr	= __pa(KIMAGE_VADDR);
> +	vmlinux_vm.flags	= VM_MAP;

I was going to say we should set VM_KASAN also per its description in
include/vmalloc.h, though per its uses its not clear if it will ever
matter.

> +	vmlinux_vm.caller	= setup_arch;
> +
> +	vm_area_add_early(&vmlinux_vm);

Do we need to register the kernel VA range quite this early, or could we
do this around paging_init/map_kernel time?

> +
>  	pr_info("Boot CPU: AArch64 Processor [%08x]\n", read_cpuid_id());
>  
>  	sprintf(init_utsname()->machine, ELF_PLATFORM);
> diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
> index 5a22a119a74c..e83ffb00560c 100644
> --- a/arch/arm64/mm/dump.c
> +++ b/arch/arm64/mm/dump.c
> @@ -35,7 +35,9 @@ struct addr_marker {
>  };
>  
>  enum address_markers_idx {
> -	VMALLOC_START_NR = 0,
> +	MODULES_START_NR = 0,
> +	MODULES_END_NR,
> +	VMALLOC_START_NR,
>  	VMALLOC_END_NR,
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  	VMEMMAP_START_NR,
> @@ -45,12 +47,12 @@ enum address_markers_idx {
>  	FIXADDR_END_NR,
>  	PCI_START_NR,
>  	PCI_END_NR,
> -	MODULES_START_NR,
> -	MODUELS_END_NR,
>  	KERNEL_SPACE_NR,
>  };
>  
>  static struct addr_marker address_markers[] = {
> +	{ MODULES_VADDR,	"Modules start" },
> +	{ MODULES_END,		"Modules end" },
>  	{ VMALLOC_START,	"vmalloc() Area" },
>  	{ VMALLOC_END,		"vmalloc() End" },
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
> @@ -61,9 +63,7 @@ static struct addr_marker address_markers[] = {
>  	{ FIXADDR_TOP,		"Fixmap end" },
>  	{ PCI_IO_START,		"PCI I/O start" },
>  	{ PCI_IO_END,		"PCI I/O end" },
> -	{ MODULES_VADDR,	"Modules start" },
> -	{ MODULES_END,		"Modules end" },
> -	{ PAGE_OFFSET,		"Kernel Mapping" },
> +	{ PAGE_OFFSET,		"Linear Mapping" },
>  	{ -1,			NULL },
>  };
>  
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index f3b061e67bfe..baa923bda651 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -302,22 +302,26 @@ void __init mem_init(void)
>  #ifdef CONFIG_KASAN
>  		  "    kasan   : 0x%16lx - 0x%16lx   (%6ld GB)\n"
>  #endif
> +		  "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>  		  "    vmalloc : 0x%16lx - 0x%16lx   (%6ld GB)\n"
> +		  "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
> +		  "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
> +		  "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  		  "    vmemmap : 0x%16lx - 0x%16lx   (%6ld GB maximum)\n"
>  		  "              0x%16lx - 0x%16lx   (%6ld MB actual)\n"
>  #endif
>  		  "    fixed   : 0x%16lx - 0x%16lx   (%6ld KB)\n"
>  		  "    PCI I/O : 0x%16lx - 0x%16lx   (%6ld MB)\n"
> -		  "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
> -		  "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n"
> -		  "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
> -		  "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
> -		  "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n",
> +		  "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n",
>  #ifdef CONFIG_KASAN
>  		  MLG(KASAN_SHADOW_START, KASAN_SHADOW_END),
>  #endif
> +		  MLM(MODULES_VADDR, MODULES_END),
>  		  MLG(VMALLOC_START, VMALLOC_END),
> +		  MLK_ROUNDUP(__init_begin, __init_end),
> +		  MLK_ROUNDUP(_text, _etext),
> +		  MLK_ROUNDUP(_sdata, _edata),
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  		  MLG((unsigned long)vmemmap,
>  		      (unsigned long)vmemmap + VMEMMAP_SIZE),
> @@ -326,11 +330,7 @@ void __init mem_init(void)
>  #endif
>  		  MLK(FIXADDR_START, FIXADDR_TOP),
>  		  MLM(PCI_IO_START, PCI_IO_END),
> -		  MLM(MODULES_VADDR, MODULES_END),
> -		  MLM(PAGE_OFFSET, (unsigned long)high_memory),
> -		  MLK_ROUNDUP(__init_begin, __init_end),
> -		  MLK_ROUNDUP(_text, _etext),
> -		  MLK_ROUNDUP(_sdata, _edata));
> +		  MLM(PAGE_OFFSET, (unsigned long)high_memory));
>  
>  #undef MLK
>  #undef MLM
> diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
> index 0ca411fc5ea3..acdd1ac166ec 100644
> --- a/arch/arm64/mm/kasan_init.c
> +++ b/arch/arm64/mm/kasan_init.c
> @@ -17,9 +17,11 @@
>  #include <linux/start_kernel.h>
>  
>  #include <asm/mmu_context.h>
> +#include <asm/kernel-pgtable.h>
>  #include <asm/page.h>
>  #include <asm/pgalloc.h>
>  #include <asm/pgtable.h>
> +#include <asm/sections.h>
>  #include <asm/tlbflush.h>
>  
>  static pgd_t tmp_pg_dir[PTRS_PER_PGD] __initdata __aligned(PGD_SIZE);
> @@ -33,7 +35,7 @@ static void __init kasan_early_pte_populate(pmd_t *pmd, unsigned long addr,
>  	if (pmd_none(*pmd))
>  		pmd_populate_kernel(&init_mm, pmd, kasan_zero_pte);
>  
> -	pte = pte_offset_kernel(pmd, addr);
> +	pte = pte_offset_kimg(pmd, addr);
>  	do {
>  		next = addr + PAGE_SIZE;
>  		set_pte(pte, pfn_pte(virt_to_pfn(kasan_zero_page),
> @@ -51,7 +53,7 @@ static void __init kasan_early_pmd_populate(pud_t *pud,
>  	if (pud_none(*pud))
>  		pud_populate(&init_mm, pud, kasan_zero_pmd);
>  
> -	pmd = pmd_offset(pud, addr);
> +	pmd = pmd_offset_kimg(pud, addr);
>  	do {
>  		next = pmd_addr_end(addr, end);
>  		kasan_early_pte_populate(pmd, addr, next);
> @@ -68,7 +70,7 @@ static void __init kasan_early_pud_populate(pgd_t *pgd,
>  	if (pgd_none(*pgd))
>  		pgd_populate(&init_mm, pgd, kasan_zero_pud);
>  
> -	pud = pud_offset(pgd, addr);
> +	pud = pud_offset_kimg(pgd, addr);
>  	do {
>  		next = pud_addr_end(addr, end);
>  		kasan_early_pmd_populate(pud, addr, next);
> @@ -126,8 +128,14 @@ static void __init clear_pgds(unsigned long start,
>  
>  void __init kasan_init(void)
>  {
> +	u64 kimg_shadow_start, kimg_shadow_end;
>  	struct memblock_region *reg;
>  
> +	kimg_shadow_start = round_down((u64)kasan_mem_to_shadow(_text),
> +				       SWAPPER_BLOCK_SIZE);
> +	kimg_shadow_end = round_up((u64)kasan_mem_to_shadow(_end),
> +				   SWAPPER_BLOCK_SIZE);

This rounding looks suspect to me, given it's applied to the shadow
addresses rather than the kimage addresses. That's roughly equivalent to
kasan_mem_to_shadow(round_up(_end, 8 * SWAPPER_BLOCK_SIZE).

I don't think we need any rounding for the kimage addresses. The image
end is page-granular (and the fine-grained mapping will reflect that).
Any accesses between _end and roud_up(_end, SWAPPER_BLOCK_SIZE) would be
bugs (and would most likely fault) regardless of KASAN.

Or am I just being thick here?

> +
>  	/*
>  	 * We are going to perform proper setup of shadow memory.
>  	 * At first we should unmap early shadow (clear_pgds() call bellow).
> @@ -141,8 +149,13 @@ void __init kasan_init(void)
>  
>  	clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
>  
> +	vmemmap_populate(kimg_shadow_start, kimg_shadow_end,
> +			 pfn_to_nid(virt_to_pfn(kimg_shadow_start)));

That virt_to_pfn doesn't look right -- kimg_shadow_start is neither a
linear address nor an image address. As pfn_to_nid is hard-coded to 0
for !NUMA this happens to be ok for us for the moment.

I think we should follow the x86 KASAN code and use NUMA_NO_NODE for
this for now.

> +
>  	kasan_populate_zero_shadow((void *)KASAN_SHADOW_START,
> -			kasan_mem_to_shadow((void *)MODULES_VADDR));
> +				   (void *)kimg_shadow_start);
> +	kasan_populate_zero_shadow((void *)kimg_shadow_end,
> +				   kasan_mem_to_shadow((void *)PAGE_OFFSET));
>  
>  	for_each_memblock(memory, reg) {
>  		void *start = (void *)__phys_to_virt(reg->base);
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 75b5f0dc3bdc..0b28f1469f9b 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -53,6 +53,10 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
>  unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
>  EXPORT_SYMBOL(empty_zero_page);
>  
> +static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> +static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
> +static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
> +
>  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>  			      unsigned long size, pgprot_t vma_prot)
>  {
> @@ -349,14 +353,14 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>  {
>  
>  	unsigned long kernel_start = __pa(_stext);
> -	unsigned long kernel_end = __pa(_end);
> +	unsigned long kernel_end = __pa(_etext);
>  
>  	/*
> -	 * The kernel itself is mapped at page granularity. Map all other
> -	 * memory, making sure we don't overwrite the existing kernel mappings.
> +	 * Take care not to create a writable alias for the
> +	 * read-only text and rodata sections of the kernel image.
>  	 */
>  
> -	/* No overlap with the kernel. */
> +	/* No overlap with the kernel text */
>  	if (end < kernel_start || start >= kernel_end) {
>  		__create_pgd_mapping(pgd, start, __phys_to_virt(start),
>  				     end - start, PAGE_KERNEL,
> @@ -365,7 +369,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>  	}
>  
>  	/*
> -	 * This block overlaps the kernel mapping. Map the portion(s) which
> +	 * This block overlaps the kernel text mapping. Map the portion(s) which
>  	 * don't overlap.
>  	 */
>  	if (start < kernel_start)
> @@ -438,12 +442,29 @@ static void __init map_kernel(pgd_t *pgd)
>  	map_kernel_chunk(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC);
>  	map_kernel_chunk(pgd, _data, _end, PAGE_KERNEL);
>  
> -	/*
> -	 * The fixmap falls in a separate pgd to the kernel, and doesn't live
> -	 * in the carveout for the swapper_pg_dir. We can simply re-use the
> -	 * existing dir for the fixmap.
> -	 */
> -	set_pgd(pgd_offset_raw(pgd, FIXADDR_START), *pgd_offset_k(FIXADDR_START));
> +	if (pgd_index(FIXADDR_START) != pgd_index((u64)_end)) {

To match the style of early_fixmap_init, and given we already mapped the
kernel image, this could be:

	if (pgd_none(pgd_offset_raw(pgd, FIXADDR_START))) {

Which also serves as a run-time check that the pgd entry really was
clear.

Other than that, this looks good to me!

Thanks,
Mark.

> +		/*
> +		 * The fixmap falls in a separate pgd to the kernel, and doesn't
> +		 * live in the carveout for the swapper_pg_dir. We can simply
> +		 * re-use the existing dir for the fixmap.
> +		 */
> +		set_pgd(pgd_offset_raw(pgd, FIXADDR_START),
> +			*pgd_offset_k(FIXADDR_START));
> +	} else if (CONFIG_PGTABLE_LEVELS > 3) {
> +		/*
> +		 * The fixmap shares its top level pgd entry with the kernel
> +		 * mapping. This can really only occur when we are running
> +		 * with 16k/4 levels, so we can simply reuse the pud level
> +		 * entry instead.
> +		 */
> +		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
> +
> +		set_pud(pud_set_fixmap_offset(pgd, FIXADDR_START),
> +			__pud(__pa(bm_pmd) | PUD_TYPE_TABLE));
> +		pud_clear_fixmap();
> +	} else {
> +		BUG();
> +	}
>  
>  	kasan_copy_shadow(pgd);
>  }
> @@ -569,10 +590,6 @@ void vmemmap_free(unsigned long start, unsigned long end)
>  }
>  #endif	/* CONFIG_SPARSEMEM_VMEMMAP */
>  
> -static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> -static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
> -static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
> -
>  static inline pud_t * fixmap_pud(unsigned long addr)
>  {
>  	return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
> @@ -598,8 +615,19 @@ void __init early_fixmap_init(void)
>  	unsigned long addr = FIXADDR_START;
>  
>  	pgd = pgd_offset_k(addr);
> -	pgd_populate(&init_mm, pgd, bm_pud);
> -	pud = fixmap_pud(addr);
> +	if (CONFIG_PGTABLE_LEVELS > 3 && !pgd_none(*pgd)) {
> +		/*
> +		 * We only end up here if the kernel mapping and the fixmap
> +		 * share the top level pgd entry, which should only happen on
> +		 * 16k/4 levels configurations.
> +		 */
> +		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
> +		pud = pud_offset_kimg(pgd, addr);
> +		memblock_free(__pa(bm_pud), sizeof(bm_pud));
> +	} else {
> +		pgd_populate(&init_mm, pgd, bm_pud);
> +		pud = fixmap_pud(addr);
> +	}
>  	pud_populate(&init_mm, pud, bm_pmd);
>  	pmd = fixmap_pmd(addr);
>  	pmd_populate_kernel(&init_mm, pmd, bm_pte);
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-12 18:14     ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-12 18:14 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	leif.lindholm, keescook, linux-kernel, stuart.yoder,
	bhupesh.sharma, arnd, marc.zyngier, christoffer.dall

On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
> This moves the module area to right before the vmalloc area, and
> moves the kernel image to the base of the vmalloc area. This is
> an intermediate step towards implementing kASLR, where the kernel
> image can be located anywhere in the vmalloc area.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/include/asm/kasan.h          | 20 ++++---
>  arch/arm64/include/asm/kernel-pgtable.h |  5 +-
>  arch/arm64/include/asm/memory.h         | 18 ++++--
>  arch/arm64/include/asm/pgtable.h        |  7 ---
>  arch/arm64/kernel/setup.c               | 12 ++++
>  arch/arm64/mm/dump.c                    | 12 ++--
>  arch/arm64/mm/init.c                    | 20 +++----
>  arch/arm64/mm/kasan_init.c              | 21 +++++--
>  arch/arm64/mm/mmu.c                     | 62 ++++++++++++++------
>  9 files changed, 118 insertions(+), 59 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
> index de0d21211c34..2c583dbf4746 100644
> --- a/arch/arm64/include/asm/kasan.h
> +++ b/arch/arm64/include/asm/kasan.h
> @@ -1,20 +1,16 @@
>  #ifndef __ASM_KASAN_H
>  #define __ASM_KASAN_H
>  
> -#ifndef __ASSEMBLY__
> -
>  #ifdef CONFIG_KASAN
>  
>  #include <linux/linkage.h>
> -#include <asm/memory.h>
> -#include <asm/pgtable-types.h>
>  
>  /*
>   * KASAN_SHADOW_START: beginning of the kernel virtual addresses.
>   * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
>   */
> -#define KASAN_SHADOW_START      (VA_START)
> -#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))
> +#define KASAN_SHADOW_START	(VA_START)
> +#define KASAN_SHADOW_END	(KASAN_SHADOW_START + (_AC(1, UL) << (VA_BITS - 3)))
>  
>  /*
>   * This value is used to map an address to the corresponding shadow
> @@ -26,16 +22,22 @@
>   * should satisfy the following equation:
>   *      KASAN_SHADOW_OFFSET = KASAN_SHADOW_END - (1ULL << 61)
>   */
> -#define KASAN_SHADOW_OFFSET     (KASAN_SHADOW_END - (1ULL << (64 - 3)))
> +#define KASAN_SHADOW_OFFSET	(KASAN_SHADOW_END - (_AC(1, ULL) << (64 - 3)))
> +

I couldn't immediately spot where KASAN_SHADOW_* were used in assembly.
I guess there's some other definition built atop of them that I've
missed.

Where should I be looking?

> +#ifndef __ASSEMBLY__
> +#include <asm/pgtable-types.h>
>  
>  void kasan_init(void);
>  void kasan_copy_shadow(pgd_t *pgdir);
>  asmlinkage void kasan_early_init(void);
> +#endif
>  
>  #else
> +
> +#ifndef __ASSEMBLY__
>  static inline void kasan_init(void) { }
>  static inline void kasan_copy_shadow(pgd_t *pgdir) { }
>  #endif
>  
> -#endif
> -#endif
> +#endif /* CONFIG_KASAN */
> +#endif /* __ASM_KASAN_H */
> diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
> index a459714ee29e..daa8a7b9917a 100644
> --- a/arch/arm64/include/asm/kernel-pgtable.h
> +++ b/arch/arm64/include/asm/kernel-pgtable.h
> @@ -70,8 +70,9 @@
>  /*
>   * Initial memory map attributes.
>   */
> -#define SWAPPER_PTE_FLAGS	(PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
> -#define SWAPPER_PMD_FLAGS	(PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
> +#define SWAPPER_PTE_FLAGS	(PTE_TYPE_PAGE | PTE_AF | PTE_SHARED | PTE_UXN)
> +#define SWAPPER_PMD_FLAGS	(PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S | \
> +				 PMD_SECT_UXN)

This will only affect the tables created in head.S. Before we start
userspace we'll have switched over to a new set of tables using
PAGE_KERNEL (including UXN).

Given that, this doesn't look necessary for the vmalloc area changes. Am
I missing something?

>  #if ARM64_SWAPPER_USES_SECTION_MAPS
>  #define SWAPPER_MM_MMUFLAGS	(PMD_ATTRINDX(MT_NORMAL) | SWAPPER_PMD_FLAGS)
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index bea9631b34a8..e45d3141ad98 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -51,14 +51,24 @@
>  #define VA_BITS			(CONFIG_ARM64_VA_BITS)
>  #define VA_START		(UL(0xffffffffffffffff) << VA_BITS)
>  #define PAGE_OFFSET		(UL(0xffffffffffffffff) << (VA_BITS - 1))
> -#define KIMAGE_VADDR		(PAGE_OFFSET)
> -#define MODULES_END		(KIMAGE_VADDR)
> -#define MODULES_VADDR		(MODULES_END - SZ_64M)
> -#define PCI_IO_END		(MODULES_VADDR - SZ_2M)
> +#define PCI_IO_END		(PAGE_OFFSET - SZ_2M)
>  #define PCI_IO_START		(PCI_IO_END - PCI_IO_SIZE)
>  #define FIXADDR_TOP		(PCI_IO_START - SZ_2M)
>  #define TASK_SIZE_64		(UL(1) << VA_BITS)
>  
> +#ifndef CONFIG_KASAN
> +#define MODULES_VADDR		(VA_START)
> +#else
> +#include <asm/kasan.h>
> +#define MODULES_VADDR		(KASAN_SHADOW_END)
> +#endif
> +
> +#define MODULES_VSIZE		(SZ_64M)
> +#define MODULES_END		(MODULES_VADDR + MODULES_VSIZE)
> +
> +#define KIMAGE_VADDR		(MODULES_END)
> +#define VMALLOC_START		(MODULES_END)
> +
>  #ifdef CONFIG_COMPAT
>  #define TASK_SIZE_32		UL(0x100000000)
>  #define TASK_SIZE		(test_thread_flag(TIF_32BIT) ? \
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 7b4e16068c9f..a910a44d7ab3 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -42,13 +42,6 @@
>   */
>  #define VMEMMAP_SIZE		ALIGN((1UL << (VA_BITS - PAGE_SHIFT)) * sizeof(struct page), PUD_SIZE)
>  
> -#ifndef CONFIG_KASAN
> -#define VMALLOC_START		(VA_START)
> -#else
> -#include <asm/kasan.h>
> -#define VMALLOC_START		(KASAN_SHADOW_END + SZ_64K)
> -#endif
> -
>  #define VMALLOC_END		(PAGE_OFFSET - PUD_SIZE - VMEMMAP_SIZE - SZ_64K)

It's a shame VMALLOC_START and VMALLOC_END are now in different headers.
It would be nice if we could keep them together.

As VMEMMAP_SIZE depends on sizeof(struct page), it's not just a simple
move. We could either place that in the !__ASSEMBLY__ portion of
memory.h, or we could add S_PAGE to asm-offsets.

If that's too painful now, we can leave that for subsequent cleanup;
there's other stuff in that area I'd like to unify at some point (e.g.
the mem_init and dump.c section boundary descriptions).

>  
>  #define vmemmap			((struct page *)(VMALLOC_END + SZ_64K))
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index cfed56f0ad26..c67ba4453ec6 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -53,6 +53,7 @@
>  #include <asm/cpufeature.h>
>  #include <asm/cpu_ops.h>
>  #include <asm/kasan.h>
> +#include <asm/kernel-pgtable.h>
>  #include <asm/sections.h>
>  #include <asm/setup.h>
>  #include <asm/smp_plat.h>
> @@ -291,6 +292,17 @@ u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
>  
>  void __init setup_arch(char **cmdline_p)
>  {
> +	static struct vm_struct vmlinux_vm;
> +
> +	vmlinux_vm.addr		= (void *)KIMAGE_VADDR;
> +	vmlinux_vm.size		= round_up((u64)_end - KIMAGE_VADDR,
> +					   SWAPPER_BLOCK_SIZE);

With the fine grained tables we should only need to round up to
PAGE_SIZE (though _end is implicitly page-aligned anyway). Given that,
is the SWAPPER_BLOCK_SIZE rounding necessary? 

> +	vmlinux_vm.phys_addr	= __pa(KIMAGE_VADDR);
> +	vmlinux_vm.flags	= VM_MAP;

I was going to say we should set VM_KASAN also per its description in
include/vmalloc.h, though per its uses its not clear if it will ever
matter.

> +	vmlinux_vm.caller	= setup_arch;
> +
> +	vm_area_add_early(&vmlinux_vm);

Do we need to register the kernel VA range quite this early, or could we
do this around paging_init/map_kernel time?

> +
>  	pr_info("Boot CPU: AArch64 Processor [%08x]\n", read_cpuid_id());
>  
>  	sprintf(init_utsname()->machine, ELF_PLATFORM);
> diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
> index 5a22a119a74c..e83ffb00560c 100644
> --- a/arch/arm64/mm/dump.c
> +++ b/arch/arm64/mm/dump.c
> @@ -35,7 +35,9 @@ struct addr_marker {
>  };
>  
>  enum address_markers_idx {
> -	VMALLOC_START_NR = 0,
> +	MODULES_START_NR = 0,
> +	MODULES_END_NR,
> +	VMALLOC_START_NR,
>  	VMALLOC_END_NR,
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  	VMEMMAP_START_NR,
> @@ -45,12 +47,12 @@ enum address_markers_idx {
>  	FIXADDR_END_NR,
>  	PCI_START_NR,
>  	PCI_END_NR,
> -	MODULES_START_NR,
> -	MODUELS_END_NR,
>  	KERNEL_SPACE_NR,
>  };
>  
>  static struct addr_marker address_markers[] = {
> +	{ MODULES_VADDR,	"Modules start" },
> +	{ MODULES_END,		"Modules end" },
>  	{ VMALLOC_START,	"vmalloc() Area" },
>  	{ VMALLOC_END,		"vmalloc() End" },
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
> @@ -61,9 +63,7 @@ static struct addr_marker address_markers[] = {
>  	{ FIXADDR_TOP,		"Fixmap end" },
>  	{ PCI_IO_START,		"PCI I/O start" },
>  	{ PCI_IO_END,		"PCI I/O end" },
> -	{ MODULES_VADDR,	"Modules start" },
> -	{ MODULES_END,		"Modules end" },
> -	{ PAGE_OFFSET,		"Kernel Mapping" },
> +	{ PAGE_OFFSET,		"Linear Mapping" },
>  	{ -1,			NULL },
>  };
>  
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index f3b061e67bfe..baa923bda651 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -302,22 +302,26 @@ void __init mem_init(void)
>  #ifdef CONFIG_KASAN
>  		  "    kasan   : 0x%16lx - 0x%16lx   (%6ld GB)\n"
>  #endif
> +		  "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>  		  "    vmalloc : 0x%16lx - 0x%16lx   (%6ld GB)\n"
> +		  "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
> +		  "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
> +		  "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  		  "    vmemmap : 0x%16lx - 0x%16lx   (%6ld GB maximum)\n"
>  		  "              0x%16lx - 0x%16lx   (%6ld MB actual)\n"
>  #endif
>  		  "    fixed   : 0x%16lx - 0x%16lx   (%6ld KB)\n"
>  		  "    PCI I/O : 0x%16lx - 0x%16lx   (%6ld MB)\n"
> -		  "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
> -		  "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n"
> -		  "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
> -		  "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
> -		  "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n",
> +		  "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n",
>  #ifdef CONFIG_KASAN
>  		  MLG(KASAN_SHADOW_START, KASAN_SHADOW_END),
>  #endif
> +		  MLM(MODULES_VADDR, MODULES_END),
>  		  MLG(VMALLOC_START, VMALLOC_END),
> +		  MLK_ROUNDUP(__init_begin, __init_end),
> +		  MLK_ROUNDUP(_text, _etext),
> +		  MLK_ROUNDUP(_sdata, _edata),
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  		  MLG((unsigned long)vmemmap,
>  		      (unsigned long)vmemmap + VMEMMAP_SIZE),
> @@ -326,11 +330,7 @@ void __init mem_init(void)
>  #endif
>  		  MLK(FIXADDR_START, FIXADDR_TOP),
>  		  MLM(PCI_IO_START, PCI_IO_END),
> -		  MLM(MODULES_VADDR, MODULES_END),
> -		  MLM(PAGE_OFFSET, (unsigned long)high_memory),
> -		  MLK_ROUNDUP(__init_begin, __init_end),
> -		  MLK_ROUNDUP(_text, _etext),
> -		  MLK_ROUNDUP(_sdata, _edata));
> +		  MLM(PAGE_OFFSET, (unsigned long)high_memory));
>  
>  #undef MLK
>  #undef MLM
> diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
> index 0ca411fc5ea3..acdd1ac166ec 100644
> --- a/arch/arm64/mm/kasan_init.c
> +++ b/arch/arm64/mm/kasan_init.c
> @@ -17,9 +17,11 @@
>  #include <linux/start_kernel.h>
>  
>  #include <asm/mmu_context.h>
> +#include <asm/kernel-pgtable.h>
>  #include <asm/page.h>
>  #include <asm/pgalloc.h>
>  #include <asm/pgtable.h>
> +#include <asm/sections.h>
>  #include <asm/tlbflush.h>
>  
>  static pgd_t tmp_pg_dir[PTRS_PER_PGD] __initdata __aligned(PGD_SIZE);
> @@ -33,7 +35,7 @@ static void __init kasan_early_pte_populate(pmd_t *pmd, unsigned long addr,
>  	if (pmd_none(*pmd))
>  		pmd_populate_kernel(&init_mm, pmd, kasan_zero_pte);
>  
> -	pte = pte_offset_kernel(pmd, addr);
> +	pte = pte_offset_kimg(pmd, addr);
>  	do {
>  		next = addr + PAGE_SIZE;
>  		set_pte(pte, pfn_pte(virt_to_pfn(kasan_zero_page),
> @@ -51,7 +53,7 @@ static void __init kasan_early_pmd_populate(pud_t *pud,
>  	if (pud_none(*pud))
>  		pud_populate(&init_mm, pud, kasan_zero_pmd);
>  
> -	pmd = pmd_offset(pud, addr);
> +	pmd = pmd_offset_kimg(pud, addr);
>  	do {
>  		next = pmd_addr_end(addr, end);
>  		kasan_early_pte_populate(pmd, addr, next);
> @@ -68,7 +70,7 @@ static void __init kasan_early_pud_populate(pgd_t *pgd,
>  	if (pgd_none(*pgd))
>  		pgd_populate(&init_mm, pgd, kasan_zero_pud);
>  
> -	pud = pud_offset(pgd, addr);
> +	pud = pud_offset_kimg(pgd, addr);
>  	do {
>  		next = pud_addr_end(addr, end);
>  		kasan_early_pmd_populate(pud, addr, next);
> @@ -126,8 +128,14 @@ static void __init clear_pgds(unsigned long start,
>  
>  void __init kasan_init(void)
>  {
> +	u64 kimg_shadow_start, kimg_shadow_end;
>  	struct memblock_region *reg;
>  
> +	kimg_shadow_start = round_down((u64)kasan_mem_to_shadow(_text),
> +				       SWAPPER_BLOCK_SIZE);
> +	kimg_shadow_end = round_up((u64)kasan_mem_to_shadow(_end),
> +				   SWAPPER_BLOCK_SIZE);

This rounding looks suspect to me, given it's applied to the shadow
addresses rather than the kimage addresses. That's roughly equivalent to
kasan_mem_to_shadow(round_up(_end, 8 * SWAPPER_BLOCK_SIZE).

I don't think we need any rounding for the kimage addresses. The image
end is page-granular (and the fine-grained mapping will reflect that).
Any accesses between _end and roud_up(_end, SWAPPER_BLOCK_SIZE) would be
bugs (and would most likely fault) regardless of KASAN.

Or am I just being thick here?

> +
>  	/*
>  	 * We are going to perform proper setup of shadow memory.
>  	 * At first we should unmap early shadow (clear_pgds() call bellow).
> @@ -141,8 +149,13 @@ void __init kasan_init(void)
>  
>  	clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
>  
> +	vmemmap_populate(kimg_shadow_start, kimg_shadow_end,
> +			 pfn_to_nid(virt_to_pfn(kimg_shadow_start)));

That virt_to_pfn doesn't look right -- kimg_shadow_start is neither a
linear address nor an image address. As pfn_to_nid is hard-coded to 0
for !NUMA this happens to be ok for us for the moment.

I think we should follow the x86 KASAN code and use NUMA_NO_NODE for
this for now.

> +
>  	kasan_populate_zero_shadow((void *)KASAN_SHADOW_START,
> -			kasan_mem_to_shadow((void *)MODULES_VADDR));
> +				   (void *)kimg_shadow_start);
> +	kasan_populate_zero_shadow((void *)kimg_shadow_end,
> +				   kasan_mem_to_shadow((void *)PAGE_OFFSET));
>  
>  	for_each_memblock(memory, reg) {
>  		void *start = (void *)__phys_to_virt(reg->base);
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 75b5f0dc3bdc..0b28f1469f9b 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -53,6 +53,10 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
>  unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
>  EXPORT_SYMBOL(empty_zero_page);
>  
> +static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> +static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
> +static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
> +
>  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>  			      unsigned long size, pgprot_t vma_prot)
>  {
> @@ -349,14 +353,14 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>  {
>  
>  	unsigned long kernel_start = __pa(_stext);
> -	unsigned long kernel_end = __pa(_end);
> +	unsigned long kernel_end = __pa(_etext);
>  
>  	/*
> -	 * The kernel itself is mapped at page granularity. Map all other
> -	 * memory, making sure we don't overwrite the existing kernel mappings.
> +	 * Take care not to create a writable alias for the
> +	 * read-only text and rodata sections of the kernel image.
>  	 */
>  
> -	/* No overlap with the kernel. */
> +	/* No overlap with the kernel text */
>  	if (end < kernel_start || start >= kernel_end) {
>  		__create_pgd_mapping(pgd, start, __phys_to_virt(start),
>  				     end - start, PAGE_KERNEL,
> @@ -365,7 +369,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>  	}
>  
>  	/*
> -	 * This block overlaps the kernel mapping. Map the portion(s) which
> +	 * This block overlaps the kernel text mapping. Map the portion(s) which
>  	 * don't overlap.
>  	 */
>  	if (start < kernel_start)
> @@ -438,12 +442,29 @@ static void __init map_kernel(pgd_t *pgd)
>  	map_kernel_chunk(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC);
>  	map_kernel_chunk(pgd, _data, _end, PAGE_KERNEL);
>  
> -	/*
> -	 * The fixmap falls in a separate pgd to the kernel, and doesn't live
> -	 * in the carveout for the swapper_pg_dir. We can simply re-use the
> -	 * existing dir for the fixmap.
> -	 */
> -	set_pgd(pgd_offset_raw(pgd, FIXADDR_START), *pgd_offset_k(FIXADDR_START));
> +	if (pgd_index(FIXADDR_START) != pgd_index((u64)_end)) {

To match the style of early_fixmap_init, and given we already mapped the
kernel image, this could be:

	if (pgd_none(pgd_offset_raw(pgd, FIXADDR_START))) {

Which also serves as a run-time check that the pgd entry really was
clear.

Other than that, this looks good to me!

Thanks,
Mark.

> +		/*
> +		 * The fixmap falls in a separate pgd to the kernel, and doesn't
> +		 * live in the carveout for the swapper_pg_dir. We can simply
> +		 * re-use the existing dir for the fixmap.
> +		 */
> +		set_pgd(pgd_offset_raw(pgd, FIXADDR_START),
> +			*pgd_offset_k(FIXADDR_START));
> +	} else if (CONFIG_PGTABLE_LEVELS > 3) {
> +		/*
> +		 * The fixmap shares its top level pgd entry with the kernel
> +		 * mapping. This can really only occur when we are running
> +		 * with 16k/4 levels, so we can simply reuse the pud level
> +		 * entry instead.
> +		 */
> +		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
> +
> +		set_pud(pud_set_fixmap_offset(pgd, FIXADDR_START),
> +			__pud(__pa(bm_pmd) | PUD_TYPE_TABLE));
> +		pud_clear_fixmap();
> +	} else {
> +		BUG();
> +	}
>  
>  	kasan_copy_shadow(pgd);
>  }
> @@ -569,10 +590,6 @@ void vmemmap_free(unsigned long start, unsigned long end)
>  }
>  #endif	/* CONFIG_SPARSEMEM_VMEMMAP */
>  
> -static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> -static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
> -static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
> -
>  static inline pud_t * fixmap_pud(unsigned long addr)
>  {
>  	return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
> @@ -598,8 +615,19 @@ void __init early_fixmap_init(void)
>  	unsigned long addr = FIXADDR_START;
>  
>  	pgd = pgd_offset_k(addr);
> -	pgd_populate(&init_mm, pgd, bm_pud);
> -	pud = fixmap_pud(addr);
> +	if (CONFIG_PGTABLE_LEVELS > 3 && !pgd_none(*pgd)) {
> +		/*
> +		 * We only end up here if the kernel mapping and the fixmap
> +		 * share the top level pgd entry, which should only happen on
> +		 * 16k/4 levels configurations.
> +		 */
> +		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
> +		pud = pud_offset_kimg(pgd, addr);
> +		memblock_free(__pa(bm_pud), sizeof(bm_pud));
> +	} else {
> +		pgd_populate(&init_mm, pgd, bm_pud);
> +		pud = fixmap_pud(addr);
> +	}
>  	pud_populate(&init_mm, pud, bm_pmd);
>  	pmd = fixmap_pmd(addr);
>  	pmd_populate_kernel(&init_mm, pmd, bm_pte);
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
  2016-01-12 18:14     ` Mark Rutland
  (?)
@ 2016-01-13  8:39       ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-13  8:39 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 12 January 2016 at 19:14, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
>> This moves the module area to right before the vmalloc area, and
>> moves the kernel image to the base of the vmalloc area. This is
>> an intermediate step towards implementing kASLR, where the kernel
>> image can be located anywhere in the vmalloc area.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/include/asm/kasan.h          | 20 ++++---
>>  arch/arm64/include/asm/kernel-pgtable.h |  5 +-
>>  arch/arm64/include/asm/memory.h         | 18 ++++--
>>  arch/arm64/include/asm/pgtable.h        |  7 ---
>>  arch/arm64/kernel/setup.c               | 12 ++++
>>  arch/arm64/mm/dump.c                    | 12 ++--
>>  arch/arm64/mm/init.c                    | 20 +++----
>>  arch/arm64/mm/kasan_init.c              | 21 +++++--
>>  arch/arm64/mm/mmu.c                     | 62 ++++++++++++++------
>>  9 files changed, 118 insertions(+), 59 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
>> index de0d21211c34..2c583dbf4746 100644
>> --- a/arch/arm64/include/asm/kasan.h
>> +++ b/arch/arm64/include/asm/kasan.h
>> @@ -1,20 +1,16 @@
>>  #ifndef __ASM_KASAN_H
>>  #define __ASM_KASAN_H
>>
>> -#ifndef __ASSEMBLY__
>> -
>>  #ifdef CONFIG_KASAN
>>
>>  #include <linux/linkage.h>
>> -#include <asm/memory.h>
>> -#include <asm/pgtable-types.h>
>>
>>  /*
>>   * KASAN_SHADOW_START: beginning of the kernel virtual addresses.
>>   * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
>>   */
>> -#define KASAN_SHADOW_START      (VA_START)
>> -#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))
>> +#define KASAN_SHADOW_START   (VA_START)
>> +#define KASAN_SHADOW_END     (KASAN_SHADOW_START + (_AC(1, UL) << (VA_BITS - 3)))
>>
>>  /*
>>   * This value is used to map an address to the corresponding shadow
>> @@ -26,16 +22,22 @@
>>   * should satisfy the following equation:
>>   *      KASAN_SHADOW_OFFSET = KASAN_SHADOW_END - (1ULL << 61)
>>   */
>> -#define KASAN_SHADOW_OFFSET     (KASAN_SHADOW_END - (1ULL << (64 - 3)))
>> +#define KASAN_SHADOW_OFFSET  (KASAN_SHADOW_END - (_AC(1, ULL) << (64 - 3)))
>> +
>
> I couldn't immediately spot where KASAN_SHADOW_* were used in assembly.
> I guess there's some other definition built atop of them that I've
> missed.
>
> Where should I be looking?
>

Well, the problem is that KIMAGE_VADDR will be defined in terms of
KASAN_SHADOW_END if KASAN is enabled. But since KASAN always uses the
first 1/8 of that VA space, I am going to rework this so that the
non-KASAN constants never depend on the actual values but only on
CONFIG_KASAN

>> +#ifndef __ASSEMBLY__
>> +#include <asm/pgtable-types.h>
>>
>>  void kasan_init(void);
>>  void kasan_copy_shadow(pgd_t *pgdir);
>>  asmlinkage void kasan_early_init(void);
>> +#endif
>>
>>  #else
>> +
>> +#ifndef __ASSEMBLY__
>>  static inline void kasan_init(void) { }
>>  static inline void kasan_copy_shadow(pgd_t *pgdir) { }
>>  #endif
>>
>> -#endif
>> -#endif
>> +#endif /* CONFIG_KASAN */
>> +#endif /* __ASM_KASAN_H */
>> diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
>> index a459714ee29e..daa8a7b9917a 100644
>> --- a/arch/arm64/include/asm/kernel-pgtable.h
>> +++ b/arch/arm64/include/asm/kernel-pgtable.h
>> @@ -70,8 +70,9 @@
>>  /*
>>   * Initial memory map attributes.
>>   */
>> -#define SWAPPER_PTE_FLAGS    (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
>> -#define SWAPPER_PMD_FLAGS    (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
>> +#define SWAPPER_PTE_FLAGS    (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED | PTE_UXN)
>> +#define SWAPPER_PMD_FLAGS    (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S | \
>> +                              PMD_SECT_UXN)
>
> This will only affect the tables created in head.S. Before we start
> userspace we'll have switched over to a new set of tables using
> PAGE_KERNEL (including UXN).
>
> Given that, this doesn't look necessary for the vmalloc area changes. Am
> I missing something?
>

No, this was carried over from an older version of the series, when
the kernel mapping, after having been moved below PAGE_OFFSET, would
not be overridden by the memblock based linear mapping routines, and
so would missing the UXN bit. But with your changes, this can indeed
be dropped.

>>  #if ARM64_SWAPPER_USES_SECTION_MAPS
>>  #define SWAPPER_MM_MMUFLAGS  (PMD_ATTRINDX(MT_NORMAL) | SWAPPER_PMD_FLAGS)
>> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
>> index bea9631b34a8..e45d3141ad98 100644
>> --- a/arch/arm64/include/asm/memory.h
>> +++ b/arch/arm64/include/asm/memory.h
>> @@ -51,14 +51,24 @@
>>  #define VA_BITS                      (CONFIG_ARM64_VA_BITS)
>>  #define VA_START             (UL(0xffffffffffffffff) << VA_BITS)
>>  #define PAGE_OFFSET          (UL(0xffffffffffffffff) << (VA_BITS - 1))
>> -#define KIMAGE_VADDR         (PAGE_OFFSET)
>> -#define MODULES_END          (KIMAGE_VADDR)
>> -#define MODULES_VADDR                (MODULES_END - SZ_64M)
>> -#define PCI_IO_END           (MODULES_VADDR - SZ_2M)
>> +#define PCI_IO_END           (PAGE_OFFSET - SZ_2M)
>>  #define PCI_IO_START         (PCI_IO_END - PCI_IO_SIZE)
>>  #define FIXADDR_TOP          (PCI_IO_START - SZ_2M)
>>  #define TASK_SIZE_64         (UL(1) << VA_BITS)
>>
>> +#ifndef CONFIG_KASAN
>> +#define MODULES_VADDR                (VA_START)
>> +#else
>> +#include <asm/kasan.h>
>> +#define MODULES_VADDR                (KASAN_SHADOW_END)
>> +#endif
>> +
>> +#define MODULES_VSIZE                (SZ_64M)
>> +#define MODULES_END          (MODULES_VADDR + MODULES_VSIZE)
>> +
>> +#define KIMAGE_VADDR         (MODULES_END)
>> +#define VMALLOC_START                (MODULES_END)
>> +
>>  #ifdef CONFIG_COMPAT
>>  #define TASK_SIZE_32         UL(0x100000000)
>>  #define TASK_SIZE            (test_thread_flag(TIF_32BIT) ? \
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index 7b4e16068c9f..a910a44d7ab3 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -42,13 +42,6 @@
>>   */
>>  #define VMEMMAP_SIZE         ALIGN((1UL << (VA_BITS - PAGE_SHIFT)) * sizeof(struct page), PUD_SIZE)
>>
>> -#ifndef CONFIG_KASAN
>> -#define VMALLOC_START                (VA_START)
>> -#else
>> -#include <asm/kasan.h>
>> -#define VMALLOC_START                (KASAN_SHADOW_END + SZ_64K)
>> -#endif
>> -
>>  #define VMALLOC_END          (PAGE_OFFSET - PUD_SIZE - VMEMMAP_SIZE - SZ_64K)
>
> It's a shame VMALLOC_START and VMALLOC_END are now in different headers.
> It would be nice if we could keep them together.
>
> As VMEMMAP_SIZE depends on sizeof(struct page), it's not just a simple
> move. We could either place that in the !__ASSEMBLY__ portion of
> memory.h, or we could add S_PAGE to asm-offsets.
>
> If that's too painful now, we can leave that for subsequent cleanup;
> there's other stuff in that area I'd like to unify at some point (e.g.
> the mem_init and dump.c section boundary descriptions).
>

No, I think I can probably do a bit better than this. I will address it in v4

>>
>>  #define vmemmap                      ((struct page *)(VMALLOC_END + SZ_64K))
>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>> index cfed56f0ad26..c67ba4453ec6 100644
>> --- a/arch/arm64/kernel/setup.c
>> +++ b/arch/arm64/kernel/setup.c
>> @@ -53,6 +53,7 @@
>>  #include <asm/cpufeature.h>
>>  #include <asm/cpu_ops.h>
>>  #include <asm/kasan.h>
>> +#include <asm/kernel-pgtable.h>
>>  #include <asm/sections.h>
>>  #include <asm/setup.h>
>>  #include <asm/smp_plat.h>
>> @@ -291,6 +292,17 @@ u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
>>
>>  void __init setup_arch(char **cmdline_p)
>>  {
>> +     static struct vm_struct vmlinux_vm;
>> +
>> +     vmlinux_vm.addr         = (void *)KIMAGE_VADDR;
>> +     vmlinux_vm.size         = round_up((u64)_end - KIMAGE_VADDR,
>> +                                        SWAPPER_BLOCK_SIZE);
>
> With the fine grained tables we should only need to round up to
> PAGE_SIZE (though _end is implicitly page-aligned anyway). Given that,
> is the SWAPPER_BLOCK_SIZE rounding necessary?
>

No, probably not.

>> +     vmlinux_vm.phys_addr    = __pa(KIMAGE_VADDR);
>> +     vmlinux_vm.flags        = VM_MAP;
>
> I was going to say we should set VM_KASAN also per its description in
> include/vmalloc.h, though per its uses its not clear if it will ever
> matter.
>

No, we shouldn't. Even if we are never going to unmap this vma,
setting the flag will result in the shadow area being freed using
vfree(), while it was not allocated via vmalloc() so that is likely to
cause trouble.

>> +     vmlinux_vm.caller       = setup_arch;
>> +
>> +     vm_area_add_early(&vmlinux_vm);
>
> Do we need to register the kernel VA range quite this early, or could we
> do this around paging_init/map_kernel time?
>

No. Locally, I moved it into map_kernel_chunk, so that we have
separate areas for _text, _init and _data, and we can unmap the _init
entirely rather than only stripping the exec bit. I haven't quite
figured out how to get rid of the vma area, but perhaps it make sense
to keep it reserved, so that modules don't end up there later (which
is possible with the module region randomization I have implemented
for v4) since I don't know how well things like kallsyms etc cope with
that.

>> +
>>       pr_info("Boot CPU: AArch64 Processor [%08x]\n", read_cpuid_id());
>>
>>       sprintf(init_utsname()->machine, ELF_PLATFORM);
>> diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
>> index 5a22a119a74c..e83ffb00560c 100644
>> --- a/arch/arm64/mm/dump.c
>> +++ b/arch/arm64/mm/dump.c
>> @@ -35,7 +35,9 @@ struct addr_marker {
>>  };
>>
>>  enum address_markers_idx {
>> -     VMALLOC_START_NR = 0,
>> +     MODULES_START_NR = 0,
>> +     MODULES_END_NR,
>> +     VMALLOC_START_NR,
>>       VMALLOC_END_NR,
>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>       VMEMMAP_START_NR,
>> @@ -45,12 +47,12 @@ enum address_markers_idx {
>>       FIXADDR_END_NR,
>>       PCI_START_NR,
>>       PCI_END_NR,
>> -     MODULES_START_NR,
>> -     MODUELS_END_NR,
>>       KERNEL_SPACE_NR,
>>  };
>>
>>  static struct addr_marker address_markers[] = {
>> +     { MODULES_VADDR,        "Modules start" },
>> +     { MODULES_END,          "Modules end" },
>>       { VMALLOC_START,        "vmalloc() Area" },
>>       { VMALLOC_END,          "vmalloc() End" },
>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>> @@ -61,9 +63,7 @@ static struct addr_marker address_markers[] = {
>>       { FIXADDR_TOP,          "Fixmap end" },
>>       { PCI_IO_START,         "PCI I/O start" },
>>       { PCI_IO_END,           "PCI I/O end" },
>> -     { MODULES_VADDR,        "Modules start" },
>> -     { MODULES_END,          "Modules end" },
>> -     { PAGE_OFFSET,          "Kernel Mapping" },
>> +     { PAGE_OFFSET,          "Linear Mapping" },
>>       { -1,                   NULL },
>>  };
>>
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index f3b061e67bfe..baa923bda651 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -302,22 +302,26 @@ void __init mem_init(void)
>>  #ifdef CONFIG_KASAN
>>                 "    kasan   : 0x%16lx - 0x%16lx   (%6ld GB)\n"
>>  #endif
>> +               "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>>                 "    vmalloc : 0x%16lx - 0x%16lx   (%6ld GB)\n"
>> +               "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>> +               "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>> +               "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>                 "    vmemmap : 0x%16lx - 0x%16lx   (%6ld GB maximum)\n"
>>                 "              0x%16lx - 0x%16lx   (%6ld MB actual)\n"
>>  #endif
>>                 "    fixed   : 0x%16lx - 0x%16lx   (%6ld KB)\n"
>>                 "    PCI I/O : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>> -               "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>> -               "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>> -               "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>> -               "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>> -               "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n",
>> +               "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n",
>>  #ifdef CONFIG_KASAN
>>                 MLG(KASAN_SHADOW_START, KASAN_SHADOW_END),
>>  #endif
>> +               MLM(MODULES_VADDR, MODULES_END),
>>                 MLG(VMALLOC_START, VMALLOC_END),
>> +               MLK_ROUNDUP(__init_begin, __init_end),
>> +               MLK_ROUNDUP(_text, _etext),
>> +               MLK_ROUNDUP(_sdata, _edata),
>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>                 MLG((unsigned long)vmemmap,
>>                     (unsigned long)vmemmap + VMEMMAP_SIZE),
>> @@ -326,11 +330,7 @@ void __init mem_init(void)
>>  #endif
>>                 MLK(FIXADDR_START, FIXADDR_TOP),
>>                 MLM(PCI_IO_START, PCI_IO_END),
>> -               MLM(MODULES_VADDR, MODULES_END),
>> -               MLM(PAGE_OFFSET, (unsigned long)high_memory),
>> -               MLK_ROUNDUP(__init_begin, __init_end),
>> -               MLK_ROUNDUP(_text, _etext),
>> -               MLK_ROUNDUP(_sdata, _edata));
>> +               MLM(PAGE_OFFSET, (unsigned long)high_memory));
>>
>>  #undef MLK
>>  #undef MLM
>> diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
>> index 0ca411fc5ea3..acdd1ac166ec 100644
>> --- a/arch/arm64/mm/kasan_init.c
>> +++ b/arch/arm64/mm/kasan_init.c
>> @@ -17,9 +17,11 @@
>>  #include <linux/start_kernel.h>
>>
>>  #include <asm/mmu_context.h>
>> +#include <asm/kernel-pgtable.h>
>>  #include <asm/page.h>
>>  #include <asm/pgalloc.h>
>>  #include <asm/pgtable.h>
>> +#include <asm/sections.h>
>>  #include <asm/tlbflush.h>
>>
>>  static pgd_t tmp_pg_dir[PTRS_PER_PGD] __initdata __aligned(PGD_SIZE);
>> @@ -33,7 +35,7 @@ static void __init kasan_early_pte_populate(pmd_t *pmd, unsigned long addr,
>>       if (pmd_none(*pmd))
>>               pmd_populate_kernel(&init_mm, pmd, kasan_zero_pte);
>>
>> -     pte = pte_offset_kernel(pmd, addr);
>> +     pte = pte_offset_kimg(pmd, addr);
>>       do {
>>               next = addr + PAGE_SIZE;
>>               set_pte(pte, pfn_pte(virt_to_pfn(kasan_zero_page),
>> @@ -51,7 +53,7 @@ static void __init kasan_early_pmd_populate(pud_t *pud,
>>       if (pud_none(*pud))
>>               pud_populate(&init_mm, pud, kasan_zero_pmd);
>>
>> -     pmd = pmd_offset(pud, addr);
>> +     pmd = pmd_offset_kimg(pud, addr);
>>       do {
>>               next = pmd_addr_end(addr, end);
>>               kasan_early_pte_populate(pmd, addr, next);
>> @@ -68,7 +70,7 @@ static void __init kasan_early_pud_populate(pgd_t *pgd,
>>       if (pgd_none(*pgd))
>>               pgd_populate(&init_mm, pgd, kasan_zero_pud);
>>
>> -     pud = pud_offset(pgd, addr);
>> +     pud = pud_offset_kimg(pgd, addr);
>>       do {
>>               next = pud_addr_end(addr, end);
>>               kasan_early_pmd_populate(pud, addr, next);
>> @@ -126,8 +128,14 @@ static void __init clear_pgds(unsigned long start,
>>
>>  void __init kasan_init(void)
>>  {
>> +     u64 kimg_shadow_start, kimg_shadow_end;
>>       struct memblock_region *reg;
>>
>> +     kimg_shadow_start = round_down((u64)kasan_mem_to_shadow(_text),
>> +                                    SWAPPER_BLOCK_SIZE);
>> +     kimg_shadow_end = round_up((u64)kasan_mem_to_shadow(_end),
>> +                                SWAPPER_BLOCK_SIZE);
>
> This rounding looks suspect to me, given it's applied to the shadow
> addresses rather than the kimage addresses. That's roughly equivalent to
> kasan_mem_to_shadow(round_up(_end, 8 * SWAPPER_BLOCK_SIZE).
>
> I don't think we need any rounding for the kimage addresses. The image
> end is page-granular (and the fine-grained mapping will reflect that).
> Any accesses between _end and roud_up(_end, SWAPPER_BLOCK_SIZE) would be
> bugs (and would most likely fault) regardless of KASAN.
>
> Or am I just being thick here?
>

Well, the problem here is that vmemmap_populate() is used as a
surrogate vmalloc() since that is not available yet, and
vmemmap_populate() allocates in SWAPPER_BLOCK_SIZE granularity.
If I remove the rounding, I get false positive kasan errors which I
have not quite diagnosed yet, but are probably due to the fact that
the rounding performed by vmemmap_populate() goes in the wrong
direction.

I do wonder what that means for memblocks that are not multiples of 16
MB, though (below)

>> +
>>       /*
>>        * We are going to perform proper setup of shadow memory.
>>        * At first we should unmap early shadow (clear_pgds() call bellow).
>> @@ -141,8 +149,13 @@ void __init kasan_init(void)
>>
>>       clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
>>
>> +     vmemmap_populate(kimg_shadow_start, kimg_shadow_end,
>> +                      pfn_to_nid(virt_to_pfn(kimg_shadow_start)));
>
> That virt_to_pfn doesn't look right -- kimg_shadow_start is neither a
> linear address nor an image address. As pfn_to_nid is hard-coded to 0
> for !NUMA this happens to be ok for us for the moment.
>
> I think we should follow the x86 KASAN code and use NUMA_NO_NODE for
> this for now.
>

Ack

>> +
>>       kasan_populate_zero_shadow((void *)KASAN_SHADOW_START,
>> -                     kasan_mem_to_shadow((void *)MODULES_VADDR));
>> +                                (void *)kimg_shadow_start);
>> +     kasan_populate_zero_shadow((void *)kimg_shadow_end,
>> +                                kasan_mem_to_shadow((void *)PAGE_OFFSET));
>>
>>       for_each_memblock(memory, reg) {
>>               void *start = (void *)__phys_to_virt(reg->base);
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 75b5f0dc3bdc..0b28f1469f9b 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -53,6 +53,10 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
>>  unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
>>  EXPORT_SYMBOL(empty_zero_page);
>>
>> +static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>> +static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
>> +static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
>> +
>>  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>>                             unsigned long size, pgprot_t vma_prot)
>>  {
>> @@ -349,14 +353,14 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>>  {
>>
>>       unsigned long kernel_start = __pa(_stext);
>> -     unsigned long kernel_end = __pa(_end);
>> +     unsigned long kernel_end = __pa(_etext);
>>
>>       /*
>> -      * The kernel itself is mapped at page granularity. Map all other
>> -      * memory, making sure we don't overwrite the existing kernel mappings.
>> +      * Take care not to create a writable alias for the
>> +      * read-only text and rodata sections of the kernel image.
>>        */
>>
>> -     /* No overlap with the kernel. */
>> +     /* No overlap with the kernel text */
>>       if (end < kernel_start || start >= kernel_end) {
>>               __create_pgd_mapping(pgd, start, __phys_to_virt(start),
>>                                    end - start, PAGE_KERNEL,
>> @@ -365,7 +369,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>>       }
>>
>>       /*
>> -      * This block overlaps the kernel mapping. Map the portion(s) which
>> +      * This block overlaps the kernel text mapping. Map the portion(s) which
>>        * don't overlap.
>>        */
>>       if (start < kernel_start)
>> @@ -438,12 +442,29 @@ static void __init map_kernel(pgd_t *pgd)
>>       map_kernel_chunk(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC);
>>       map_kernel_chunk(pgd, _data, _end, PAGE_KERNEL);
>>
>> -     /*
>> -      * The fixmap falls in a separate pgd to the kernel, and doesn't live
>> -      * in the carveout for the swapper_pg_dir. We can simply re-use the
>> -      * existing dir for the fixmap.
>> -      */
>> -     set_pgd(pgd_offset_raw(pgd, FIXADDR_START), *pgd_offset_k(FIXADDR_START));
>> +     if (pgd_index(FIXADDR_START) != pgd_index((u64)_end)) {
>
> To match the style of early_fixmap_init, and given we already mapped the
> kernel image, this could be:
>
>         if (pgd_none(pgd_offset_raw(pgd, FIXADDR_START))) {
>
> Which also serves as a run-time check that the pgd entry really was
> clear.
>

Yes, that looks better. I will steal that :-)

> Other than that, this looks good to me!
>

Thanks!

>> +             /*
>> +              * The fixmap falls in a separate pgd to the kernel, and doesn't
>> +              * live in the carveout for the swapper_pg_dir. We can simply
>> +              * re-use the existing dir for the fixmap.
>> +              */
>> +             set_pgd(pgd_offset_raw(pgd, FIXADDR_START),
>> +                     *pgd_offset_k(FIXADDR_START));
>> +     } else if (CONFIG_PGTABLE_LEVELS > 3) {
>> +             /*
>> +              * The fixmap shares its top level pgd entry with the kernel
>> +              * mapping. This can really only occur when we are running
>> +              * with 16k/4 levels, so we can simply reuse the pud level
>> +              * entry instead.
>> +              */
>> +             BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
>> +
>> +             set_pud(pud_set_fixmap_offset(pgd, FIXADDR_START),
>> +                     __pud(__pa(bm_pmd) | PUD_TYPE_TABLE));
>> +             pud_clear_fixmap();
>> +     } else {
>> +             BUG();
>> +     }
>>
>>       kasan_copy_shadow(pgd);
>>  }
>> @@ -569,10 +590,6 @@ void vmemmap_free(unsigned long start, unsigned long end)
>>  }
>>  #endif       /* CONFIG_SPARSEMEM_VMEMMAP */
>>
>> -static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>> -static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
>> -static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
>> -
>>  static inline pud_t * fixmap_pud(unsigned long addr)
>>  {
>>       return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
>> @@ -598,8 +615,19 @@ void __init early_fixmap_init(void)
>>       unsigned long addr = FIXADDR_START;
>>
>>       pgd = pgd_offset_k(addr);
>> -     pgd_populate(&init_mm, pgd, bm_pud);
>> -     pud = fixmap_pud(addr);
>> +     if (CONFIG_PGTABLE_LEVELS > 3 && !pgd_none(*pgd)) {
>> +             /*
>> +              * We only end up here if the kernel mapping and the fixmap
>> +              * share the top level pgd entry, which should only happen on
>> +              * 16k/4 levels configurations.
>> +              */
>> +             BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
>> +             pud = pud_offset_kimg(pgd, addr);
>> +             memblock_free(__pa(bm_pud), sizeof(bm_pud));
>> +     } else {
>> +             pgd_populate(&init_mm, pgd, bm_pud);
>> +             pud = fixmap_pud(addr);
>> +     }
>>       pud_populate(&init_mm, pud, bm_pmd);
>>       pmd = fixmap_pmd(addr);
>>       pmd_populate_kernel(&init_mm, pmd, bm_pte);
>> --
>> 2.5.0
>>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-13  8:39       ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-13  8:39 UTC (permalink / raw)
  To: linux-arm-kernel

On 12 January 2016 at 19:14, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
>> This moves the module area to right before the vmalloc area, and
>> moves the kernel image to the base of the vmalloc area. This is
>> an intermediate step towards implementing kASLR, where the kernel
>> image can be located anywhere in the vmalloc area.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/include/asm/kasan.h          | 20 ++++---
>>  arch/arm64/include/asm/kernel-pgtable.h |  5 +-
>>  arch/arm64/include/asm/memory.h         | 18 ++++--
>>  arch/arm64/include/asm/pgtable.h        |  7 ---
>>  arch/arm64/kernel/setup.c               | 12 ++++
>>  arch/arm64/mm/dump.c                    | 12 ++--
>>  arch/arm64/mm/init.c                    | 20 +++----
>>  arch/arm64/mm/kasan_init.c              | 21 +++++--
>>  arch/arm64/mm/mmu.c                     | 62 ++++++++++++++------
>>  9 files changed, 118 insertions(+), 59 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
>> index de0d21211c34..2c583dbf4746 100644
>> --- a/arch/arm64/include/asm/kasan.h
>> +++ b/arch/arm64/include/asm/kasan.h
>> @@ -1,20 +1,16 @@
>>  #ifndef __ASM_KASAN_H
>>  #define __ASM_KASAN_H
>>
>> -#ifndef __ASSEMBLY__
>> -
>>  #ifdef CONFIG_KASAN
>>
>>  #include <linux/linkage.h>
>> -#include <asm/memory.h>
>> -#include <asm/pgtable-types.h>
>>
>>  /*
>>   * KASAN_SHADOW_START: beginning of the kernel virtual addresses.
>>   * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
>>   */
>> -#define KASAN_SHADOW_START      (VA_START)
>> -#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))
>> +#define KASAN_SHADOW_START   (VA_START)
>> +#define KASAN_SHADOW_END     (KASAN_SHADOW_START + (_AC(1, UL) << (VA_BITS - 3)))
>>
>>  /*
>>   * This value is used to map an address to the corresponding shadow
>> @@ -26,16 +22,22 @@
>>   * should satisfy the following equation:
>>   *      KASAN_SHADOW_OFFSET = KASAN_SHADOW_END - (1ULL << 61)
>>   */
>> -#define KASAN_SHADOW_OFFSET     (KASAN_SHADOW_END - (1ULL << (64 - 3)))
>> +#define KASAN_SHADOW_OFFSET  (KASAN_SHADOW_END - (_AC(1, ULL) << (64 - 3)))
>> +
>
> I couldn't immediately spot where KASAN_SHADOW_* were used in assembly.
> I guess there's some other definition built atop of them that I've
> missed.
>
> Where should I be looking?
>

Well, the problem is that KIMAGE_VADDR will be defined in terms of
KASAN_SHADOW_END if KASAN is enabled. But since KASAN always uses the
first 1/8 of that VA space, I am going to rework this so that the
non-KASAN constants never depend on the actual values but only on
CONFIG_KASAN

>> +#ifndef __ASSEMBLY__
>> +#include <asm/pgtable-types.h>
>>
>>  void kasan_init(void);
>>  void kasan_copy_shadow(pgd_t *pgdir);
>>  asmlinkage void kasan_early_init(void);
>> +#endif
>>
>>  #else
>> +
>> +#ifndef __ASSEMBLY__
>>  static inline void kasan_init(void) { }
>>  static inline void kasan_copy_shadow(pgd_t *pgdir) { }
>>  #endif
>>
>> -#endif
>> -#endif
>> +#endif /* CONFIG_KASAN */
>> +#endif /* __ASM_KASAN_H */
>> diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
>> index a459714ee29e..daa8a7b9917a 100644
>> --- a/arch/arm64/include/asm/kernel-pgtable.h
>> +++ b/arch/arm64/include/asm/kernel-pgtable.h
>> @@ -70,8 +70,9 @@
>>  /*
>>   * Initial memory map attributes.
>>   */
>> -#define SWAPPER_PTE_FLAGS    (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
>> -#define SWAPPER_PMD_FLAGS    (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
>> +#define SWAPPER_PTE_FLAGS    (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED | PTE_UXN)
>> +#define SWAPPER_PMD_FLAGS    (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S | \
>> +                              PMD_SECT_UXN)
>
> This will only affect the tables created in head.S. Before we start
> userspace we'll have switched over to a new set of tables using
> PAGE_KERNEL (including UXN).
>
> Given that, this doesn't look necessary for the vmalloc area changes. Am
> I missing something?
>

No, this was carried over from an older version of the series, when
the kernel mapping, after having been moved below PAGE_OFFSET, would
not be overridden by the memblock based linear mapping routines, and
so would missing the UXN bit. But with your changes, this can indeed
be dropped.

>>  #if ARM64_SWAPPER_USES_SECTION_MAPS
>>  #define SWAPPER_MM_MMUFLAGS  (PMD_ATTRINDX(MT_NORMAL) | SWAPPER_PMD_FLAGS)
>> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
>> index bea9631b34a8..e45d3141ad98 100644
>> --- a/arch/arm64/include/asm/memory.h
>> +++ b/arch/arm64/include/asm/memory.h
>> @@ -51,14 +51,24 @@
>>  #define VA_BITS                      (CONFIG_ARM64_VA_BITS)
>>  #define VA_START             (UL(0xffffffffffffffff) << VA_BITS)
>>  #define PAGE_OFFSET          (UL(0xffffffffffffffff) << (VA_BITS - 1))
>> -#define KIMAGE_VADDR         (PAGE_OFFSET)
>> -#define MODULES_END          (KIMAGE_VADDR)
>> -#define MODULES_VADDR                (MODULES_END - SZ_64M)
>> -#define PCI_IO_END           (MODULES_VADDR - SZ_2M)
>> +#define PCI_IO_END           (PAGE_OFFSET - SZ_2M)
>>  #define PCI_IO_START         (PCI_IO_END - PCI_IO_SIZE)
>>  #define FIXADDR_TOP          (PCI_IO_START - SZ_2M)
>>  #define TASK_SIZE_64         (UL(1) << VA_BITS)
>>
>> +#ifndef CONFIG_KASAN
>> +#define MODULES_VADDR                (VA_START)
>> +#else
>> +#include <asm/kasan.h>
>> +#define MODULES_VADDR                (KASAN_SHADOW_END)
>> +#endif
>> +
>> +#define MODULES_VSIZE                (SZ_64M)
>> +#define MODULES_END          (MODULES_VADDR + MODULES_VSIZE)
>> +
>> +#define KIMAGE_VADDR         (MODULES_END)
>> +#define VMALLOC_START                (MODULES_END)
>> +
>>  #ifdef CONFIG_COMPAT
>>  #define TASK_SIZE_32         UL(0x100000000)
>>  #define TASK_SIZE            (test_thread_flag(TIF_32BIT) ? \
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index 7b4e16068c9f..a910a44d7ab3 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -42,13 +42,6 @@
>>   */
>>  #define VMEMMAP_SIZE         ALIGN((1UL << (VA_BITS - PAGE_SHIFT)) * sizeof(struct page), PUD_SIZE)
>>
>> -#ifndef CONFIG_KASAN
>> -#define VMALLOC_START                (VA_START)
>> -#else
>> -#include <asm/kasan.h>
>> -#define VMALLOC_START                (KASAN_SHADOW_END + SZ_64K)
>> -#endif
>> -
>>  #define VMALLOC_END          (PAGE_OFFSET - PUD_SIZE - VMEMMAP_SIZE - SZ_64K)
>
> It's a shame VMALLOC_START and VMALLOC_END are now in different headers.
> It would be nice if we could keep them together.
>
> As VMEMMAP_SIZE depends on sizeof(struct page), it's not just a simple
> move. We could either place that in the !__ASSEMBLY__ portion of
> memory.h, or we could add S_PAGE to asm-offsets.
>
> If that's too painful now, we can leave that for subsequent cleanup;
> there's other stuff in that area I'd like to unify at some point (e.g.
> the mem_init and dump.c section boundary descriptions).
>

No, I think I can probably do a bit better than this. I will address it in v4

>>
>>  #define vmemmap                      ((struct page *)(VMALLOC_END + SZ_64K))
>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>> index cfed56f0ad26..c67ba4453ec6 100644
>> --- a/arch/arm64/kernel/setup.c
>> +++ b/arch/arm64/kernel/setup.c
>> @@ -53,6 +53,7 @@
>>  #include <asm/cpufeature.h>
>>  #include <asm/cpu_ops.h>
>>  #include <asm/kasan.h>
>> +#include <asm/kernel-pgtable.h>
>>  #include <asm/sections.h>
>>  #include <asm/setup.h>
>>  #include <asm/smp_plat.h>
>> @@ -291,6 +292,17 @@ u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
>>
>>  void __init setup_arch(char **cmdline_p)
>>  {
>> +     static struct vm_struct vmlinux_vm;
>> +
>> +     vmlinux_vm.addr         = (void *)KIMAGE_VADDR;
>> +     vmlinux_vm.size         = round_up((u64)_end - KIMAGE_VADDR,
>> +                                        SWAPPER_BLOCK_SIZE);
>
> With the fine grained tables we should only need to round up to
> PAGE_SIZE (though _end is implicitly page-aligned anyway). Given that,
> is the SWAPPER_BLOCK_SIZE rounding necessary?
>

No, probably not.

>> +     vmlinux_vm.phys_addr    = __pa(KIMAGE_VADDR);
>> +     vmlinux_vm.flags        = VM_MAP;
>
> I was going to say we should set VM_KASAN also per its description in
> include/vmalloc.h, though per its uses its not clear if it will ever
> matter.
>

No, we shouldn't. Even if we are never going to unmap this vma,
setting the flag will result in the shadow area being freed using
vfree(), while it was not allocated via vmalloc() so that is likely to
cause trouble.

>> +     vmlinux_vm.caller       = setup_arch;
>> +
>> +     vm_area_add_early(&vmlinux_vm);
>
> Do we need to register the kernel VA range quite this early, or could we
> do this around paging_init/map_kernel time?
>

No. Locally, I moved it into map_kernel_chunk, so that we have
separate areas for _text, _init and _data, and we can unmap the _init
entirely rather than only stripping the exec bit. I haven't quite
figured out how to get rid of the vma area, but perhaps it make sense
to keep it reserved, so that modules don't end up there later (which
is possible with the module region randomization I have implemented
for v4) since I don't know how well things like kallsyms etc cope with
that.

>> +
>>       pr_info("Boot CPU: AArch64 Processor [%08x]\n", read_cpuid_id());
>>
>>       sprintf(init_utsname()->machine, ELF_PLATFORM);
>> diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
>> index 5a22a119a74c..e83ffb00560c 100644
>> --- a/arch/arm64/mm/dump.c
>> +++ b/arch/arm64/mm/dump.c
>> @@ -35,7 +35,9 @@ struct addr_marker {
>>  };
>>
>>  enum address_markers_idx {
>> -     VMALLOC_START_NR = 0,
>> +     MODULES_START_NR = 0,
>> +     MODULES_END_NR,
>> +     VMALLOC_START_NR,
>>       VMALLOC_END_NR,
>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>       VMEMMAP_START_NR,
>> @@ -45,12 +47,12 @@ enum address_markers_idx {
>>       FIXADDR_END_NR,
>>       PCI_START_NR,
>>       PCI_END_NR,
>> -     MODULES_START_NR,
>> -     MODUELS_END_NR,
>>       KERNEL_SPACE_NR,
>>  };
>>
>>  static struct addr_marker address_markers[] = {
>> +     { MODULES_VADDR,        "Modules start" },
>> +     { MODULES_END,          "Modules end" },
>>       { VMALLOC_START,        "vmalloc() Area" },
>>       { VMALLOC_END,          "vmalloc() End" },
>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>> @@ -61,9 +63,7 @@ static struct addr_marker address_markers[] = {
>>       { FIXADDR_TOP,          "Fixmap end" },
>>       { PCI_IO_START,         "PCI I/O start" },
>>       { PCI_IO_END,           "PCI I/O end" },
>> -     { MODULES_VADDR,        "Modules start" },
>> -     { MODULES_END,          "Modules end" },
>> -     { PAGE_OFFSET,          "Kernel Mapping" },
>> +     { PAGE_OFFSET,          "Linear Mapping" },
>>       { -1,                   NULL },
>>  };
>>
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index f3b061e67bfe..baa923bda651 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -302,22 +302,26 @@ void __init mem_init(void)
>>  #ifdef CONFIG_KASAN
>>                 "    kasan   : 0x%16lx - 0x%16lx   (%6ld GB)\n"
>>  #endif
>> +               "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>>                 "    vmalloc : 0x%16lx - 0x%16lx   (%6ld GB)\n"
>> +               "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>> +               "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>> +               "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>                 "    vmemmap : 0x%16lx - 0x%16lx   (%6ld GB maximum)\n"
>>                 "              0x%16lx - 0x%16lx   (%6ld MB actual)\n"
>>  #endif
>>                 "    fixed   : 0x%16lx - 0x%16lx   (%6ld KB)\n"
>>                 "    PCI I/O : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>> -               "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>> -               "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>> -               "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>> -               "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>> -               "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n",
>> +               "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n",
>>  #ifdef CONFIG_KASAN
>>                 MLG(KASAN_SHADOW_START, KASAN_SHADOW_END),
>>  #endif
>> +               MLM(MODULES_VADDR, MODULES_END),
>>                 MLG(VMALLOC_START, VMALLOC_END),
>> +               MLK_ROUNDUP(__init_begin, __init_end),
>> +               MLK_ROUNDUP(_text, _etext),
>> +               MLK_ROUNDUP(_sdata, _edata),
>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>                 MLG((unsigned long)vmemmap,
>>                     (unsigned long)vmemmap + VMEMMAP_SIZE),
>> @@ -326,11 +330,7 @@ void __init mem_init(void)
>>  #endif
>>                 MLK(FIXADDR_START, FIXADDR_TOP),
>>                 MLM(PCI_IO_START, PCI_IO_END),
>> -               MLM(MODULES_VADDR, MODULES_END),
>> -               MLM(PAGE_OFFSET, (unsigned long)high_memory),
>> -               MLK_ROUNDUP(__init_begin, __init_end),
>> -               MLK_ROUNDUP(_text, _etext),
>> -               MLK_ROUNDUP(_sdata, _edata));
>> +               MLM(PAGE_OFFSET, (unsigned long)high_memory));
>>
>>  #undef MLK
>>  #undef MLM
>> diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
>> index 0ca411fc5ea3..acdd1ac166ec 100644
>> --- a/arch/arm64/mm/kasan_init.c
>> +++ b/arch/arm64/mm/kasan_init.c
>> @@ -17,9 +17,11 @@
>>  #include <linux/start_kernel.h>
>>
>>  #include <asm/mmu_context.h>
>> +#include <asm/kernel-pgtable.h>
>>  #include <asm/page.h>
>>  #include <asm/pgalloc.h>
>>  #include <asm/pgtable.h>
>> +#include <asm/sections.h>
>>  #include <asm/tlbflush.h>
>>
>>  static pgd_t tmp_pg_dir[PTRS_PER_PGD] __initdata __aligned(PGD_SIZE);
>> @@ -33,7 +35,7 @@ static void __init kasan_early_pte_populate(pmd_t *pmd, unsigned long addr,
>>       if (pmd_none(*pmd))
>>               pmd_populate_kernel(&init_mm, pmd, kasan_zero_pte);
>>
>> -     pte = pte_offset_kernel(pmd, addr);
>> +     pte = pte_offset_kimg(pmd, addr);
>>       do {
>>               next = addr + PAGE_SIZE;
>>               set_pte(pte, pfn_pte(virt_to_pfn(kasan_zero_page),
>> @@ -51,7 +53,7 @@ static void __init kasan_early_pmd_populate(pud_t *pud,
>>       if (pud_none(*pud))
>>               pud_populate(&init_mm, pud, kasan_zero_pmd);
>>
>> -     pmd = pmd_offset(pud, addr);
>> +     pmd = pmd_offset_kimg(pud, addr);
>>       do {
>>               next = pmd_addr_end(addr, end);
>>               kasan_early_pte_populate(pmd, addr, next);
>> @@ -68,7 +70,7 @@ static void __init kasan_early_pud_populate(pgd_t *pgd,
>>       if (pgd_none(*pgd))
>>               pgd_populate(&init_mm, pgd, kasan_zero_pud);
>>
>> -     pud = pud_offset(pgd, addr);
>> +     pud = pud_offset_kimg(pgd, addr);
>>       do {
>>               next = pud_addr_end(addr, end);
>>               kasan_early_pmd_populate(pud, addr, next);
>> @@ -126,8 +128,14 @@ static void __init clear_pgds(unsigned long start,
>>
>>  void __init kasan_init(void)
>>  {
>> +     u64 kimg_shadow_start, kimg_shadow_end;
>>       struct memblock_region *reg;
>>
>> +     kimg_shadow_start = round_down((u64)kasan_mem_to_shadow(_text),
>> +                                    SWAPPER_BLOCK_SIZE);
>> +     kimg_shadow_end = round_up((u64)kasan_mem_to_shadow(_end),
>> +                                SWAPPER_BLOCK_SIZE);
>
> This rounding looks suspect to me, given it's applied to the shadow
> addresses rather than the kimage addresses. That's roughly equivalent to
> kasan_mem_to_shadow(round_up(_end, 8 * SWAPPER_BLOCK_SIZE).
>
> I don't think we need any rounding for the kimage addresses. The image
> end is page-granular (and the fine-grained mapping will reflect that).
> Any accesses between _end and roud_up(_end, SWAPPER_BLOCK_SIZE) would be
> bugs (and would most likely fault) regardless of KASAN.
>
> Or am I just being thick here?
>

Well, the problem here is that vmemmap_populate() is used as a
surrogate vmalloc() since that is not available yet, and
vmemmap_populate() allocates in SWAPPER_BLOCK_SIZE granularity.
If I remove the rounding, I get false positive kasan errors which I
have not quite diagnosed yet, but are probably due to the fact that
the rounding performed by vmemmap_populate() goes in the wrong
direction.

I do wonder what that means for memblocks that are not multiples of 16
MB, though (below)

>> +
>>       /*
>>        * We are going to perform proper setup of shadow memory.
>>        * At first we should unmap early shadow (clear_pgds() call bellow).
>> @@ -141,8 +149,13 @@ void __init kasan_init(void)
>>
>>       clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
>>
>> +     vmemmap_populate(kimg_shadow_start, kimg_shadow_end,
>> +                      pfn_to_nid(virt_to_pfn(kimg_shadow_start)));
>
> That virt_to_pfn doesn't look right -- kimg_shadow_start is neither a
> linear address nor an image address. As pfn_to_nid is hard-coded to 0
> for !NUMA this happens to be ok for us for the moment.
>
> I think we should follow the x86 KASAN code and use NUMA_NO_NODE for
> this for now.
>

Ack

>> +
>>       kasan_populate_zero_shadow((void *)KASAN_SHADOW_START,
>> -                     kasan_mem_to_shadow((void *)MODULES_VADDR));
>> +                                (void *)kimg_shadow_start);
>> +     kasan_populate_zero_shadow((void *)kimg_shadow_end,
>> +                                kasan_mem_to_shadow((void *)PAGE_OFFSET));
>>
>>       for_each_memblock(memory, reg) {
>>               void *start = (void *)__phys_to_virt(reg->base);
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 75b5f0dc3bdc..0b28f1469f9b 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -53,6 +53,10 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
>>  unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
>>  EXPORT_SYMBOL(empty_zero_page);
>>
>> +static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>> +static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
>> +static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
>> +
>>  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>>                             unsigned long size, pgprot_t vma_prot)
>>  {
>> @@ -349,14 +353,14 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>>  {
>>
>>       unsigned long kernel_start = __pa(_stext);
>> -     unsigned long kernel_end = __pa(_end);
>> +     unsigned long kernel_end = __pa(_etext);
>>
>>       /*
>> -      * The kernel itself is mapped at page granularity. Map all other
>> -      * memory, making sure we don't overwrite the existing kernel mappings.
>> +      * Take care not to create a writable alias for the
>> +      * read-only text and rodata sections of the kernel image.
>>        */
>>
>> -     /* No overlap with the kernel. */
>> +     /* No overlap with the kernel text */
>>       if (end < kernel_start || start >= kernel_end) {
>>               __create_pgd_mapping(pgd, start, __phys_to_virt(start),
>>                                    end - start, PAGE_KERNEL,
>> @@ -365,7 +369,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>>       }
>>
>>       /*
>> -      * This block overlaps the kernel mapping. Map the portion(s) which
>> +      * This block overlaps the kernel text mapping. Map the portion(s) which
>>        * don't overlap.
>>        */
>>       if (start < kernel_start)
>> @@ -438,12 +442,29 @@ static void __init map_kernel(pgd_t *pgd)
>>       map_kernel_chunk(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC);
>>       map_kernel_chunk(pgd, _data, _end, PAGE_KERNEL);
>>
>> -     /*
>> -      * The fixmap falls in a separate pgd to the kernel, and doesn't live
>> -      * in the carveout for the swapper_pg_dir. We can simply re-use the
>> -      * existing dir for the fixmap.
>> -      */
>> -     set_pgd(pgd_offset_raw(pgd, FIXADDR_START), *pgd_offset_k(FIXADDR_START));
>> +     if (pgd_index(FIXADDR_START) != pgd_index((u64)_end)) {
>
> To match the style of early_fixmap_init, and given we already mapped the
> kernel image, this could be:
>
>         if (pgd_none(pgd_offset_raw(pgd, FIXADDR_START))) {
>
> Which also serves as a run-time check that the pgd entry really was
> clear.
>

Yes, that looks better. I will steal that :-)

> Other than that, this looks good to me!
>

Thanks!

>> +             /*
>> +              * The fixmap falls in a separate pgd to the kernel, and doesn't
>> +              * live in the carveout for the swapper_pg_dir. We can simply
>> +              * re-use the existing dir for the fixmap.
>> +              */
>> +             set_pgd(pgd_offset_raw(pgd, FIXADDR_START),
>> +                     *pgd_offset_k(FIXADDR_START));
>> +     } else if (CONFIG_PGTABLE_LEVELS > 3) {
>> +             /*
>> +              * The fixmap shares its top level pgd entry with the kernel
>> +              * mapping. This can really only occur when we are running
>> +              * with 16k/4 levels, so we can simply reuse the pud level
>> +              * entry instead.
>> +              */
>> +             BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
>> +
>> +             set_pud(pud_set_fixmap_offset(pgd, FIXADDR_START),
>> +                     __pud(__pa(bm_pmd) | PUD_TYPE_TABLE));
>> +             pud_clear_fixmap();
>> +     } else {
>> +             BUG();
>> +     }
>>
>>       kasan_copy_shadow(pgd);
>>  }
>> @@ -569,10 +590,6 @@ void vmemmap_free(unsigned long start, unsigned long end)
>>  }
>>  #endif       /* CONFIG_SPARSEMEM_VMEMMAP */
>>
>> -static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>> -static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
>> -static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
>> -
>>  static inline pud_t * fixmap_pud(unsigned long addr)
>>  {
>>       return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
>> @@ -598,8 +615,19 @@ void __init early_fixmap_init(void)
>>       unsigned long addr = FIXADDR_START;
>>
>>       pgd = pgd_offset_k(addr);
>> -     pgd_populate(&init_mm, pgd, bm_pud);
>> -     pud = fixmap_pud(addr);
>> +     if (CONFIG_PGTABLE_LEVELS > 3 && !pgd_none(*pgd)) {
>> +             /*
>> +              * We only end up here if the kernel mapping and the fixmap
>> +              * share the top level pgd entry, which should only happen on
>> +              * 16k/4 levels configurations.
>> +              */
>> +             BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
>> +             pud = pud_offset_kimg(pgd, addr);
>> +             memblock_free(__pa(bm_pud), sizeof(bm_pud));
>> +     } else {
>> +             pgd_populate(&init_mm, pgd, bm_pud);
>> +             pud = fixmap_pud(addr);
>> +     }
>>       pud_populate(&init_mm, pud, bm_pmd);
>>       pmd = fixmap_pmd(addr);
>>       pmd_populate_kernel(&init_mm, pmd, bm_pte);
>> --
>> 2.5.0
>>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-13  8:39       ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-13  8:39 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 12 January 2016 at 19:14, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
>> This moves the module area to right before the vmalloc area, and
>> moves the kernel image to the base of the vmalloc area. This is
>> an intermediate step towards implementing kASLR, where the kernel
>> image can be located anywhere in the vmalloc area.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/include/asm/kasan.h          | 20 ++++---
>>  arch/arm64/include/asm/kernel-pgtable.h |  5 +-
>>  arch/arm64/include/asm/memory.h         | 18 ++++--
>>  arch/arm64/include/asm/pgtable.h        |  7 ---
>>  arch/arm64/kernel/setup.c               | 12 ++++
>>  arch/arm64/mm/dump.c                    | 12 ++--
>>  arch/arm64/mm/init.c                    | 20 +++----
>>  arch/arm64/mm/kasan_init.c              | 21 +++++--
>>  arch/arm64/mm/mmu.c                     | 62 ++++++++++++++------
>>  9 files changed, 118 insertions(+), 59 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
>> index de0d21211c34..2c583dbf4746 100644
>> --- a/arch/arm64/include/asm/kasan.h
>> +++ b/arch/arm64/include/asm/kasan.h
>> @@ -1,20 +1,16 @@
>>  #ifndef __ASM_KASAN_H
>>  #define __ASM_KASAN_H
>>
>> -#ifndef __ASSEMBLY__
>> -
>>  #ifdef CONFIG_KASAN
>>
>>  #include <linux/linkage.h>
>> -#include <asm/memory.h>
>> -#include <asm/pgtable-types.h>
>>
>>  /*
>>   * KASAN_SHADOW_START: beginning of the kernel virtual addresses.
>>   * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
>>   */
>> -#define KASAN_SHADOW_START      (VA_START)
>> -#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))
>> +#define KASAN_SHADOW_START   (VA_START)
>> +#define KASAN_SHADOW_END     (KASAN_SHADOW_START + (_AC(1, UL) << (VA_BITS - 3)))
>>
>>  /*
>>   * This value is used to map an address to the corresponding shadow
>> @@ -26,16 +22,22 @@
>>   * should satisfy the following equation:
>>   *      KASAN_SHADOW_OFFSET = KASAN_SHADOW_END - (1ULL << 61)
>>   */
>> -#define KASAN_SHADOW_OFFSET     (KASAN_SHADOW_END - (1ULL << (64 - 3)))
>> +#define KASAN_SHADOW_OFFSET  (KASAN_SHADOW_END - (_AC(1, ULL) << (64 - 3)))
>> +
>
> I couldn't immediately spot where KASAN_SHADOW_* were used in assembly.
> I guess there's some other definition built atop of them that I've
> missed.
>
> Where should I be looking?
>

Well, the problem is that KIMAGE_VADDR will be defined in terms of
KASAN_SHADOW_END if KASAN is enabled. But since KASAN always uses the
first 1/8 of that VA space, I am going to rework this so that the
non-KASAN constants never depend on the actual values but only on
CONFIG_KASAN

>> +#ifndef __ASSEMBLY__
>> +#include <asm/pgtable-types.h>
>>
>>  void kasan_init(void);
>>  void kasan_copy_shadow(pgd_t *pgdir);
>>  asmlinkage void kasan_early_init(void);
>> +#endif
>>
>>  #else
>> +
>> +#ifndef __ASSEMBLY__
>>  static inline void kasan_init(void) { }
>>  static inline void kasan_copy_shadow(pgd_t *pgdir) { }
>>  #endif
>>
>> -#endif
>> -#endif
>> +#endif /* CONFIG_KASAN */
>> +#endif /* __ASM_KASAN_H */
>> diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
>> index a459714ee29e..daa8a7b9917a 100644
>> --- a/arch/arm64/include/asm/kernel-pgtable.h
>> +++ b/arch/arm64/include/asm/kernel-pgtable.h
>> @@ -70,8 +70,9 @@
>>  /*
>>   * Initial memory map attributes.
>>   */
>> -#define SWAPPER_PTE_FLAGS    (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
>> -#define SWAPPER_PMD_FLAGS    (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
>> +#define SWAPPER_PTE_FLAGS    (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED | PTE_UXN)
>> +#define SWAPPER_PMD_FLAGS    (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S | \
>> +                              PMD_SECT_UXN)
>
> This will only affect the tables created in head.S. Before we start
> userspace we'll have switched over to a new set of tables using
> PAGE_KERNEL (including UXN).
>
> Given that, this doesn't look necessary for the vmalloc area changes. Am
> I missing something?
>

No, this was carried over from an older version of the series, when
the kernel mapping, after having been moved below PAGE_OFFSET, would
not be overridden by the memblock based linear mapping routines, and
so would missing the UXN bit. But with your changes, this can indeed
be dropped.

>>  #if ARM64_SWAPPER_USES_SECTION_MAPS
>>  #define SWAPPER_MM_MMUFLAGS  (PMD_ATTRINDX(MT_NORMAL) | SWAPPER_PMD_FLAGS)
>> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
>> index bea9631b34a8..e45d3141ad98 100644
>> --- a/arch/arm64/include/asm/memory.h
>> +++ b/arch/arm64/include/asm/memory.h
>> @@ -51,14 +51,24 @@
>>  #define VA_BITS                      (CONFIG_ARM64_VA_BITS)
>>  #define VA_START             (UL(0xffffffffffffffff) << VA_BITS)
>>  #define PAGE_OFFSET          (UL(0xffffffffffffffff) << (VA_BITS - 1))
>> -#define KIMAGE_VADDR         (PAGE_OFFSET)
>> -#define MODULES_END          (KIMAGE_VADDR)
>> -#define MODULES_VADDR                (MODULES_END - SZ_64M)
>> -#define PCI_IO_END           (MODULES_VADDR - SZ_2M)
>> +#define PCI_IO_END           (PAGE_OFFSET - SZ_2M)
>>  #define PCI_IO_START         (PCI_IO_END - PCI_IO_SIZE)
>>  #define FIXADDR_TOP          (PCI_IO_START - SZ_2M)
>>  #define TASK_SIZE_64         (UL(1) << VA_BITS)
>>
>> +#ifndef CONFIG_KASAN
>> +#define MODULES_VADDR                (VA_START)
>> +#else
>> +#include <asm/kasan.h>
>> +#define MODULES_VADDR                (KASAN_SHADOW_END)
>> +#endif
>> +
>> +#define MODULES_VSIZE                (SZ_64M)
>> +#define MODULES_END          (MODULES_VADDR + MODULES_VSIZE)
>> +
>> +#define KIMAGE_VADDR         (MODULES_END)
>> +#define VMALLOC_START                (MODULES_END)
>> +
>>  #ifdef CONFIG_COMPAT
>>  #define TASK_SIZE_32         UL(0x100000000)
>>  #define TASK_SIZE            (test_thread_flag(TIF_32BIT) ? \
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index 7b4e16068c9f..a910a44d7ab3 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -42,13 +42,6 @@
>>   */
>>  #define VMEMMAP_SIZE         ALIGN((1UL << (VA_BITS - PAGE_SHIFT)) * sizeof(struct page), PUD_SIZE)
>>
>> -#ifndef CONFIG_KASAN
>> -#define VMALLOC_START                (VA_START)
>> -#else
>> -#include <asm/kasan.h>
>> -#define VMALLOC_START                (KASAN_SHADOW_END + SZ_64K)
>> -#endif
>> -
>>  #define VMALLOC_END          (PAGE_OFFSET - PUD_SIZE - VMEMMAP_SIZE - SZ_64K)
>
> It's a shame VMALLOC_START and VMALLOC_END are now in different headers.
> It would be nice if we could keep them together.
>
> As VMEMMAP_SIZE depends on sizeof(struct page), it's not just a simple
> move. We could either place that in the !__ASSEMBLY__ portion of
> memory.h, or we could add S_PAGE to asm-offsets.
>
> If that's too painful now, we can leave that for subsequent cleanup;
> there's other stuff in that area I'd like to unify at some point (e.g.
> the mem_init and dump.c section boundary descriptions).
>

No, I think I can probably do a bit better than this. I will address it in v4

>>
>>  #define vmemmap                      ((struct page *)(VMALLOC_END + SZ_64K))
>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>> index cfed56f0ad26..c67ba4453ec6 100644
>> --- a/arch/arm64/kernel/setup.c
>> +++ b/arch/arm64/kernel/setup.c
>> @@ -53,6 +53,7 @@
>>  #include <asm/cpufeature.h>
>>  #include <asm/cpu_ops.h>
>>  #include <asm/kasan.h>
>> +#include <asm/kernel-pgtable.h>
>>  #include <asm/sections.h>
>>  #include <asm/setup.h>
>>  #include <asm/smp_plat.h>
>> @@ -291,6 +292,17 @@ u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
>>
>>  void __init setup_arch(char **cmdline_p)
>>  {
>> +     static struct vm_struct vmlinux_vm;
>> +
>> +     vmlinux_vm.addr         = (void *)KIMAGE_VADDR;
>> +     vmlinux_vm.size         = round_up((u64)_end - KIMAGE_VADDR,
>> +                                        SWAPPER_BLOCK_SIZE);
>
> With the fine grained tables we should only need to round up to
> PAGE_SIZE (though _end is implicitly page-aligned anyway). Given that,
> is the SWAPPER_BLOCK_SIZE rounding necessary?
>

No, probably not.

>> +     vmlinux_vm.phys_addr    = __pa(KIMAGE_VADDR);
>> +     vmlinux_vm.flags        = VM_MAP;
>
> I was going to say we should set VM_KASAN also per its description in
> include/vmalloc.h, though per its uses its not clear if it will ever
> matter.
>

No, we shouldn't. Even if we are never going to unmap this vma,
setting the flag will result in the shadow area being freed using
vfree(), while it was not allocated via vmalloc() so that is likely to
cause trouble.

>> +     vmlinux_vm.caller       = setup_arch;
>> +
>> +     vm_area_add_early(&vmlinux_vm);
>
> Do we need to register the kernel VA range quite this early, or could we
> do this around paging_init/map_kernel time?
>

No. Locally, I moved it into map_kernel_chunk, so that we have
separate areas for _text, _init and _data, and we can unmap the _init
entirely rather than only stripping the exec bit. I haven't quite
figured out how to get rid of the vma area, but perhaps it make sense
to keep it reserved, so that modules don't end up there later (which
is possible with the module region randomization I have implemented
for v4) since I don't know how well things like kallsyms etc cope with
that.

>> +
>>       pr_info("Boot CPU: AArch64 Processor [%08x]\n", read_cpuid_id());
>>
>>       sprintf(init_utsname()->machine, ELF_PLATFORM);
>> diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
>> index 5a22a119a74c..e83ffb00560c 100644
>> --- a/arch/arm64/mm/dump.c
>> +++ b/arch/arm64/mm/dump.c
>> @@ -35,7 +35,9 @@ struct addr_marker {
>>  };
>>
>>  enum address_markers_idx {
>> -     VMALLOC_START_NR = 0,
>> +     MODULES_START_NR = 0,
>> +     MODULES_END_NR,
>> +     VMALLOC_START_NR,
>>       VMALLOC_END_NR,
>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>       VMEMMAP_START_NR,
>> @@ -45,12 +47,12 @@ enum address_markers_idx {
>>       FIXADDR_END_NR,
>>       PCI_START_NR,
>>       PCI_END_NR,
>> -     MODULES_START_NR,
>> -     MODUELS_END_NR,
>>       KERNEL_SPACE_NR,
>>  };
>>
>>  static struct addr_marker address_markers[] = {
>> +     { MODULES_VADDR,        "Modules start" },
>> +     { MODULES_END,          "Modules end" },
>>       { VMALLOC_START,        "vmalloc() Area" },
>>       { VMALLOC_END,          "vmalloc() End" },
>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>> @@ -61,9 +63,7 @@ static struct addr_marker address_markers[] = {
>>       { FIXADDR_TOP,          "Fixmap end" },
>>       { PCI_IO_START,         "PCI I/O start" },
>>       { PCI_IO_END,           "PCI I/O end" },
>> -     { MODULES_VADDR,        "Modules start" },
>> -     { MODULES_END,          "Modules end" },
>> -     { PAGE_OFFSET,          "Kernel Mapping" },
>> +     { PAGE_OFFSET,          "Linear Mapping" },
>>       { -1,                   NULL },
>>  };
>>
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index f3b061e67bfe..baa923bda651 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -302,22 +302,26 @@ void __init mem_init(void)
>>  #ifdef CONFIG_KASAN
>>                 "    kasan   : 0x%16lx - 0x%16lx   (%6ld GB)\n"
>>  #endif
>> +               "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>>                 "    vmalloc : 0x%16lx - 0x%16lx   (%6ld GB)\n"
>> +               "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>> +               "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>> +               "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>                 "    vmemmap : 0x%16lx - 0x%16lx   (%6ld GB maximum)\n"
>>                 "              0x%16lx - 0x%16lx   (%6ld MB actual)\n"
>>  #endif
>>                 "    fixed   : 0x%16lx - 0x%16lx   (%6ld KB)\n"
>>                 "    PCI I/O : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>> -               "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>> -               "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>> -               "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>> -               "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>> -               "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n",
>> +               "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n",
>>  #ifdef CONFIG_KASAN
>>                 MLG(KASAN_SHADOW_START, KASAN_SHADOW_END),
>>  #endif
>> +               MLM(MODULES_VADDR, MODULES_END),
>>                 MLG(VMALLOC_START, VMALLOC_END),
>> +               MLK_ROUNDUP(__init_begin, __init_end),
>> +               MLK_ROUNDUP(_text, _etext),
>> +               MLK_ROUNDUP(_sdata, _edata),
>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>                 MLG((unsigned long)vmemmap,
>>                     (unsigned long)vmemmap + VMEMMAP_SIZE),
>> @@ -326,11 +330,7 @@ void __init mem_init(void)
>>  #endif
>>                 MLK(FIXADDR_START, FIXADDR_TOP),
>>                 MLM(PCI_IO_START, PCI_IO_END),
>> -               MLM(MODULES_VADDR, MODULES_END),
>> -               MLM(PAGE_OFFSET, (unsigned long)high_memory),
>> -               MLK_ROUNDUP(__init_begin, __init_end),
>> -               MLK_ROUNDUP(_text, _etext),
>> -               MLK_ROUNDUP(_sdata, _edata));
>> +               MLM(PAGE_OFFSET, (unsigned long)high_memory));
>>
>>  #undef MLK
>>  #undef MLM
>> diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
>> index 0ca411fc5ea3..acdd1ac166ec 100644
>> --- a/arch/arm64/mm/kasan_init.c
>> +++ b/arch/arm64/mm/kasan_init.c
>> @@ -17,9 +17,11 @@
>>  #include <linux/start_kernel.h>
>>
>>  #include <asm/mmu_context.h>
>> +#include <asm/kernel-pgtable.h>
>>  #include <asm/page.h>
>>  #include <asm/pgalloc.h>
>>  #include <asm/pgtable.h>
>> +#include <asm/sections.h>
>>  #include <asm/tlbflush.h>
>>
>>  static pgd_t tmp_pg_dir[PTRS_PER_PGD] __initdata __aligned(PGD_SIZE);
>> @@ -33,7 +35,7 @@ static void __init kasan_early_pte_populate(pmd_t *pmd, unsigned long addr,
>>       if (pmd_none(*pmd))
>>               pmd_populate_kernel(&init_mm, pmd, kasan_zero_pte);
>>
>> -     pte = pte_offset_kernel(pmd, addr);
>> +     pte = pte_offset_kimg(pmd, addr);
>>       do {
>>               next = addr + PAGE_SIZE;
>>               set_pte(pte, pfn_pte(virt_to_pfn(kasan_zero_page),
>> @@ -51,7 +53,7 @@ static void __init kasan_early_pmd_populate(pud_t *pud,
>>       if (pud_none(*pud))
>>               pud_populate(&init_mm, pud, kasan_zero_pmd);
>>
>> -     pmd = pmd_offset(pud, addr);
>> +     pmd = pmd_offset_kimg(pud, addr);
>>       do {
>>               next = pmd_addr_end(addr, end);
>>               kasan_early_pte_populate(pmd, addr, next);
>> @@ -68,7 +70,7 @@ static void __init kasan_early_pud_populate(pgd_t *pgd,
>>       if (pgd_none(*pgd))
>>               pgd_populate(&init_mm, pgd, kasan_zero_pud);
>>
>> -     pud = pud_offset(pgd, addr);
>> +     pud = pud_offset_kimg(pgd, addr);
>>       do {
>>               next = pud_addr_end(addr, end);
>>               kasan_early_pmd_populate(pud, addr, next);
>> @@ -126,8 +128,14 @@ static void __init clear_pgds(unsigned long start,
>>
>>  void __init kasan_init(void)
>>  {
>> +     u64 kimg_shadow_start, kimg_shadow_end;
>>       struct memblock_region *reg;
>>
>> +     kimg_shadow_start = round_down((u64)kasan_mem_to_shadow(_text),
>> +                                    SWAPPER_BLOCK_SIZE);
>> +     kimg_shadow_end = round_up((u64)kasan_mem_to_shadow(_end),
>> +                                SWAPPER_BLOCK_SIZE);
>
> This rounding looks suspect to me, given it's applied to the shadow
> addresses rather than the kimage addresses. That's roughly equivalent to
> kasan_mem_to_shadow(round_up(_end, 8 * SWAPPER_BLOCK_SIZE).
>
> I don't think we need any rounding for the kimage addresses. The image
> end is page-granular (and the fine-grained mapping will reflect that).
> Any accesses between _end and roud_up(_end, SWAPPER_BLOCK_SIZE) would be
> bugs (and would most likely fault) regardless of KASAN.
>
> Or am I just being thick here?
>

Well, the problem here is that vmemmap_populate() is used as a
surrogate vmalloc() since that is not available yet, and
vmemmap_populate() allocates in SWAPPER_BLOCK_SIZE granularity.
If I remove the rounding, I get false positive kasan errors which I
have not quite diagnosed yet, but are probably due to the fact that
the rounding performed by vmemmap_populate() goes in the wrong
direction.

I do wonder what that means for memblocks that are not multiples of 16
MB, though (below)

>> +
>>       /*
>>        * We are going to perform proper setup of shadow memory.
>>        * At first we should unmap early shadow (clear_pgds() call bellow).
>> @@ -141,8 +149,13 @@ void __init kasan_init(void)
>>
>>       clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
>>
>> +     vmemmap_populate(kimg_shadow_start, kimg_shadow_end,
>> +                      pfn_to_nid(virt_to_pfn(kimg_shadow_start)));
>
> That virt_to_pfn doesn't look right -- kimg_shadow_start is neither a
> linear address nor an image address. As pfn_to_nid is hard-coded to 0
> for !NUMA this happens to be ok for us for the moment.
>
> I think we should follow the x86 KASAN code and use NUMA_NO_NODE for
> this for now.
>

Ack

>> +
>>       kasan_populate_zero_shadow((void *)KASAN_SHADOW_START,
>> -                     kasan_mem_to_shadow((void *)MODULES_VADDR));
>> +                                (void *)kimg_shadow_start);
>> +     kasan_populate_zero_shadow((void *)kimg_shadow_end,
>> +                                kasan_mem_to_shadow((void *)PAGE_OFFSET));
>>
>>       for_each_memblock(memory, reg) {
>>               void *start = (void *)__phys_to_virt(reg->base);
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 75b5f0dc3bdc..0b28f1469f9b 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -53,6 +53,10 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
>>  unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
>>  EXPORT_SYMBOL(empty_zero_page);
>>
>> +static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>> +static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
>> +static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
>> +
>>  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>>                             unsigned long size, pgprot_t vma_prot)
>>  {
>> @@ -349,14 +353,14 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>>  {
>>
>>       unsigned long kernel_start = __pa(_stext);
>> -     unsigned long kernel_end = __pa(_end);
>> +     unsigned long kernel_end = __pa(_etext);
>>
>>       /*
>> -      * The kernel itself is mapped at page granularity. Map all other
>> -      * memory, making sure we don't overwrite the existing kernel mappings.
>> +      * Take care not to create a writable alias for the
>> +      * read-only text and rodata sections of the kernel image.
>>        */
>>
>> -     /* No overlap with the kernel. */
>> +     /* No overlap with the kernel text */
>>       if (end < kernel_start || start >= kernel_end) {
>>               __create_pgd_mapping(pgd, start, __phys_to_virt(start),
>>                                    end - start, PAGE_KERNEL,
>> @@ -365,7 +369,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>>       }
>>
>>       /*
>> -      * This block overlaps the kernel mapping. Map the portion(s) which
>> +      * This block overlaps the kernel text mapping. Map the portion(s) which
>>        * don't overlap.
>>        */
>>       if (start < kernel_start)
>> @@ -438,12 +442,29 @@ static void __init map_kernel(pgd_t *pgd)
>>       map_kernel_chunk(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC);
>>       map_kernel_chunk(pgd, _data, _end, PAGE_KERNEL);
>>
>> -     /*
>> -      * The fixmap falls in a separate pgd to the kernel, and doesn't live
>> -      * in the carveout for the swapper_pg_dir. We can simply re-use the
>> -      * existing dir for the fixmap.
>> -      */
>> -     set_pgd(pgd_offset_raw(pgd, FIXADDR_START), *pgd_offset_k(FIXADDR_START));
>> +     if (pgd_index(FIXADDR_START) != pgd_index((u64)_end)) {
>
> To match the style of early_fixmap_init, and given we already mapped the
> kernel image, this could be:
>
>         if (pgd_none(pgd_offset_raw(pgd, FIXADDR_START))) {
>
> Which also serves as a run-time check that the pgd entry really was
> clear.
>

Yes, that looks better. I will steal that :-)

> Other than that, this looks good to me!
>

Thanks!

>> +             /*
>> +              * The fixmap falls in a separate pgd to the kernel, and doesn't
>> +              * live in the carveout for the swapper_pg_dir. We can simply
>> +              * re-use the existing dir for the fixmap.
>> +              */
>> +             set_pgd(pgd_offset_raw(pgd, FIXADDR_START),
>> +                     *pgd_offset_k(FIXADDR_START));
>> +     } else if (CONFIG_PGTABLE_LEVELS > 3) {
>> +             /*
>> +              * The fixmap shares its top level pgd entry with the kernel
>> +              * mapping. This can really only occur when we are running
>> +              * with 16k/4 levels, so we can simply reuse the pud level
>> +              * entry instead.
>> +              */
>> +             BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
>> +
>> +             set_pud(pud_set_fixmap_offset(pgd, FIXADDR_START),
>> +                     __pud(__pa(bm_pmd) | PUD_TYPE_TABLE));
>> +             pud_clear_fixmap();
>> +     } else {
>> +             BUG();
>> +     }
>>
>>       kasan_copy_shadow(pgd);
>>  }
>> @@ -569,10 +590,6 @@ void vmemmap_free(unsigned long start, unsigned long end)
>>  }
>>  #endif       /* CONFIG_SPARSEMEM_VMEMMAP */
>>
>> -static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>> -static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
>> -static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
>> -
>>  static inline pud_t * fixmap_pud(unsigned long addr)
>>  {
>>       return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
>> @@ -598,8 +615,19 @@ void __init early_fixmap_init(void)
>>       unsigned long addr = FIXADDR_START;
>>
>>       pgd = pgd_offset_k(addr);
>> -     pgd_populate(&init_mm, pgd, bm_pud);
>> -     pud = fixmap_pud(addr);
>> +     if (CONFIG_PGTABLE_LEVELS > 3 && !pgd_none(*pgd)) {
>> +             /*
>> +              * We only end up here if the kernel mapping and the fixmap
>> +              * share the top level pgd entry, which should only happen on
>> +              * 16k/4 levels configurations.
>> +              */
>> +             BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
>> +             pud = pud_offset_kimg(pgd, addr);
>> +             memblock_free(__pa(bm_pud), sizeof(bm_pud));
>> +     } else {
>> +             pgd_populate(&init_mm, pgd, bm_pud);
>> +             pud = fixmap_pud(addr);
>> +     }
>>       pud_populate(&init_mm, pud, bm_pmd);
>>       pmd = fixmap_pmd(addr);
>>       pmd_populate_kernel(&init_mm, pmd, bm_pte);
>> --
>> 2.5.0
>>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
  2016-01-13  8:39       ` Ard Biesheuvel
  (?)
@ 2016-01-13  9:58         ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-13  9:58 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 13 January 2016 at 09:39, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 12 January 2016 at 19:14, Mark Rutland <mark.rutland@arm.com> wrote:
>> On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
>>> This moves the module area to right before the vmalloc area, and
>>> moves the kernel image to the base of the vmalloc area. This is
>>> an intermediate step towards implementing kASLR, where the kernel
>>> image can be located anywhere in the vmalloc area.
>>>
>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>> ---
>>>  arch/arm64/include/asm/kasan.h          | 20 ++++---
>>>  arch/arm64/include/asm/kernel-pgtable.h |  5 +-
>>>  arch/arm64/include/asm/memory.h         | 18 ++++--
>>>  arch/arm64/include/asm/pgtable.h        |  7 ---
>>>  arch/arm64/kernel/setup.c               | 12 ++++
>>>  arch/arm64/mm/dump.c                    | 12 ++--
>>>  arch/arm64/mm/init.c                    | 20 +++----
>>>  arch/arm64/mm/kasan_init.c              | 21 +++++--
>>>  arch/arm64/mm/mmu.c                     | 62 ++++++++++++++------
>>>  9 files changed, 118 insertions(+), 59 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
>>> index de0d21211c34..2c583dbf4746 100644
>>> --- a/arch/arm64/include/asm/kasan.h
>>> +++ b/arch/arm64/include/asm/kasan.h
>>> @@ -1,20 +1,16 @@
>>>  #ifndef __ASM_KASAN_H
>>>  #define __ASM_KASAN_H
>>>
>>> -#ifndef __ASSEMBLY__
>>> -
>>>  #ifdef CONFIG_KASAN
>>>
>>>  #include <linux/linkage.h>
>>> -#include <asm/memory.h>
>>> -#include <asm/pgtable-types.h>
>>>
>>>  /*
>>>   * KASAN_SHADOW_START: beginning of the kernel virtual addresses.
>>>   * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
>>>   */
>>> -#define KASAN_SHADOW_START      (VA_START)
>>> -#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))
>>> +#define KASAN_SHADOW_START   (VA_START)
>>> +#define KASAN_SHADOW_END     (KASAN_SHADOW_START + (_AC(1, UL) << (VA_BITS - 3)))
>>>
>>>  /*
>>>   * This value is used to map an address to the corresponding shadow
>>> @@ -26,16 +22,22 @@
>>>   * should satisfy the following equation:
>>>   *      KASAN_SHADOW_OFFSET = KASAN_SHADOW_END - (1ULL << 61)
>>>   */
>>> -#define KASAN_SHADOW_OFFSET     (KASAN_SHADOW_END - (1ULL << (64 - 3)))
>>> +#define KASAN_SHADOW_OFFSET  (KASAN_SHADOW_END - (_AC(1, ULL) << (64 - 3)))
>>> +
>>
>> I couldn't immediately spot where KASAN_SHADOW_* were used in assembly.
>> I guess there's some other definition built atop of them that I've
>> missed.
>>
>> Where should I be looking?
>>
>
> Well, the problem is that KIMAGE_VADDR will be defined in terms of
> KASAN_SHADOW_END if KASAN is enabled. But since KASAN always uses the
> first 1/8 of that VA space, I am going to rework this so that the
> non-KASAN constants never depend on the actual values but only on
> CONFIG_KASAN
>
>>> +#ifndef __ASSEMBLY__
>>> +#include <asm/pgtable-types.h>
>>>
>>>  void kasan_init(void);
>>>  void kasan_copy_shadow(pgd_t *pgdir);
>>>  asmlinkage void kasan_early_init(void);
>>> +#endif
>>>
>>>  #else
>>> +
>>> +#ifndef __ASSEMBLY__
>>>  static inline void kasan_init(void) { }
>>>  static inline void kasan_copy_shadow(pgd_t *pgdir) { }
>>>  #endif
>>>
>>> -#endif
>>> -#endif
>>> +#endif /* CONFIG_KASAN */
>>> +#endif /* __ASM_KASAN_H */
>>> diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
>>> index a459714ee29e..daa8a7b9917a 100644
>>> --- a/arch/arm64/include/asm/kernel-pgtable.h
>>> +++ b/arch/arm64/include/asm/kernel-pgtable.h
>>> @@ -70,8 +70,9 @@
>>>  /*
>>>   * Initial memory map attributes.
>>>   */
>>> -#define SWAPPER_PTE_FLAGS    (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
>>> -#define SWAPPER_PMD_FLAGS    (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
>>> +#define SWAPPER_PTE_FLAGS    (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED | PTE_UXN)
>>> +#define SWAPPER_PMD_FLAGS    (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S | \
>>> +                              PMD_SECT_UXN)
>>
>> This will only affect the tables created in head.S. Before we start
>> userspace we'll have switched over to a new set of tables using
>> PAGE_KERNEL (including UXN).
>>
>> Given that, this doesn't look necessary for the vmalloc area changes. Am
>> I missing something?
>>
>
> No, this was carried over from an older version of the series, when
> the kernel mapping, after having been moved below PAGE_OFFSET, would
> not be overridden by the memblock based linear mapping routines, and
> so would missing the UXN bit. But with your changes, this can indeed
> be dropped.
>
>>>  #if ARM64_SWAPPER_USES_SECTION_MAPS
>>>  #define SWAPPER_MM_MMUFLAGS  (PMD_ATTRINDX(MT_NORMAL) | SWAPPER_PMD_FLAGS)
>>> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
>>> index bea9631b34a8..e45d3141ad98 100644
>>> --- a/arch/arm64/include/asm/memory.h
>>> +++ b/arch/arm64/include/asm/memory.h
>>> @@ -51,14 +51,24 @@
>>>  #define VA_BITS                      (CONFIG_ARM64_VA_BITS)
>>>  #define VA_START             (UL(0xffffffffffffffff) << VA_BITS)
>>>  #define PAGE_OFFSET          (UL(0xffffffffffffffff) << (VA_BITS - 1))
>>> -#define KIMAGE_VADDR         (PAGE_OFFSET)
>>> -#define MODULES_END          (KIMAGE_VADDR)
>>> -#define MODULES_VADDR                (MODULES_END - SZ_64M)
>>> -#define PCI_IO_END           (MODULES_VADDR - SZ_2M)
>>> +#define PCI_IO_END           (PAGE_OFFSET - SZ_2M)
>>>  #define PCI_IO_START         (PCI_IO_END - PCI_IO_SIZE)
>>>  #define FIXADDR_TOP          (PCI_IO_START - SZ_2M)
>>>  #define TASK_SIZE_64         (UL(1) << VA_BITS)
>>>
>>> +#ifndef CONFIG_KASAN
>>> +#define MODULES_VADDR                (VA_START)
>>> +#else
>>> +#include <asm/kasan.h>
>>> +#define MODULES_VADDR                (KASAN_SHADOW_END)
>>> +#endif
>>> +
>>> +#define MODULES_VSIZE                (SZ_64M)
>>> +#define MODULES_END          (MODULES_VADDR + MODULES_VSIZE)
>>> +
>>> +#define KIMAGE_VADDR         (MODULES_END)
>>> +#define VMALLOC_START                (MODULES_END)
>>> +
>>>  #ifdef CONFIG_COMPAT
>>>  #define TASK_SIZE_32         UL(0x100000000)
>>>  #define TASK_SIZE            (test_thread_flag(TIF_32BIT) ? \
>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>>> index 7b4e16068c9f..a910a44d7ab3 100644
>>> --- a/arch/arm64/include/asm/pgtable.h
>>> +++ b/arch/arm64/include/asm/pgtable.h
>>> @@ -42,13 +42,6 @@
>>>   */
>>>  #define VMEMMAP_SIZE         ALIGN((1UL << (VA_BITS - PAGE_SHIFT)) * sizeof(struct page), PUD_SIZE)
>>>
>>> -#ifndef CONFIG_KASAN
>>> -#define VMALLOC_START                (VA_START)
>>> -#else
>>> -#include <asm/kasan.h>
>>> -#define VMALLOC_START                (KASAN_SHADOW_END + SZ_64K)
>>> -#endif
>>> -
>>>  #define VMALLOC_END          (PAGE_OFFSET - PUD_SIZE - VMEMMAP_SIZE - SZ_64K)
>>
>> It's a shame VMALLOC_START and VMALLOC_END are now in different headers.
>> It would be nice if we could keep them together.
>>
>> As VMEMMAP_SIZE depends on sizeof(struct page), it's not just a simple
>> move. We could either place that in the !__ASSEMBLY__ portion of
>> memory.h, or we could add S_PAGE to asm-offsets.
>>
>> If that's too painful now, we can leave that for subsequent cleanup;
>> there's other stuff in that area I'd like to unify at some point (e.g.
>> the mem_init and dump.c section boundary descriptions).
>>
>
> No, I think I can probably do a bit better than this. I will address it in v4
>
>>>
>>>  #define vmemmap                      ((struct page *)(VMALLOC_END + SZ_64K))
>>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>>> index cfed56f0ad26..c67ba4453ec6 100644
>>> --- a/arch/arm64/kernel/setup.c
>>> +++ b/arch/arm64/kernel/setup.c
>>> @@ -53,6 +53,7 @@
>>>  #include <asm/cpufeature.h>
>>>  #include <asm/cpu_ops.h>
>>>  #include <asm/kasan.h>
>>> +#include <asm/kernel-pgtable.h>
>>>  #include <asm/sections.h>
>>>  #include <asm/setup.h>
>>>  #include <asm/smp_plat.h>
>>> @@ -291,6 +292,17 @@ u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
>>>
>>>  void __init setup_arch(char **cmdline_p)
>>>  {
>>> +     static struct vm_struct vmlinux_vm;
>>> +
>>> +     vmlinux_vm.addr         = (void *)KIMAGE_VADDR;
>>> +     vmlinux_vm.size         = round_up((u64)_end - KIMAGE_VADDR,
>>> +                                        SWAPPER_BLOCK_SIZE);
>>
>> With the fine grained tables we should only need to round up to
>> PAGE_SIZE (though _end is implicitly page-aligned anyway). Given that,
>> is the SWAPPER_BLOCK_SIZE rounding necessary?
>>
>
> No, probably not.
>
>>> +     vmlinux_vm.phys_addr    = __pa(KIMAGE_VADDR);
>>> +     vmlinux_vm.flags        = VM_MAP;
>>
>> I was going to say we should set VM_KASAN also per its description in
>> include/vmalloc.h, though per its uses its not clear if it will ever
>> matter.
>>
>
> No, we shouldn't. Even if we are never going to unmap this vma,
> setting the flag will result in the shadow area being freed using
> vfree(), while it was not allocated via vmalloc() so that is likely to
> cause trouble.
>
>>> +     vmlinux_vm.caller       = setup_arch;
>>> +
>>> +     vm_area_add_early(&vmlinux_vm);
>>
>> Do we need to register the kernel VA range quite this early, or could we
>> do this around paging_init/map_kernel time?
>>
>
> No. Locally, I moved it into map_kernel_chunk, so that we have
> separate areas for _text, _init and _data, and we can unmap the _init
> entirely rather than only stripping the exec bit. I haven't quite
> figured out how to get rid of the vma area, but perhaps it make sense
> to keep it reserved, so that modules don't end up there later (which
> is possible with the module region randomization I have implemented
> for v4) since I don't know how well things like kallsyms etc cope with
> that.
>
>>> +
>>>       pr_info("Boot CPU: AArch64 Processor [%08x]\n", read_cpuid_id());
>>>
>>>       sprintf(init_utsname()->machine, ELF_PLATFORM);
>>> diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
>>> index 5a22a119a74c..e83ffb00560c 100644
>>> --- a/arch/arm64/mm/dump.c
>>> +++ b/arch/arm64/mm/dump.c
>>> @@ -35,7 +35,9 @@ struct addr_marker {
>>>  };
>>>
>>>  enum address_markers_idx {
>>> -     VMALLOC_START_NR = 0,
>>> +     MODULES_START_NR = 0,
>>> +     MODULES_END_NR,
>>> +     VMALLOC_START_NR,
>>>       VMALLOC_END_NR,
>>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>>       VMEMMAP_START_NR,
>>> @@ -45,12 +47,12 @@ enum address_markers_idx {
>>>       FIXADDR_END_NR,
>>>       PCI_START_NR,
>>>       PCI_END_NR,
>>> -     MODULES_START_NR,
>>> -     MODUELS_END_NR,
>>>       KERNEL_SPACE_NR,
>>>  };
>>>
>>>  static struct addr_marker address_markers[] = {
>>> +     { MODULES_VADDR,        "Modules start" },
>>> +     { MODULES_END,          "Modules end" },
>>>       { VMALLOC_START,        "vmalloc() Area" },
>>>       { VMALLOC_END,          "vmalloc() End" },
>>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>> @@ -61,9 +63,7 @@ static struct addr_marker address_markers[] = {
>>>       { FIXADDR_TOP,          "Fixmap end" },
>>>       { PCI_IO_START,         "PCI I/O start" },
>>>       { PCI_IO_END,           "PCI I/O end" },
>>> -     { MODULES_VADDR,        "Modules start" },
>>> -     { MODULES_END,          "Modules end" },
>>> -     { PAGE_OFFSET,          "Kernel Mapping" },
>>> +     { PAGE_OFFSET,          "Linear Mapping" },
>>>       { -1,                   NULL },
>>>  };
>>>
>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>> index f3b061e67bfe..baa923bda651 100644
>>> --- a/arch/arm64/mm/init.c
>>> +++ b/arch/arm64/mm/init.c
>>> @@ -302,22 +302,26 @@ void __init mem_init(void)
>>>  #ifdef CONFIG_KASAN
>>>                 "    kasan   : 0x%16lx - 0x%16lx   (%6ld GB)\n"
>>>  #endif
>>> +               "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>>>                 "    vmalloc : 0x%16lx - 0x%16lx   (%6ld GB)\n"
>>> +               "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>>> +               "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>>> +               "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>>                 "    vmemmap : 0x%16lx - 0x%16lx   (%6ld GB maximum)\n"
>>>                 "              0x%16lx - 0x%16lx   (%6ld MB actual)\n"
>>>  #endif
>>>                 "    fixed   : 0x%16lx - 0x%16lx   (%6ld KB)\n"
>>>                 "    PCI I/O : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>>> -               "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>>> -               "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>>> -               "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>>> -               "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>>> -               "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n",
>>> +               "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n",
>>>  #ifdef CONFIG_KASAN
>>>                 MLG(KASAN_SHADOW_START, KASAN_SHADOW_END),
>>>  #endif
>>> +               MLM(MODULES_VADDR, MODULES_END),
>>>                 MLG(VMALLOC_START, VMALLOC_END),
>>> +               MLK_ROUNDUP(__init_begin, __init_end),
>>> +               MLK_ROUNDUP(_text, _etext),
>>> +               MLK_ROUNDUP(_sdata, _edata),
>>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>>                 MLG((unsigned long)vmemmap,
>>>                     (unsigned long)vmemmap + VMEMMAP_SIZE),
>>> @@ -326,11 +330,7 @@ void __init mem_init(void)
>>>  #endif
>>>                 MLK(FIXADDR_START, FIXADDR_TOP),
>>>                 MLM(PCI_IO_START, PCI_IO_END),
>>> -               MLM(MODULES_VADDR, MODULES_END),
>>> -               MLM(PAGE_OFFSET, (unsigned long)high_memory),
>>> -               MLK_ROUNDUP(__init_begin, __init_end),
>>> -               MLK_ROUNDUP(_text, _etext),
>>> -               MLK_ROUNDUP(_sdata, _edata));
>>> +               MLM(PAGE_OFFSET, (unsigned long)high_memory));
>>>
>>>  #undef MLK
>>>  #undef MLM
>>> diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
>>> index 0ca411fc5ea3..acdd1ac166ec 100644
>>> --- a/arch/arm64/mm/kasan_init.c
>>> +++ b/arch/arm64/mm/kasan_init.c
>>> @@ -17,9 +17,11 @@
>>>  #include <linux/start_kernel.h>
>>>
>>>  #include <asm/mmu_context.h>
>>> +#include <asm/kernel-pgtable.h>
>>>  #include <asm/page.h>
>>>  #include <asm/pgalloc.h>
>>>  #include <asm/pgtable.h>
>>> +#include <asm/sections.h>
>>>  #include <asm/tlbflush.h>
>>>
>>>  static pgd_t tmp_pg_dir[PTRS_PER_PGD] __initdata __aligned(PGD_SIZE);
>>> @@ -33,7 +35,7 @@ static void __init kasan_early_pte_populate(pmd_t *pmd, unsigned long addr,
>>>       if (pmd_none(*pmd))
>>>               pmd_populate_kernel(&init_mm, pmd, kasan_zero_pte);
>>>
>>> -     pte = pte_offset_kernel(pmd, addr);
>>> +     pte = pte_offset_kimg(pmd, addr);
>>>       do {
>>>               next = addr + PAGE_SIZE;
>>>               set_pte(pte, pfn_pte(virt_to_pfn(kasan_zero_page),
>>> @@ -51,7 +53,7 @@ static void __init kasan_early_pmd_populate(pud_t *pud,
>>>       if (pud_none(*pud))
>>>               pud_populate(&init_mm, pud, kasan_zero_pmd);
>>>
>>> -     pmd = pmd_offset(pud, addr);
>>> +     pmd = pmd_offset_kimg(pud, addr);
>>>       do {
>>>               next = pmd_addr_end(addr, end);
>>>               kasan_early_pte_populate(pmd, addr, next);
>>> @@ -68,7 +70,7 @@ static void __init kasan_early_pud_populate(pgd_t *pgd,
>>>       if (pgd_none(*pgd))
>>>               pgd_populate(&init_mm, pgd, kasan_zero_pud);
>>>
>>> -     pud = pud_offset(pgd, addr);
>>> +     pud = pud_offset_kimg(pgd, addr);
>>>       do {
>>>               next = pud_addr_end(addr, end);
>>>               kasan_early_pmd_populate(pud, addr, next);
>>> @@ -126,8 +128,14 @@ static void __init clear_pgds(unsigned long start,
>>>
>>>  void __init kasan_init(void)
>>>  {
>>> +     u64 kimg_shadow_start, kimg_shadow_end;
>>>       struct memblock_region *reg;
>>>
>>> +     kimg_shadow_start = round_down((u64)kasan_mem_to_shadow(_text),
>>> +                                    SWAPPER_BLOCK_SIZE);
>>> +     kimg_shadow_end = round_up((u64)kasan_mem_to_shadow(_end),
>>> +                                SWAPPER_BLOCK_SIZE);
>>
>> This rounding looks suspect to me, given it's applied to the shadow
>> addresses rather than the kimage addresses. That's roughly equivalent to
>> kasan_mem_to_shadow(round_up(_end, 8 * SWAPPER_BLOCK_SIZE).
>>
>> I don't think we need any rounding for the kimage addresses. The image
>> end is page-granular (and the fine-grained mapping will reflect that).
>> Any accesses between _end and roud_up(_end, SWAPPER_BLOCK_SIZE) would be
>> bugs (and would most likely fault) regardless of KASAN.
>>
>> Or am I just being thick here?
>>
>
> Well, the problem here is that vmemmap_populate() is used as a
> surrogate vmalloc() since that is not available yet, and
> vmemmap_populate() allocates in SWAPPER_BLOCK_SIZE granularity.
> If I remove the rounding, I get false positive kasan errors which I
> have not quite diagnosed yet, but are probably due to the fact that
> the rounding performed by vmemmap_populate() goes in the wrong
> direction.
>
> I do wonder what that means for memblocks that are not multiples of 16
> MB, though (below)
>
>>> +
>>>       /*
>>>        * We are going to perform proper setup of shadow memory.
>>>        * At first we should unmap early shadow (clear_pgds() call bellow).
>>> @@ -141,8 +149,13 @@ void __init kasan_init(void)
>>>
>>>       clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
>>>
>>> +     vmemmap_populate(kimg_shadow_start, kimg_shadow_end,
>>> +                      pfn_to_nid(virt_to_pfn(kimg_shadow_start)));
>>
>> That virt_to_pfn doesn't look right -- kimg_shadow_start is neither a
>> linear address nor an image address. As pfn_to_nid is hard-coded to 0
>> for !NUMA this happens to be ok for us for the moment.
>>
>> I think we should follow the x86 KASAN code and use NUMA_NO_NODE for
>> this for now.
>>
>
> Ack
>
>>> +
>>>       kasan_populate_zero_shadow((void *)KASAN_SHADOW_START,
>>> -                     kasan_mem_to_shadow((void *)MODULES_VADDR));
>>> +                                (void *)kimg_shadow_start);
>>> +     kasan_populate_zero_shadow((void *)kimg_shadow_end,
>>> +                                kasan_mem_to_shadow((void *)PAGE_OFFSET));
>>>
>>>       for_each_memblock(memory, reg) {
>>>               void *start = (void *)__phys_to_virt(reg->base);
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 75b5f0dc3bdc..0b28f1469f9b 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -53,6 +53,10 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
>>>  unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
>>>  EXPORT_SYMBOL(empty_zero_page);
>>>
>>> +static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>>> +static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
>>> +static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
>>> +
>>>  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>>>                             unsigned long size, pgprot_t vma_prot)
>>>  {
>>> @@ -349,14 +353,14 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>>>  {
>>>
>>>       unsigned long kernel_start = __pa(_stext);
>>> -     unsigned long kernel_end = __pa(_end);
>>> +     unsigned long kernel_end = __pa(_etext);
>>>
>>>       /*
>>> -      * The kernel itself is mapped at page granularity. Map all other
>>> -      * memory, making sure we don't overwrite the existing kernel mappings.
>>> +      * Take care not to create a writable alias for the
>>> +      * read-only text and rodata sections of the kernel image.
>>>        */
>>>
>>> -     /* No overlap with the kernel. */
>>> +     /* No overlap with the kernel text */
>>>       if (end < kernel_start || start >= kernel_end) {
>>>               __create_pgd_mapping(pgd, start, __phys_to_virt(start),
>>>                                    end - start, PAGE_KERNEL,
>>> @@ -365,7 +369,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>>>       }
>>>
>>>       /*
>>> -      * This block overlaps the kernel mapping. Map the portion(s) which
>>> +      * This block overlaps the kernel text mapping. Map the portion(s) which
>>>        * don't overlap.
>>>        */
>>>       if (start < kernel_start)
>>> @@ -438,12 +442,29 @@ static void __init map_kernel(pgd_t *pgd)
>>>       map_kernel_chunk(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC);
>>>       map_kernel_chunk(pgd, _data, _end, PAGE_KERNEL);
>>>
>>> -     /*
>>> -      * The fixmap falls in a separate pgd to the kernel, and doesn't live
>>> -      * in the carveout for the swapper_pg_dir. We can simply re-use the
>>> -      * existing dir for the fixmap.
>>> -      */
>>> -     set_pgd(pgd_offset_raw(pgd, FIXADDR_START), *pgd_offset_k(FIXADDR_START));
>>> +     if (pgd_index(FIXADDR_START) != pgd_index((u64)_end)) {
>>
>> To match the style of early_fixmap_init, and given we already mapped the
>> kernel image, this could be:
>>
>>         if (pgd_none(pgd_offset_raw(pgd, FIXADDR_START))) {
>>
>> Which also serves as a run-time check that the pgd entry really was
>> clear.
>>
>
> Yes, that looks better. I will steal that :-)
>

OK, that doesn't work. pgd_none() is hardcoded to 'false' when running
with fewer than 4 pgtable levels, and so we always hit the BUG() here.


>> Other than that, this looks good to me!
>>
>
> Thanks!
>
>>> +             /*
>>> +              * The fixmap falls in a separate pgd to the kernel, and doesn't
>>> +              * live in the carveout for the swapper_pg_dir. We can simply
>>> +              * re-use the existing dir for the fixmap.
>>> +              */
>>> +             set_pgd(pgd_offset_raw(pgd, FIXADDR_START),
>>> +                     *pgd_offset_k(FIXADDR_START));
>>> +     } else if (CONFIG_PGTABLE_LEVELS > 3) {
>>> +             /*
>>> +              * The fixmap shares its top level pgd entry with the kernel
>>> +              * mapping. This can really only occur when we are running
>>> +              * with 16k/4 levels, so we can simply reuse the pud level
>>> +              * entry instead.
>>> +              */
>>> +             BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
>>> +
>>> +             set_pud(pud_set_fixmap_offset(pgd, FIXADDR_START),
>>> +                     __pud(__pa(bm_pmd) | PUD_TYPE_TABLE));
>>> +             pud_clear_fixmap();
>>> +     } else {
>>> +             BUG();
>>> +     }
>>>
>>>       kasan_copy_shadow(pgd);
>>>  }
>>> @@ -569,10 +590,6 @@ void vmemmap_free(unsigned long start, unsigned long end)
>>>  }
>>>  #endif       /* CONFIG_SPARSEMEM_VMEMMAP */
>>>
>>> -static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>>> -static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
>>> -static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
>>> -
>>>  static inline pud_t * fixmap_pud(unsigned long addr)
>>>  {
>>>       return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
>>> @@ -598,8 +615,19 @@ void __init early_fixmap_init(void)
>>>       unsigned long addr = FIXADDR_START;
>>>
>>>       pgd = pgd_offset_k(addr);
>>> -     pgd_populate(&init_mm, pgd, bm_pud);
>>> -     pud = fixmap_pud(addr);
>>> +     if (CONFIG_PGTABLE_LEVELS > 3 && !pgd_none(*pgd)) {
>>> +             /*
>>> +              * We only end up here if the kernel mapping and the fixmap
>>> +              * share the top level pgd entry, which should only happen on
>>> +              * 16k/4 levels configurations.
>>> +              */
>>> +             BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
>>> +             pud = pud_offset_kimg(pgd, addr);
>>> +             memblock_free(__pa(bm_pud), sizeof(bm_pud));
>>> +     } else {
>>> +             pgd_populate(&init_mm, pgd, bm_pud);
>>> +             pud = fixmap_pud(addr);
>>> +     }
>>>       pud_populate(&init_mm, pud, bm_pmd);
>>>       pmd = fixmap_pmd(addr);
>>>       pmd_populate_kernel(&init_mm, pmd, bm_pte);
>>> --
>>> 2.5.0
>>>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-13  9:58         ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-13  9:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 January 2016 at 09:39, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 12 January 2016 at 19:14, Mark Rutland <mark.rutland@arm.com> wrote:
>> On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
>>> This moves the module area to right before the vmalloc area, and
>>> moves the kernel image to the base of the vmalloc area. This is
>>> an intermediate step towards implementing kASLR, where the kernel
>>> image can be located anywhere in the vmalloc area.
>>>
>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>> ---
>>>  arch/arm64/include/asm/kasan.h          | 20 ++++---
>>>  arch/arm64/include/asm/kernel-pgtable.h |  5 +-
>>>  arch/arm64/include/asm/memory.h         | 18 ++++--
>>>  arch/arm64/include/asm/pgtable.h        |  7 ---
>>>  arch/arm64/kernel/setup.c               | 12 ++++
>>>  arch/arm64/mm/dump.c                    | 12 ++--
>>>  arch/arm64/mm/init.c                    | 20 +++----
>>>  arch/arm64/mm/kasan_init.c              | 21 +++++--
>>>  arch/arm64/mm/mmu.c                     | 62 ++++++++++++++------
>>>  9 files changed, 118 insertions(+), 59 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
>>> index de0d21211c34..2c583dbf4746 100644
>>> --- a/arch/arm64/include/asm/kasan.h
>>> +++ b/arch/arm64/include/asm/kasan.h
>>> @@ -1,20 +1,16 @@
>>>  #ifndef __ASM_KASAN_H
>>>  #define __ASM_KASAN_H
>>>
>>> -#ifndef __ASSEMBLY__
>>> -
>>>  #ifdef CONFIG_KASAN
>>>
>>>  #include <linux/linkage.h>
>>> -#include <asm/memory.h>
>>> -#include <asm/pgtable-types.h>
>>>
>>>  /*
>>>   * KASAN_SHADOW_START: beginning of the kernel virtual addresses.
>>>   * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
>>>   */
>>> -#define KASAN_SHADOW_START      (VA_START)
>>> -#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))
>>> +#define KASAN_SHADOW_START   (VA_START)
>>> +#define KASAN_SHADOW_END     (KASAN_SHADOW_START + (_AC(1, UL) << (VA_BITS - 3)))
>>>
>>>  /*
>>>   * This value is used to map an address to the corresponding shadow
>>> @@ -26,16 +22,22 @@
>>>   * should satisfy the following equation:
>>>   *      KASAN_SHADOW_OFFSET = KASAN_SHADOW_END - (1ULL << 61)
>>>   */
>>> -#define KASAN_SHADOW_OFFSET     (KASAN_SHADOW_END - (1ULL << (64 - 3)))
>>> +#define KASAN_SHADOW_OFFSET  (KASAN_SHADOW_END - (_AC(1, ULL) << (64 - 3)))
>>> +
>>
>> I couldn't immediately spot where KASAN_SHADOW_* were used in assembly.
>> I guess there's some other definition built atop of them that I've
>> missed.
>>
>> Where should I be looking?
>>
>
> Well, the problem is that KIMAGE_VADDR will be defined in terms of
> KASAN_SHADOW_END if KASAN is enabled. But since KASAN always uses the
> first 1/8 of that VA space, I am going to rework this so that the
> non-KASAN constants never depend on the actual values but only on
> CONFIG_KASAN
>
>>> +#ifndef __ASSEMBLY__
>>> +#include <asm/pgtable-types.h>
>>>
>>>  void kasan_init(void);
>>>  void kasan_copy_shadow(pgd_t *pgdir);
>>>  asmlinkage void kasan_early_init(void);
>>> +#endif
>>>
>>>  #else
>>> +
>>> +#ifndef __ASSEMBLY__
>>>  static inline void kasan_init(void) { }
>>>  static inline void kasan_copy_shadow(pgd_t *pgdir) { }
>>>  #endif
>>>
>>> -#endif
>>> -#endif
>>> +#endif /* CONFIG_KASAN */
>>> +#endif /* __ASM_KASAN_H */
>>> diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
>>> index a459714ee29e..daa8a7b9917a 100644
>>> --- a/arch/arm64/include/asm/kernel-pgtable.h
>>> +++ b/arch/arm64/include/asm/kernel-pgtable.h
>>> @@ -70,8 +70,9 @@
>>>  /*
>>>   * Initial memory map attributes.
>>>   */
>>> -#define SWAPPER_PTE_FLAGS    (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
>>> -#define SWAPPER_PMD_FLAGS    (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
>>> +#define SWAPPER_PTE_FLAGS    (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED | PTE_UXN)
>>> +#define SWAPPER_PMD_FLAGS    (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S | \
>>> +                              PMD_SECT_UXN)
>>
>> This will only affect the tables created in head.S. Before we start
>> userspace we'll have switched over to a new set of tables using
>> PAGE_KERNEL (including UXN).
>>
>> Given that, this doesn't look necessary for the vmalloc area changes. Am
>> I missing something?
>>
>
> No, this was carried over from an older version of the series, when
> the kernel mapping, after having been moved below PAGE_OFFSET, would
> not be overridden by the memblock based linear mapping routines, and
> so would missing the UXN bit. But with your changes, this can indeed
> be dropped.
>
>>>  #if ARM64_SWAPPER_USES_SECTION_MAPS
>>>  #define SWAPPER_MM_MMUFLAGS  (PMD_ATTRINDX(MT_NORMAL) | SWAPPER_PMD_FLAGS)
>>> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
>>> index bea9631b34a8..e45d3141ad98 100644
>>> --- a/arch/arm64/include/asm/memory.h
>>> +++ b/arch/arm64/include/asm/memory.h
>>> @@ -51,14 +51,24 @@
>>>  #define VA_BITS                      (CONFIG_ARM64_VA_BITS)
>>>  #define VA_START             (UL(0xffffffffffffffff) << VA_BITS)
>>>  #define PAGE_OFFSET          (UL(0xffffffffffffffff) << (VA_BITS - 1))
>>> -#define KIMAGE_VADDR         (PAGE_OFFSET)
>>> -#define MODULES_END          (KIMAGE_VADDR)
>>> -#define MODULES_VADDR                (MODULES_END - SZ_64M)
>>> -#define PCI_IO_END           (MODULES_VADDR - SZ_2M)
>>> +#define PCI_IO_END           (PAGE_OFFSET - SZ_2M)
>>>  #define PCI_IO_START         (PCI_IO_END - PCI_IO_SIZE)
>>>  #define FIXADDR_TOP          (PCI_IO_START - SZ_2M)
>>>  #define TASK_SIZE_64         (UL(1) << VA_BITS)
>>>
>>> +#ifndef CONFIG_KASAN
>>> +#define MODULES_VADDR                (VA_START)
>>> +#else
>>> +#include <asm/kasan.h>
>>> +#define MODULES_VADDR                (KASAN_SHADOW_END)
>>> +#endif
>>> +
>>> +#define MODULES_VSIZE                (SZ_64M)
>>> +#define MODULES_END          (MODULES_VADDR + MODULES_VSIZE)
>>> +
>>> +#define KIMAGE_VADDR         (MODULES_END)
>>> +#define VMALLOC_START                (MODULES_END)
>>> +
>>>  #ifdef CONFIG_COMPAT
>>>  #define TASK_SIZE_32         UL(0x100000000)
>>>  #define TASK_SIZE            (test_thread_flag(TIF_32BIT) ? \
>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>>> index 7b4e16068c9f..a910a44d7ab3 100644
>>> --- a/arch/arm64/include/asm/pgtable.h
>>> +++ b/arch/arm64/include/asm/pgtable.h
>>> @@ -42,13 +42,6 @@
>>>   */
>>>  #define VMEMMAP_SIZE         ALIGN((1UL << (VA_BITS - PAGE_SHIFT)) * sizeof(struct page), PUD_SIZE)
>>>
>>> -#ifndef CONFIG_KASAN
>>> -#define VMALLOC_START                (VA_START)
>>> -#else
>>> -#include <asm/kasan.h>
>>> -#define VMALLOC_START                (KASAN_SHADOW_END + SZ_64K)
>>> -#endif
>>> -
>>>  #define VMALLOC_END          (PAGE_OFFSET - PUD_SIZE - VMEMMAP_SIZE - SZ_64K)
>>
>> It's a shame VMALLOC_START and VMALLOC_END are now in different headers.
>> It would be nice if we could keep them together.
>>
>> As VMEMMAP_SIZE depends on sizeof(struct page), it's not just a simple
>> move. We could either place that in the !__ASSEMBLY__ portion of
>> memory.h, or we could add S_PAGE to asm-offsets.
>>
>> If that's too painful now, we can leave that for subsequent cleanup;
>> there's other stuff in that area I'd like to unify at some point (e.g.
>> the mem_init and dump.c section boundary descriptions).
>>
>
> No, I think I can probably do a bit better than this. I will address it in v4
>
>>>
>>>  #define vmemmap                      ((struct page *)(VMALLOC_END + SZ_64K))
>>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>>> index cfed56f0ad26..c67ba4453ec6 100644
>>> --- a/arch/arm64/kernel/setup.c
>>> +++ b/arch/arm64/kernel/setup.c
>>> @@ -53,6 +53,7 @@
>>>  #include <asm/cpufeature.h>
>>>  #include <asm/cpu_ops.h>
>>>  #include <asm/kasan.h>
>>> +#include <asm/kernel-pgtable.h>
>>>  #include <asm/sections.h>
>>>  #include <asm/setup.h>
>>>  #include <asm/smp_plat.h>
>>> @@ -291,6 +292,17 @@ u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
>>>
>>>  void __init setup_arch(char **cmdline_p)
>>>  {
>>> +     static struct vm_struct vmlinux_vm;
>>> +
>>> +     vmlinux_vm.addr         = (void *)KIMAGE_VADDR;
>>> +     vmlinux_vm.size         = round_up((u64)_end - KIMAGE_VADDR,
>>> +                                        SWAPPER_BLOCK_SIZE);
>>
>> With the fine grained tables we should only need to round up to
>> PAGE_SIZE (though _end is implicitly page-aligned anyway). Given that,
>> is the SWAPPER_BLOCK_SIZE rounding necessary?
>>
>
> No, probably not.
>
>>> +     vmlinux_vm.phys_addr    = __pa(KIMAGE_VADDR);
>>> +     vmlinux_vm.flags        = VM_MAP;
>>
>> I was going to say we should set VM_KASAN also per its description in
>> include/vmalloc.h, though per its uses its not clear if it will ever
>> matter.
>>
>
> No, we shouldn't. Even if we are never going to unmap this vma,
> setting the flag will result in the shadow area being freed using
> vfree(), while it was not allocated via vmalloc() so that is likely to
> cause trouble.
>
>>> +     vmlinux_vm.caller       = setup_arch;
>>> +
>>> +     vm_area_add_early(&vmlinux_vm);
>>
>> Do we need to register the kernel VA range quite this early, or could we
>> do this around paging_init/map_kernel time?
>>
>
> No. Locally, I moved it into map_kernel_chunk, so that we have
> separate areas for _text, _init and _data, and we can unmap the _init
> entirely rather than only stripping the exec bit. I haven't quite
> figured out how to get rid of the vma area, but perhaps it make sense
> to keep it reserved, so that modules don't end up there later (which
> is possible with the module region randomization I have implemented
> for v4) since I don't know how well things like kallsyms etc cope with
> that.
>
>>> +
>>>       pr_info("Boot CPU: AArch64 Processor [%08x]\n", read_cpuid_id());
>>>
>>>       sprintf(init_utsname()->machine, ELF_PLATFORM);
>>> diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
>>> index 5a22a119a74c..e83ffb00560c 100644
>>> --- a/arch/arm64/mm/dump.c
>>> +++ b/arch/arm64/mm/dump.c
>>> @@ -35,7 +35,9 @@ struct addr_marker {
>>>  };
>>>
>>>  enum address_markers_idx {
>>> -     VMALLOC_START_NR = 0,
>>> +     MODULES_START_NR = 0,
>>> +     MODULES_END_NR,
>>> +     VMALLOC_START_NR,
>>>       VMALLOC_END_NR,
>>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>>       VMEMMAP_START_NR,
>>> @@ -45,12 +47,12 @@ enum address_markers_idx {
>>>       FIXADDR_END_NR,
>>>       PCI_START_NR,
>>>       PCI_END_NR,
>>> -     MODULES_START_NR,
>>> -     MODUELS_END_NR,
>>>       KERNEL_SPACE_NR,
>>>  };
>>>
>>>  static struct addr_marker address_markers[] = {
>>> +     { MODULES_VADDR,        "Modules start" },
>>> +     { MODULES_END,          "Modules end" },
>>>       { VMALLOC_START,        "vmalloc() Area" },
>>>       { VMALLOC_END,          "vmalloc() End" },
>>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>> @@ -61,9 +63,7 @@ static struct addr_marker address_markers[] = {
>>>       { FIXADDR_TOP,          "Fixmap end" },
>>>       { PCI_IO_START,         "PCI I/O start" },
>>>       { PCI_IO_END,           "PCI I/O end" },
>>> -     { MODULES_VADDR,        "Modules start" },
>>> -     { MODULES_END,          "Modules end" },
>>> -     { PAGE_OFFSET,          "Kernel Mapping" },
>>> +     { PAGE_OFFSET,          "Linear Mapping" },
>>>       { -1,                   NULL },
>>>  };
>>>
>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>> index f3b061e67bfe..baa923bda651 100644
>>> --- a/arch/arm64/mm/init.c
>>> +++ b/arch/arm64/mm/init.c
>>> @@ -302,22 +302,26 @@ void __init mem_init(void)
>>>  #ifdef CONFIG_KASAN
>>>                 "    kasan   : 0x%16lx - 0x%16lx   (%6ld GB)\n"
>>>  #endif
>>> +               "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>>>                 "    vmalloc : 0x%16lx - 0x%16lx   (%6ld GB)\n"
>>> +               "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>>> +               "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>>> +               "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>>                 "    vmemmap : 0x%16lx - 0x%16lx   (%6ld GB maximum)\n"
>>>                 "              0x%16lx - 0x%16lx   (%6ld MB actual)\n"
>>>  #endif
>>>                 "    fixed   : 0x%16lx - 0x%16lx   (%6ld KB)\n"
>>>                 "    PCI I/O : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>>> -               "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>>> -               "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>>> -               "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>>> -               "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>>> -               "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n",
>>> +               "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n",
>>>  #ifdef CONFIG_KASAN
>>>                 MLG(KASAN_SHADOW_START, KASAN_SHADOW_END),
>>>  #endif
>>> +               MLM(MODULES_VADDR, MODULES_END),
>>>                 MLG(VMALLOC_START, VMALLOC_END),
>>> +               MLK_ROUNDUP(__init_begin, __init_end),
>>> +               MLK_ROUNDUP(_text, _etext),
>>> +               MLK_ROUNDUP(_sdata, _edata),
>>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>>                 MLG((unsigned long)vmemmap,
>>>                     (unsigned long)vmemmap + VMEMMAP_SIZE),
>>> @@ -326,11 +330,7 @@ void __init mem_init(void)
>>>  #endif
>>>                 MLK(FIXADDR_START, FIXADDR_TOP),
>>>                 MLM(PCI_IO_START, PCI_IO_END),
>>> -               MLM(MODULES_VADDR, MODULES_END),
>>> -               MLM(PAGE_OFFSET, (unsigned long)high_memory),
>>> -               MLK_ROUNDUP(__init_begin, __init_end),
>>> -               MLK_ROUNDUP(_text, _etext),
>>> -               MLK_ROUNDUP(_sdata, _edata));
>>> +               MLM(PAGE_OFFSET, (unsigned long)high_memory));
>>>
>>>  #undef MLK
>>>  #undef MLM
>>> diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
>>> index 0ca411fc5ea3..acdd1ac166ec 100644
>>> --- a/arch/arm64/mm/kasan_init.c
>>> +++ b/arch/arm64/mm/kasan_init.c
>>> @@ -17,9 +17,11 @@
>>>  #include <linux/start_kernel.h>
>>>
>>>  #include <asm/mmu_context.h>
>>> +#include <asm/kernel-pgtable.h>
>>>  #include <asm/page.h>
>>>  #include <asm/pgalloc.h>
>>>  #include <asm/pgtable.h>
>>> +#include <asm/sections.h>
>>>  #include <asm/tlbflush.h>
>>>
>>>  static pgd_t tmp_pg_dir[PTRS_PER_PGD] __initdata __aligned(PGD_SIZE);
>>> @@ -33,7 +35,7 @@ static void __init kasan_early_pte_populate(pmd_t *pmd, unsigned long addr,
>>>       if (pmd_none(*pmd))
>>>               pmd_populate_kernel(&init_mm, pmd, kasan_zero_pte);
>>>
>>> -     pte = pte_offset_kernel(pmd, addr);
>>> +     pte = pte_offset_kimg(pmd, addr);
>>>       do {
>>>               next = addr + PAGE_SIZE;
>>>               set_pte(pte, pfn_pte(virt_to_pfn(kasan_zero_page),
>>> @@ -51,7 +53,7 @@ static void __init kasan_early_pmd_populate(pud_t *pud,
>>>       if (pud_none(*pud))
>>>               pud_populate(&init_mm, pud, kasan_zero_pmd);
>>>
>>> -     pmd = pmd_offset(pud, addr);
>>> +     pmd = pmd_offset_kimg(pud, addr);
>>>       do {
>>>               next = pmd_addr_end(addr, end);
>>>               kasan_early_pte_populate(pmd, addr, next);
>>> @@ -68,7 +70,7 @@ static void __init kasan_early_pud_populate(pgd_t *pgd,
>>>       if (pgd_none(*pgd))
>>>               pgd_populate(&init_mm, pgd, kasan_zero_pud);
>>>
>>> -     pud = pud_offset(pgd, addr);
>>> +     pud = pud_offset_kimg(pgd, addr);
>>>       do {
>>>               next = pud_addr_end(addr, end);
>>>               kasan_early_pmd_populate(pud, addr, next);
>>> @@ -126,8 +128,14 @@ static void __init clear_pgds(unsigned long start,
>>>
>>>  void __init kasan_init(void)
>>>  {
>>> +     u64 kimg_shadow_start, kimg_shadow_end;
>>>       struct memblock_region *reg;
>>>
>>> +     kimg_shadow_start = round_down((u64)kasan_mem_to_shadow(_text),
>>> +                                    SWAPPER_BLOCK_SIZE);
>>> +     kimg_shadow_end = round_up((u64)kasan_mem_to_shadow(_end),
>>> +                                SWAPPER_BLOCK_SIZE);
>>
>> This rounding looks suspect to me, given it's applied to the shadow
>> addresses rather than the kimage addresses. That's roughly equivalent to
>> kasan_mem_to_shadow(round_up(_end, 8 * SWAPPER_BLOCK_SIZE).
>>
>> I don't think we need any rounding for the kimage addresses. The image
>> end is page-granular (and the fine-grained mapping will reflect that).
>> Any accesses between _end and roud_up(_end, SWAPPER_BLOCK_SIZE) would be
>> bugs (and would most likely fault) regardless of KASAN.
>>
>> Or am I just being thick here?
>>
>
> Well, the problem here is that vmemmap_populate() is used as a
> surrogate vmalloc() since that is not available yet, and
> vmemmap_populate() allocates in SWAPPER_BLOCK_SIZE granularity.
> If I remove the rounding, I get false positive kasan errors which I
> have not quite diagnosed yet, but are probably due to the fact that
> the rounding performed by vmemmap_populate() goes in the wrong
> direction.
>
> I do wonder what that means for memblocks that are not multiples of 16
> MB, though (below)
>
>>> +
>>>       /*
>>>        * We are going to perform proper setup of shadow memory.
>>>        * At first we should unmap early shadow (clear_pgds() call bellow).
>>> @@ -141,8 +149,13 @@ void __init kasan_init(void)
>>>
>>>       clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
>>>
>>> +     vmemmap_populate(kimg_shadow_start, kimg_shadow_end,
>>> +                      pfn_to_nid(virt_to_pfn(kimg_shadow_start)));
>>
>> That virt_to_pfn doesn't look right -- kimg_shadow_start is neither a
>> linear address nor an image address. As pfn_to_nid is hard-coded to 0
>> for !NUMA this happens to be ok for us for the moment.
>>
>> I think we should follow the x86 KASAN code and use NUMA_NO_NODE for
>> this for now.
>>
>
> Ack
>
>>> +
>>>       kasan_populate_zero_shadow((void *)KASAN_SHADOW_START,
>>> -                     kasan_mem_to_shadow((void *)MODULES_VADDR));
>>> +                                (void *)kimg_shadow_start);
>>> +     kasan_populate_zero_shadow((void *)kimg_shadow_end,
>>> +                                kasan_mem_to_shadow((void *)PAGE_OFFSET));
>>>
>>>       for_each_memblock(memory, reg) {
>>>               void *start = (void *)__phys_to_virt(reg->base);
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 75b5f0dc3bdc..0b28f1469f9b 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -53,6 +53,10 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
>>>  unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
>>>  EXPORT_SYMBOL(empty_zero_page);
>>>
>>> +static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>>> +static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
>>> +static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
>>> +
>>>  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>>>                             unsigned long size, pgprot_t vma_prot)
>>>  {
>>> @@ -349,14 +353,14 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>>>  {
>>>
>>>       unsigned long kernel_start = __pa(_stext);
>>> -     unsigned long kernel_end = __pa(_end);
>>> +     unsigned long kernel_end = __pa(_etext);
>>>
>>>       /*
>>> -      * The kernel itself is mapped at page granularity. Map all other
>>> -      * memory, making sure we don't overwrite the existing kernel mappings.
>>> +      * Take care not to create a writable alias for the
>>> +      * read-only text and rodata sections of the kernel image.
>>>        */
>>>
>>> -     /* No overlap with the kernel. */
>>> +     /* No overlap with the kernel text */
>>>       if (end < kernel_start || start >= kernel_end) {
>>>               __create_pgd_mapping(pgd, start, __phys_to_virt(start),
>>>                                    end - start, PAGE_KERNEL,
>>> @@ -365,7 +369,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>>>       }
>>>
>>>       /*
>>> -      * This block overlaps the kernel mapping. Map the portion(s) which
>>> +      * This block overlaps the kernel text mapping. Map the portion(s) which
>>>        * don't overlap.
>>>        */
>>>       if (start < kernel_start)
>>> @@ -438,12 +442,29 @@ static void __init map_kernel(pgd_t *pgd)
>>>       map_kernel_chunk(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC);
>>>       map_kernel_chunk(pgd, _data, _end, PAGE_KERNEL);
>>>
>>> -     /*
>>> -      * The fixmap falls in a separate pgd to the kernel, and doesn't live
>>> -      * in the carveout for the swapper_pg_dir. We can simply re-use the
>>> -      * existing dir for the fixmap.
>>> -      */
>>> -     set_pgd(pgd_offset_raw(pgd, FIXADDR_START), *pgd_offset_k(FIXADDR_START));
>>> +     if (pgd_index(FIXADDR_START) != pgd_index((u64)_end)) {
>>
>> To match the style of early_fixmap_init, and given we already mapped the
>> kernel image, this could be:
>>
>>         if (pgd_none(pgd_offset_raw(pgd, FIXADDR_START))) {
>>
>> Which also serves as a run-time check that the pgd entry really was
>> clear.
>>
>
> Yes, that looks better. I will steal that :-)
>

OK, that doesn't work. pgd_none() is hardcoded to 'false' when running
with fewer than 4 pgtable levels, and so we always hit the BUG() here.


>> Other than that, this looks good to me!
>>
>
> Thanks!
>
>>> +             /*
>>> +              * The fixmap falls in a separate pgd to the kernel, and doesn't
>>> +              * live in the carveout for the swapper_pg_dir. We can simply
>>> +              * re-use the existing dir for the fixmap.
>>> +              */
>>> +             set_pgd(pgd_offset_raw(pgd, FIXADDR_START),
>>> +                     *pgd_offset_k(FIXADDR_START));
>>> +     } else if (CONFIG_PGTABLE_LEVELS > 3) {
>>> +             /*
>>> +              * The fixmap shares its top level pgd entry with the kernel
>>> +              * mapping. This can really only occur when we are running
>>> +              * with 16k/4 levels, so we can simply reuse the pud level
>>> +              * entry instead.
>>> +              */
>>> +             BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
>>> +
>>> +             set_pud(pud_set_fixmap_offset(pgd, FIXADDR_START),
>>> +                     __pud(__pa(bm_pmd) | PUD_TYPE_TABLE));
>>> +             pud_clear_fixmap();
>>> +     } else {
>>> +             BUG();
>>> +     }
>>>
>>>       kasan_copy_shadow(pgd);
>>>  }
>>> @@ -569,10 +590,6 @@ void vmemmap_free(unsigned long start, unsigned long end)
>>>  }
>>>  #endif       /* CONFIG_SPARSEMEM_VMEMMAP */
>>>
>>> -static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>>> -static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
>>> -static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
>>> -
>>>  static inline pud_t * fixmap_pud(unsigned long addr)
>>>  {
>>>       return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
>>> @@ -598,8 +615,19 @@ void __init early_fixmap_init(void)
>>>       unsigned long addr = FIXADDR_START;
>>>
>>>       pgd = pgd_offset_k(addr);
>>> -     pgd_populate(&init_mm, pgd, bm_pud);
>>> -     pud = fixmap_pud(addr);
>>> +     if (CONFIG_PGTABLE_LEVELS > 3 && !pgd_none(*pgd)) {
>>> +             /*
>>> +              * We only end up here if the kernel mapping and the fixmap
>>> +              * share the top level pgd entry, which should only happen on
>>> +              * 16k/4 levels configurations.
>>> +              */
>>> +             BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
>>> +             pud = pud_offset_kimg(pgd, addr);
>>> +             memblock_free(__pa(bm_pud), sizeof(bm_pud));
>>> +     } else {
>>> +             pgd_populate(&init_mm, pgd, bm_pud);
>>> +             pud = fixmap_pud(addr);
>>> +     }
>>>       pud_populate(&init_mm, pud, bm_pmd);
>>>       pmd = fixmap_pmd(addr);
>>>       pmd_populate_kernel(&init_mm, pmd, bm_pte);
>>> --
>>> 2.5.0
>>>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-13  9:58         ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-13  9:58 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 13 January 2016 at 09:39, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 12 January 2016 at 19:14, Mark Rutland <mark.rutland@arm.com> wrote:
>> On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
>>> This moves the module area to right before the vmalloc area, and
>>> moves the kernel image to the base of the vmalloc area. This is
>>> an intermediate step towards implementing kASLR, where the kernel
>>> image can be located anywhere in the vmalloc area.
>>>
>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>> ---
>>>  arch/arm64/include/asm/kasan.h          | 20 ++++---
>>>  arch/arm64/include/asm/kernel-pgtable.h |  5 +-
>>>  arch/arm64/include/asm/memory.h         | 18 ++++--
>>>  arch/arm64/include/asm/pgtable.h        |  7 ---
>>>  arch/arm64/kernel/setup.c               | 12 ++++
>>>  arch/arm64/mm/dump.c                    | 12 ++--
>>>  arch/arm64/mm/init.c                    | 20 +++----
>>>  arch/arm64/mm/kasan_init.c              | 21 +++++--
>>>  arch/arm64/mm/mmu.c                     | 62 ++++++++++++++------
>>>  9 files changed, 118 insertions(+), 59 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
>>> index de0d21211c34..2c583dbf4746 100644
>>> --- a/arch/arm64/include/asm/kasan.h
>>> +++ b/arch/arm64/include/asm/kasan.h
>>> @@ -1,20 +1,16 @@
>>>  #ifndef __ASM_KASAN_H
>>>  #define __ASM_KASAN_H
>>>
>>> -#ifndef __ASSEMBLY__
>>> -
>>>  #ifdef CONFIG_KASAN
>>>
>>>  #include <linux/linkage.h>
>>> -#include <asm/memory.h>
>>> -#include <asm/pgtable-types.h>
>>>
>>>  /*
>>>   * KASAN_SHADOW_START: beginning of the kernel virtual addresses.
>>>   * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
>>>   */
>>> -#define KASAN_SHADOW_START      (VA_START)
>>> -#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))
>>> +#define KASAN_SHADOW_START   (VA_START)
>>> +#define KASAN_SHADOW_END     (KASAN_SHADOW_START + (_AC(1, UL) << (VA_BITS - 3)))
>>>
>>>  /*
>>>   * This value is used to map an address to the corresponding shadow
>>> @@ -26,16 +22,22 @@
>>>   * should satisfy the following equation:
>>>   *      KASAN_SHADOW_OFFSET = KASAN_SHADOW_END - (1ULL << 61)
>>>   */
>>> -#define KASAN_SHADOW_OFFSET     (KASAN_SHADOW_END - (1ULL << (64 - 3)))
>>> +#define KASAN_SHADOW_OFFSET  (KASAN_SHADOW_END - (_AC(1, ULL) << (64 - 3)))
>>> +
>>
>> I couldn't immediately spot where KASAN_SHADOW_* were used in assembly.
>> I guess there's some other definition built atop of them that I've
>> missed.
>>
>> Where should I be looking?
>>
>
> Well, the problem is that KIMAGE_VADDR will be defined in terms of
> KASAN_SHADOW_END if KASAN is enabled. But since KASAN always uses the
> first 1/8 of that VA space, I am going to rework this so that the
> non-KASAN constants never depend on the actual values but only on
> CONFIG_KASAN
>
>>> +#ifndef __ASSEMBLY__
>>> +#include <asm/pgtable-types.h>
>>>
>>>  void kasan_init(void);
>>>  void kasan_copy_shadow(pgd_t *pgdir);
>>>  asmlinkage void kasan_early_init(void);
>>> +#endif
>>>
>>>  #else
>>> +
>>> +#ifndef __ASSEMBLY__
>>>  static inline void kasan_init(void) { }
>>>  static inline void kasan_copy_shadow(pgd_t *pgdir) { }
>>>  #endif
>>>
>>> -#endif
>>> -#endif
>>> +#endif /* CONFIG_KASAN */
>>> +#endif /* __ASM_KASAN_H */
>>> diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
>>> index a459714ee29e..daa8a7b9917a 100644
>>> --- a/arch/arm64/include/asm/kernel-pgtable.h
>>> +++ b/arch/arm64/include/asm/kernel-pgtable.h
>>> @@ -70,8 +70,9 @@
>>>  /*
>>>   * Initial memory map attributes.
>>>   */
>>> -#define SWAPPER_PTE_FLAGS    (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
>>> -#define SWAPPER_PMD_FLAGS    (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
>>> +#define SWAPPER_PTE_FLAGS    (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED | PTE_UXN)
>>> +#define SWAPPER_PMD_FLAGS    (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S | \
>>> +                              PMD_SECT_UXN)
>>
>> This will only affect the tables created in head.S. Before we start
>> userspace we'll have switched over to a new set of tables using
>> PAGE_KERNEL (including UXN).
>>
>> Given that, this doesn't look necessary for the vmalloc area changes. Am
>> I missing something?
>>
>
> No, this was carried over from an older version of the series, when
> the kernel mapping, after having been moved below PAGE_OFFSET, would
> not be overridden by the memblock based linear mapping routines, and
> so would missing the UXN bit. But with your changes, this can indeed
> be dropped.
>
>>>  #if ARM64_SWAPPER_USES_SECTION_MAPS
>>>  #define SWAPPER_MM_MMUFLAGS  (PMD_ATTRINDX(MT_NORMAL) | SWAPPER_PMD_FLAGS)
>>> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
>>> index bea9631b34a8..e45d3141ad98 100644
>>> --- a/arch/arm64/include/asm/memory.h
>>> +++ b/arch/arm64/include/asm/memory.h
>>> @@ -51,14 +51,24 @@
>>>  #define VA_BITS                      (CONFIG_ARM64_VA_BITS)
>>>  #define VA_START             (UL(0xffffffffffffffff) << VA_BITS)
>>>  #define PAGE_OFFSET          (UL(0xffffffffffffffff) << (VA_BITS - 1))
>>> -#define KIMAGE_VADDR         (PAGE_OFFSET)
>>> -#define MODULES_END          (KIMAGE_VADDR)
>>> -#define MODULES_VADDR                (MODULES_END - SZ_64M)
>>> -#define PCI_IO_END           (MODULES_VADDR - SZ_2M)
>>> +#define PCI_IO_END           (PAGE_OFFSET - SZ_2M)
>>>  #define PCI_IO_START         (PCI_IO_END - PCI_IO_SIZE)
>>>  #define FIXADDR_TOP          (PCI_IO_START - SZ_2M)
>>>  #define TASK_SIZE_64         (UL(1) << VA_BITS)
>>>
>>> +#ifndef CONFIG_KASAN
>>> +#define MODULES_VADDR                (VA_START)
>>> +#else
>>> +#include <asm/kasan.h>
>>> +#define MODULES_VADDR                (KASAN_SHADOW_END)
>>> +#endif
>>> +
>>> +#define MODULES_VSIZE                (SZ_64M)
>>> +#define MODULES_END          (MODULES_VADDR + MODULES_VSIZE)
>>> +
>>> +#define KIMAGE_VADDR         (MODULES_END)
>>> +#define VMALLOC_START                (MODULES_END)
>>> +
>>>  #ifdef CONFIG_COMPAT
>>>  #define TASK_SIZE_32         UL(0x100000000)
>>>  #define TASK_SIZE            (test_thread_flag(TIF_32BIT) ? \
>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>>> index 7b4e16068c9f..a910a44d7ab3 100644
>>> --- a/arch/arm64/include/asm/pgtable.h
>>> +++ b/arch/arm64/include/asm/pgtable.h
>>> @@ -42,13 +42,6 @@
>>>   */
>>>  #define VMEMMAP_SIZE         ALIGN((1UL << (VA_BITS - PAGE_SHIFT)) * sizeof(struct page), PUD_SIZE)
>>>
>>> -#ifndef CONFIG_KASAN
>>> -#define VMALLOC_START                (VA_START)
>>> -#else
>>> -#include <asm/kasan.h>
>>> -#define VMALLOC_START                (KASAN_SHADOW_END + SZ_64K)
>>> -#endif
>>> -
>>>  #define VMALLOC_END          (PAGE_OFFSET - PUD_SIZE - VMEMMAP_SIZE - SZ_64K)
>>
>> It's a shame VMALLOC_START and VMALLOC_END are now in different headers.
>> It would be nice if we could keep them together.
>>
>> As VMEMMAP_SIZE depends on sizeof(struct page), it's not just a simple
>> move. We could either place that in the !__ASSEMBLY__ portion of
>> memory.h, or we could add S_PAGE to asm-offsets.
>>
>> If that's too painful now, we can leave that for subsequent cleanup;
>> there's other stuff in that area I'd like to unify at some point (e.g.
>> the mem_init and dump.c section boundary descriptions).
>>
>
> No, I think I can probably do a bit better than this. I will address it in v4
>
>>>
>>>  #define vmemmap                      ((struct page *)(VMALLOC_END + SZ_64K))
>>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>>> index cfed56f0ad26..c67ba4453ec6 100644
>>> --- a/arch/arm64/kernel/setup.c
>>> +++ b/arch/arm64/kernel/setup.c
>>> @@ -53,6 +53,7 @@
>>>  #include <asm/cpufeature.h>
>>>  #include <asm/cpu_ops.h>
>>>  #include <asm/kasan.h>
>>> +#include <asm/kernel-pgtable.h>
>>>  #include <asm/sections.h>
>>>  #include <asm/setup.h>
>>>  #include <asm/smp_plat.h>
>>> @@ -291,6 +292,17 @@ u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
>>>
>>>  void __init setup_arch(char **cmdline_p)
>>>  {
>>> +     static struct vm_struct vmlinux_vm;
>>> +
>>> +     vmlinux_vm.addr         = (void *)KIMAGE_VADDR;
>>> +     vmlinux_vm.size         = round_up((u64)_end - KIMAGE_VADDR,
>>> +                                        SWAPPER_BLOCK_SIZE);
>>
>> With the fine grained tables we should only need to round up to
>> PAGE_SIZE (though _end is implicitly page-aligned anyway). Given that,
>> is the SWAPPER_BLOCK_SIZE rounding necessary?
>>
>
> No, probably not.
>
>>> +     vmlinux_vm.phys_addr    = __pa(KIMAGE_VADDR);
>>> +     vmlinux_vm.flags        = VM_MAP;
>>
>> I was going to say we should set VM_KASAN also per its description in
>> include/vmalloc.h, though per its uses its not clear if it will ever
>> matter.
>>
>
> No, we shouldn't. Even if we are never going to unmap this vma,
> setting the flag will result in the shadow area being freed using
> vfree(), while it was not allocated via vmalloc() so that is likely to
> cause trouble.
>
>>> +     vmlinux_vm.caller       = setup_arch;
>>> +
>>> +     vm_area_add_early(&vmlinux_vm);
>>
>> Do we need to register the kernel VA range quite this early, or could we
>> do this around paging_init/map_kernel time?
>>
>
> No. Locally, I moved it into map_kernel_chunk, so that we have
> separate areas for _text, _init and _data, and we can unmap the _init
> entirely rather than only stripping the exec bit. I haven't quite
> figured out how to get rid of the vma area, but perhaps it make sense
> to keep it reserved, so that modules don't end up there later (which
> is possible with the module region randomization I have implemented
> for v4) since I don't know how well things like kallsyms etc cope with
> that.
>
>>> +
>>>       pr_info("Boot CPU: AArch64 Processor [%08x]\n", read_cpuid_id());
>>>
>>>       sprintf(init_utsname()->machine, ELF_PLATFORM);
>>> diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
>>> index 5a22a119a74c..e83ffb00560c 100644
>>> --- a/arch/arm64/mm/dump.c
>>> +++ b/arch/arm64/mm/dump.c
>>> @@ -35,7 +35,9 @@ struct addr_marker {
>>>  };
>>>
>>>  enum address_markers_idx {
>>> -     VMALLOC_START_NR = 0,
>>> +     MODULES_START_NR = 0,
>>> +     MODULES_END_NR,
>>> +     VMALLOC_START_NR,
>>>       VMALLOC_END_NR,
>>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>>       VMEMMAP_START_NR,
>>> @@ -45,12 +47,12 @@ enum address_markers_idx {
>>>       FIXADDR_END_NR,
>>>       PCI_START_NR,
>>>       PCI_END_NR,
>>> -     MODULES_START_NR,
>>> -     MODUELS_END_NR,
>>>       KERNEL_SPACE_NR,
>>>  };
>>>
>>>  static struct addr_marker address_markers[] = {
>>> +     { MODULES_VADDR,        "Modules start" },
>>> +     { MODULES_END,          "Modules end" },
>>>       { VMALLOC_START,        "vmalloc() Area" },
>>>       { VMALLOC_END,          "vmalloc() End" },
>>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>> @@ -61,9 +63,7 @@ static struct addr_marker address_markers[] = {
>>>       { FIXADDR_TOP,          "Fixmap end" },
>>>       { PCI_IO_START,         "PCI I/O start" },
>>>       { PCI_IO_END,           "PCI I/O end" },
>>> -     { MODULES_VADDR,        "Modules start" },
>>> -     { MODULES_END,          "Modules end" },
>>> -     { PAGE_OFFSET,          "Kernel Mapping" },
>>> +     { PAGE_OFFSET,          "Linear Mapping" },
>>>       { -1,                   NULL },
>>>  };
>>>
>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>> index f3b061e67bfe..baa923bda651 100644
>>> --- a/arch/arm64/mm/init.c
>>> +++ b/arch/arm64/mm/init.c
>>> @@ -302,22 +302,26 @@ void __init mem_init(void)
>>>  #ifdef CONFIG_KASAN
>>>                 "    kasan   : 0x%16lx - 0x%16lx   (%6ld GB)\n"
>>>  #endif
>>> +               "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>>>                 "    vmalloc : 0x%16lx - 0x%16lx   (%6ld GB)\n"
>>> +               "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>>> +               "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>>> +               "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>>                 "    vmemmap : 0x%16lx - 0x%16lx   (%6ld GB maximum)\n"
>>>                 "              0x%16lx - 0x%16lx   (%6ld MB actual)\n"
>>>  #endif
>>>                 "    fixed   : 0x%16lx - 0x%16lx   (%6ld KB)\n"
>>>                 "    PCI I/O : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>>> -               "    modules : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>>> -               "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n"
>>> -               "      .init : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>>> -               "      .text : 0x%p" " - 0x%p" "   (%6ld KB)\n"
>>> -               "      .data : 0x%p" " - 0x%p" "   (%6ld KB)\n",
>>> +               "    memory  : 0x%16lx - 0x%16lx   (%6ld MB)\n",
>>>  #ifdef CONFIG_KASAN
>>>                 MLG(KASAN_SHADOW_START, KASAN_SHADOW_END),
>>>  #endif
>>> +               MLM(MODULES_VADDR, MODULES_END),
>>>                 MLG(VMALLOC_START, VMALLOC_END),
>>> +               MLK_ROUNDUP(__init_begin, __init_end),
>>> +               MLK_ROUNDUP(_text, _etext),
>>> +               MLK_ROUNDUP(_sdata, _edata),
>>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>>                 MLG((unsigned long)vmemmap,
>>>                     (unsigned long)vmemmap + VMEMMAP_SIZE),
>>> @@ -326,11 +330,7 @@ void __init mem_init(void)
>>>  #endif
>>>                 MLK(FIXADDR_START, FIXADDR_TOP),
>>>                 MLM(PCI_IO_START, PCI_IO_END),
>>> -               MLM(MODULES_VADDR, MODULES_END),
>>> -               MLM(PAGE_OFFSET, (unsigned long)high_memory),
>>> -               MLK_ROUNDUP(__init_begin, __init_end),
>>> -               MLK_ROUNDUP(_text, _etext),
>>> -               MLK_ROUNDUP(_sdata, _edata));
>>> +               MLM(PAGE_OFFSET, (unsigned long)high_memory));
>>>
>>>  #undef MLK
>>>  #undef MLM
>>> diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
>>> index 0ca411fc5ea3..acdd1ac166ec 100644
>>> --- a/arch/arm64/mm/kasan_init.c
>>> +++ b/arch/arm64/mm/kasan_init.c
>>> @@ -17,9 +17,11 @@
>>>  #include <linux/start_kernel.h>
>>>
>>>  #include <asm/mmu_context.h>
>>> +#include <asm/kernel-pgtable.h>
>>>  #include <asm/page.h>
>>>  #include <asm/pgalloc.h>
>>>  #include <asm/pgtable.h>
>>> +#include <asm/sections.h>
>>>  #include <asm/tlbflush.h>
>>>
>>>  static pgd_t tmp_pg_dir[PTRS_PER_PGD] __initdata __aligned(PGD_SIZE);
>>> @@ -33,7 +35,7 @@ static void __init kasan_early_pte_populate(pmd_t *pmd, unsigned long addr,
>>>       if (pmd_none(*pmd))
>>>               pmd_populate_kernel(&init_mm, pmd, kasan_zero_pte);
>>>
>>> -     pte = pte_offset_kernel(pmd, addr);
>>> +     pte = pte_offset_kimg(pmd, addr);
>>>       do {
>>>               next = addr + PAGE_SIZE;
>>>               set_pte(pte, pfn_pte(virt_to_pfn(kasan_zero_page),
>>> @@ -51,7 +53,7 @@ static void __init kasan_early_pmd_populate(pud_t *pud,
>>>       if (pud_none(*pud))
>>>               pud_populate(&init_mm, pud, kasan_zero_pmd);
>>>
>>> -     pmd = pmd_offset(pud, addr);
>>> +     pmd = pmd_offset_kimg(pud, addr);
>>>       do {
>>>               next = pmd_addr_end(addr, end);
>>>               kasan_early_pte_populate(pmd, addr, next);
>>> @@ -68,7 +70,7 @@ static void __init kasan_early_pud_populate(pgd_t *pgd,
>>>       if (pgd_none(*pgd))
>>>               pgd_populate(&init_mm, pgd, kasan_zero_pud);
>>>
>>> -     pud = pud_offset(pgd, addr);
>>> +     pud = pud_offset_kimg(pgd, addr);
>>>       do {
>>>               next = pud_addr_end(addr, end);
>>>               kasan_early_pmd_populate(pud, addr, next);
>>> @@ -126,8 +128,14 @@ static void __init clear_pgds(unsigned long start,
>>>
>>>  void __init kasan_init(void)
>>>  {
>>> +     u64 kimg_shadow_start, kimg_shadow_end;
>>>       struct memblock_region *reg;
>>>
>>> +     kimg_shadow_start = round_down((u64)kasan_mem_to_shadow(_text),
>>> +                                    SWAPPER_BLOCK_SIZE);
>>> +     kimg_shadow_end = round_up((u64)kasan_mem_to_shadow(_end),
>>> +                                SWAPPER_BLOCK_SIZE);
>>
>> This rounding looks suspect to me, given it's applied to the shadow
>> addresses rather than the kimage addresses. That's roughly equivalent to
>> kasan_mem_to_shadow(round_up(_end, 8 * SWAPPER_BLOCK_SIZE).
>>
>> I don't think we need any rounding for the kimage addresses. The image
>> end is page-granular (and the fine-grained mapping will reflect that).
>> Any accesses between _end and roud_up(_end, SWAPPER_BLOCK_SIZE) would be
>> bugs (and would most likely fault) regardless of KASAN.
>>
>> Or am I just being thick here?
>>
>
> Well, the problem here is that vmemmap_populate() is used as a
> surrogate vmalloc() since that is not available yet, and
> vmemmap_populate() allocates in SWAPPER_BLOCK_SIZE granularity.
> If I remove the rounding, I get false positive kasan errors which I
> have not quite diagnosed yet, but are probably due to the fact that
> the rounding performed by vmemmap_populate() goes in the wrong
> direction.
>
> I do wonder what that means for memblocks that are not multiples of 16
> MB, though (below)
>
>>> +
>>>       /*
>>>        * We are going to perform proper setup of shadow memory.
>>>        * At first we should unmap early shadow (clear_pgds() call bellow).
>>> @@ -141,8 +149,13 @@ void __init kasan_init(void)
>>>
>>>       clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
>>>
>>> +     vmemmap_populate(kimg_shadow_start, kimg_shadow_end,
>>> +                      pfn_to_nid(virt_to_pfn(kimg_shadow_start)));
>>
>> That virt_to_pfn doesn't look right -- kimg_shadow_start is neither a
>> linear address nor an image address. As pfn_to_nid is hard-coded to 0
>> for !NUMA this happens to be ok for us for the moment.
>>
>> I think we should follow the x86 KASAN code and use NUMA_NO_NODE for
>> this for now.
>>
>
> Ack
>
>>> +
>>>       kasan_populate_zero_shadow((void *)KASAN_SHADOW_START,
>>> -                     kasan_mem_to_shadow((void *)MODULES_VADDR));
>>> +                                (void *)kimg_shadow_start);
>>> +     kasan_populate_zero_shadow((void *)kimg_shadow_end,
>>> +                                kasan_mem_to_shadow((void *)PAGE_OFFSET));
>>>
>>>       for_each_memblock(memory, reg) {
>>>               void *start = (void *)__phys_to_virt(reg->base);
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 75b5f0dc3bdc..0b28f1469f9b 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -53,6 +53,10 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
>>>  unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
>>>  EXPORT_SYMBOL(empty_zero_page);
>>>
>>> +static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>>> +static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
>>> +static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
>>> +
>>>  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>>>                             unsigned long size, pgprot_t vma_prot)
>>>  {
>>> @@ -349,14 +353,14 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>>>  {
>>>
>>>       unsigned long kernel_start = __pa(_stext);
>>> -     unsigned long kernel_end = __pa(_end);
>>> +     unsigned long kernel_end = __pa(_etext);
>>>
>>>       /*
>>> -      * The kernel itself is mapped at page granularity. Map all other
>>> -      * memory, making sure we don't overwrite the existing kernel mappings.
>>> +      * Take care not to create a writable alias for the
>>> +      * read-only text and rodata sections of the kernel image.
>>>        */
>>>
>>> -     /* No overlap with the kernel. */
>>> +     /* No overlap with the kernel text */
>>>       if (end < kernel_start || start >= kernel_end) {
>>>               __create_pgd_mapping(pgd, start, __phys_to_virt(start),
>>>                                    end - start, PAGE_KERNEL,
>>> @@ -365,7 +369,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end
>>>       }
>>>
>>>       /*
>>> -      * This block overlaps the kernel mapping. Map the portion(s) which
>>> +      * This block overlaps the kernel text mapping. Map the portion(s) which
>>>        * don't overlap.
>>>        */
>>>       if (start < kernel_start)
>>> @@ -438,12 +442,29 @@ static void __init map_kernel(pgd_t *pgd)
>>>       map_kernel_chunk(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC);
>>>       map_kernel_chunk(pgd, _data, _end, PAGE_KERNEL);
>>>
>>> -     /*
>>> -      * The fixmap falls in a separate pgd to the kernel, and doesn't live
>>> -      * in the carveout for the swapper_pg_dir. We can simply re-use the
>>> -      * existing dir for the fixmap.
>>> -      */
>>> -     set_pgd(pgd_offset_raw(pgd, FIXADDR_START), *pgd_offset_k(FIXADDR_START));
>>> +     if (pgd_index(FIXADDR_START) != pgd_index((u64)_end)) {
>>
>> To match the style of early_fixmap_init, and given we already mapped the
>> kernel image, this could be:
>>
>>         if (pgd_none(pgd_offset_raw(pgd, FIXADDR_START))) {
>>
>> Which also serves as a run-time check that the pgd entry really was
>> clear.
>>
>
> Yes, that looks better. I will steal that :-)
>

OK, that doesn't work. pgd_none() is hardcoded to 'false' when running
with fewer than 4 pgtable levels, and so we always hit the BUG() here.


>> Other than that, this looks good to me!
>>
>
> Thanks!
>
>>> +             /*
>>> +              * The fixmap falls in a separate pgd to the kernel, and doesn't
>>> +              * live in the carveout for the swapper_pg_dir. We can simply
>>> +              * re-use the existing dir for the fixmap.
>>> +              */
>>> +             set_pgd(pgd_offset_raw(pgd, FIXADDR_START),
>>> +                     *pgd_offset_k(FIXADDR_START));
>>> +     } else if (CONFIG_PGTABLE_LEVELS > 3) {
>>> +             /*
>>> +              * The fixmap shares its top level pgd entry with the kernel
>>> +              * mapping. This can really only occur when we are running
>>> +              * with 16k/4 levels, so we can simply reuse the pud level
>>> +              * entry instead.
>>> +              */
>>> +             BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
>>> +
>>> +             set_pud(pud_set_fixmap_offset(pgd, FIXADDR_START),
>>> +                     __pud(__pa(bm_pmd) | PUD_TYPE_TABLE));
>>> +             pud_clear_fixmap();
>>> +     } else {
>>> +             BUG();
>>> +     }
>>>
>>>       kasan_copy_shadow(pgd);
>>>  }
>>> @@ -569,10 +590,6 @@ void vmemmap_free(unsigned long start, unsigned long end)
>>>  }
>>>  #endif       /* CONFIG_SPARSEMEM_VMEMMAP */
>>>
>>> -static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>>> -static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss;
>>> -static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss;
>>> -
>>>  static inline pud_t * fixmap_pud(unsigned long addr)
>>>  {
>>>       return (CONFIG_PGTABLE_LEVELS > 3) ? &bm_pud[pud_index(addr)]
>>> @@ -598,8 +615,19 @@ void __init early_fixmap_init(void)
>>>       unsigned long addr = FIXADDR_START;
>>>
>>>       pgd = pgd_offset_k(addr);
>>> -     pgd_populate(&init_mm, pgd, bm_pud);
>>> -     pud = fixmap_pud(addr);
>>> +     if (CONFIG_PGTABLE_LEVELS > 3 && !pgd_none(*pgd)) {
>>> +             /*
>>> +              * We only end up here if the kernel mapping and the fixmap
>>> +              * share the top level pgd entry, which should only happen on
>>> +              * 16k/4 levels configurations.
>>> +              */
>>> +             BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
>>> +             pud = pud_offset_kimg(pgd, addr);
>>> +             memblock_free(__pa(bm_pud), sizeof(bm_pud));
>>> +     } else {
>>> +             pgd_populate(&init_mm, pgd, bm_pud);
>>> +             pud = fixmap_pud(addr);
>>> +     }
>>>       pud_populate(&init_mm, pud, bm_pmd);
>>>       pmd = fixmap_pmd(addr);
>>>       pmd_populate_kernel(&init_mm, pmd, bm_pte);
>>> --
>>> 2.5.0
>>>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
  2016-01-13  9:58         ` Ard Biesheuvel
  (?)
@ 2016-01-13 11:11           ` Mark Rutland
  -1 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-13 11:11 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On Wed, Jan 13, 2016 at 10:58:55AM +0100, Ard Biesheuvel wrote:
> On 13 January 2016 at 09:39, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > On 12 January 2016 at 19:14, Mark Rutland <mark.rutland@arm.com> wrote:
> >> On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
> >>> @@ -438,12 +442,29 @@ static void __init map_kernel(pgd_t *pgd)
> >>>       map_kernel_chunk(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC);
> >>>       map_kernel_chunk(pgd, _data, _end, PAGE_KERNEL);
> >>>
> >>> -     /*
> >>> -      * The fixmap falls in a separate pgd to the kernel, and doesn't live
> >>> -      * in the carveout for the swapper_pg_dir. We can simply re-use the
> >>> -      * existing dir for the fixmap.
> >>> -      */
> >>> -     set_pgd(pgd_offset_raw(pgd, FIXADDR_START), *pgd_offset_k(FIXADDR_START));
> >>> +     if (pgd_index(FIXADDR_START) != pgd_index((u64)_end)) {
> >>
> >> To match the style of early_fixmap_init, and given we already mapped the
> >> kernel image, this could be:
> >>
> >>         if (pgd_none(pgd_offset_raw(pgd, FIXADDR_START))) {
> >>
> >> Which also serves as a run-time check that the pgd entry really was
> >> clear.
> >>
> >
> > Yes, that looks better. I will steal that :-)
> >
> 
> OK, that doesn't work. pgd_none() is hardcoded to 'false' when running
> with fewer than 4 pgtable levels, and so we always hit the BUG() here.

Ah, sorry.

We could also check CONFIG_PGTABLE_LEVELS > 3 check, as with
fixmap_init, perhaps?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-13 11:11           ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-13 11:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 13, 2016 at 10:58:55AM +0100, Ard Biesheuvel wrote:
> On 13 January 2016 at 09:39, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > On 12 January 2016 at 19:14, Mark Rutland <mark.rutland@arm.com> wrote:
> >> On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
> >>> @@ -438,12 +442,29 @@ static void __init map_kernel(pgd_t *pgd)
> >>>       map_kernel_chunk(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC);
> >>>       map_kernel_chunk(pgd, _data, _end, PAGE_KERNEL);
> >>>
> >>> -     /*
> >>> -      * The fixmap falls in a separate pgd to the kernel, and doesn't live
> >>> -      * in the carveout for the swapper_pg_dir. We can simply re-use the
> >>> -      * existing dir for the fixmap.
> >>> -      */
> >>> -     set_pgd(pgd_offset_raw(pgd, FIXADDR_START), *pgd_offset_k(FIXADDR_START));
> >>> +     if (pgd_index(FIXADDR_START) != pgd_index((u64)_end)) {
> >>
> >> To match the style of early_fixmap_init, and given we already mapped the
> >> kernel image, this could be:
> >>
> >>         if (pgd_none(pgd_offset_raw(pgd, FIXADDR_START))) {
> >>
> >> Which also serves as a run-time check that the pgd entry really was
> >> clear.
> >>
> >
> > Yes, that looks better. I will steal that :-)
> >
> 
> OK, that doesn't work. pgd_none() is hardcoded to 'false' when running
> with fewer than 4 pgtable levels, and so we always hit the BUG() here.

Ah, sorry.

We could also check CONFIG_PGTABLE_LEVELS > 3 check, as with
fixmap_init, perhaps?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-13 11:11           ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-13 11:11 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On Wed, Jan 13, 2016 at 10:58:55AM +0100, Ard Biesheuvel wrote:
> On 13 January 2016 at 09:39, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > On 12 January 2016 at 19:14, Mark Rutland <mark.rutland@arm.com> wrote:
> >> On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
> >>> @@ -438,12 +442,29 @@ static void __init map_kernel(pgd_t *pgd)
> >>>       map_kernel_chunk(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC);
> >>>       map_kernel_chunk(pgd, _data, _end, PAGE_KERNEL);
> >>>
> >>> -     /*
> >>> -      * The fixmap falls in a separate pgd to the kernel, and doesn't live
> >>> -      * in the carveout for the swapper_pg_dir. We can simply re-use the
> >>> -      * existing dir for the fixmap.
> >>> -      */
> >>> -     set_pgd(pgd_offset_raw(pgd, FIXADDR_START), *pgd_offset_k(FIXADDR_START));
> >>> +     if (pgd_index(FIXADDR_START) != pgd_index((u64)_end)) {
> >>
> >> To match the style of early_fixmap_init, and given we already mapped the
> >> kernel image, this could be:
> >>
> >>         if (pgd_none(pgd_offset_raw(pgd, FIXADDR_START))) {
> >>
> >> Which also serves as a run-time check that the pgd entry really was
> >> clear.
> >>
> >
> > Yes, that looks better. I will steal that :-)
> >
> 
> OK, that doesn't work. pgd_none() is hardcoded to 'false' when running
> with fewer than 4 pgtable levels, and so we always hit the BUG() here.

Ah, sorry.

We could also check CONFIG_PGTABLE_LEVELS > 3 check, as with
fixmap_init, perhaps?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
  2016-01-13 11:11           ` Mark Rutland
  (?)
@ 2016-01-13 11:14             ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-13 11:14 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 13 January 2016 at 12:11, Mark Rutland <mark.rutland@arm.com> wrote:
> On Wed, Jan 13, 2016 at 10:58:55AM +0100, Ard Biesheuvel wrote:
>> On 13 January 2016 at 09:39, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> > On 12 January 2016 at 19:14, Mark Rutland <mark.rutland@arm.com> wrote:
>> >> On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
>> >>> @@ -438,12 +442,29 @@ static void __init map_kernel(pgd_t *pgd)
>> >>>       map_kernel_chunk(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC);
>> >>>       map_kernel_chunk(pgd, _data, _end, PAGE_KERNEL);
>> >>>
>> >>> -     /*
>> >>> -      * The fixmap falls in a separate pgd to the kernel, and doesn't live
>> >>> -      * in the carveout for the swapper_pg_dir. We can simply re-use the
>> >>> -      * existing dir for the fixmap.
>> >>> -      */
>> >>> -     set_pgd(pgd_offset_raw(pgd, FIXADDR_START), *pgd_offset_k(FIXADDR_START));
>> >>> +     if (pgd_index(FIXADDR_START) != pgd_index((u64)_end)) {
>> >>
>> >> To match the style of early_fixmap_init, and given we already mapped the
>> >> kernel image, this could be:
>> >>
>> >>         if (pgd_none(pgd_offset_raw(pgd, FIXADDR_START))) {
>> >>
>> >> Which also serves as a run-time check that the pgd entry really was
>> >> clear.
>> >>
>> >
>> > Yes, that looks better. I will steal that :-)
>> >
>>
>> OK, that doesn't work. pgd_none() is hardcoded to 'false' when running
>> with fewer than 4 pgtable levels, and so we always hit the BUG() here.
>
> Ah, sorry.
>
> We could also check CONFIG_PGTABLE_LEVELS > 3 check, as with
> fixmap_init, perhaps?
>

I'm using this now:

if (!pgd_val(*pgd_offset_raw(pgd, FIXADDR_START))) {

which I think is appropriate, since we don't expect to share any top
level entry, folded or not.

-- 
Ard.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-13 11:14             ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-13 11:14 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 January 2016 at 12:11, Mark Rutland <mark.rutland@arm.com> wrote:
> On Wed, Jan 13, 2016 at 10:58:55AM +0100, Ard Biesheuvel wrote:
>> On 13 January 2016 at 09:39, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> > On 12 January 2016 at 19:14, Mark Rutland <mark.rutland@arm.com> wrote:
>> >> On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
>> >>> @@ -438,12 +442,29 @@ static void __init map_kernel(pgd_t *pgd)
>> >>>       map_kernel_chunk(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC);
>> >>>       map_kernel_chunk(pgd, _data, _end, PAGE_KERNEL);
>> >>>
>> >>> -     /*
>> >>> -      * The fixmap falls in a separate pgd to the kernel, and doesn't live
>> >>> -      * in the carveout for the swapper_pg_dir. We can simply re-use the
>> >>> -      * existing dir for the fixmap.
>> >>> -      */
>> >>> -     set_pgd(pgd_offset_raw(pgd, FIXADDR_START), *pgd_offset_k(FIXADDR_START));
>> >>> +     if (pgd_index(FIXADDR_START) != pgd_index((u64)_end)) {
>> >>
>> >> To match the style of early_fixmap_init, and given we already mapped the
>> >> kernel image, this could be:
>> >>
>> >>         if (pgd_none(pgd_offset_raw(pgd, FIXADDR_START))) {
>> >>
>> >> Which also serves as a run-time check that the pgd entry really was
>> >> clear.
>> >>
>> >
>> > Yes, that looks better. I will steal that :-)
>> >
>>
>> OK, that doesn't work. pgd_none() is hardcoded to 'false' when running
>> with fewer than 4 pgtable levels, and so we always hit the BUG() here.
>
> Ah, sorry.
>
> We could also check CONFIG_PGTABLE_LEVELS > 3 check, as with
> fixmap_init, perhaps?
>

I'm using this now:

if (!pgd_val(*pgd_offset_raw(pgd, FIXADDR_START))) {

which I think is appropriate, since we don't expect to share any top
level entry, folded or not.

-- 
Ard.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-13 11:14             ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-13 11:14 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 13 January 2016 at 12:11, Mark Rutland <mark.rutland@arm.com> wrote:
> On Wed, Jan 13, 2016 at 10:58:55AM +0100, Ard Biesheuvel wrote:
>> On 13 January 2016 at 09:39, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> > On 12 January 2016 at 19:14, Mark Rutland <mark.rutland@arm.com> wrote:
>> >> On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
>> >>> @@ -438,12 +442,29 @@ static void __init map_kernel(pgd_t *pgd)
>> >>>       map_kernel_chunk(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC);
>> >>>       map_kernel_chunk(pgd, _data, _end, PAGE_KERNEL);
>> >>>
>> >>> -     /*
>> >>> -      * The fixmap falls in a separate pgd to the kernel, and doesn't live
>> >>> -      * in the carveout for the swapper_pg_dir. We can simply re-use the
>> >>> -      * existing dir for the fixmap.
>> >>> -      */
>> >>> -     set_pgd(pgd_offset_raw(pgd, FIXADDR_START), *pgd_offset_k(FIXADDR_START));
>> >>> +     if (pgd_index(FIXADDR_START) != pgd_index((u64)_end)) {
>> >>
>> >> To match the style of early_fixmap_init, and given we already mapped the
>> >> kernel image, this could be:
>> >>
>> >>         if (pgd_none(pgd_offset_raw(pgd, FIXADDR_START))) {
>> >>
>> >> Which also serves as a run-time check that the pgd entry really was
>> >> clear.
>> >>
>> >
>> > Yes, that looks better. I will steal that :-)
>> >
>>
>> OK, that doesn't work. pgd_none() is hardcoded to 'false' when running
>> with fewer than 4 pgtable levels, and so we always hit the BUG() here.
>
> Ah, sorry.
>
> We could also check CONFIG_PGTABLE_LEVELS > 3 check, as with
> fixmap_init, perhaps?
>

I'm using this now:

if (!pgd_val(*pgd_offset_raw(pgd, FIXADDR_START))) {

which I think is appropriate, since we don't expect to share any top
level entry, folded or not.

-- 
Ard.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
  2016-01-13  8:39       ` Ard Biesheuvel
  (?)
@ 2016-01-13 13:51         ` Mark Rutland
  -1 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-13 13:51 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On Wed, Jan 13, 2016 at 09:39:41AM +0100, Ard Biesheuvel wrote:
> On 12 January 2016 at 19:14, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
> >> This moves the module area to right before the vmalloc area, and
> >> moves the kernel image to the base of the vmalloc area. This is
> >> an intermediate step towards implementing kASLR, where the kernel
> >> image can be located anywhere in the vmalloc area.
> >>
> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >> ---
> >>  arch/arm64/include/asm/kasan.h          | 20 ++++---
> >>  arch/arm64/include/asm/kernel-pgtable.h |  5 +-
> >>  arch/arm64/include/asm/memory.h         | 18 ++++--
> >>  arch/arm64/include/asm/pgtable.h        |  7 ---
> >>  arch/arm64/kernel/setup.c               | 12 ++++
> >>  arch/arm64/mm/dump.c                    | 12 ++--
> >>  arch/arm64/mm/init.c                    | 20 +++----
> >>  arch/arm64/mm/kasan_init.c              | 21 +++++--
> >>  arch/arm64/mm/mmu.c                     | 62 ++++++++++++++------
> >>  9 files changed, 118 insertions(+), 59 deletions(-)
> >>
> >> diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
> >> index de0d21211c34..2c583dbf4746 100644
> >> --- a/arch/arm64/include/asm/kasan.h
> >> +++ b/arch/arm64/include/asm/kasan.h
> >> @@ -1,20 +1,16 @@
> >>  #ifndef __ASM_KASAN_H
> >>  #define __ASM_KASAN_H
> >>
> >> -#ifndef __ASSEMBLY__
> >> -
> >>  #ifdef CONFIG_KASAN
> >>
> >>  #include <linux/linkage.h>
> >> -#include <asm/memory.h>
> >> -#include <asm/pgtable-types.h>
> >>
> >>  /*
> >>   * KASAN_SHADOW_START: beginning of the kernel virtual addresses.
> >>   * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
> >>   */
> >> -#define KASAN_SHADOW_START      (VA_START)
> >> -#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))
> >> +#define KASAN_SHADOW_START   (VA_START)
> >> +#define KASAN_SHADOW_END     (KASAN_SHADOW_START + (_AC(1, UL) << (VA_BITS - 3)))
> >>
> >>  /*
> >>   * This value is used to map an address to the corresponding shadow
> >> @@ -26,16 +22,22 @@
> >>   * should satisfy the following equation:
> >>   *      KASAN_SHADOW_OFFSET = KASAN_SHADOW_END - (1ULL << 61)
> >>   */
> >> -#define KASAN_SHADOW_OFFSET     (KASAN_SHADOW_END - (1ULL << (64 - 3)))
> >> +#define KASAN_SHADOW_OFFSET  (KASAN_SHADOW_END - (_AC(1, ULL) << (64 - 3)))
> >> +
> >
> > I couldn't immediately spot where KASAN_SHADOW_* were used in assembly.
> > I guess there's some other definition built atop of them that I've
> > missed.
> >
> > Where should I be looking?
> >
> 
> Well, the problem is that KIMAGE_VADDR will be defined in terms of
> KASAN_SHADOW_END if KASAN is enabled.

Ah. I'd somehow managed to overlook that. Thanks for pointing that out!

> But since KASAN always uses the first 1/8 of that VA space, I am going
> to rework this so that the non-KASAN constants never depend on the
> actual values but only on CONFIG_KASAN

Personally I'd prefer that they were obviously defined in terms of each
other if possible (as this means that the definitions are obviously
consistent by construction).

So if it's not too much of a pain to keep them that way it would be
nice to do so.

[...]

> >> +     vmlinux_vm.flags        = VM_MAP;
> >
> > I was going to say we should set VM_KASAN also per its description in
> > include/vmalloc.h, though per its uses its not clear if it will ever
> > matter.
> >
> 
> No, we shouldn't. Even if we are never going to unmap this vma,
> setting the flag will result in the shadow area being freed using
> vfree(), while it was not allocated via vmalloc() so that is likely to
> cause trouble.

Ok.

> >> +     vm_area_add_early(&vmlinux_vm);
> >
> > Do we need to register the kernel VA range quite this early, or could we
> > do this around paging_init/map_kernel time?
> >
> 
> No. Locally, I moved it into map_kernel_chunk, so that we have
> separate areas for _text, _init and _data, and we can unmap the _init
> entirely rather than only stripping the exec bit. I haven't quite
> figured out how to get rid of the vma area, but perhaps it make sense
> to keep it reserved, so that modules don't end up there later (which
> is possible with the module region randomization I have implemented
> for v4) since I don't know how well things like kallsyms etc cope with
> that.

Keeping that reserved sounds reasonable to me.

[...]

> >>  void __init kasan_init(void)
> >>  {
> >> +     u64 kimg_shadow_start, kimg_shadow_end;
> >>       struct memblock_region *reg;
> >>
> >> +     kimg_shadow_start = round_down((u64)kasan_mem_to_shadow(_text),
> >> +                                    SWAPPER_BLOCK_SIZE);
> >> +     kimg_shadow_end = round_up((u64)kasan_mem_to_shadow(_end),
> >> +                                SWAPPER_BLOCK_SIZE);
> >
> > This rounding looks suspect to me, given it's applied to the shadow
> > addresses rather than the kimage addresses. That's roughly equivalent to
> > kasan_mem_to_shadow(round_up(_end, 8 * SWAPPER_BLOCK_SIZE).
> >
> > I don't think we need any rounding for the kimage addresses. The image
> > end is page-granular (and the fine-grained mapping will reflect that).
> > Any accesses between _end and roud_up(_end, SWAPPER_BLOCK_SIZE) would be
> > bugs (and would most likely fault) regardless of KASAN.
> >
> > Or am I just being thick here?
> >
> 
> Well, the problem here is that vmemmap_populate() is used as a
> surrogate vmalloc() since that is not available yet, and
> vmemmap_populate() allocates in SWAPPER_BLOCK_SIZE granularity.
> If I remove the rounding, I get false positive kasan errors which I
> have not quite diagnosed yet, but are probably due to the fact that
> the rounding performed by vmemmap_populate() goes in the wrong
> direction.

Ah. :(

I'll also take a peek.

> I do wonder what that means for memblocks that are not multiples of 16
> MB, though (below)

Indeed.

On a related note, something I've been thinking about is PA layout
fuzzing using VMs.

It sounds like being able to test memory layouts would be useful for
cases like the above, and I suspect there are plenty of other edge cases
that we aren't yet aware of due to typical physical memory layouts being
fairly simple.

It doesn't seem to be possible to force a particular physical memory
layout (and particular kernel, dtb, etc addresses) for QEMU or KVM
tool. I started looking into adding support to KVM tool, but there's a
fair amount of refactoring needed first.

Another option might be a special EFI application that carves up memory
in a deliberate fashion to ensure particular fragmentation cases (e.g. a
bank that's SWAPPER_BLOCK_SIZE - PAGE_SIZE in length).

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-13 13:51         ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-13 13:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 13, 2016 at 09:39:41AM +0100, Ard Biesheuvel wrote:
> On 12 January 2016 at 19:14, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
> >> This moves the module area to right before the vmalloc area, and
> >> moves the kernel image to the base of the vmalloc area. This is
> >> an intermediate step towards implementing kASLR, where the kernel
> >> image can be located anywhere in the vmalloc area.
> >>
> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >> ---
> >>  arch/arm64/include/asm/kasan.h          | 20 ++++---
> >>  arch/arm64/include/asm/kernel-pgtable.h |  5 +-
> >>  arch/arm64/include/asm/memory.h         | 18 ++++--
> >>  arch/arm64/include/asm/pgtable.h        |  7 ---
> >>  arch/arm64/kernel/setup.c               | 12 ++++
> >>  arch/arm64/mm/dump.c                    | 12 ++--
> >>  arch/arm64/mm/init.c                    | 20 +++----
> >>  arch/arm64/mm/kasan_init.c              | 21 +++++--
> >>  arch/arm64/mm/mmu.c                     | 62 ++++++++++++++------
> >>  9 files changed, 118 insertions(+), 59 deletions(-)
> >>
> >> diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
> >> index de0d21211c34..2c583dbf4746 100644
> >> --- a/arch/arm64/include/asm/kasan.h
> >> +++ b/arch/arm64/include/asm/kasan.h
> >> @@ -1,20 +1,16 @@
> >>  #ifndef __ASM_KASAN_H
> >>  #define __ASM_KASAN_H
> >>
> >> -#ifndef __ASSEMBLY__
> >> -
> >>  #ifdef CONFIG_KASAN
> >>
> >>  #include <linux/linkage.h>
> >> -#include <asm/memory.h>
> >> -#include <asm/pgtable-types.h>
> >>
> >>  /*
> >>   * KASAN_SHADOW_START: beginning of the kernel virtual addresses.
> >>   * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
> >>   */
> >> -#define KASAN_SHADOW_START      (VA_START)
> >> -#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))
> >> +#define KASAN_SHADOW_START   (VA_START)
> >> +#define KASAN_SHADOW_END     (KASAN_SHADOW_START + (_AC(1, UL) << (VA_BITS - 3)))
> >>
> >>  /*
> >>   * This value is used to map an address to the corresponding shadow
> >> @@ -26,16 +22,22 @@
> >>   * should satisfy the following equation:
> >>   *      KASAN_SHADOW_OFFSET = KASAN_SHADOW_END - (1ULL << 61)
> >>   */
> >> -#define KASAN_SHADOW_OFFSET     (KASAN_SHADOW_END - (1ULL << (64 - 3)))
> >> +#define KASAN_SHADOW_OFFSET  (KASAN_SHADOW_END - (_AC(1, ULL) << (64 - 3)))
> >> +
> >
> > I couldn't immediately spot where KASAN_SHADOW_* were used in assembly.
> > I guess there's some other definition built atop of them that I've
> > missed.
> >
> > Where should I be looking?
> >
> 
> Well, the problem is that KIMAGE_VADDR will be defined in terms of
> KASAN_SHADOW_END if KASAN is enabled.

Ah. I'd somehow managed to overlook that. Thanks for pointing that out!

> But since KASAN always uses the first 1/8 of that VA space, I am going
> to rework this so that the non-KASAN constants never depend on the
> actual values but only on CONFIG_KASAN

Personally I'd prefer that they were obviously defined in terms of each
other if possible (as this means that the definitions are obviously
consistent by construction).

So if it's not too much of a pain to keep them that way it would be
nice to do so.

[...]

> >> +     vmlinux_vm.flags        = VM_MAP;
> >
> > I was going to say we should set VM_KASAN also per its description in
> > include/vmalloc.h, though per its uses its not clear if it will ever
> > matter.
> >
> 
> No, we shouldn't. Even if we are never going to unmap this vma,
> setting the flag will result in the shadow area being freed using
> vfree(), while it was not allocated via vmalloc() so that is likely to
> cause trouble.

Ok.

> >> +     vm_area_add_early(&vmlinux_vm);
> >
> > Do we need to register the kernel VA range quite this early, or could we
> > do this around paging_init/map_kernel time?
> >
> 
> No. Locally, I moved it into map_kernel_chunk, so that we have
> separate areas for _text, _init and _data, and we can unmap the _init
> entirely rather than only stripping the exec bit. I haven't quite
> figured out how to get rid of the vma area, but perhaps it make sense
> to keep it reserved, so that modules don't end up there later (which
> is possible with the module region randomization I have implemented
> for v4) since I don't know how well things like kallsyms etc cope with
> that.

Keeping that reserved sounds reasonable to me.

[...]

> >>  void __init kasan_init(void)
> >>  {
> >> +     u64 kimg_shadow_start, kimg_shadow_end;
> >>       struct memblock_region *reg;
> >>
> >> +     kimg_shadow_start = round_down((u64)kasan_mem_to_shadow(_text),
> >> +                                    SWAPPER_BLOCK_SIZE);
> >> +     kimg_shadow_end = round_up((u64)kasan_mem_to_shadow(_end),
> >> +                                SWAPPER_BLOCK_SIZE);
> >
> > This rounding looks suspect to me, given it's applied to the shadow
> > addresses rather than the kimage addresses. That's roughly equivalent to
> > kasan_mem_to_shadow(round_up(_end, 8 * SWAPPER_BLOCK_SIZE).
> >
> > I don't think we need any rounding for the kimage addresses. The image
> > end is page-granular (and the fine-grained mapping will reflect that).
> > Any accesses between _end and roud_up(_end, SWAPPER_BLOCK_SIZE) would be
> > bugs (and would most likely fault) regardless of KASAN.
> >
> > Or am I just being thick here?
> >
> 
> Well, the problem here is that vmemmap_populate() is used as a
> surrogate vmalloc() since that is not available yet, and
> vmemmap_populate() allocates in SWAPPER_BLOCK_SIZE granularity.
> If I remove the rounding, I get false positive kasan errors which I
> have not quite diagnosed yet, but are probably due to the fact that
> the rounding performed by vmemmap_populate() goes in the wrong
> direction.

Ah. :(

I'll also take a peek.

> I do wonder what that means for memblocks that are not multiples of 16
> MB, though (below)

Indeed.

On a related note, something I've been thinking about is PA layout
fuzzing using VMs.

It sounds like being able to test memory layouts would be useful for
cases like the above, and I suspect there are plenty of other edge cases
that we aren't yet aware of due to typical physical memory layouts being
fairly simple.

It doesn't seem to be possible to force a particular physical memory
layout (and particular kernel, dtb, etc addresses) for QEMU or KVM
tool. I started looking into adding support to KVM tool, but there's a
fair amount of refactoring needed first.

Another option might be a special EFI application that carves up memory
in a deliberate fashion to ensure particular fragmentation cases (e.g. a
bank that's SWAPPER_BLOCK_SIZE - PAGE_SIZE in length).

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-13 13:51         ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-13 13:51 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On Wed, Jan 13, 2016 at 09:39:41AM +0100, Ard Biesheuvel wrote:
> On 12 January 2016 at 19:14, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
> >> This moves the module area to right before the vmalloc area, and
> >> moves the kernel image to the base of the vmalloc area. This is
> >> an intermediate step towards implementing kASLR, where the kernel
> >> image can be located anywhere in the vmalloc area.
> >>
> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >> ---
> >>  arch/arm64/include/asm/kasan.h          | 20 ++++---
> >>  arch/arm64/include/asm/kernel-pgtable.h |  5 +-
> >>  arch/arm64/include/asm/memory.h         | 18 ++++--
> >>  arch/arm64/include/asm/pgtable.h        |  7 ---
> >>  arch/arm64/kernel/setup.c               | 12 ++++
> >>  arch/arm64/mm/dump.c                    | 12 ++--
> >>  arch/arm64/mm/init.c                    | 20 +++----
> >>  arch/arm64/mm/kasan_init.c              | 21 +++++--
> >>  arch/arm64/mm/mmu.c                     | 62 ++++++++++++++------
> >>  9 files changed, 118 insertions(+), 59 deletions(-)
> >>
> >> diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
> >> index de0d21211c34..2c583dbf4746 100644
> >> --- a/arch/arm64/include/asm/kasan.h
> >> +++ b/arch/arm64/include/asm/kasan.h
> >> @@ -1,20 +1,16 @@
> >>  #ifndef __ASM_KASAN_H
> >>  #define __ASM_KASAN_H
> >>
> >> -#ifndef __ASSEMBLY__
> >> -
> >>  #ifdef CONFIG_KASAN
> >>
> >>  #include <linux/linkage.h>
> >> -#include <asm/memory.h>
> >> -#include <asm/pgtable-types.h>
> >>
> >>  /*
> >>   * KASAN_SHADOW_START: beginning of the kernel virtual addresses.
> >>   * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
> >>   */
> >> -#define KASAN_SHADOW_START      (VA_START)
> >> -#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))
> >> +#define KASAN_SHADOW_START   (VA_START)
> >> +#define KASAN_SHADOW_END     (KASAN_SHADOW_START + (_AC(1, UL) << (VA_BITS - 3)))
> >>
> >>  /*
> >>   * This value is used to map an address to the corresponding shadow
> >> @@ -26,16 +22,22 @@
> >>   * should satisfy the following equation:
> >>   *      KASAN_SHADOW_OFFSET = KASAN_SHADOW_END - (1ULL << 61)
> >>   */
> >> -#define KASAN_SHADOW_OFFSET     (KASAN_SHADOW_END - (1ULL << (64 - 3)))
> >> +#define KASAN_SHADOW_OFFSET  (KASAN_SHADOW_END - (_AC(1, ULL) << (64 - 3)))
> >> +
> >
> > I couldn't immediately spot where KASAN_SHADOW_* were used in assembly.
> > I guess there's some other definition built atop of them that I've
> > missed.
> >
> > Where should I be looking?
> >
> 
> Well, the problem is that KIMAGE_VADDR will be defined in terms of
> KASAN_SHADOW_END if KASAN is enabled.

Ah. I'd somehow managed to overlook that. Thanks for pointing that out!

> But since KASAN always uses the first 1/8 of that VA space, I am going
> to rework this so that the non-KASAN constants never depend on the
> actual values but only on CONFIG_KASAN

Personally I'd prefer that they were obviously defined in terms of each
other if possible (as this means that the definitions are obviously
consistent by construction).

So if it's not too much of a pain to keep them that way it would be
nice to do so.

[...]

> >> +     vmlinux_vm.flags        = VM_MAP;
> >
> > I was going to say we should set VM_KASAN also per its description in
> > include/vmalloc.h, though per its uses its not clear if it will ever
> > matter.
> >
> 
> No, we shouldn't. Even if we are never going to unmap this vma,
> setting the flag will result in the shadow area being freed using
> vfree(), while it was not allocated via vmalloc() so that is likely to
> cause trouble.

Ok.

> >> +     vm_area_add_early(&vmlinux_vm);
> >
> > Do we need to register the kernel VA range quite this early, or could we
> > do this around paging_init/map_kernel time?
> >
> 
> No. Locally, I moved it into map_kernel_chunk, so that we have
> separate areas for _text, _init and _data, and we can unmap the _init
> entirely rather than only stripping the exec bit. I haven't quite
> figured out how to get rid of the vma area, but perhaps it make sense
> to keep it reserved, so that modules don't end up there later (which
> is possible with the module region randomization I have implemented
> for v4) since I don't know how well things like kallsyms etc cope with
> that.

Keeping that reserved sounds reasonable to me.

[...]

> >>  void __init kasan_init(void)
> >>  {
> >> +     u64 kimg_shadow_start, kimg_shadow_end;
> >>       struct memblock_region *reg;
> >>
> >> +     kimg_shadow_start = round_down((u64)kasan_mem_to_shadow(_text),
> >> +                                    SWAPPER_BLOCK_SIZE);
> >> +     kimg_shadow_end = round_up((u64)kasan_mem_to_shadow(_end),
> >> +                                SWAPPER_BLOCK_SIZE);
> >
> > This rounding looks suspect to me, given it's applied to the shadow
> > addresses rather than the kimage addresses. That's roughly equivalent to
> > kasan_mem_to_shadow(round_up(_end, 8 * SWAPPER_BLOCK_SIZE).
> >
> > I don't think we need any rounding for the kimage addresses. The image
> > end is page-granular (and the fine-grained mapping will reflect that).
> > Any accesses between _end and roud_up(_end, SWAPPER_BLOCK_SIZE) would be
> > bugs (and would most likely fault) regardless of KASAN.
> >
> > Or am I just being thick here?
> >
> 
> Well, the problem here is that vmemmap_populate() is used as a
> surrogate vmalloc() since that is not available yet, and
> vmemmap_populate() allocates in SWAPPER_BLOCK_SIZE granularity.
> If I remove the rounding, I get false positive kasan errors which I
> have not quite diagnosed yet, but are probably due to the fact that
> the rounding performed by vmemmap_populate() goes in the wrong
> direction.

Ah. :(

I'll also take a peek.

> I do wonder what that means for memblocks that are not multiples of 16
> MB, though (below)

Indeed.

On a related note, something I've been thinking about is PA layout
fuzzing using VMs.

It sounds like being able to test memory layouts would be useful for
cases like the above, and I suspect there are plenty of other edge cases
that we aren't yet aware of due to typical physical memory layouts being
fairly simple.

It doesn't seem to be possible to force a particular physical memory
layout (and particular kernel, dtb, etc addresses) for QEMU or KVM
tool. I started looking into adding support to KVM tool, but there's a
fair amount of refactoring needed first.

Another option might be a special EFI application that carves up memory
in a deliberate fashion to ensure particular fragmentation cases (e.g. a
bank that's SWAPPER_BLOCK_SIZE - PAGE_SIZE in length).

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
  2016-01-13 13:51         ` Mark Rutland
  (?)
@ 2016-01-13 15:50           ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-13 15:50 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 13 January 2016 at 14:51, Mark Rutland <mark.rutland@arm.com> wrote:
> On Wed, Jan 13, 2016 at 09:39:41AM +0100, Ard Biesheuvel wrote:
>> On 12 January 2016 at 19:14, Mark Rutland <mark.rutland@arm.com> wrote:
>> > On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
>> >> This moves the module area to right before the vmalloc area, and
>> >> moves the kernel image to the base of the vmalloc area. This is
>> >> an intermediate step towards implementing kASLR, where the kernel
>> >> image can be located anywhere in the vmalloc area.
>> >>
>> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> >> ---
>> >>  arch/arm64/include/asm/kasan.h          | 20 ++++---
>> >>  arch/arm64/include/asm/kernel-pgtable.h |  5 +-
>> >>  arch/arm64/include/asm/memory.h         | 18 ++++--
>> >>  arch/arm64/include/asm/pgtable.h        |  7 ---
>> >>  arch/arm64/kernel/setup.c               | 12 ++++
>> >>  arch/arm64/mm/dump.c                    | 12 ++--
>> >>  arch/arm64/mm/init.c                    | 20 +++----
>> >>  arch/arm64/mm/kasan_init.c              | 21 +++++--
>> >>  arch/arm64/mm/mmu.c                     | 62 ++++++++++++++------
>> >>  9 files changed, 118 insertions(+), 59 deletions(-)
>> >>
>> >> diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
>> >> index de0d21211c34..2c583dbf4746 100644
>> >> --- a/arch/arm64/include/asm/kasan.h
>> >> +++ b/arch/arm64/include/asm/kasan.h
>> >> @@ -1,20 +1,16 @@
>> >>  #ifndef __ASM_KASAN_H
>> >>  #define __ASM_KASAN_H
>> >>
>> >> -#ifndef __ASSEMBLY__
>> >> -
>> >>  #ifdef CONFIG_KASAN
>> >>
>> >>  #include <linux/linkage.h>
>> >> -#include <asm/memory.h>
>> >> -#include <asm/pgtable-types.h>
>> >>
>> >>  /*
>> >>   * KASAN_SHADOW_START: beginning of the kernel virtual addresses.
>> >>   * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
>> >>   */
>> >> -#define KASAN_SHADOW_START      (VA_START)
>> >> -#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))
>> >> +#define KASAN_SHADOW_START   (VA_START)
>> >> +#define KASAN_SHADOW_END     (KASAN_SHADOW_START + (_AC(1, UL) << (VA_BITS - 3)))
>> >>
>> >>  /*
>> >>   * This value is used to map an address to the corresponding shadow
>> >> @@ -26,16 +22,22 @@
>> >>   * should satisfy the following equation:
>> >>   *      KASAN_SHADOW_OFFSET = KASAN_SHADOW_END - (1ULL << 61)
>> >>   */
>> >> -#define KASAN_SHADOW_OFFSET     (KASAN_SHADOW_END - (1ULL << (64 - 3)))
>> >> +#define KASAN_SHADOW_OFFSET  (KASAN_SHADOW_END - (_AC(1, ULL) << (64 - 3)))
>> >> +
>> >
>> > I couldn't immediately spot where KASAN_SHADOW_* were used in assembly.
>> > I guess there's some other definition built atop of them that I've
>> > missed.
>> >
>> > Where should I be looking?
>> >
>>
>> Well, the problem is that KIMAGE_VADDR will be defined in terms of
>> KASAN_SHADOW_END if KASAN is enabled.
>
> Ah. I'd somehow managed to overlook that. Thanks for pointing that out!
>
>> But since KASAN always uses the first 1/8 of that VA space, I am going
>> to rework this so that the non-KASAN constants never depend on the
>> actual values but only on CONFIG_KASAN
>
> Personally I'd prefer that they were obviously defined in terms of each
> other if possible (as this means that the definitions are obviously
> consistent by construction).
>
> So if it's not too much of a pain to keep them that way it would be
> nice to do so.
>
> [...]
>

I am leaning towards adding this to asm/memory.h

#ifdef CONFIG_KASAN
#define KASAN_SHADOW_SIZE (UL(1) << (VA_BITS - 3))
#else
#define KASAN_SHADOW_SIZE (0)
#endif

and remove the #ifdef CONFIG_KASAN block from asm/pgtable.h. Then
asm/kasan.h, which already includes asm/memory.h, can use it as region
size, and none of the reshuffling I had to do before is necessary.

>> >> +     vmlinux_vm.flags        = VM_MAP;
>> >
>> > I was going to say we should set VM_KASAN also per its description in
>> > include/vmalloc.h, though per its uses its not clear if it will ever
>> > matter.
>> >
>>
>> No, we shouldn't. Even if we are never going to unmap this vma,
>> setting the flag will result in the shadow area being freed using
>> vfree(), while it was not allocated via vmalloc() so that is likely to
>> cause trouble.
>
> Ok.
>
>> >> +     vm_area_add_early(&vmlinux_vm);
>> >
>> > Do we need to register the kernel VA range quite this early, or could we
>> > do this around paging_init/map_kernel time?
>> >
>>
>> No. Locally, I moved it into map_kernel_chunk, so that we have
>> separate areas for _text, _init and _data, and we can unmap the _init
>> entirely rather than only stripping the exec bit. I haven't quite
>> figured out how to get rid of the vma area, but perhaps it make sense
>> to keep it reserved, so that modules don't end up there later (which
>> is possible with the module region randomization I have implemented
>> for v4) since I don't know how well things like kallsyms etc cope with
>> that.
>
> Keeping that reserved sounds reasonable to me.
>
> [...]
>
>> >>  void __init kasan_init(void)
>> >>  {
>> >> +     u64 kimg_shadow_start, kimg_shadow_end;
>> >>       struct memblock_region *reg;
>> >>
>> >> +     kimg_shadow_start = round_down((u64)kasan_mem_to_shadow(_text),
>> >> +                                    SWAPPER_BLOCK_SIZE);
>> >> +     kimg_shadow_end = round_up((u64)kasan_mem_to_shadow(_end),
>> >> +                                SWAPPER_BLOCK_SIZE);
>> >
>> > This rounding looks suspect to me, given it's applied to the shadow
>> > addresses rather than the kimage addresses. That's roughly equivalent to
>> > kasan_mem_to_shadow(round_up(_end, 8 * SWAPPER_BLOCK_SIZE).
>> >
>> > I don't think we need any rounding for the kimage addresses. The image
>> > end is page-granular (and the fine-grained mapping will reflect that).
>> > Any accesses between _end and roud_up(_end, SWAPPER_BLOCK_SIZE) would be
>> > bugs (and would most likely fault) regardless of KASAN.
>> >
>> > Or am I just being thick here?
>> >
>>
>> Well, the problem here is that vmemmap_populate() is used as a
>> surrogate vmalloc() since that is not available yet, and
>> vmemmap_populate() allocates in SWAPPER_BLOCK_SIZE granularity.
>> If I remove the rounding, I get false positive kasan errors which I
>> have not quite diagnosed yet, but are probably due to the fact that
>> the rounding performed by vmemmap_populate() goes in the wrong
>> direction.
>
> Ah. :(
>
> I'll also take a peek.
>

Yes, please.

>> I do wonder what that means for memblocks that are not multiples of 16
>> MB, though (below)
>
> Indeed.
>
> On a related note, something I've been thinking about is PA layout
> fuzzing using VMs.
>
> It sounds like being able to test memory layouts would be useful for
> cases like the above, and I suspect there are plenty of other edge cases
> that we aren't yet aware of due to typical physical memory layouts being
> fairly simple.
>
> It doesn't seem to be possible to force a particular physical memory
> layout (and particular kernel, dtb, etc addresses) for QEMU or KVM
> tool. I started looking into adding support to KVM tool, but there's a
> fair amount of refactoring needed first.
>
> Another option might be a special EFI application that carves up memory
> in a deliberate fashion to ensure particular fragmentation cases (e.g. a
> bank that's SWAPPER_BLOCK_SIZE - PAGE_SIZE in length).
>

I use mem= for this, in fact, and boot most of my machines and VMs
with some value slightly below the actual available DRAM that is not a
multiple of 2M

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-13 15:50           ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-13 15:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 January 2016 at 14:51, Mark Rutland <mark.rutland@arm.com> wrote:
> On Wed, Jan 13, 2016 at 09:39:41AM +0100, Ard Biesheuvel wrote:
>> On 12 January 2016 at 19:14, Mark Rutland <mark.rutland@arm.com> wrote:
>> > On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
>> >> This moves the module area to right before the vmalloc area, and
>> >> moves the kernel image to the base of the vmalloc area. This is
>> >> an intermediate step towards implementing kASLR, where the kernel
>> >> image can be located anywhere in the vmalloc area.
>> >>
>> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> >> ---
>> >>  arch/arm64/include/asm/kasan.h          | 20 ++++---
>> >>  arch/arm64/include/asm/kernel-pgtable.h |  5 +-
>> >>  arch/arm64/include/asm/memory.h         | 18 ++++--
>> >>  arch/arm64/include/asm/pgtable.h        |  7 ---
>> >>  arch/arm64/kernel/setup.c               | 12 ++++
>> >>  arch/arm64/mm/dump.c                    | 12 ++--
>> >>  arch/arm64/mm/init.c                    | 20 +++----
>> >>  arch/arm64/mm/kasan_init.c              | 21 +++++--
>> >>  arch/arm64/mm/mmu.c                     | 62 ++++++++++++++------
>> >>  9 files changed, 118 insertions(+), 59 deletions(-)
>> >>
>> >> diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
>> >> index de0d21211c34..2c583dbf4746 100644
>> >> --- a/arch/arm64/include/asm/kasan.h
>> >> +++ b/arch/arm64/include/asm/kasan.h
>> >> @@ -1,20 +1,16 @@
>> >>  #ifndef __ASM_KASAN_H
>> >>  #define __ASM_KASAN_H
>> >>
>> >> -#ifndef __ASSEMBLY__
>> >> -
>> >>  #ifdef CONFIG_KASAN
>> >>
>> >>  #include <linux/linkage.h>
>> >> -#include <asm/memory.h>
>> >> -#include <asm/pgtable-types.h>
>> >>
>> >>  /*
>> >>   * KASAN_SHADOW_START: beginning of the kernel virtual addresses.
>> >>   * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
>> >>   */
>> >> -#define KASAN_SHADOW_START      (VA_START)
>> >> -#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))
>> >> +#define KASAN_SHADOW_START   (VA_START)
>> >> +#define KASAN_SHADOW_END     (KASAN_SHADOW_START + (_AC(1, UL) << (VA_BITS - 3)))
>> >>
>> >>  /*
>> >>   * This value is used to map an address to the corresponding shadow
>> >> @@ -26,16 +22,22 @@
>> >>   * should satisfy the following equation:
>> >>   *      KASAN_SHADOW_OFFSET = KASAN_SHADOW_END - (1ULL << 61)
>> >>   */
>> >> -#define KASAN_SHADOW_OFFSET     (KASAN_SHADOW_END - (1ULL << (64 - 3)))
>> >> +#define KASAN_SHADOW_OFFSET  (KASAN_SHADOW_END - (_AC(1, ULL) << (64 - 3)))
>> >> +
>> >
>> > I couldn't immediately spot where KASAN_SHADOW_* were used in assembly.
>> > I guess there's some other definition built atop of them that I've
>> > missed.
>> >
>> > Where should I be looking?
>> >
>>
>> Well, the problem is that KIMAGE_VADDR will be defined in terms of
>> KASAN_SHADOW_END if KASAN is enabled.
>
> Ah. I'd somehow managed to overlook that. Thanks for pointing that out!
>
>> But since KASAN always uses the first 1/8 of that VA space, I am going
>> to rework this so that the non-KASAN constants never depend on the
>> actual values but only on CONFIG_KASAN
>
> Personally I'd prefer that they were obviously defined in terms of each
> other if possible (as this means that the definitions are obviously
> consistent by construction).
>
> So if it's not too much of a pain to keep them that way it would be
> nice to do so.
>
> [...]
>

I am leaning towards adding this to asm/memory.h

#ifdef CONFIG_KASAN
#define KASAN_SHADOW_SIZE (UL(1) << (VA_BITS - 3))
#else
#define KASAN_SHADOW_SIZE (0)
#endif

and remove the #ifdef CONFIG_KASAN block from asm/pgtable.h. Then
asm/kasan.h, which already includes asm/memory.h, can use it as region
size, and none of the reshuffling I had to do before is necessary.

>> >> +     vmlinux_vm.flags        = VM_MAP;
>> >
>> > I was going to say we should set VM_KASAN also per its description in
>> > include/vmalloc.h, though per its uses its not clear if it will ever
>> > matter.
>> >
>>
>> No, we shouldn't. Even if we are never going to unmap this vma,
>> setting the flag will result in the shadow area being freed using
>> vfree(), while it was not allocated via vmalloc() so that is likely to
>> cause trouble.
>
> Ok.
>
>> >> +     vm_area_add_early(&vmlinux_vm);
>> >
>> > Do we need to register the kernel VA range quite this early, or could we
>> > do this around paging_init/map_kernel time?
>> >
>>
>> No. Locally, I moved it into map_kernel_chunk, so that we have
>> separate areas for _text, _init and _data, and we can unmap the _init
>> entirely rather than only stripping the exec bit. I haven't quite
>> figured out how to get rid of the vma area, but perhaps it make sense
>> to keep it reserved, so that modules don't end up there later (which
>> is possible with the module region randomization I have implemented
>> for v4) since I don't know how well things like kallsyms etc cope with
>> that.
>
> Keeping that reserved sounds reasonable to me.
>
> [...]
>
>> >>  void __init kasan_init(void)
>> >>  {
>> >> +     u64 kimg_shadow_start, kimg_shadow_end;
>> >>       struct memblock_region *reg;
>> >>
>> >> +     kimg_shadow_start = round_down((u64)kasan_mem_to_shadow(_text),
>> >> +                                    SWAPPER_BLOCK_SIZE);
>> >> +     kimg_shadow_end = round_up((u64)kasan_mem_to_shadow(_end),
>> >> +                                SWAPPER_BLOCK_SIZE);
>> >
>> > This rounding looks suspect to me, given it's applied to the shadow
>> > addresses rather than the kimage addresses. That's roughly equivalent to
>> > kasan_mem_to_shadow(round_up(_end, 8 * SWAPPER_BLOCK_SIZE).
>> >
>> > I don't think we need any rounding for the kimage addresses. The image
>> > end is page-granular (and the fine-grained mapping will reflect that).
>> > Any accesses between _end and roud_up(_end, SWAPPER_BLOCK_SIZE) would be
>> > bugs (and would most likely fault) regardless of KASAN.
>> >
>> > Or am I just being thick here?
>> >
>>
>> Well, the problem here is that vmemmap_populate() is used as a
>> surrogate vmalloc() since that is not available yet, and
>> vmemmap_populate() allocates in SWAPPER_BLOCK_SIZE granularity.
>> If I remove the rounding, I get false positive kasan errors which I
>> have not quite diagnosed yet, but are probably due to the fact that
>> the rounding performed by vmemmap_populate() goes in the wrong
>> direction.
>
> Ah. :(
>
> I'll also take a peek.
>

Yes, please.

>> I do wonder what that means for memblocks that are not multiples of 16
>> MB, though (below)
>
> Indeed.
>
> On a related note, something I've been thinking about is PA layout
> fuzzing using VMs.
>
> It sounds like being able to test memory layouts would be useful for
> cases like the above, and I suspect there are plenty of other edge cases
> that we aren't yet aware of due to typical physical memory layouts being
> fairly simple.
>
> It doesn't seem to be possible to force a particular physical memory
> layout (and particular kernel, dtb, etc addresses) for QEMU or KVM
> tool. I started looking into adding support to KVM tool, but there's a
> fair amount of refactoring needed first.
>
> Another option might be a special EFI application that carves up memory
> in a deliberate fashion to ensure particular fragmentation cases (e.g. a
> bank that's SWAPPER_BLOCK_SIZE - PAGE_SIZE in length).
>

I use mem= for this, in fact, and boot most of my machines and VMs
with some value slightly below the actual available DRAM that is not a
multiple of 2M

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-13 15:50           ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-13 15:50 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 13 January 2016 at 14:51, Mark Rutland <mark.rutland@arm.com> wrote:
> On Wed, Jan 13, 2016 at 09:39:41AM +0100, Ard Biesheuvel wrote:
>> On 12 January 2016 at 19:14, Mark Rutland <mark.rutland@arm.com> wrote:
>> > On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
>> >> This moves the module area to right before the vmalloc area, and
>> >> moves the kernel image to the base of the vmalloc area. This is
>> >> an intermediate step towards implementing kASLR, where the kernel
>> >> image can be located anywhere in the vmalloc area.
>> >>
>> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> >> ---
>> >>  arch/arm64/include/asm/kasan.h          | 20 ++++---
>> >>  arch/arm64/include/asm/kernel-pgtable.h |  5 +-
>> >>  arch/arm64/include/asm/memory.h         | 18 ++++--
>> >>  arch/arm64/include/asm/pgtable.h        |  7 ---
>> >>  arch/arm64/kernel/setup.c               | 12 ++++
>> >>  arch/arm64/mm/dump.c                    | 12 ++--
>> >>  arch/arm64/mm/init.c                    | 20 +++----
>> >>  arch/arm64/mm/kasan_init.c              | 21 +++++--
>> >>  arch/arm64/mm/mmu.c                     | 62 ++++++++++++++------
>> >>  9 files changed, 118 insertions(+), 59 deletions(-)
>> >>
>> >> diff --git a/arch/arm64/include/asm/kasan.h b/arch/arm64/include/asm/kasan.h
>> >> index de0d21211c34..2c583dbf4746 100644
>> >> --- a/arch/arm64/include/asm/kasan.h
>> >> +++ b/arch/arm64/include/asm/kasan.h
>> >> @@ -1,20 +1,16 @@
>> >>  #ifndef __ASM_KASAN_H
>> >>  #define __ASM_KASAN_H
>> >>
>> >> -#ifndef __ASSEMBLY__
>> >> -
>> >>  #ifdef CONFIG_KASAN
>> >>
>> >>  #include <linux/linkage.h>
>> >> -#include <asm/memory.h>
>> >> -#include <asm/pgtable-types.h>
>> >>
>> >>  /*
>> >>   * KASAN_SHADOW_START: beginning of the kernel virtual addresses.
>> >>   * KASAN_SHADOW_END: KASAN_SHADOW_START + 1/8 of kernel virtual addresses.
>> >>   */
>> >> -#define KASAN_SHADOW_START      (VA_START)
>> >> -#define KASAN_SHADOW_END        (KASAN_SHADOW_START + (1UL << (VA_BITS - 3)))
>> >> +#define KASAN_SHADOW_START   (VA_START)
>> >> +#define KASAN_SHADOW_END     (KASAN_SHADOW_START + (_AC(1, UL) << (VA_BITS - 3)))
>> >>
>> >>  /*
>> >>   * This value is used to map an address to the corresponding shadow
>> >> @@ -26,16 +22,22 @@
>> >>   * should satisfy the following equation:
>> >>   *      KASAN_SHADOW_OFFSET = KASAN_SHADOW_END - (1ULL << 61)
>> >>   */
>> >> -#define KASAN_SHADOW_OFFSET     (KASAN_SHADOW_END - (1ULL << (64 - 3)))
>> >> +#define KASAN_SHADOW_OFFSET  (KASAN_SHADOW_END - (_AC(1, ULL) << (64 - 3)))
>> >> +
>> >
>> > I couldn't immediately spot where KASAN_SHADOW_* were used in assembly.
>> > I guess there's some other definition built atop of them that I've
>> > missed.
>> >
>> > Where should I be looking?
>> >
>>
>> Well, the problem is that KIMAGE_VADDR will be defined in terms of
>> KASAN_SHADOW_END if KASAN is enabled.
>
> Ah. I'd somehow managed to overlook that. Thanks for pointing that out!
>
>> But since KASAN always uses the first 1/8 of that VA space, I am going
>> to rework this so that the non-KASAN constants never depend on the
>> actual values but only on CONFIG_KASAN
>
> Personally I'd prefer that they were obviously defined in terms of each
> other if possible (as this means that the definitions are obviously
> consistent by construction).
>
> So if it's not too much of a pain to keep them that way it would be
> nice to do so.
>
> [...]
>

I am leaning towards adding this to asm/memory.h

#ifdef CONFIG_KASAN
#define KASAN_SHADOW_SIZE (UL(1) << (VA_BITS - 3))
#else
#define KASAN_SHADOW_SIZE (0)
#endif

and remove the #ifdef CONFIG_KASAN block from asm/pgtable.h. Then
asm/kasan.h, which already includes asm/memory.h, can use it as region
size, and none of the reshuffling I had to do before is necessary.

>> >> +     vmlinux_vm.flags        = VM_MAP;
>> >
>> > I was going to say we should set VM_KASAN also per its description in
>> > include/vmalloc.h, though per its uses its not clear if it will ever
>> > matter.
>> >
>>
>> No, we shouldn't. Even if we are never going to unmap this vma,
>> setting the flag will result in the shadow area being freed using
>> vfree(), while it was not allocated via vmalloc() so that is likely to
>> cause trouble.
>
> Ok.
>
>> >> +     vm_area_add_early(&vmlinux_vm);
>> >
>> > Do we need to register the kernel VA range quite this early, or could we
>> > do this around paging_init/map_kernel time?
>> >
>>
>> No. Locally, I moved it into map_kernel_chunk, so that we have
>> separate areas for _text, _init and _data, and we can unmap the _init
>> entirely rather than only stripping the exec bit. I haven't quite
>> figured out how to get rid of the vma area, but perhaps it make sense
>> to keep it reserved, so that modules don't end up there later (which
>> is possible with the module region randomization I have implemented
>> for v4) since I don't know how well things like kallsyms etc cope with
>> that.
>
> Keeping that reserved sounds reasonable to me.
>
> [...]
>
>> >>  void __init kasan_init(void)
>> >>  {
>> >> +     u64 kimg_shadow_start, kimg_shadow_end;
>> >>       struct memblock_region *reg;
>> >>
>> >> +     kimg_shadow_start = round_down((u64)kasan_mem_to_shadow(_text),
>> >> +                                    SWAPPER_BLOCK_SIZE);
>> >> +     kimg_shadow_end = round_up((u64)kasan_mem_to_shadow(_end),
>> >> +                                SWAPPER_BLOCK_SIZE);
>> >
>> > This rounding looks suspect to me, given it's applied to the shadow
>> > addresses rather than the kimage addresses. That's roughly equivalent to
>> > kasan_mem_to_shadow(round_up(_end, 8 * SWAPPER_BLOCK_SIZE).
>> >
>> > I don't think we need any rounding for the kimage addresses. The image
>> > end is page-granular (and the fine-grained mapping will reflect that).
>> > Any accesses between _end and roud_up(_end, SWAPPER_BLOCK_SIZE) would be
>> > bugs (and would most likely fault) regardless of KASAN.
>> >
>> > Or am I just being thick here?
>> >
>>
>> Well, the problem here is that vmemmap_populate() is used as a
>> surrogate vmalloc() since that is not available yet, and
>> vmemmap_populate() allocates in SWAPPER_BLOCK_SIZE granularity.
>> If I remove the rounding, I get false positive kasan errors which I
>> have not quite diagnosed yet, but are probably due to the fact that
>> the rounding performed by vmemmap_populate() goes in the wrong
>> direction.
>
> Ah. :(
>
> I'll also take a peek.
>

Yes, please.

>> I do wonder what that means for memblocks that are not multiples of 16
>> MB, though (below)
>
> Indeed.
>
> On a related note, something I've been thinking about is PA layout
> fuzzing using VMs.
>
> It sounds like being able to test memory layouts would be useful for
> cases like the above, and I suspect there are plenty of other edge cases
> that we aren't yet aware of due to typical physical memory layouts being
> fairly simple.
>
> It doesn't seem to be possible to force a particular physical memory
> layout (and particular kernel, dtb, etc addresses) for QEMU or KVM
> tool. I started looking into adding support to KVM tool, but there's a
> fair amount of refactoring needed first.
>
> Another option might be a special EFI application that carves up memory
> in a deliberate fashion to ensure particular fragmentation cases (e.g. a
> bank that's SWAPPER_BLOCK_SIZE - PAGE_SIZE in length).
>

I use mem= for this, in fact, and boot most of my machines and VMs
with some value slightly below the actual available DRAM that is not a
multiple of 2M

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
  2016-01-13 15:50           ` Ard Biesheuvel
  (?)
@ 2016-01-13 16:26             ` Mark Rutland
  -1 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-13 16:26 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On Wed, Jan 13, 2016 at 04:50:24PM +0100, Ard Biesheuvel wrote:
> On 13 January 2016 at 14:51, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Wed, Jan 13, 2016 at 09:39:41AM +0100, Ard Biesheuvel wrote:
> >> But since KASAN always uses the first 1/8 of that VA space, I am going
> >> to rework this so that the non-KASAN constants never depend on the
> >> actual values but only on CONFIG_KASAN
> >
> > Personally I'd prefer that they were obviously defined in terms of each
> > other if possible (as this means that the definitions are obviously
> > consistent by construction).
> >
> > So if it's not too much of a pain to keep them that way it would be
> > nice to do so.
> >
> > [...]
> >
> 
> I am leaning towards adding this to asm/memory.h
> 
> #ifdef CONFIG_KASAN
> #define KASAN_SHADOW_SIZE (UL(1) << (VA_BITS - 3))
> #else
> #define KASAN_SHADOW_SIZE (0)
> #endif
> 
> and remove the #ifdef CONFIG_KASAN block from asm/pgtable.h. Then
> asm/kasan.h, which already includes asm/memory.h, can use it as region
> size, and none of the reshuffling I had to do before is necessary.

FWIW, that looks good to me.

[...]

> >> I do wonder what that means for memblocks that are not multiples of 16
> >> MB, though (below)
> >
> > Indeed.
> >
> > On a related note, something I've been thinking about is PA layout
> > fuzzing using VMs.
> >
> > It sounds like being able to test memory layouts would be useful for
> > cases like the above, and I suspect there are plenty of other edge cases
> > that we aren't yet aware of due to typical physical memory layouts being
> > fairly simple.
> >
> > It doesn't seem to be possible to force a particular physical memory
> > layout (and particular kernel, dtb, etc addresses) for QEMU or KVM
> > tool. I started looking into adding support to KVM tool, but there's a
> > fair amount of refactoring needed first.
> >
> > Another option might be a special EFI application that carves up memory
> > in a deliberate fashion to ensure particular fragmentation cases (e.g. a
> > bank that's SWAPPER_BLOCK_SIZE - PAGE_SIZE in length).
> >
> 
> I use mem= for this, in fact, and boot most of my machines and VMs
> with some value slightly below the actual available DRAM that is not a
> multiple of 2M

Sure. I do some testing with mem= to find some bugs. The problem is that
you can only vary the end address (prior to your patches), and don't get
much variation on the portions in the middle.

It's difficult to test for bugs with not-quite-adjacent regions, or
where particular sizes, alignment, or addresses are important.

For example, having memory that extends right to the end of the
kernel-supported PA range (even with gaps in the middle) is an edge case
we don't currently test. I have a suspicion that KASAN's shadow
lookahead (and allocation to account for this) wouldn't be quite right,
and I want to be able to properly test and verify that.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-13 16:26             ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-13 16:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 13, 2016 at 04:50:24PM +0100, Ard Biesheuvel wrote:
> On 13 January 2016 at 14:51, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Wed, Jan 13, 2016 at 09:39:41AM +0100, Ard Biesheuvel wrote:
> >> But since KASAN always uses the first 1/8 of that VA space, I am going
> >> to rework this so that the non-KASAN constants never depend on the
> >> actual values but only on CONFIG_KASAN
> >
> > Personally I'd prefer that they were obviously defined in terms of each
> > other if possible (as this means that the definitions are obviously
> > consistent by construction).
> >
> > So if it's not too much of a pain to keep them that way it would be
> > nice to do so.
> >
> > [...]
> >
> 
> I am leaning towards adding this to asm/memory.h
> 
> #ifdef CONFIG_KASAN
> #define KASAN_SHADOW_SIZE (UL(1) << (VA_BITS - 3))
> #else
> #define KASAN_SHADOW_SIZE (0)
> #endif
> 
> and remove the #ifdef CONFIG_KASAN block from asm/pgtable.h. Then
> asm/kasan.h, which already includes asm/memory.h, can use it as region
> size, and none of the reshuffling I had to do before is necessary.

FWIW, that looks good to me.

[...]

> >> I do wonder what that means for memblocks that are not multiples of 16
> >> MB, though (below)
> >
> > Indeed.
> >
> > On a related note, something I've been thinking about is PA layout
> > fuzzing using VMs.
> >
> > It sounds like being able to test memory layouts would be useful for
> > cases like the above, and I suspect there are plenty of other edge cases
> > that we aren't yet aware of due to typical physical memory layouts being
> > fairly simple.
> >
> > It doesn't seem to be possible to force a particular physical memory
> > layout (and particular kernel, dtb, etc addresses) for QEMU or KVM
> > tool. I started looking into adding support to KVM tool, but there's a
> > fair amount of refactoring needed first.
> >
> > Another option might be a special EFI application that carves up memory
> > in a deliberate fashion to ensure particular fragmentation cases (e.g. a
> > bank that's SWAPPER_BLOCK_SIZE - PAGE_SIZE in length).
> >
> 
> I use mem= for this, in fact, and boot most of my machines and VMs
> with some value slightly below the actual available DRAM that is not a
> multiple of 2M

Sure. I do some testing with mem= to find some bugs. The problem is that
you can only vary the end address (prior to your patches), and don't get
much variation on the portions in the middle.

It's difficult to test for bugs with not-quite-adjacent regions, or
where particular sizes, alignment, or addresses are important.

For example, having memory that extends right to the end of the
kernel-supported PA range (even with gaps in the middle) is an edge case
we don't currently test. I have a suspicion that KASAN's shadow
lookahead (and allocation to account for this) wouldn't be quite right,
and I want to be able to properly test and verify that.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-13 16:26             ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-13 16:26 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On Wed, Jan 13, 2016 at 04:50:24PM +0100, Ard Biesheuvel wrote:
> On 13 January 2016 at 14:51, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Wed, Jan 13, 2016 at 09:39:41AM +0100, Ard Biesheuvel wrote:
> >> But since KASAN always uses the first 1/8 of that VA space, I am going
> >> to rework this so that the non-KASAN constants never depend on the
> >> actual values but only on CONFIG_KASAN
> >
> > Personally I'd prefer that they were obviously defined in terms of each
> > other if possible (as this means that the definitions are obviously
> > consistent by construction).
> >
> > So if it's not too much of a pain to keep them that way it would be
> > nice to do so.
> >
> > [...]
> >
> 
> I am leaning towards adding this to asm/memory.h
> 
> #ifdef CONFIG_KASAN
> #define KASAN_SHADOW_SIZE (UL(1) << (VA_BITS - 3))
> #else
> #define KASAN_SHADOW_SIZE (0)
> #endif
> 
> and remove the #ifdef CONFIG_KASAN block from asm/pgtable.h. Then
> asm/kasan.h, which already includes asm/memory.h, can use it as region
> size, and none of the reshuffling I had to do before is necessary.

FWIW, that looks good to me.

[...]

> >> I do wonder what that means for memblocks that are not multiples of 16
> >> MB, though (below)
> >
> > Indeed.
> >
> > On a related note, something I've been thinking about is PA layout
> > fuzzing using VMs.
> >
> > It sounds like being able to test memory layouts would be useful for
> > cases like the above, and I suspect there are plenty of other edge cases
> > that we aren't yet aware of due to typical physical memory layouts being
> > fairly simple.
> >
> > It doesn't seem to be possible to force a particular physical memory
> > layout (and particular kernel, dtb, etc addresses) for QEMU or KVM
> > tool. I started looking into adding support to KVM tool, but there's a
> > fair amount of refactoring needed first.
> >
> > Another option might be a special EFI application that carves up memory
> > in a deliberate fashion to ensure particular fragmentation cases (e.g. a
> > bank that's SWAPPER_BLOCK_SIZE - PAGE_SIZE in length).
> >
> 
> I use mem= for this, in fact, and boot most of my machines and VMs
> with some value slightly below the actual available DRAM that is not a
> multiple of 2M

Sure. I do some testing with mem= to find some bugs. The problem is that
you can only vary the end address (prior to your patches), and don't get
much variation on the portions in the middle.

It's difficult to test for bugs with not-quite-adjacent regions, or
where particular sizes, alignment, or addresses are important.

For example, having memory that extends right to the end of the
kernel-supported PA range (even with gaps in the middle) is an edge case
we don't currently test. I have a suspicion that KASAN's shadow
lookahead (and allocation to account for this) wouldn't be quite right,
and I want to be able to properly test and verify that.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 11/21] arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
  2016-01-11 13:19   ` Ard Biesheuvel
  (?)
@ 2016-01-13 18:12     ` Mark Rutland
  -1 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-13 18:12 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	leif.lindholm, keescook, linux-kernel, stuart.yoder,
	bhupesh.sharma, arnd, marc.zyngier, christoffer.dall

On Mon, Jan 11, 2016 at 02:19:04PM +0100, Ard Biesheuvel wrote:
> Unfortunately, the current way of using the linker to emit build time
> constants into the Image header will no longer work once we switch to
> the use of PIE executables. The reason is that such constants are emitted
> into the binary using R_AARCH64_ABS64 relocations, which we will resolve
> at runtime, not at build time, and the places targeted by those
> relocations will contain zeroes before that.
> 
> So move back to assembly time constants or R_AARCH64_ABS32 relocations
> (which, interestingly enough, do get resolved at build time)

To me it seems very odd that ABS64 and ABS32 are treated differently,
and it makes me somewhat uncomfortable becuase it feels like a bug.

Do we know whether the inconsistency between ABS64 and ABS32 was
deliberate?

I couldn't spot anything declaring a difference in the AArch64 ELF
spec, and I'm not sure where else to look.

Thanks,
Mark.

> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/include/asm/assembler.h | 15 ++++++++
>  arch/arm64/kernel/head.S           | 17 +++++++--
>  arch/arm64/kernel/image.h          | 37 ++++++--------------
>  3 files changed, 40 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> index d8bfcc1ce923..e211af783a3d 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -222,4 +222,19 @@ lr	.req	x30		// link register
>  	.size	__pi_##x, . - x;	\
>  	ENDPROC(x)
>  
> +	.macro	le16, val
> +	.byte	\val & 0xff
> +	.byte	(\val >> 8) & 0xff
> +	.endm
> +
> +	.macro	le32, val
> +	le16	\val
> +	le16	(\val >> 16)
> +	.endm
> +
> +	.macro	le64, val
> +	le32	\val
> +	le32	(\val >> 32)
> +	.endm
> +
>  #endif	/* __ASM_ASSEMBLER_H */
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 350515276541..211f75e673f4 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -51,6 +51,17 @@
>  #define KERNEL_START	_text
>  #define KERNEL_END	_end
>  
> +#ifdef CONFIG_CPU_BIG_ENDIAN
> +#define __HEAD_FLAG_BE	1
> +#else
> +#define __HEAD_FLAG_BE	0
> +#endif
> +
> +#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
> +
> +#define __HEAD_FLAGS	((__HEAD_FLAG_BE << 0) |	\
> +			 (__HEAD_FLAG_PAGE_SIZE << 1))
> +
>  /*
>   * Kernel startup entry point.
>   * ---------------------------
> @@ -83,9 +94,9 @@ efi_head:
>  	b	stext				// branch to kernel start, magic
>  	.long	0				// reserved
>  #endif
> -	.quad	_kernel_offset_le		// Image load offset from start of RAM, little-endian
> -	.quad	_kernel_size_le			// Effective size of kernel image, little-endian
> -	.quad	_kernel_flags_le		// Informative flags, little-endian
> +	le64	TEXT_OFFSET			// Image load offset from start of RAM, little-endian
> +	.long	_kernel_size_le, 0		// Effective size of kernel image, little-endian
> +	le64	__HEAD_FLAGS			// Informative flags, little-endian
>  	.quad	0				// reserved
>  	.quad	0				// reserved
>  	.quad	0				// reserved
> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
> index bc2abb8b1599..bb6b0e69d0a4 100644
> --- a/arch/arm64/kernel/image.h
> +++ b/arch/arm64/kernel/image.h
> @@ -26,41 +26,26 @@
>   * There aren't any ELF relocations we can use to endian-swap values known only
>   * at link time (e.g. the subtraction of two symbol addresses), so we must get
>   * the linker to endian-swap certain values before emitting them.
> + * Note that this will not work for 64-bit values: these are resolved using
> + * R_AARCH64_ABS64 relocations, which are fixed up at runtime rather than at
> + * build time when building the PIE executable (for KASLR).
>   */
>  #ifdef CONFIG_CPU_BIG_ENDIAN
> -#define DATA_LE64(data)					\
> -	((((data) & 0x00000000000000ff) << 56) |	\
> -	 (((data) & 0x000000000000ff00) << 40) |	\
> -	 (((data) & 0x0000000000ff0000) << 24) |	\
> -	 (((data) & 0x00000000ff000000) << 8)  |	\
> -	 (((data) & 0x000000ff00000000) >> 8)  |	\
> -	 (((data) & 0x0000ff0000000000) >> 24) |	\
> -	 (((data) & 0x00ff000000000000) >> 40) |	\
> -	 (((data) & 0xff00000000000000) >> 56))
> +#define DATA_LE32(data)				\
> +	((((data) & 0x000000ff) << 24) |	\
> +	 (((data) & 0x0000ff00) << 8)  |	\
> +	 (((data) & 0x00ff0000) >> 8)  |	\
> +	 (((data) & 0xff000000) >> 24))
>  #else
> -#define DATA_LE64(data) ((data) & 0xffffffffffffffff)
> +#define DATA_LE32(data) ((data) & 0xffffffff)
>  #endif
>  
> -#ifdef CONFIG_CPU_BIG_ENDIAN
> -#define __HEAD_FLAG_BE	1
> -#else
> -#define __HEAD_FLAG_BE	0
> -#endif
> -
> -#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
> -
> -#define __HEAD_FLAGS	((__HEAD_FLAG_BE << 0) |	\
> -			 (__HEAD_FLAG_PAGE_SIZE << 1))
> -
>  /*
>   * These will output as part of the Image header, which should be little-endian
> - * regardless of the endianness of the kernel. While constant values could be
> - * endian swapped in head.S, all are done here for consistency.
> + * regardless of the endianness of the kernel.
>   */
>  #define HEAD_SYMBOLS						\
> -	_kernel_size_le		= DATA_LE64(_end - _text);	\
> -	_kernel_offset_le	= DATA_LE64(TEXT_OFFSET);	\
> -	_kernel_flags_le	= DATA_LE64(__HEAD_FLAGS);
> +	_kernel_size_le		= DATA_LE32(_end - _text);
>  
>  #ifdef CONFIG_EFI
>  
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 11/21] arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
@ 2016-01-13 18:12     ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-13 18:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 11, 2016 at 02:19:04PM +0100, Ard Biesheuvel wrote:
> Unfortunately, the current way of using the linker to emit build time
> constants into the Image header will no longer work once we switch to
> the use of PIE executables. The reason is that such constants are emitted
> into the binary using R_AARCH64_ABS64 relocations, which we will resolve
> at runtime, not at build time, and the places targeted by those
> relocations will contain zeroes before that.
> 
> So move back to assembly time constants or R_AARCH64_ABS32 relocations
> (which, interestingly enough, do get resolved at build time)

To me it seems very odd that ABS64 and ABS32 are treated differently,
and it makes me somewhat uncomfortable becuase it feels like a bug.

Do we know whether the inconsistency between ABS64 and ABS32 was
deliberate?

I couldn't spot anything declaring a difference in the AArch64 ELF
spec, and I'm not sure where else to look.

Thanks,
Mark.

> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/include/asm/assembler.h | 15 ++++++++
>  arch/arm64/kernel/head.S           | 17 +++++++--
>  arch/arm64/kernel/image.h          | 37 ++++++--------------
>  3 files changed, 40 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> index d8bfcc1ce923..e211af783a3d 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -222,4 +222,19 @@ lr	.req	x30		// link register
>  	.size	__pi_##x, . - x;	\
>  	ENDPROC(x)
>  
> +	.macro	le16, val
> +	.byte	\val & 0xff
> +	.byte	(\val >> 8) & 0xff
> +	.endm
> +
> +	.macro	le32, val
> +	le16	\val
> +	le16	(\val >> 16)
> +	.endm
> +
> +	.macro	le64, val
> +	le32	\val
> +	le32	(\val >> 32)
> +	.endm
> +
>  #endif	/* __ASM_ASSEMBLER_H */
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 350515276541..211f75e673f4 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -51,6 +51,17 @@
>  #define KERNEL_START	_text
>  #define KERNEL_END	_end
>  
> +#ifdef CONFIG_CPU_BIG_ENDIAN
> +#define __HEAD_FLAG_BE	1
> +#else
> +#define __HEAD_FLAG_BE	0
> +#endif
> +
> +#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
> +
> +#define __HEAD_FLAGS	((__HEAD_FLAG_BE << 0) |	\
> +			 (__HEAD_FLAG_PAGE_SIZE << 1))
> +
>  /*
>   * Kernel startup entry point.
>   * ---------------------------
> @@ -83,9 +94,9 @@ efi_head:
>  	b	stext				// branch to kernel start, magic
>  	.long	0				// reserved
>  #endif
> -	.quad	_kernel_offset_le		// Image load offset from start of RAM, little-endian
> -	.quad	_kernel_size_le			// Effective size of kernel image, little-endian
> -	.quad	_kernel_flags_le		// Informative flags, little-endian
> +	le64	TEXT_OFFSET			// Image load offset from start of RAM, little-endian
> +	.long	_kernel_size_le, 0		// Effective size of kernel image, little-endian
> +	le64	__HEAD_FLAGS			// Informative flags, little-endian
>  	.quad	0				// reserved
>  	.quad	0				// reserved
>  	.quad	0				// reserved
> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
> index bc2abb8b1599..bb6b0e69d0a4 100644
> --- a/arch/arm64/kernel/image.h
> +++ b/arch/arm64/kernel/image.h
> @@ -26,41 +26,26 @@
>   * There aren't any ELF relocations we can use to endian-swap values known only
>   * at link time (e.g. the subtraction of two symbol addresses), so we must get
>   * the linker to endian-swap certain values before emitting them.
> + * Note that this will not work for 64-bit values: these are resolved using
> + * R_AARCH64_ABS64 relocations, which are fixed up at runtime rather than at
> + * build time when building the PIE executable (for KASLR).
>   */
>  #ifdef CONFIG_CPU_BIG_ENDIAN
> -#define DATA_LE64(data)					\
> -	((((data) & 0x00000000000000ff) << 56) |	\
> -	 (((data) & 0x000000000000ff00) << 40) |	\
> -	 (((data) & 0x0000000000ff0000) << 24) |	\
> -	 (((data) & 0x00000000ff000000) << 8)  |	\
> -	 (((data) & 0x000000ff00000000) >> 8)  |	\
> -	 (((data) & 0x0000ff0000000000) >> 24) |	\
> -	 (((data) & 0x00ff000000000000) >> 40) |	\
> -	 (((data) & 0xff00000000000000) >> 56))
> +#define DATA_LE32(data)				\
> +	((((data) & 0x000000ff) << 24) |	\
> +	 (((data) & 0x0000ff00) << 8)  |	\
> +	 (((data) & 0x00ff0000) >> 8)  |	\
> +	 (((data) & 0xff000000) >> 24))
>  #else
> -#define DATA_LE64(data) ((data) & 0xffffffffffffffff)
> +#define DATA_LE32(data) ((data) & 0xffffffff)
>  #endif
>  
> -#ifdef CONFIG_CPU_BIG_ENDIAN
> -#define __HEAD_FLAG_BE	1
> -#else
> -#define __HEAD_FLAG_BE	0
> -#endif
> -
> -#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
> -
> -#define __HEAD_FLAGS	((__HEAD_FLAG_BE << 0) |	\
> -			 (__HEAD_FLAG_PAGE_SIZE << 1))
> -
>  /*
>   * These will output as part of the Image header, which should be little-endian
> - * regardless of the endianness of the kernel. While constant values could be
> - * endian swapped in head.S, all are done here for consistency.
> + * regardless of the endianness of the kernel.
>   */
>  #define HEAD_SYMBOLS						\
> -	_kernel_size_le		= DATA_LE64(_end - _text);	\
> -	_kernel_offset_le	= DATA_LE64(TEXT_OFFSET);	\
> -	_kernel_flags_le	= DATA_LE64(__HEAD_FLAGS);
> +	_kernel_size_le		= DATA_LE32(_end - _text);
>  
>  #ifdef CONFIG_EFI
>  
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 11/21] arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
@ 2016-01-13 18:12     ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-13 18:12 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	leif.lindholm, keescook, linux-kernel, stuart.yoder,
	bhupesh.sharma, arnd, marc.zyngier, christoffer.dall

On Mon, Jan 11, 2016 at 02:19:04PM +0100, Ard Biesheuvel wrote:
> Unfortunately, the current way of using the linker to emit build time
> constants into the Image header will no longer work once we switch to
> the use of PIE executables. The reason is that such constants are emitted
> into the binary using R_AARCH64_ABS64 relocations, which we will resolve
> at runtime, not at build time, and the places targeted by those
> relocations will contain zeroes before that.
> 
> So move back to assembly time constants or R_AARCH64_ABS32 relocations
> (which, interestingly enough, do get resolved at build time)

To me it seems very odd that ABS64 and ABS32 are treated differently,
and it makes me somewhat uncomfortable becuase it feels like a bug.

Do we know whether the inconsistency between ABS64 and ABS32 was
deliberate?

I couldn't spot anything declaring a difference in the AArch64 ELF
spec, and I'm not sure where else to look.

Thanks,
Mark.

> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/include/asm/assembler.h | 15 ++++++++
>  arch/arm64/kernel/head.S           | 17 +++++++--
>  arch/arm64/kernel/image.h          | 37 ++++++--------------
>  3 files changed, 40 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> index d8bfcc1ce923..e211af783a3d 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -222,4 +222,19 @@ lr	.req	x30		// link register
>  	.size	__pi_##x, . - x;	\
>  	ENDPROC(x)
>  
> +	.macro	le16, val
> +	.byte	\val & 0xff
> +	.byte	(\val >> 8) & 0xff
> +	.endm
> +
> +	.macro	le32, val
> +	le16	\val
> +	le16	(\val >> 16)
> +	.endm
> +
> +	.macro	le64, val
> +	le32	\val
> +	le32	(\val >> 32)
> +	.endm
> +
>  #endif	/* __ASM_ASSEMBLER_H */
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 350515276541..211f75e673f4 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -51,6 +51,17 @@
>  #define KERNEL_START	_text
>  #define KERNEL_END	_end
>  
> +#ifdef CONFIG_CPU_BIG_ENDIAN
> +#define __HEAD_FLAG_BE	1
> +#else
> +#define __HEAD_FLAG_BE	0
> +#endif
> +
> +#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
> +
> +#define __HEAD_FLAGS	((__HEAD_FLAG_BE << 0) |	\
> +			 (__HEAD_FLAG_PAGE_SIZE << 1))
> +
>  /*
>   * Kernel startup entry point.
>   * ---------------------------
> @@ -83,9 +94,9 @@ efi_head:
>  	b	stext				// branch to kernel start, magic
>  	.long	0				// reserved
>  #endif
> -	.quad	_kernel_offset_le		// Image load offset from start of RAM, little-endian
> -	.quad	_kernel_size_le			// Effective size of kernel image, little-endian
> -	.quad	_kernel_flags_le		// Informative flags, little-endian
> +	le64	TEXT_OFFSET			// Image load offset from start of RAM, little-endian
> +	.long	_kernel_size_le, 0		// Effective size of kernel image, little-endian
> +	le64	__HEAD_FLAGS			// Informative flags, little-endian
>  	.quad	0				// reserved
>  	.quad	0				// reserved
>  	.quad	0				// reserved
> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
> index bc2abb8b1599..bb6b0e69d0a4 100644
> --- a/arch/arm64/kernel/image.h
> +++ b/arch/arm64/kernel/image.h
> @@ -26,41 +26,26 @@
>   * There aren't any ELF relocations we can use to endian-swap values known only
>   * at link time (e.g. the subtraction of two symbol addresses), so we must get
>   * the linker to endian-swap certain values before emitting them.
> + * Note that this will not work for 64-bit values: these are resolved using
> + * R_AARCH64_ABS64 relocations, which are fixed up at runtime rather than at
> + * build time when building the PIE executable (for KASLR).
>   */
>  #ifdef CONFIG_CPU_BIG_ENDIAN
> -#define DATA_LE64(data)					\
> -	((((data) & 0x00000000000000ff) << 56) |	\
> -	 (((data) & 0x000000000000ff00) << 40) |	\
> -	 (((data) & 0x0000000000ff0000) << 24) |	\
> -	 (((data) & 0x00000000ff000000) << 8)  |	\
> -	 (((data) & 0x000000ff00000000) >> 8)  |	\
> -	 (((data) & 0x0000ff0000000000) >> 24) |	\
> -	 (((data) & 0x00ff000000000000) >> 40) |	\
> -	 (((data) & 0xff00000000000000) >> 56))
> +#define DATA_LE32(data)				\
> +	((((data) & 0x000000ff) << 24) |	\
> +	 (((data) & 0x0000ff00) << 8)  |	\
> +	 (((data) & 0x00ff0000) >> 8)  |	\
> +	 (((data) & 0xff000000) >> 24))
>  #else
> -#define DATA_LE64(data) ((data) & 0xffffffffffffffff)
> +#define DATA_LE32(data) ((data) & 0xffffffff)
>  #endif
>  
> -#ifdef CONFIG_CPU_BIG_ENDIAN
> -#define __HEAD_FLAG_BE	1
> -#else
> -#define __HEAD_FLAG_BE	0
> -#endif
> -
> -#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
> -
> -#define __HEAD_FLAGS	((__HEAD_FLAG_BE << 0) |	\
> -			 (__HEAD_FLAG_PAGE_SIZE << 1))
> -
>  /*
>   * These will output as part of the Image header, which should be little-endian
> - * regardless of the endianness of the kernel. While constant values could be
> - * endian swapped in head.S, all are done here for consistency.
> + * regardless of the endianness of the kernel.
>   */
>  #define HEAD_SYMBOLS						\
> -	_kernel_size_le		= DATA_LE64(_end - _text);	\
> -	_kernel_offset_le	= DATA_LE64(TEXT_OFFSET);	\
> -	_kernel_flags_le	= DATA_LE64(__HEAD_FLAGS);
> +	_kernel_size_le		= DATA_LE32(_end - _text);
>  
>  #ifdef CONFIG_EFI
>  
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 11/21] arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
  2016-01-13 18:12     ` Mark Rutland
  (?)
@ 2016-01-13 18:48       ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-13 18:48 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 13 January 2016 at 19:12, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Jan 11, 2016 at 02:19:04PM +0100, Ard Biesheuvel wrote:
>> Unfortunately, the current way of using the linker to emit build time
>> constants into the Image header will no longer work once we switch to
>> the use of PIE executables. The reason is that such constants are emitted
>> into the binary using R_AARCH64_ABS64 relocations, which we will resolve
>> at runtime, not at build time, and the places targeted by those
>> relocations will contain zeroes before that.
>>
>> So move back to assembly time constants or R_AARCH64_ABS32 relocations
>> (which, interestingly enough, do get resolved at build time)
>
> To me it seems very odd that ABS64 and ABS32 are treated differently,
> and it makes me somewhat uncomfortable becuase it feels like a bug.
>
> Do we know whether the inconsistency between ABS64 and ABS32 was
> deliberate?
>
> I couldn't spot anything declaring a difference in the AArch64 ELF
> spec, and I'm not sure where else to look.
>

My assumption is that PIE only defers resolving R_AARCH64_ABS64
relocations since those are the only ones that be used to refer to
memory addresses

>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/include/asm/assembler.h | 15 ++++++++
>>  arch/arm64/kernel/head.S           | 17 +++++++--
>>  arch/arm64/kernel/image.h          | 37 ++++++--------------
>>  3 files changed, 40 insertions(+), 29 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
>> index d8bfcc1ce923..e211af783a3d 100644
>> --- a/arch/arm64/include/asm/assembler.h
>> +++ b/arch/arm64/include/asm/assembler.h
>> @@ -222,4 +222,19 @@ lr       .req    x30             // link register
>>       .size   __pi_##x, . - x;        \
>>       ENDPROC(x)
>>
>> +     .macro  le16, val
>> +     .byte   \val & 0xff
>> +     .byte   (\val >> 8) & 0xff
>> +     .endm
>> +
>> +     .macro  le32, val
>> +     le16    \val
>> +     le16    (\val >> 16)
>> +     .endm
>> +
>> +     .macro  le64, val
>> +     le32    \val
>> +     le32    (\val >> 32)
>> +     .endm
>> +
>>  #endif       /* __ASM_ASSEMBLER_H */
>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>> index 350515276541..211f75e673f4 100644
>> --- a/arch/arm64/kernel/head.S
>> +++ b/arch/arm64/kernel/head.S
>> @@ -51,6 +51,17 @@
>>  #define KERNEL_START _text
>>  #define KERNEL_END   _end
>>
>> +#ifdef CONFIG_CPU_BIG_ENDIAN
>> +#define __HEAD_FLAG_BE       1
>> +#else
>> +#define __HEAD_FLAG_BE       0
>> +#endif
>> +
>> +#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>> +
>> +#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
>> +                      (__HEAD_FLAG_PAGE_SIZE << 1))
>> +
>>  /*
>>   * Kernel startup entry point.
>>   * ---------------------------
>> @@ -83,9 +94,9 @@ efi_head:
>>       b       stext                           // branch to kernel start, magic
>>       .long   0                               // reserved
>>  #endif
>> -     .quad   _kernel_offset_le               // Image load offset from start of RAM, little-endian
>> -     .quad   _kernel_size_le                 // Effective size of kernel image, little-endian
>> -     .quad   _kernel_flags_le                // Informative flags, little-endian
>> +     le64    TEXT_OFFSET                     // Image load offset from start of RAM, little-endian
>> +     .long   _kernel_size_le, 0              // Effective size of kernel image, little-endian
>> +     le64    __HEAD_FLAGS                    // Informative flags, little-endian
>>       .quad   0                               // reserved
>>       .quad   0                               // reserved
>>       .quad   0                               // reserved
>> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
>> index bc2abb8b1599..bb6b0e69d0a4 100644
>> --- a/arch/arm64/kernel/image.h
>> +++ b/arch/arm64/kernel/image.h
>> @@ -26,41 +26,26 @@
>>   * There aren't any ELF relocations we can use to endian-swap values known only
>>   * at link time (e.g. the subtraction of two symbol addresses), so we must get
>>   * the linker to endian-swap certain values before emitting them.
>> + * Note that this will not work for 64-bit values: these are resolved using
>> + * R_AARCH64_ABS64 relocations, which are fixed up at runtime rather than at
>> + * build time when building the PIE executable (for KASLR).
>>   */
>>  #ifdef CONFIG_CPU_BIG_ENDIAN
>> -#define DATA_LE64(data)                                      \
>> -     ((((data) & 0x00000000000000ff) << 56) |        \
>> -      (((data) & 0x000000000000ff00) << 40) |        \
>> -      (((data) & 0x0000000000ff0000) << 24) |        \
>> -      (((data) & 0x00000000ff000000) << 8)  |        \
>> -      (((data) & 0x000000ff00000000) >> 8)  |        \
>> -      (((data) & 0x0000ff0000000000) >> 24) |        \
>> -      (((data) & 0x00ff000000000000) >> 40) |        \
>> -      (((data) & 0xff00000000000000) >> 56))
>> +#define DATA_LE32(data)                              \
>> +     ((((data) & 0x000000ff) << 24) |        \
>> +      (((data) & 0x0000ff00) << 8)  |        \
>> +      (((data) & 0x00ff0000) >> 8)  |        \
>> +      (((data) & 0xff000000) >> 24))
>>  #else
>> -#define DATA_LE64(data) ((data) & 0xffffffffffffffff)
>> +#define DATA_LE32(data) ((data) & 0xffffffff)
>>  #endif
>>
>> -#ifdef CONFIG_CPU_BIG_ENDIAN
>> -#define __HEAD_FLAG_BE       1
>> -#else
>> -#define __HEAD_FLAG_BE       0
>> -#endif
>> -
>> -#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>> -
>> -#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
>> -                      (__HEAD_FLAG_PAGE_SIZE << 1))
>> -
>>  /*
>>   * These will output as part of the Image header, which should be little-endian
>> - * regardless of the endianness of the kernel. While constant values could be
>> - * endian swapped in head.S, all are done here for consistency.
>> + * regardless of the endianness of the kernel.
>>   */
>>  #define HEAD_SYMBOLS                                         \
>> -     _kernel_size_le         = DATA_LE64(_end - _text);      \
>> -     _kernel_offset_le       = DATA_LE64(TEXT_OFFSET);       \
>> -     _kernel_flags_le        = DATA_LE64(__HEAD_FLAGS);
>> +     _kernel_size_le         = DATA_LE32(_end - _text);
>>
>>  #ifdef CONFIG_EFI
>>
>> --
>> 2.5.0
>>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 11/21] arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
@ 2016-01-13 18:48       ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-13 18:48 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 January 2016 at 19:12, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Jan 11, 2016 at 02:19:04PM +0100, Ard Biesheuvel wrote:
>> Unfortunately, the current way of using the linker to emit build time
>> constants into the Image header will no longer work once we switch to
>> the use of PIE executables. The reason is that such constants are emitted
>> into the binary using R_AARCH64_ABS64 relocations, which we will resolve
>> at runtime, not at build time, and the places targeted by those
>> relocations will contain zeroes before that.
>>
>> So move back to assembly time constants or R_AARCH64_ABS32 relocations
>> (which, interestingly enough, do get resolved at build time)
>
> To me it seems very odd that ABS64 and ABS32 are treated differently,
> and it makes me somewhat uncomfortable becuase it feels like a bug.
>
> Do we know whether the inconsistency between ABS64 and ABS32 was
> deliberate?
>
> I couldn't spot anything declaring a difference in the AArch64 ELF
> spec, and I'm not sure where else to look.
>

My assumption is that PIE only defers resolving R_AARCH64_ABS64
relocations since those are the only ones that be used to refer to
memory addresses

>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/include/asm/assembler.h | 15 ++++++++
>>  arch/arm64/kernel/head.S           | 17 +++++++--
>>  arch/arm64/kernel/image.h          | 37 ++++++--------------
>>  3 files changed, 40 insertions(+), 29 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
>> index d8bfcc1ce923..e211af783a3d 100644
>> --- a/arch/arm64/include/asm/assembler.h
>> +++ b/arch/arm64/include/asm/assembler.h
>> @@ -222,4 +222,19 @@ lr       .req    x30             // link register
>>       .size   __pi_##x, . - x;        \
>>       ENDPROC(x)
>>
>> +     .macro  le16, val
>> +     .byte   \val & 0xff
>> +     .byte   (\val >> 8) & 0xff
>> +     .endm
>> +
>> +     .macro  le32, val
>> +     le16    \val
>> +     le16    (\val >> 16)
>> +     .endm
>> +
>> +     .macro  le64, val
>> +     le32    \val
>> +     le32    (\val >> 32)
>> +     .endm
>> +
>>  #endif       /* __ASM_ASSEMBLER_H */
>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>> index 350515276541..211f75e673f4 100644
>> --- a/arch/arm64/kernel/head.S
>> +++ b/arch/arm64/kernel/head.S
>> @@ -51,6 +51,17 @@
>>  #define KERNEL_START _text
>>  #define KERNEL_END   _end
>>
>> +#ifdef CONFIG_CPU_BIG_ENDIAN
>> +#define __HEAD_FLAG_BE       1
>> +#else
>> +#define __HEAD_FLAG_BE       0
>> +#endif
>> +
>> +#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>> +
>> +#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
>> +                      (__HEAD_FLAG_PAGE_SIZE << 1))
>> +
>>  /*
>>   * Kernel startup entry point.
>>   * ---------------------------
>> @@ -83,9 +94,9 @@ efi_head:
>>       b       stext                           // branch to kernel start, magic
>>       .long   0                               // reserved
>>  #endif
>> -     .quad   _kernel_offset_le               // Image load offset from start of RAM, little-endian
>> -     .quad   _kernel_size_le                 // Effective size of kernel image, little-endian
>> -     .quad   _kernel_flags_le                // Informative flags, little-endian
>> +     le64    TEXT_OFFSET                     // Image load offset from start of RAM, little-endian
>> +     .long   _kernel_size_le, 0              // Effective size of kernel image, little-endian
>> +     le64    __HEAD_FLAGS                    // Informative flags, little-endian
>>       .quad   0                               // reserved
>>       .quad   0                               // reserved
>>       .quad   0                               // reserved
>> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
>> index bc2abb8b1599..bb6b0e69d0a4 100644
>> --- a/arch/arm64/kernel/image.h
>> +++ b/arch/arm64/kernel/image.h
>> @@ -26,41 +26,26 @@
>>   * There aren't any ELF relocations we can use to endian-swap values known only
>>   * at link time (e.g. the subtraction of two symbol addresses), so we must get
>>   * the linker to endian-swap certain values before emitting them.
>> + * Note that this will not work for 64-bit values: these are resolved using
>> + * R_AARCH64_ABS64 relocations, which are fixed up at runtime rather than at
>> + * build time when building the PIE executable (for KASLR).
>>   */
>>  #ifdef CONFIG_CPU_BIG_ENDIAN
>> -#define DATA_LE64(data)                                      \
>> -     ((((data) & 0x00000000000000ff) << 56) |        \
>> -      (((data) & 0x000000000000ff00) << 40) |        \
>> -      (((data) & 0x0000000000ff0000) << 24) |        \
>> -      (((data) & 0x00000000ff000000) << 8)  |        \
>> -      (((data) & 0x000000ff00000000) >> 8)  |        \
>> -      (((data) & 0x0000ff0000000000) >> 24) |        \
>> -      (((data) & 0x00ff000000000000) >> 40) |        \
>> -      (((data) & 0xff00000000000000) >> 56))
>> +#define DATA_LE32(data)                              \
>> +     ((((data) & 0x000000ff) << 24) |        \
>> +      (((data) & 0x0000ff00) << 8)  |        \
>> +      (((data) & 0x00ff0000) >> 8)  |        \
>> +      (((data) & 0xff000000) >> 24))
>>  #else
>> -#define DATA_LE64(data) ((data) & 0xffffffffffffffff)
>> +#define DATA_LE32(data) ((data) & 0xffffffff)
>>  #endif
>>
>> -#ifdef CONFIG_CPU_BIG_ENDIAN
>> -#define __HEAD_FLAG_BE       1
>> -#else
>> -#define __HEAD_FLAG_BE       0
>> -#endif
>> -
>> -#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>> -
>> -#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
>> -                      (__HEAD_FLAG_PAGE_SIZE << 1))
>> -
>>  /*
>>   * These will output as part of the Image header, which should be little-endian
>> - * regardless of the endianness of the kernel. While constant values could be
>> - * endian swapped in head.S, all are done here for consistency.
>> + * regardless of the endianness of the kernel.
>>   */
>>  #define HEAD_SYMBOLS                                         \
>> -     _kernel_size_le         = DATA_LE64(_end - _text);      \
>> -     _kernel_offset_le       = DATA_LE64(TEXT_OFFSET);       \
>> -     _kernel_flags_le        = DATA_LE64(__HEAD_FLAGS);
>> +     _kernel_size_le         = DATA_LE32(_end - _text);
>>
>>  #ifdef CONFIG_EFI
>>
>> --
>> 2.5.0
>>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 11/21] arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
@ 2016-01-13 18:48       ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-13 18:48 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 13 January 2016 at 19:12, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Jan 11, 2016 at 02:19:04PM +0100, Ard Biesheuvel wrote:
>> Unfortunately, the current way of using the linker to emit build time
>> constants into the Image header will no longer work once we switch to
>> the use of PIE executables. The reason is that such constants are emitted
>> into the binary using R_AARCH64_ABS64 relocations, which we will resolve
>> at runtime, not at build time, and the places targeted by those
>> relocations will contain zeroes before that.
>>
>> So move back to assembly time constants or R_AARCH64_ABS32 relocations
>> (which, interestingly enough, do get resolved at build time)
>
> To me it seems very odd that ABS64 and ABS32 are treated differently,
> and it makes me somewhat uncomfortable becuase it feels like a bug.
>
> Do we know whether the inconsistency between ABS64 and ABS32 was
> deliberate?
>
> I couldn't spot anything declaring a difference in the AArch64 ELF
> spec, and I'm not sure where else to look.
>

My assumption is that PIE only defers resolving R_AARCH64_ABS64
relocations since those are the only ones that be used to refer to
memory addresses

>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/include/asm/assembler.h | 15 ++++++++
>>  arch/arm64/kernel/head.S           | 17 +++++++--
>>  arch/arm64/kernel/image.h          | 37 ++++++--------------
>>  3 files changed, 40 insertions(+), 29 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
>> index d8bfcc1ce923..e211af783a3d 100644
>> --- a/arch/arm64/include/asm/assembler.h
>> +++ b/arch/arm64/include/asm/assembler.h
>> @@ -222,4 +222,19 @@ lr       .req    x30             // link register
>>       .size   __pi_##x, . - x;        \
>>       ENDPROC(x)
>>
>> +     .macro  le16, val
>> +     .byte   \val & 0xff
>> +     .byte   (\val >> 8) & 0xff
>> +     .endm
>> +
>> +     .macro  le32, val
>> +     le16    \val
>> +     le16    (\val >> 16)
>> +     .endm
>> +
>> +     .macro  le64, val
>> +     le32    \val
>> +     le32    (\val >> 32)
>> +     .endm
>> +
>>  #endif       /* __ASM_ASSEMBLER_H */
>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>> index 350515276541..211f75e673f4 100644
>> --- a/arch/arm64/kernel/head.S
>> +++ b/arch/arm64/kernel/head.S
>> @@ -51,6 +51,17 @@
>>  #define KERNEL_START _text
>>  #define KERNEL_END   _end
>>
>> +#ifdef CONFIG_CPU_BIG_ENDIAN
>> +#define __HEAD_FLAG_BE       1
>> +#else
>> +#define __HEAD_FLAG_BE       0
>> +#endif
>> +
>> +#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>> +
>> +#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
>> +                      (__HEAD_FLAG_PAGE_SIZE << 1))
>> +
>>  /*
>>   * Kernel startup entry point.
>>   * ---------------------------
>> @@ -83,9 +94,9 @@ efi_head:
>>       b       stext                           // branch to kernel start, magic
>>       .long   0                               // reserved
>>  #endif
>> -     .quad   _kernel_offset_le               // Image load offset from start of RAM, little-endian
>> -     .quad   _kernel_size_le                 // Effective size of kernel image, little-endian
>> -     .quad   _kernel_flags_le                // Informative flags, little-endian
>> +     le64    TEXT_OFFSET                     // Image load offset from start of RAM, little-endian
>> +     .long   _kernel_size_le, 0              // Effective size of kernel image, little-endian
>> +     le64    __HEAD_FLAGS                    // Informative flags, little-endian
>>       .quad   0                               // reserved
>>       .quad   0                               // reserved
>>       .quad   0                               // reserved
>> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
>> index bc2abb8b1599..bb6b0e69d0a4 100644
>> --- a/arch/arm64/kernel/image.h
>> +++ b/arch/arm64/kernel/image.h
>> @@ -26,41 +26,26 @@
>>   * There aren't any ELF relocations we can use to endian-swap values known only
>>   * at link time (e.g. the subtraction of two symbol addresses), so we must get
>>   * the linker to endian-swap certain values before emitting them.
>> + * Note that this will not work for 64-bit values: these are resolved using
>> + * R_AARCH64_ABS64 relocations, which are fixed up at runtime rather than at
>> + * build time when building the PIE executable (for KASLR).
>>   */
>>  #ifdef CONFIG_CPU_BIG_ENDIAN
>> -#define DATA_LE64(data)                                      \
>> -     ((((data) & 0x00000000000000ff) << 56) |        \
>> -      (((data) & 0x000000000000ff00) << 40) |        \
>> -      (((data) & 0x0000000000ff0000) << 24) |        \
>> -      (((data) & 0x00000000ff000000) << 8)  |        \
>> -      (((data) & 0x000000ff00000000) >> 8)  |        \
>> -      (((data) & 0x0000ff0000000000) >> 24) |        \
>> -      (((data) & 0x00ff000000000000) >> 40) |        \
>> -      (((data) & 0xff00000000000000) >> 56))
>> +#define DATA_LE32(data)                              \
>> +     ((((data) & 0x000000ff) << 24) |        \
>> +      (((data) & 0x0000ff00) << 8)  |        \
>> +      (((data) & 0x00ff0000) >> 8)  |        \
>> +      (((data) & 0xff000000) >> 24))
>>  #else
>> -#define DATA_LE64(data) ((data) & 0xffffffffffffffff)
>> +#define DATA_LE32(data) ((data) & 0xffffffff)
>>  #endif
>>
>> -#ifdef CONFIG_CPU_BIG_ENDIAN
>> -#define __HEAD_FLAG_BE       1
>> -#else
>> -#define __HEAD_FLAG_BE       0
>> -#endif
>> -
>> -#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>> -
>> -#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
>> -                      (__HEAD_FLAG_PAGE_SIZE << 1))
>> -
>>  /*
>>   * These will output as part of the Image header, which should be little-endian
>> - * regardless of the endianness of the kernel. While constant values could be
>> - * endian swapped in head.S, all are done here for consistency.
>> + * regardless of the endianness of the kernel.
>>   */
>>  #define HEAD_SYMBOLS                                         \
>> -     _kernel_size_le         = DATA_LE64(_end - _text);      \
>> -     _kernel_offset_le       = DATA_LE64(TEXT_OFFSET);       \
>> -     _kernel_flags_le        = DATA_LE64(__HEAD_FLAGS);
>> +     _kernel_size_le         = DATA_LE32(_end - _text);
>>
>>  #ifdef CONFIG_EFI
>>
>> --
>> 2.5.0
>>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 11/21] arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
  2016-01-13 18:48       ` Ard Biesheuvel
  (?)
@ 2016-01-14  8:51         ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-14  8:51 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 13 January 2016 at 19:48, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 13 January 2016 at 19:12, Mark Rutland <mark.rutland@arm.com> wrote:
>> On Mon, Jan 11, 2016 at 02:19:04PM +0100, Ard Biesheuvel wrote:
>>> Unfortunately, the current way of using the linker to emit build time
>>> constants into the Image header will no longer work once we switch to
>>> the use of PIE executables. The reason is that such constants are emitted
>>> into the binary using R_AARCH64_ABS64 relocations, which we will resolve
>>> at runtime, not at build time, and the places targeted by those
>>> relocations will contain zeroes before that.
>>>
>>> So move back to assembly time constants or R_AARCH64_ABS32 relocations
>>> (which, interestingly enough, do get resolved at build time)
>>
>> To me it seems very odd that ABS64 and ABS32 are treated differently,
>> and it makes me somewhat uncomfortable becuase it feels like a bug.
>>
>> Do we know whether the inconsistency between ABS64 and ABS32 was
>> deliberate?
>>
>> I couldn't spot anything declaring a difference in the AArch64 ELF
>> spec, and I'm not sure where else to look.
>>
>
> My assumption is that PIE only defers resolving R_AARCH64_ABS64
> relocations since those are the only ones that can be used to refer to
> memory addresses
>

OK, digging into the binutils source code, it turns out that indeed,
ABSnn relocations where nn equals the ELFnn memory size are treated
differently, but only if they have default visibility. This is simply
a result of the fact the code path is shared between shared libraries
and PIE executables, since PIE executable are fully linked. It also
means that we can simply work around it by emitting the linker symbols
as hidden.


>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>> ---
>>>  arch/arm64/include/asm/assembler.h | 15 ++++++++
>>>  arch/arm64/kernel/head.S           | 17 +++++++--
>>>  arch/arm64/kernel/image.h          | 37 ++++++--------------
>>>  3 files changed, 40 insertions(+), 29 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
>>> index d8bfcc1ce923..e211af783a3d 100644
>>> --- a/arch/arm64/include/asm/assembler.h
>>> +++ b/arch/arm64/include/asm/assembler.h
>>> @@ -222,4 +222,19 @@ lr       .req    x30             // link register
>>>       .size   __pi_##x, . - x;        \
>>>       ENDPROC(x)
>>>
>>> +     .macro  le16, val
>>> +     .byte   \val & 0xff
>>> +     .byte   (\val >> 8) & 0xff
>>> +     .endm
>>> +
>>> +     .macro  le32, val
>>> +     le16    \val
>>> +     le16    (\val >> 16)
>>> +     .endm
>>> +
>>> +     .macro  le64, val
>>> +     le32    \val
>>> +     le32    (\val >> 32)
>>> +     .endm
>>> +
>>>  #endif       /* __ASM_ASSEMBLER_H */
>>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>>> index 350515276541..211f75e673f4 100644
>>> --- a/arch/arm64/kernel/head.S
>>> +++ b/arch/arm64/kernel/head.S
>>> @@ -51,6 +51,17 @@
>>>  #define KERNEL_START _text
>>>  #define KERNEL_END   _end
>>>
>>> +#ifdef CONFIG_CPU_BIG_ENDIAN
>>> +#define __HEAD_FLAG_BE       1
>>> +#else
>>> +#define __HEAD_FLAG_BE       0
>>> +#endif
>>> +
>>> +#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>>> +
>>> +#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
>>> +                      (__HEAD_FLAG_PAGE_SIZE << 1))
>>> +
>>>  /*
>>>   * Kernel startup entry point.
>>>   * ---------------------------
>>> @@ -83,9 +94,9 @@ efi_head:
>>>       b       stext                           // branch to kernel start, magic
>>>       .long   0                               // reserved
>>>  #endif
>>> -     .quad   _kernel_offset_le               // Image load offset from start of RAM, little-endian
>>> -     .quad   _kernel_size_le                 // Effective size of kernel image, little-endian
>>> -     .quad   _kernel_flags_le                // Informative flags, little-endian
>>> +     le64    TEXT_OFFSET                     // Image load offset from start of RAM, little-endian
>>> +     .long   _kernel_size_le, 0              // Effective size of kernel image, little-endian
>>> +     le64    __HEAD_FLAGS                    // Informative flags, little-endian
>>>       .quad   0                               // reserved
>>>       .quad   0                               // reserved
>>>       .quad   0                               // reserved
>>> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
>>> index bc2abb8b1599..bb6b0e69d0a4 100644
>>> --- a/arch/arm64/kernel/image.h
>>> +++ b/arch/arm64/kernel/image.h
>>> @@ -26,41 +26,26 @@
>>>   * There aren't any ELF relocations we can use to endian-swap values known only
>>>   * at link time (e.g. the subtraction of two symbol addresses), so we must get
>>>   * the linker to endian-swap certain values before emitting them.
>>> + * Note that this will not work for 64-bit values: these are resolved using
>>> + * R_AARCH64_ABS64 relocations, which are fixed up at runtime rather than at
>>> + * build time when building the PIE executable (for KASLR).
>>>   */
>>>  #ifdef CONFIG_CPU_BIG_ENDIAN
>>> -#define DATA_LE64(data)                                      \
>>> -     ((((data) & 0x00000000000000ff) << 56) |        \
>>> -      (((data) & 0x000000000000ff00) << 40) |        \
>>> -      (((data) & 0x0000000000ff0000) << 24) |        \
>>> -      (((data) & 0x00000000ff000000) << 8)  |        \
>>> -      (((data) & 0x000000ff00000000) >> 8)  |        \
>>> -      (((data) & 0x0000ff0000000000) >> 24) |        \
>>> -      (((data) & 0x00ff000000000000) >> 40) |        \
>>> -      (((data) & 0xff00000000000000) >> 56))
>>> +#define DATA_LE32(data)                              \
>>> +     ((((data) & 0x000000ff) << 24) |        \
>>> +      (((data) & 0x0000ff00) << 8)  |        \
>>> +      (((data) & 0x00ff0000) >> 8)  |        \
>>> +      (((data) & 0xff000000) >> 24))
>>>  #else
>>> -#define DATA_LE64(data) ((data) & 0xffffffffffffffff)
>>> +#define DATA_LE32(data) ((data) & 0xffffffff)
>>>  #endif
>>>
>>> -#ifdef CONFIG_CPU_BIG_ENDIAN
>>> -#define __HEAD_FLAG_BE       1
>>> -#else
>>> -#define __HEAD_FLAG_BE       0
>>> -#endif
>>> -
>>> -#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>>> -
>>> -#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
>>> -                      (__HEAD_FLAG_PAGE_SIZE << 1))
>>> -
>>>  /*
>>>   * These will output as part of the Image header, which should be little-endian
>>> - * regardless of the endianness of the kernel. While constant values could be
>>> - * endian swapped in head.S, all are done here for consistency.
>>> + * regardless of the endianness of the kernel.
>>>   */
>>>  #define HEAD_SYMBOLS                                         \
>>> -     _kernel_size_le         = DATA_LE64(_end - _text);      \
>>> -     _kernel_offset_le       = DATA_LE64(TEXT_OFFSET);       \
>>> -     _kernel_flags_le        = DATA_LE64(__HEAD_FLAGS);
>>> +     _kernel_size_le         = DATA_LE32(_end - _text);
>>>
>>>  #ifdef CONFIG_EFI
>>>
>>> --
>>> 2.5.0
>>>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 11/21] arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
@ 2016-01-14  8:51         ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-14  8:51 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 January 2016 at 19:48, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 13 January 2016 at 19:12, Mark Rutland <mark.rutland@arm.com> wrote:
>> On Mon, Jan 11, 2016 at 02:19:04PM +0100, Ard Biesheuvel wrote:
>>> Unfortunately, the current way of using the linker to emit build time
>>> constants into the Image header will no longer work once we switch to
>>> the use of PIE executables. The reason is that such constants are emitted
>>> into the binary using R_AARCH64_ABS64 relocations, which we will resolve
>>> at runtime, not at build time, and the places targeted by those
>>> relocations will contain zeroes before that.
>>>
>>> So move back to assembly time constants or R_AARCH64_ABS32 relocations
>>> (which, interestingly enough, do get resolved at build time)
>>
>> To me it seems very odd that ABS64 and ABS32 are treated differently,
>> and it makes me somewhat uncomfortable becuase it feels like a bug.
>>
>> Do we know whether the inconsistency between ABS64 and ABS32 was
>> deliberate?
>>
>> I couldn't spot anything declaring a difference in the AArch64 ELF
>> spec, and I'm not sure where else to look.
>>
>
> My assumption is that PIE only defers resolving R_AARCH64_ABS64
> relocations since those are the only ones that can be used to refer to
> memory addresses
>

OK, digging into the binutils source code, it turns out that indeed,
ABSnn relocations where nn equals the ELFnn memory size are treated
differently, but only if they have default visibility. This is simply
a result of the fact the code path is shared between shared libraries
and PIE executables, since PIE executable are fully linked. It also
means that we can simply work around it by emitting the linker symbols
as hidden.


>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>> ---
>>>  arch/arm64/include/asm/assembler.h | 15 ++++++++
>>>  arch/arm64/kernel/head.S           | 17 +++++++--
>>>  arch/arm64/kernel/image.h          | 37 ++++++--------------
>>>  3 files changed, 40 insertions(+), 29 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
>>> index d8bfcc1ce923..e211af783a3d 100644
>>> --- a/arch/arm64/include/asm/assembler.h
>>> +++ b/arch/arm64/include/asm/assembler.h
>>> @@ -222,4 +222,19 @@ lr       .req    x30             // link register
>>>       .size   __pi_##x, . - x;        \
>>>       ENDPROC(x)
>>>
>>> +     .macro  le16, val
>>> +     .byte   \val & 0xff
>>> +     .byte   (\val >> 8) & 0xff
>>> +     .endm
>>> +
>>> +     .macro  le32, val
>>> +     le16    \val
>>> +     le16    (\val >> 16)
>>> +     .endm
>>> +
>>> +     .macro  le64, val
>>> +     le32    \val
>>> +     le32    (\val >> 32)
>>> +     .endm
>>> +
>>>  #endif       /* __ASM_ASSEMBLER_H */
>>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>>> index 350515276541..211f75e673f4 100644
>>> --- a/arch/arm64/kernel/head.S
>>> +++ b/arch/arm64/kernel/head.S
>>> @@ -51,6 +51,17 @@
>>>  #define KERNEL_START _text
>>>  #define KERNEL_END   _end
>>>
>>> +#ifdef CONFIG_CPU_BIG_ENDIAN
>>> +#define __HEAD_FLAG_BE       1
>>> +#else
>>> +#define __HEAD_FLAG_BE       0
>>> +#endif
>>> +
>>> +#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>>> +
>>> +#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
>>> +                      (__HEAD_FLAG_PAGE_SIZE << 1))
>>> +
>>>  /*
>>>   * Kernel startup entry point.
>>>   * ---------------------------
>>> @@ -83,9 +94,9 @@ efi_head:
>>>       b       stext                           // branch to kernel start, magic
>>>       .long   0                               // reserved
>>>  #endif
>>> -     .quad   _kernel_offset_le               // Image load offset from start of RAM, little-endian
>>> -     .quad   _kernel_size_le                 // Effective size of kernel image, little-endian
>>> -     .quad   _kernel_flags_le                // Informative flags, little-endian
>>> +     le64    TEXT_OFFSET                     // Image load offset from start of RAM, little-endian
>>> +     .long   _kernel_size_le, 0              // Effective size of kernel image, little-endian
>>> +     le64    __HEAD_FLAGS                    // Informative flags, little-endian
>>>       .quad   0                               // reserved
>>>       .quad   0                               // reserved
>>>       .quad   0                               // reserved
>>> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
>>> index bc2abb8b1599..bb6b0e69d0a4 100644
>>> --- a/arch/arm64/kernel/image.h
>>> +++ b/arch/arm64/kernel/image.h
>>> @@ -26,41 +26,26 @@
>>>   * There aren't any ELF relocations we can use to endian-swap values known only
>>>   * at link time (e.g. the subtraction of two symbol addresses), so we must get
>>>   * the linker to endian-swap certain values before emitting them.
>>> + * Note that this will not work for 64-bit values: these are resolved using
>>> + * R_AARCH64_ABS64 relocations, which are fixed up at runtime rather than at
>>> + * build time when building the PIE executable (for KASLR).
>>>   */
>>>  #ifdef CONFIG_CPU_BIG_ENDIAN
>>> -#define DATA_LE64(data)                                      \
>>> -     ((((data) & 0x00000000000000ff) << 56) |        \
>>> -      (((data) & 0x000000000000ff00) << 40) |        \
>>> -      (((data) & 0x0000000000ff0000) << 24) |        \
>>> -      (((data) & 0x00000000ff000000) << 8)  |        \
>>> -      (((data) & 0x000000ff00000000) >> 8)  |        \
>>> -      (((data) & 0x0000ff0000000000) >> 24) |        \
>>> -      (((data) & 0x00ff000000000000) >> 40) |        \
>>> -      (((data) & 0xff00000000000000) >> 56))
>>> +#define DATA_LE32(data)                              \
>>> +     ((((data) & 0x000000ff) << 24) |        \
>>> +      (((data) & 0x0000ff00) << 8)  |        \
>>> +      (((data) & 0x00ff0000) >> 8)  |        \
>>> +      (((data) & 0xff000000) >> 24))
>>>  #else
>>> -#define DATA_LE64(data) ((data) & 0xffffffffffffffff)
>>> +#define DATA_LE32(data) ((data) & 0xffffffff)
>>>  #endif
>>>
>>> -#ifdef CONFIG_CPU_BIG_ENDIAN
>>> -#define __HEAD_FLAG_BE       1
>>> -#else
>>> -#define __HEAD_FLAG_BE       0
>>> -#endif
>>> -
>>> -#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>>> -
>>> -#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
>>> -                      (__HEAD_FLAG_PAGE_SIZE << 1))
>>> -
>>>  /*
>>>   * These will output as part of the Image header, which should be little-endian
>>> - * regardless of the endianness of the kernel. While constant values could be
>>> - * endian swapped in head.S, all are done here for consistency.
>>> + * regardless of the endianness of the kernel.
>>>   */
>>>  #define HEAD_SYMBOLS                                         \
>>> -     _kernel_size_le         = DATA_LE64(_end - _text);      \
>>> -     _kernel_offset_le       = DATA_LE64(TEXT_OFFSET);       \
>>> -     _kernel_flags_le        = DATA_LE64(__HEAD_FLAGS);
>>> +     _kernel_size_le         = DATA_LE32(_end - _text);
>>>
>>>  #ifdef CONFIG_EFI
>>>
>>> --
>>> 2.5.0
>>>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 11/21] arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
@ 2016-01-14  8:51         ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-14  8:51 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 13 January 2016 at 19:48, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 13 January 2016 at 19:12, Mark Rutland <mark.rutland@arm.com> wrote:
>> On Mon, Jan 11, 2016 at 02:19:04PM +0100, Ard Biesheuvel wrote:
>>> Unfortunately, the current way of using the linker to emit build time
>>> constants into the Image header will no longer work once we switch to
>>> the use of PIE executables. The reason is that such constants are emitted
>>> into the binary using R_AARCH64_ABS64 relocations, which we will resolve
>>> at runtime, not at build time, and the places targeted by those
>>> relocations will contain zeroes before that.
>>>
>>> So move back to assembly time constants or R_AARCH64_ABS32 relocations
>>> (which, interestingly enough, do get resolved at build time)
>>
>> To me it seems very odd that ABS64 and ABS32 are treated differently,
>> and it makes me somewhat uncomfortable becuase it feels like a bug.
>>
>> Do we know whether the inconsistency between ABS64 and ABS32 was
>> deliberate?
>>
>> I couldn't spot anything declaring a difference in the AArch64 ELF
>> spec, and I'm not sure where else to look.
>>
>
> My assumption is that PIE only defers resolving R_AARCH64_ABS64
> relocations since those are the only ones that can be used to refer to
> memory addresses
>

OK, digging into the binutils source code, it turns out that indeed,
ABSnn relocations where nn equals the ELFnn memory size are treated
differently, but only if they have default visibility. This is simply
a result of the fact the code path is shared between shared libraries
and PIE executables, since PIE executable are fully linked. It also
means that we can simply work around it by emitting the linker symbols
as hidden.


>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>> ---
>>>  arch/arm64/include/asm/assembler.h | 15 ++++++++
>>>  arch/arm64/kernel/head.S           | 17 +++++++--
>>>  arch/arm64/kernel/image.h          | 37 ++++++--------------
>>>  3 files changed, 40 insertions(+), 29 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
>>> index d8bfcc1ce923..e211af783a3d 100644
>>> --- a/arch/arm64/include/asm/assembler.h
>>> +++ b/arch/arm64/include/asm/assembler.h
>>> @@ -222,4 +222,19 @@ lr       .req    x30             // link register
>>>       .size   __pi_##x, . - x;        \
>>>       ENDPROC(x)
>>>
>>> +     .macro  le16, val
>>> +     .byte   \val & 0xff
>>> +     .byte   (\val >> 8) & 0xff
>>> +     .endm
>>> +
>>> +     .macro  le32, val
>>> +     le16    \val
>>> +     le16    (\val >> 16)
>>> +     .endm
>>> +
>>> +     .macro  le64, val
>>> +     le32    \val
>>> +     le32    (\val >> 32)
>>> +     .endm
>>> +
>>>  #endif       /* __ASM_ASSEMBLER_H */
>>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>>> index 350515276541..211f75e673f4 100644
>>> --- a/arch/arm64/kernel/head.S
>>> +++ b/arch/arm64/kernel/head.S
>>> @@ -51,6 +51,17 @@
>>>  #define KERNEL_START _text
>>>  #define KERNEL_END   _end
>>>
>>> +#ifdef CONFIG_CPU_BIG_ENDIAN
>>> +#define __HEAD_FLAG_BE       1
>>> +#else
>>> +#define __HEAD_FLAG_BE       0
>>> +#endif
>>> +
>>> +#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>>> +
>>> +#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
>>> +                      (__HEAD_FLAG_PAGE_SIZE << 1))
>>> +
>>>  /*
>>>   * Kernel startup entry point.
>>>   * ---------------------------
>>> @@ -83,9 +94,9 @@ efi_head:
>>>       b       stext                           // branch to kernel start, magic
>>>       .long   0                               // reserved
>>>  #endif
>>> -     .quad   _kernel_offset_le               // Image load offset from start of RAM, little-endian
>>> -     .quad   _kernel_size_le                 // Effective size of kernel image, little-endian
>>> -     .quad   _kernel_flags_le                // Informative flags, little-endian
>>> +     le64    TEXT_OFFSET                     // Image load offset from start of RAM, little-endian
>>> +     .long   _kernel_size_le, 0              // Effective size of kernel image, little-endian
>>> +     le64    __HEAD_FLAGS                    // Informative flags, little-endian
>>>       .quad   0                               // reserved
>>>       .quad   0                               // reserved
>>>       .quad   0                               // reserved
>>> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
>>> index bc2abb8b1599..bb6b0e69d0a4 100644
>>> --- a/arch/arm64/kernel/image.h
>>> +++ b/arch/arm64/kernel/image.h
>>> @@ -26,41 +26,26 @@
>>>   * There aren't any ELF relocations we can use to endian-swap values known only
>>>   * at link time (e.g. the subtraction of two symbol addresses), so we must get
>>>   * the linker to endian-swap certain values before emitting them.
>>> + * Note that this will not work for 64-bit values: these are resolved using
>>> + * R_AARCH64_ABS64 relocations, which are fixed up at runtime rather than at
>>> + * build time when building the PIE executable (for KASLR).
>>>   */
>>>  #ifdef CONFIG_CPU_BIG_ENDIAN
>>> -#define DATA_LE64(data)                                      \
>>> -     ((((data) & 0x00000000000000ff) << 56) |        \
>>> -      (((data) & 0x000000000000ff00) << 40) |        \
>>> -      (((data) & 0x0000000000ff0000) << 24) |        \
>>> -      (((data) & 0x00000000ff000000) << 8)  |        \
>>> -      (((data) & 0x000000ff00000000) >> 8)  |        \
>>> -      (((data) & 0x0000ff0000000000) >> 24) |        \
>>> -      (((data) & 0x00ff000000000000) >> 40) |        \
>>> -      (((data) & 0xff00000000000000) >> 56))
>>> +#define DATA_LE32(data)                              \
>>> +     ((((data) & 0x000000ff) << 24) |        \
>>> +      (((data) & 0x0000ff00) << 8)  |        \
>>> +      (((data) & 0x00ff0000) >> 8)  |        \
>>> +      (((data) & 0xff000000) >> 24))
>>>  #else
>>> -#define DATA_LE64(data) ((data) & 0xffffffffffffffff)
>>> +#define DATA_LE32(data) ((data) & 0xffffffff)
>>>  #endif
>>>
>>> -#ifdef CONFIG_CPU_BIG_ENDIAN
>>> -#define __HEAD_FLAG_BE       1
>>> -#else
>>> -#define __HEAD_FLAG_BE       0
>>> -#endif
>>> -
>>> -#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>>> -
>>> -#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
>>> -                      (__HEAD_FLAG_PAGE_SIZE << 1))
>>> -
>>>  /*
>>>   * These will output as part of the Image header, which should be little-endian
>>> - * regardless of the endianness of the kernel. While constant values could be
>>> - * endian swapped in head.S, all are done here for consistency.
>>> + * regardless of the endianness of the kernel.
>>>   */
>>>  #define HEAD_SYMBOLS                                         \
>>> -     _kernel_size_le         = DATA_LE64(_end - _text);      \
>>> -     _kernel_offset_le       = DATA_LE64(TEXT_OFFSET);       \
>>> -     _kernel_flags_le        = DATA_LE64(__HEAD_FLAGS);
>>> +     _kernel_size_le         = DATA_LE32(_end - _text);
>>>
>>>  #ifdef CONFIG_EFI
>>>
>>> --
>>> 2.5.0
>>>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 11/21] arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
  2016-01-14  8:51         ` Ard Biesheuvel
  (?)
@ 2016-01-14  9:05           ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-14  9:05 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 14 January 2016 at 09:51, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 13 January 2016 at 19:48, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> On 13 January 2016 at 19:12, Mark Rutland <mark.rutland@arm.com> wrote:
>>> On Mon, Jan 11, 2016 at 02:19:04PM +0100, Ard Biesheuvel wrote:
>>>> Unfortunately, the current way of using the linker to emit build time
>>>> constants into the Image header will no longer work once we switch to
>>>> the use of PIE executables. The reason is that such constants are emitted
>>>> into the binary using R_AARCH64_ABS64 relocations, which we will resolve
>>>> at runtime, not at build time, and the places targeted by those
>>>> relocations will contain zeroes before that.
>>>>
>>>> So move back to assembly time constants or R_AARCH64_ABS32 relocations
>>>> (which, interestingly enough, do get resolved at build time)
>>>
>>> To me it seems very odd that ABS64 and ABS32 are treated differently,
>>> and it makes me somewhat uncomfortable becuase it feels like a bug.
>>>
>>> Do we know whether the inconsistency between ABS64 and ABS32 was
>>> deliberate?
>>>
>>> I couldn't spot anything declaring a difference in the AArch64 ELF
>>> spec, and I'm not sure where else to look.
>>>
>>
>> My assumption is that PIE only defers resolving R_AARCH64_ABS64
>> relocations since those are the only ones that can be used to refer to
>> memory addresses
>>
>
> OK, digging into the binutils source code, it turns out that indeed,
> ABSnn relocations where nn equals the ELFnn memory size are treated
> differently, but only if they have default visibility. This is simply
> a result of the fact the code path is shared between shared libraries
> and PIE executables, since PIE executable are fully linked. It also
> means that we can simply work around it by emitting the linker symbols
> as hidden.
>

... and the bad news is that, while emitting the symbols as hidden
turns them from R_AARCH64_ABS64 into a R_AARCH64_RELATIVE relocations,
it does not actually force the value to be emitted at build time.

So I am going to stick with the patch, but elaborate in a comment
about why R_AARCH64_ABSnn are treated differently if nn equals the
pointer size. (look at elfNN_aarch64_final_link_relocate() in binutils
if you are keen to look at the code yourself)

-- 
Ard.



>
>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>> ---
>>>>  arch/arm64/include/asm/assembler.h | 15 ++++++++
>>>>  arch/arm64/kernel/head.S           | 17 +++++++--
>>>>  arch/arm64/kernel/image.h          | 37 ++++++--------------
>>>>  3 files changed, 40 insertions(+), 29 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
>>>> index d8bfcc1ce923..e211af783a3d 100644
>>>> --- a/arch/arm64/include/asm/assembler.h
>>>> +++ b/arch/arm64/include/asm/assembler.h
>>>> @@ -222,4 +222,19 @@ lr       .req    x30             // link register
>>>>       .size   __pi_##x, . - x;        \
>>>>       ENDPROC(x)
>>>>
>>>> +     .macro  le16, val
>>>> +     .byte   \val & 0xff
>>>> +     .byte   (\val >> 8) & 0xff
>>>> +     .endm
>>>> +
>>>> +     .macro  le32, val
>>>> +     le16    \val
>>>> +     le16    (\val >> 16)
>>>> +     .endm
>>>> +
>>>> +     .macro  le64, val
>>>> +     le32    \val
>>>> +     le32    (\val >> 32)
>>>> +     .endm
>>>> +
>>>>  #endif       /* __ASM_ASSEMBLER_H */
>>>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>>>> index 350515276541..211f75e673f4 100644
>>>> --- a/arch/arm64/kernel/head.S
>>>> +++ b/arch/arm64/kernel/head.S
>>>> @@ -51,6 +51,17 @@
>>>>  #define KERNEL_START _text
>>>>  #define KERNEL_END   _end
>>>>
>>>> +#ifdef CONFIG_CPU_BIG_ENDIAN
>>>> +#define __HEAD_FLAG_BE       1
>>>> +#else
>>>> +#define __HEAD_FLAG_BE       0
>>>> +#endif
>>>> +
>>>> +#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>>>> +
>>>> +#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
>>>> +                      (__HEAD_FLAG_PAGE_SIZE << 1))
>>>> +
>>>>  /*
>>>>   * Kernel startup entry point.
>>>>   * ---------------------------
>>>> @@ -83,9 +94,9 @@ efi_head:
>>>>       b       stext                           // branch to kernel start, magic
>>>>       .long   0                               // reserved
>>>>  #endif
>>>> -     .quad   _kernel_offset_le               // Image load offset from start of RAM, little-endian
>>>> -     .quad   _kernel_size_le                 // Effective size of kernel image, little-endian
>>>> -     .quad   _kernel_flags_le                // Informative flags, little-endian
>>>> +     le64    TEXT_OFFSET                     // Image load offset from start of RAM, little-endian
>>>> +     .long   _kernel_size_le, 0              // Effective size of kernel image, little-endian
>>>> +     le64    __HEAD_FLAGS                    // Informative flags, little-endian
>>>>       .quad   0                               // reserved
>>>>       .quad   0                               // reserved
>>>>       .quad   0                               // reserved
>>>> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
>>>> index bc2abb8b1599..bb6b0e69d0a4 100644
>>>> --- a/arch/arm64/kernel/image.h
>>>> +++ b/arch/arm64/kernel/image.h
>>>> @@ -26,41 +26,26 @@
>>>>   * There aren't any ELF relocations we can use to endian-swap values known only
>>>>   * at link time (e.g. the subtraction of two symbol addresses), so we must get
>>>>   * the linker to endian-swap certain values before emitting them.
>>>> + * Note that this will not work for 64-bit values: these are resolved using
>>>> + * R_AARCH64_ABS64 relocations, which are fixed up at runtime rather than at
>>>> + * build time when building the PIE executable (for KASLR).
>>>>   */
>>>>  #ifdef CONFIG_CPU_BIG_ENDIAN
>>>> -#define DATA_LE64(data)                                      \
>>>> -     ((((data) & 0x00000000000000ff) << 56) |        \
>>>> -      (((data) & 0x000000000000ff00) << 40) |        \
>>>> -      (((data) & 0x0000000000ff0000) << 24) |        \
>>>> -      (((data) & 0x00000000ff000000) << 8)  |        \
>>>> -      (((data) & 0x000000ff00000000) >> 8)  |        \
>>>> -      (((data) & 0x0000ff0000000000) >> 24) |        \
>>>> -      (((data) & 0x00ff000000000000) >> 40) |        \
>>>> -      (((data) & 0xff00000000000000) >> 56))
>>>> +#define DATA_LE32(data)                              \
>>>> +     ((((data) & 0x000000ff) << 24) |        \
>>>> +      (((data) & 0x0000ff00) << 8)  |        \
>>>> +      (((data) & 0x00ff0000) >> 8)  |        \
>>>> +      (((data) & 0xff000000) >> 24))
>>>>  #else
>>>> -#define DATA_LE64(data) ((data) & 0xffffffffffffffff)
>>>> +#define DATA_LE32(data) ((data) & 0xffffffff)
>>>>  #endif
>>>>
>>>> -#ifdef CONFIG_CPU_BIG_ENDIAN
>>>> -#define __HEAD_FLAG_BE       1
>>>> -#else
>>>> -#define __HEAD_FLAG_BE       0
>>>> -#endif
>>>> -
>>>> -#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>>>> -
>>>> -#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
>>>> -                      (__HEAD_FLAG_PAGE_SIZE << 1))
>>>> -
>>>>  /*
>>>>   * These will output as part of the Image header, which should be little-endian
>>>> - * regardless of the endianness of the kernel. While constant values could be
>>>> - * endian swapped in head.S, all are done here for consistency.
>>>> + * regardless of the endianness of the kernel.
>>>>   */
>>>>  #define HEAD_SYMBOLS                                         \
>>>> -     _kernel_size_le         = DATA_LE64(_end - _text);      \
>>>> -     _kernel_offset_le       = DATA_LE64(TEXT_OFFSET);       \
>>>> -     _kernel_flags_le        = DATA_LE64(__HEAD_FLAGS);
>>>> +     _kernel_size_le         = DATA_LE32(_end - _text);
>>>>
>>>>  #ifdef CONFIG_EFI
>>>>
>>>> --
>>>> 2.5.0
>>>>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 11/21] arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
@ 2016-01-14  9:05           ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-14  9:05 UTC (permalink / raw)
  To: linux-arm-kernel

On 14 January 2016 at 09:51, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 13 January 2016 at 19:48, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> On 13 January 2016 at 19:12, Mark Rutland <mark.rutland@arm.com> wrote:
>>> On Mon, Jan 11, 2016 at 02:19:04PM +0100, Ard Biesheuvel wrote:
>>>> Unfortunately, the current way of using the linker to emit build time
>>>> constants into the Image header will no longer work once we switch to
>>>> the use of PIE executables. The reason is that such constants are emitted
>>>> into the binary using R_AARCH64_ABS64 relocations, which we will resolve
>>>> at runtime, not at build time, and the places targeted by those
>>>> relocations will contain zeroes before that.
>>>>
>>>> So move back to assembly time constants or R_AARCH64_ABS32 relocations
>>>> (which, interestingly enough, do get resolved at build time)
>>>
>>> To me it seems very odd that ABS64 and ABS32 are treated differently,
>>> and it makes me somewhat uncomfortable becuase it feels like a bug.
>>>
>>> Do we know whether the inconsistency between ABS64 and ABS32 was
>>> deliberate?
>>>
>>> I couldn't spot anything declaring a difference in the AArch64 ELF
>>> spec, and I'm not sure where else to look.
>>>
>>
>> My assumption is that PIE only defers resolving R_AARCH64_ABS64
>> relocations since those are the only ones that can be used to refer to
>> memory addresses
>>
>
> OK, digging into the binutils source code, it turns out that indeed,
> ABSnn relocations where nn equals the ELFnn memory size are treated
> differently, but only if they have default visibility. This is simply
> a result of the fact the code path is shared between shared libraries
> and PIE executables, since PIE executable are fully linked. It also
> means that we can simply work around it by emitting the linker symbols
> as hidden.
>

... and the bad news is that, while emitting the symbols as hidden
turns them from R_AARCH64_ABS64 into a R_AARCH64_RELATIVE relocations,
it does not actually force the value to be emitted at build time.

So I am going to stick with the patch, but elaborate in a comment
about why R_AARCH64_ABSnn are treated differently if nn equals the
pointer size. (look at elfNN_aarch64_final_link_relocate() in binutils
if you are keen to look at the code yourself)

-- 
Ard.



>
>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>> ---
>>>>  arch/arm64/include/asm/assembler.h | 15 ++++++++
>>>>  arch/arm64/kernel/head.S           | 17 +++++++--
>>>>  arch/arm64/kernel/image.h          | 37 ++++++--------------
>>>>  3 files changed, 40 insertions(+), 29 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
>>>> index d8bfcc1ce923..e211af783a3d 100644
>>>> --- a/arch/arm64/include/asm/assembler.h
>>>> +++ b/arch/arm64/include/asm/assembler.h
>>>> @@ -222,4 +222,19 @@ lr       .req    x30             // link register
>>>>       .size   __pi_##x, . - x;        \
>>>>       ENDPROC(x)
>>>>
>>>> +     .macro  le16, val
>>>> +     .byte   \val & 0xff
>>>> +     .byte   (\val >> 8) & 0xff
>>>> +     .endm
>>>> +
>>>> +     .macro  le32, val
>>>> +     le16    \val
>>>> +     le16    (\val >> 16)
>>>> +     .endm
>>>> +
>>>> +     .macro  le64, val
>>>> +     le32    \val
>>>> +     le32    (\val >> 32)
>>>> +     .endm
>>>> +
>>>>  #endif       /* __ASM_ASSEMBLER_H */
>>>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>>>> index 350515276541..211f75e673f4 100644
>>>> --- a/arch/arm64/kernel/head.S
>>>> +++ b/arch/arm64/kernel/head.S
>>>> @@ -51,6 +51,17 @@
>>>>  #define KERNEL_START _text
>>>>  #define KERNEL_END   _end
>>>>
>>>> +#ifdef CONFIG_CPU_BIG_ENDIAN
>>>> +#define __HEAD_FLAG_BE       1
>>>> +#else
>>>> +#define __HEAD_FLAG_BE       0
>>>> +#endif
>>>> +
>>>> +#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>>>> +
>>>> +#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
>>>> +                      (__HEAD_FLAG_PAGE_SIZE << 1))
>>>> +
>>>>  /*
>>>>   * Kernel startup entry point.
>>>>   * ---------------------------
>>>> @@ -83,9 +94,9 @@ efi_head:
>>>>       b       stext                           // branch to kernel start, magic
>>>>       .long   0                               // reserved
>>>>  #endif
>>>> -     .quad   _kernel_offset_le               // Image load offset from start of RAM, little-endian
>>>> -     .quad   _kernel_size_le                 // Effective size of kernel image, little-endian
>>>> -     .quad   _kernel_flags_le                // Informative flags, little-endian
>>>> +     le64    TEXT_OFFSET                     // Image load offset from start of RAM, little-endian
>>>> +     .long   _kernel_size_le, 0              // Effective size of kernel image, little-endian
>>>> +     le64    __HEAD_FLAGS                    // Informative flags, little-endian
>>>>       .quad   0                               // reserved
>>>>       .quad   0                               // reserved
>>>>       .quad   0                               // reserved
>>>> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
>>>> index bc2abb8b1599..bb6b0e69d0a4 100644
>>>> --- a/arch/arm64/kernel/image.h
>>>> +++ b/arch/arm64/kernel/image.h
>>>> @@ -26,41 +26,26 @@
>>>>   * There aren't any ELF relocations we can use to endian-swap values known only
>>>>   * at link time (e.g. the subtraction of two symbol addresses), so we must get
>>>>   * the linker to endian-swap certain values before emitting them.
>>>> + * Note that this will not work for 64-bit values: these are resolved using
>>>> + * R_AARCH64_ABS64 relocations, which are fixed up at runtime rather than at
>>>> + * build time when building the PIE executable (for KASLR).
>>>>   */
>>>>  #ifdef CONFIG_CPU_BIG_ENDIAN
>>>> -#define DATA_LE64(data)                                      \
>>>> -     ((((data) & 0x00000000000000ff) << 56) |        \
>>>> -      (((data) & 0x000000000000ff00) << 40) |        \
>>>> -      (((data) & 0x0000000000ff0000) << 24) |        \
>>>> -      (((data) & 0x00000000ff000000) << 8)  |        \
>>>> -      (((data) & 0x000000ff00000000) >> 8)  |        \
>>>> -      (((data) & 0x0000ff0000000000) >> 24) |        \
>>>> -      (((data) & 0x00ff000000000000) >> 40) |        \
>>>> -      (((data) & 0xff00000000000000) >> 56))
>>>> +#define DATA_LE32(data)                              \
>>>> +     ((((data) & 0x000000ff) << 24) |        \
>>>> +      (((data) & 0x0000ff00) << 8)  |        \
>>>> +      (((data) & 0x00ff0000) >> 8)  |        \
>>>> +      (((data) & 0xff000000) >> 24))
>>>>  #else
>>>> -#define DATA_LE64(data) ((data) & 0xffffffffffffffff)
>>>> +#define DATA_LE32(data) ((data) & 0xffffffff)
>>>>  #endif
>>>>
>>>> -#ifdef CONFIG_CPU_BIG_ENDIAN
>>>> -#define __HEAD_FLAG_BE       1
>>>> -#else
>>>> -#define __HEAD_FLAG_BE       0
>>>> -#endif
>>>> -
>>>> -#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>>>> -
>>>> -#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
>>>> -                      (__HEAD_FLAG_PAGE_SIZE << 1))
>>>> -
>>>>  /*
>>>>   * These will output as part of the Image header, which should be little-endian
>>>> - * regardless of the endianness of the kernel. While constant values could be
>>>> - * endian swapped in head.S, all are done here for consistency.
>>>> + * regardless of the endianness of the kernel.
>>>>   */
>>>>  #define HEAD_SYMBOLS                                         \
>>>> -     _kernel_size_le         = DATA_LE64(_end - _text);      \
>>>> -     _kernel_offset_le       = DATA_LE64(TEXT_OFFSET);       \
>>>> -     _kernel_flags_le        = DATA_LE64(__HEAD_FLAGS);
>>>> +     _kernel_size_le         = DATA_LE32(_end - _text);
>>>>
>>>>  #ifdef CONFIG_EFI
>>>>
>>>> --
>>>> 2.5.0
>>>>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 11/21] arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
@ 2016-01-14  9:05           ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-14  9:05 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 14 January 2016 at 09:51, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 13 January 2016 at 19:48, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> On 13 January 2016 at 19:12, Mark Rutland <mark.rutland@arm.com> wrote:
>>> On Mon, Jan 11, 2016 at 02:19:04PM +0100, Ard Biesheuvel wrote:
>>>> Unfortunately, the current way of using the linker to emit build time
>>>> constants into the Image header will no longer work once we switch to
>>>> the use of PIE executables. The reason is that such constants are emitted
>>>> into the binary using R_AARCH64_ABS64 relocations, which we will resolve
>>>> at runtime, not at build time, and the places targeted by those
>>>> relocations will contain zeroes before that.
>>>>
>>>> So move back to assembly time constants or R_AARCH64_ABS32 relocations
>>>> (which, interestingly enough, do get resolved at build time)
>>>
>>> To me it seems very odd that ABS64 and ABS32 are treated differently,
>>> and it makes me somewhat uncomfortable becuase it feels like a bug.
>>>
>>> Do we know whether the inconsistency between ABS64 and ABS32 was
>>> deliberate?
>>>
>>> I couldn't spot anything declaring a difference in the AArch64 ELF
>>> spec, and I'm not sure where else to look.
>>>
>>
>> My assumption is that PIE only defers resolving R_AARCH64_ABS64
>> relocations since those are the only ones that can be used to refer to
>> memory addresses
>>
>
> OK, digging into the binutils source code, it turns out that indeed,
> ABSnn relocations where nn equals the ELFnn memory size are treated
> differently, but only if they have default visibility. This is simply
> a result of the fact the code path is shared between shared libraries
> and PIE executables, since PIE executable are fully linked. It also
> means that we can simply work around it by emitting the linker symbols
> as hidden.
>

... and the bad news is that, while emitting the symbols as hidden
turns them from R_AARCH64_ABS64 into a R_AARCH64_RELATIVE relocations,
it does not actually force the value to be emitted at build time.

So I am going to stick with the patch, but elaborate in a comment
about why R_AARCH64_ABSnn are treated differently if nn equals the
pointer size. (look at elfNN_aarch64_final_link_relocate() in binutils
if you are keen to look at the code yourself)

-- 
Ard.



>
>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>> ---
>>>>  arch/arm64/include/asm/assembler.h | 15 ++++++++
>>>>  arch/arm64/kernel/head.S           | 17 +++++++--
>>>>  arch/arm64/kernel/image.h          | 37 ++++++--------------
>>>>  3 files changed, 40 insertions(+), 29 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
>>>> index d8bfcc1ce923..e211af783a3d 100644
>>>> --- a/arch/arm64/include/asm/assembler.h
>>>> +++ b/arch/arm64/include/asm/assembler.h
>>>> @@ -222,4 +222,19 @@ lr       .req    x30             // link register
>>>>       .size   __pi_##x, . - x;        \
>>>>       ENDPROC(x)
>>>>
>>>> +     .macro  le16, val
>>>> +     .byte   \val & 0xff
>>>> +     .byte   (\val >> 8) & 0xff
>>>> +     .endm
>>>> +
>>>> +     .macro  le32, val
>>>> +     le16    \val
>>>> +     le16    (\val >> 16)
>>>> +     .endm
>>>> +
>>>> +     .macro  le64, val
>>>> +     le32    \val
>>>> +     le32    (\val >> 32)
>>>> +     .endm
>>>> +
>>>>  #endif       /* __ASM_ASSEMBLER_H */
>>>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>>>> index 350515276541..211f75e673f4 100644
>>>> --- a/arch/arm64/kernel/head.S
>>>> +++ b/arch/arm64/kernel/head.S
>>>> @@ -51,6 +51,17 @@
>>>>  #define KERNEL_START _text
>>>>  #define KERNEL_END   _end
>>>>
>>>> +#ifdef CONFIG_CPU_BIG_ENDIAN
>>>> +#define __HEAD_FLAG_BE       1
>>>> +#else
>>>> +#define __HEAD_FLAG_BE       0
>>>> +#endif
>>>> +
>>>> +#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>>>> +
>>>> +#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
>>>> +                      (__HEAD_FLAG_PAGE_SIZE << 1))
>>>> +
>>>>  /*
>>>>   * Kernel startup entry point.
>>>>   * ---------------------------
>>>> @@ -83,9 +94,9 @@ efi_head:
>>>>       b       stext                           // branch to kernel start, magic
>>>>       .long   0                               // reserved
>>>>  #endif
>>>> -     .quad   _kernel_offset_le               // Image load offset from start of RAM, little-endian
>>>> -     .quad   _kernel_size_le                 // Effective size of kernel image, little-endian
>>>> -     .quad   _kernel_flags_le                // Informative flags, little-endian
>>>> +     le64    TEXT_OFFSET                     // Image load offset from start of RAM, little-endian
>>>> +     .long   _kernel_size_le, 0              // Effective size of kernel image, little-endian
>>>> +     le64    __HEAD_FLAGS                    // Informative flags, little-endian
>>>>       .quad   0                               // reserved
>>>>       .quad   0                               // reserved
>>>>       .quad   0                               // reserved
>>>> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
>>>> index bc2abb8b1599..bb6b0e69d0a4 100644
>>>> --- a/arch/arm64/kernel/image.h
>>>> +++ b/arch/arm64/kernel/image.h
>>>> @@ -26,41 +26,26 @@
>>>>   * There aren't any ELF relocations we can use to endian-swap values known only
>>>>   * at link time (e.g. the subtraction of two symbol addresses), so we must get
>>>>   * the linker to endian-swap certain values before emitting them.
>>>> + * Note that this will not work for 64-bit values: these are resolved using
>>>> + * R_AARCH64_ABS64 relocations, which are fixed up at runtime rather than at
>>>> + * build time when building the PIE executable (for KASLR).
>>>>   */
>>>>  #ifdef CONFIG_CPU_BIG_ENDIAN
>>>> -#define DATA_LE64(data)                                      \
>>>> -     ((((data) & 0x00000000000000ff) << 56) |        \
>>>> -      (((data) & 0x000000000000ff00) << 40) |        \
>>>> -      (((data) & 0x0000000000ff0000) << 24) |        \
>>>> -      (((data) & 0x00000000ff000000) << 8)  |        \
>>>> -      (((data) & 0x000000ff00000000) >> 8)  |        \
>>>> -      (((data) & 0x0000ff0000000000) >> 24) |        \
>>>> -      (((data) & 0x00ff000000000000) >> 40) |        \
>>>> -      (((data) & 0xff00000000000000) >> 56))
>>>> +#define DATA_LE32(data)                              \
>>>> +     ((((data) & 0x000000ff) << 24) |        \
>>>> +      (((data) & 0x0000ff00) << 8)  |        \
>>>> +      (((data) & 0x00ff0000) >> 8)  |        \
>>>> +      (((data) & 0xff000000) >> 24))
>>>>  #else
>>>> -#define DATA_LE64(data) ((data) & 0xffffffffffffffff)
>>>> +#define DATA_LE32(data) ((data) & 0xffffffff)
>>>>  #endif
>>>>
>>>> -#ifdef CONFIG_CPU_BIG_ENDIAN
>>>> -#define __HEAD_FLAG_BE       1
>>>> -#else
>>>> -#define __HEAD_FLAG_BE       0
>>>> -#endif
>>>> -
>>>> -#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>>>> -
>>>> -#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
>>>> -                      (__HEAD_FLAG_PAGE_SIZE << 1))
>>>> -
>>>>  /*
>>>>   * These will output as part of the Image header, which should be little-endian
>>>> - * regardless of the endianness of the kernel. While constant values could be
>>>> - * endian swapped in head.S, all are done here for consistency.
>>>> + * regardless of the endianness of the kernel.
>>>>   */
>>>>  #define HEAD_SYMBOLS                                         \
>>>> -     _kernel_size_le         = DATA_LE64(_end - _text);      \
>>>> -     _kernel_offset_le       = DATA_LE64(TEXT_OFFSET);       \
>>>> -     _kernel_flags_le        = DATA_LE64(__HEAD_FLAGS);
>>>> +     _kernel_size_le         = DATA_LE32(_end - _text);
>>>>
>>>>  #ifdef CONFIG_EFI
>>>>
>>>> --
>>>> 2.5.0
>>>>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 11/21] arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
  2016-01-14  9:05           ` Ard Biesheuvel
  (?)
@ 2016-01-14 10:46             ` Mark Rutland
  -1 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-14 10:46 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On Thu, Jan 14, 2016 at 10:05:42AM +0100, Ard Biesheuvel wrote:
> On 14 January 2016 at 09:51, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > On 13 January 2016 at 19:48, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> >> On 13 January 2016 at 19:12, Mark Rutland <mark.rutland@arm.com> wrote:
> >>> On Mon, Jan 11, 2016 at 02:19:04PM +0100, Ard Biesheuvel wrote:
> >>>> Unfortunately, the current way of using the linker to emit build time
> >>>> constants into the Image header will no longer work once we switch to
> >>>> the use of PIE executables. The reason is that such constants are emitted
> >>>> into the binary using R_AARCH64_ABS64 relocations, which we will resolve
> >>>> at runtime, not at build time, and the places targeted by those
> >>>> relocations will contain zeroes before that.
> >>>>
> >>>> So move back to assembly time constants or R_AARCH64_ABS32 relocations
> >>>> (which, interestingly enough, do get resolved at build time)
> >>>
> >>> To me it seems very odd that ABS64 and ABS32 are treated differently,
> >>> and it makes me somewhat uncomfortable becuase it feels like a bug.
> >>>
> >>> Do we know whether the inconsistency between ABS64 and ABS32 was
> >>> deliberate?
> >>>
> >>> I couldn't spot anything declaring a difference in the AArch64 ELF
> >>> spec, and I'm not sure where else to look.
> >>>
> >>
> >> My assumption is that PIE only defers resolving R_AARCH64_ABS64
> >> relocations since those are the only ones that can be used to refer to
> >> memory addresses
> >>
> >
> > OK, digging into the binutils source code, it turns out that indeed,
> > ABSnn relocations where nn equals the ELFnn memory size are treated
> > differently, but only if they have default visibility. This is simply
> > a result of the fact the code path is shared between shared libraries
> > and PIE executables, since PIE executable are fully linked. It also
> > means that we can simply work around it by emitting the linker symbols
> > as hidden.
> >
> 
> ... and the bad news is that, while emitting the symbols as hidden
> turns them from R_AARCH64_ABS64 into a R_AARCH64_RELATIVE relocations,
> it does not actually force the value to be emitted at build time.
> 
> So I am going to stick with the patch, but elaborate in a comment
> about why R_AARCH64_ABSnn are treated differently if nn equals the
> pointer size. (look at elfNN_aarch64_final_link_relocate() in binutils
> if you are keen to look at the code yourself)

Ok. Thanks for digging into that.

One thing though: I would prefer if we could still keep all the LE64
image header values together, to have them dealt with consistently.

Could we hide the ABS32 usage behind some macros to do so, e.g.

in image.h:

#define DEFINE_IMAGE_LE64(sym, data) 				\
	sym##_lo32 = DATA_LE32(data & 0xffffffff);		\
	sym##_hi32 = DATA_LE32(data >> 32);

#define HEAD_SYMBOLS						\
	DEFINE_IMAGE_LE64(_kernel_size_le, _end - _text);	\
	DEFINE_IMAGE_LE64(_kernel_offset_le, TEXT_OFFSET);	\
	DEFINE_IMAGE_LE64(_kernel_flags_le, __HEAD_FLAGS);

and in head.S:

#define IMAGE_LE64(sym)	.long sym##_lo32, sym##_hi32

	...
	IMAGE_LE64(_kernel_size_le)	// Image load offset from start	of RAM, little-endian
	IMAGE_LE64(_kernel_offset_le)	// Effective size of kernel image, little-endian
	IMAGE_LE64(_kernel_flags_le)	// Informative flags, little-endian
	...

Thanks,
Mark.

> 
> -- 
> Ard.
> 
> 
> 
> >
> >>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >>>> ---
> >>>>  arch/arm64/include/asm/assembler.h | 15 ++++++++
> >>>>  arch/arm64/kernel/head.S           | 17 +++++++--
> >>>>  arch/arm64/kernel/image.h          | 37 ++++++--------------
> >>>>  3 files changed, 40 insertions(+), 29 deletions(-)
> >>>>
> >>>> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> >>>> index d8bfcc1ce923..e211af783a3d 100644
> >>>> --- a/arch/arm64/include/asm/assembler.h
> >>>> +++ b/arch/arm64/include/asm/assembler.h
> >>>> @@ -222,4 +222,19 @@ lr       .req    x30             // link register
> >>>>       .size   __pi_##x, . - x;        \
> >>>>       ENDPROC(x)
> >>>>
> >>>> +     .macro  le16, val
> >>>> +     .byte   \val & 0xff
> >>>> +     .byte   (\val >> 8) & 0xff
> >>>> +     .endm
> >>>> +
> >>>> +     .macro  le32, val
> >>>> +     le16    \val
> >>>> +     le16    (\val >> 16)
> >>>> +     .endm
> >>>> +
> >>>> +     .macro  le64, val
> >>>> +     le32    \val
> >>>> +     le32    (\val >> 32)
> >>>> +     .endm
> >>>> +
> >>>>  #endif       /* __ASM_ASSEMBLER_H */
> >>>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> >>>> index 350515276541..211f75e673f4 100644
> >>>> --- a/arch/arm64/kernel/head.S
> >>>> +++ b/arch/arm64/kernel/head.S
> >>>> @@ -51,6 +51,17 @@
> >>>>  #define KERNEL_START _text
> >>>>  #define KERNEL_END   _end
> >>>>
> >>>> +#ifdef CONFIG_CPU_BIG_ENDIAN
> >>>> +#define __HEAD_FLAG_BE       1
> >>>> +#else
> >>>> +#define __HEAD_FLAG_BE       0
> >>>> +#endif
> >>>> +
> >>>> +#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
> >>>> +
> >>>> +#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
> >>>> +                      (__HEAD_FLAG_PAGE_SIZE << 1))
> >>>> +
> >>>>  /*
> >>>>   * Kernel startup entry point.
> >>>>   * ---------------------------
> >>>> @@ -83,9 +94,9 @@ efi_head:
> >>>>       b       stext                           // branch to kernel start, magic
> >>>>       .long   0                               // reserved
> >>>>  #endif
> >>>> -     .quad   _kernel_offset_le               // Image load offset from start of RAM, little-endian
> >>>> -     .quad   _kernel_size_le                 // Effective size of kernel image, little-endian
> >>>> -     .quad   _kernel_flags_le                // Informative flags, little-endian
> >>>> +     le64    TEXT_OFFSET                     // Image load offset from start of RAM, little-endian
> >>>> +     .long   _kernel_size_le, 0              // Effective size of kernel image, little-endian
> >>>> +     le64    __HEAD_FLAGS                    // Informative flags, little-endian
> >>>>       .quad   0                               // reserved
> >>>>       .quad   0                               // reserved
> >>>>       .quad   0                               // reserved
> >>>> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
> >>>> index bc2abb8b1599..bb6b0e69d0a4 100644
> >>>> --- a/arch/arm64/kernel/image.h
> >>>> +++ b/arch/arm64/kernel/image.h
> >>>> @@ -26,41 +26,26 @@
> >>>>   * There aren't any ELF relocations we can use to endian-swap values known only
> >>>>   * at link time (e.g. the subtraction of two symbol addresses), so we must get
> >>>>   * the linker to endian-swap certain values before emitting them.
> >>>> + * Note that this will not work for 64-bit values: these are resolved using
> >>>> + * R_AARCH64_ABS64 relocations, which are fixed up at runtime rather than at
> >>>> + * build time when building the PIE executable (for KASLR).
> >>>>   */
> >>>>  #ifdef CONFIG_CPU_BIG_ENDIAN
> >>>> -#define DATA_LE64(data)                                      \
> >>>> -     ((((data) & 0x00000000000000ff) << 56) |        \
> >>>> -      (((data) & 0x000000000000ff00) << 40) |        \
> >>>> -      (((data) & 0x0000000000ff0000) << 24) |        \
> >>>> -      (((data) & 0x00000000ff000000) << 8)  |        \
> >>>> -      (((data) & 0x000000ff00000000) >> 8)  |        \
> >>>> -      (((data) & 0x0000ff0000000000) >> 24) |        \
> >>>> -      (((data) & 0x00ff000000000000) >> 40) |        \
> >>>> -      (((data) & 0xff00000000000000) >> 56))
> >>>> +#define DATA_LE32(data)                              \
> >>>> +     ((((data) & 0x000000ff) << 24) |        \
> >>>> +      (((data) & 0x0000ff00) << 8)  |        \
> >>>> +      (((data) & 0x00ff0000) >> 8)  |        \
> >>>> +      (((data) & 0xff000000) >> 24))
> >>>>  #else
> >>>> -#define DATA_LE64(data) ((data) & 0xffffffffffffffff)
> >>>> +#define DATA_LE32(data) ((data) & 0xffffffff)
> >>>>  #endif
> >>>>
> >>>> -#ifdef CONFIG_CPU_BIG_ENDIAN
> >>>> -#define __HEAD_FLAG_BE       1
> >>>> -#else
> >>>> -#define __HEAD_FLAG_BE       0
> >>>> -#endif
> >>>> -
> >>>> -#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
> >>>> -
> >>>> -#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
> >>>> -                      (__HEAD_FLAG_PAGE_SIZE << 1))
> >>>> -
> >>>>  /*
> >>>>   * These will output as part of the Image header, which should be little-endian
> >>>> - * regardless of the endianness of the kernel. While constant values could be
> >>>> - * endian swapped in head.S, all are done here for consistency.
> >>>> + * regardless of the endianness of the kernel.
> >>>>   */
> >>>>  #define HEAD_SYMBOLS                                         \
> >>>> -     _kernel_size_le         = DATA_LE64(_end - _text);      \
> >>>> -     _kernel_offset_le       = DATA_LE64(TEXT_OFFSET);       \
> >>>> -     _kernel_flags_le        = DATA_LE64(__HEAD_FLAGS);
> >>>> +     _kernel_size_le         = DATA_LE32(_end - _text);
> >>>>
> >>>>  #ifdef CONFIG_EFI
> >>>>
> >>>> --
> >>>> 2.5.0
> >>>>
> 

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 11/21] arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
@ 2016-01-14 10:46             ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-14 10:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jan 14, 2016 at 10:05:42AM +0100, Ard Biesheuvel wrote:
> On 14 January 2016 at 09:51, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > On 13 January 2016 at 19:48, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> >> On 13 January 2016 at 19:12, Mark Rutland <mark.rutland@arm.com> wrote:
> >>> On Mon, Jan 11, 2016 at 02:19:04PM +0100, Ard Biesheuvel wrote:
> >>>> Unfortunately, the current way of using the linker to emit build time
> >>>> constants into the Image header will no longer work once we switch to
> >>>> the use of PIE executables. The reason is that such constants are emitted
> >>>> into the binary using R_AARCH64_ABS64 relocations, which we will resolve
> >>>> at runtime, not at build time, and the places targeted by those
> >>>> relocations will contain zeroes before that.
> >>>>
> >>>> So move back to assembly time constants or R_AARCH64_ABS32 relocations
> >>>> (which, interestingly enough, do get resolved at build time)
> >>>
> >>> To me it seems very odd that ABS64 and ABS32 are treated differently,
> >>> and it makes me somewhat uncomfortable becuase it feels like a bug.
> >>>
> >>> Do we know whether the inconsistency between ABS64 and ABS32 was
> >>> deliberate?
> >>>
> >>> I couldn't spot anything declaring a difference in the AArch64 ELF
> >>> spec, and I'm not sure where else to look.
> >>>
> >>
> >> My assumption is that PIE only defers resolving R_AARCH64_ABS64
> >> relocations since those are the only ones that can be used to refer to
> >> memory addresses
> >>
> >
> > OK, digging into the binutils source code, it turns out that indeed,
> > ABSnn relocations where nn equals the ELFnn memory size are treated
> > differently, but only if they have default visibility. This is simply
> > a result of the fact the code path is shared between shared libraries
> > and PIE executables, since PIE executable are fully linked. It also
> > means that we can simply work around it by emitting the linker symbols
> > as hidden.
> >
> 
> ... and the bad news is that, while emitting the symbols as hidden
> turns them from R_AARCH64_ABS64 into a R_AARCH64_RELATIVE relocations,
> it does not actually force the value to be emitted at build time.
> 
> So I am going to stick with the patch, but elaborate in a comment
> about why R_AARCH64_ABSnn are treated differently if nn equals the
> pointer size. (look at elfNN_aarch64_final_link_relocate() in binutils
> if you are keen to look at the code yourself)

Ok. Thanks for digging into that.

One thing though: I would prefer if we could still keep all the LE64
image header values together, to have them dealt with consistently.

Could we hide the ABS32 usage behind some macros to do so, e.g.

in image.h:

#define DEFINE_IMAGE_LE64(sym, data) 				\
	sym##_lo32 = DATA_LE32(data & 0xffffffff);		\
	sym##_hi32 = DATA_LE32(data >> 32);

#define HEAD_SYMBOLS						\
	DEFINE_IMAGE_LE64(_kernel_size_le, _end - _text);	\
	DEFINE_IMAGE_LE64(_kernel_offset_le, TEXT_OFFSET);	\
	DEFINE_IMAGE_LE64(_kernel_flags_le, __HEAD_FLAGS);

and in head.S:

#define IMAGE_LE64(sym)	.long sym##_lo32, sym##_hi32

	...
	IMAGE_LE64(_kernel_size_le)	// Image load offset from start	of RAM, little-endian
	IMAGE_LE64(_kernel_offset_le)	// Effective size of kernel image, little-endian
	IMAGE_LE64(_kernel_flags_le)	// Informative flags, little-endian
	...

Thanks,
Mark.

> 
> -- 
> Ard.
> 
> 
> 
> >
> >>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >>>> ---
> >>>>  arch/arm64/include/asm/assembler.h | 15 ++++++++
> >>>>  arch/arm64/kernel/head.S           | 17 +++++++--
> >>>>  arch/arm64/kernel/image.h          | 37 ++++++--------------
> >>>>  3 files changed, 40 insertions(+), 29 deletions(-)
> >>>>
> >>>> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> >>>> index d8bfcc1ce923..e211af783a3d 100644
> >>>> --- a/arch/arm64/include/asm/assembler.h
> >>>> +++ b/arch/arm64/include/asm/assembler.h
> >>>> @@ -222,4 +222,19 @@ lr       .req    x30             // link register
> >>>>       .size   __pi_##x, . - x;        \
> >>>>       ENDPROC(x)
> >>>>
> >>>> +     .macro  le16, val
> >>>> +     .byte   \val & 0xff
> >>>> +     .byte   (\val >> 8) & 0xff
> >>>> +     .endm
> >>>> +
> >>>> +     .macro  le32, val
> >>>> +     le16    \val
> >>>> +     le16    (\val >> 16)
> >>>> +     .endm
> >>>> +
> >>>> +     .macro  le64, val
> >>>> +     le32    \val
> >>>> +     le32    (\val >> 32)
> >>>> +     .endm
> >>>> +
> >>>>  #endif       /* __ASM_ASSEMBLER_H */
> >>>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> >>>> index 350515276541..211f75e673f4 100644
> >>>> --- a/arch/arm64/kernel/head.S
> >>>> +++ b/arch/arm64/kernel/head.S
> >>>> @@ -51,6 +51,17 @@
> >>>>  #define KERNEL_START _text
> >>>>  #define KERNEL_END   _end
> >>>>
> >>>> +#ifdef CONFIG_CPU_BIG_ENDIAN
> >>>> +#define __HEAD_FLAG_BE       1
> >>>> +#else
> >>>> +#define __HEAD_FLAG_BE       0
> >>>> +#endif
> >>>> +
> >>>> +#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
> >>>> +
> >>>> +#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
> >>>> +                      (__HEAD_FLAG_PAGE_SIZE << 1))
> >>>> +
> >>>>  /*
> >>>>   * Kernel startup entry point.
> >>>>   * ---------------------------
> >>>> @@ -83,9 +94,9 @@ efi_head:
> >>>>       b       stext                           // branch to kernel start, magic
> >>>>       .long   0                               // reserved
> >>>>  #endif
> >>>> -     .quad   _kernel_offset_le               // Image load offset from start of RAM, little-endian
> >>>> -     .quad   _kernel_size_le                 // Effective size of kernel image, little-endian
> >>>> -     .quad   _kernel_flags_le                // Informative flags, little-endian
> >>>> +     le64    TEXT_OFFSET                     // Image load offset from start of RAM, little-endian
> >>>> +     .long   _kernel_size_le, 0              // Effective size of kernel image, little-endian
> >>>> +     le64    __HEAD_FLAGS                    // Informative flags, little-endian
> >>>>       .quad   0                               // reserved
> >>>>       .quad   0                               // reserved
> >>>>       .quad   0                               // reserved
> >>>> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
> >>>> index bc2abb8b1599..bb6b0e69d0a4 100644
> >>>> --- a/arch/arm64/kernel/image.h
> >>>> +++ b/arch/arm64/kernel/image.h
> >>>> @@ -26,41 +26,26 @@
> >>>>   * There aren't any ELF relocations we can use to endian-swap values known only
> >>>>   * at link time (e.g. the subtraction of two symbol addresses), so we must get
> >>>>   * the linker to endian-swap certain values before emitting them.
> >>>> + * Note that this will not work for 64-bit values: these are resolved using
> >>>> + * R_AARCH64_ABS64 relocations, which are fixed up at runtime rather than at
> >>>> + * build time when building the PIE executable (for KASLR).
> >>>>   */
> >>>>  #ifdef CONFIG_CPU_BIG_ENDIAN
> >>>> -#define DATA_LE64(data)                                      \
> >>>> -     ((((data) & 0x00000000000000ff) << 56) |        \
> >>>> -      (((data) & 0x000000000000ff00) << 40) |        \
> >>>> -      (((data) & 0x0000000000ff0000) << 24) |        \
> >>>> -      (((data) & 0x00000000ff000000) << 8)  |        \
> >>>> -      (((data) & 0x000000ff00000000) >> 8)  |        \
> >>>> -      (((data) & 0x0000ff0000000000) >> 24) |        \
> >>>> -      (((data) & 0x00ff000000000000) >> 40) |        \
> >>>> -      (((data) & 0xff00000000000000) >> 56))
> >>>> +#define DATA_LE32(data)                              \
> >>>> +     ((((data) & 0x000000ff) << 24) |        \
> >>>> +      (((data) & 0x0000ff00) << 8)  |        \
> >>>> +      (((data) & 0x00ff0000) >> 8)  |        \
> >>>> +      (((data) & 0xff000000) >> 24))
> >>>>  #else
> >>>> -#define DATA_LE64(data) ((data) & 0xffffffffffffffff)
> >>>> +#define DATA_LE32(data) ((data) & 0xffffffff)
> >>>>  #endif
> >>>>
> >>>> -#ifdef CONFIG_CPU_BIG_ENDIAN
> >>>> -#define __HEAD_FLAG_BE       1
> >>>> -#else
> >>>> -#define __HEAD_FLAG_BE       0
> >>>> -#endif
> >>>> -
> >>>> -#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
> >>>> -
> >>>> -#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
> >>>> -                      (__HEAD_FLAG_PAGE_SIZE << 1))
> >>>> -
> >>>>  /*
> >>>>   * These will output as part of the Image header, which should be little-endian
> >>>> - * regardless of the endianness of the kernel. While constant values could be
> >>>> - * endian swapped in head.S, all are done here for consistency.
> >>>> + * regardless of the endianness of the kernel.
> >>>>   */
> >>>>  #define HEAD_SYMBOLS                                         \
> >>>> -     _kernel_size_le         = DATA_LE64(_end - _text);      \
> >>>> -     _kernel_offset_le       = DATA_LE64(TEXT_OFFSET);       \
> >>>> -     _kernel_flags_le        = DATA_LE64(__HEAD_FLAGS);
> >>>> +     _kernel_size_le         = DATA_LE32(_end - _text);
> >>>>
> >>>>  #ifdef CONFIG_EFI
> >>>>
> >>>> --
> >>>> 2.5.0
> >>>>
> 

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 11/21] arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
@ 2016-01-14 10:46             ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-14 10:46 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On Thu, Jan 14, 2016 at 10:05:42AM +0100, Ard Biesheuvel wrote:
> On 14 January 2016 at 09:51, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > On 13 January 2016 at 19:48, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> >> On 13 January 2016 at 19:12, Mark Rutland <mark.rutland@arm.com> wrote:
> >>> On Mon, Jan 11, 2016 at 02:19:04PM +0100, Ard Biesheuvel wrote:
> >>>> Unfortunately, the current way of using the linker to emit build time
> >>>> constants into the Image header will no longer work once we switch to
> >>>> the use of PIE executables. The reason is that such constants are emitted
> >>>> into the binary using R_AARCH64_ABS64 relocations, which we will resolve
> >>>> at runtime, not at build time, and the places targeted by those
> >>>> relocations will contain zeroes before that.
> >>>>
> >>>> So move back to assembly time constants or R_AARCH64_ABS32 relocations
> >>>> (which, interestingly enough, do get resolved at build time)
> >>>
> >>> To me it seems very odd that ABS64 and ABS32 are treated differently,
> >>> and it makes me somewhat uncomfortable becuase it feels like a bug.
> >>>
> >>> Do we know whether the inconsistency between ABS64 and ABS32 was
> >>> deliberate?
> >>>
> >>> I couldn't spot anything declaring a difference in the AArch64 ELF
> >>> spec, and I'm not sure where else to look.
> >>>
> >>
> >> My assumption is that PIE only defers resolving R_AARCH64_ABS64
> >> relocations since those are the only ones that can be used to refer to
> >> memory addresses
> >>
> >
> > OK, digging into the binutils source code, it turns out that indeed,
> > ABSnn relocations where nn equals the ELFnn memory size are treated
> > differently, but only if they have default visibility. This is simply
> > a result of the fact the code path is shared between shared libraries
> > and PIE executables, since PIE executable are fully linked. It also
> > means that we can simply work around it by emitting the linker symbols
> > as hidden.
> >
> 
> ... and the bad news is that, while emitting the symbols as hidden
> turns them from R_AARCH64_ABS64 into a R_AARCH64_RELATIVE relocations,
> it does not actually force the value to be emitted at build time.
> 
> So I am going to stick with the patch, but elaborate in a comment
> about why R_AARCH64_ABSnn are treated differently if nn equals the
> pointer size. (look at elfNN_aarch64_final_link_relocate() in binutils
> if you are keen to look at the code yourself)

Ok. Thanks for digging into that.

One thing though: I would prefer if we could still keep all the LE64
image header values together, to have them dealt with consistently.

Could we hide the ABS32 usage behind some macros to do so, e.g.

in image.h:

#define DEFINE_IMAGE_LE64(sym, data) 				\
	sym##_lo32 = DATA_LE32(data & 0xffffffff);		\
	sym##_hi32 = DATA_LE32(data >> 32);

#define HEAD_SYMBOLS						\
	DEFINE_IMAGE_LE64(_kernel_size_le, _end - _text);	\
	DEFINE_IMAGE_LE64(_kernel_offset_le, TEXT_OFFSET);	\
	DEFINE_IMAGE_LE64(_kernel_flags_le, __HEAD_FLAGS);

and in head.S:

#define IMAGE_LE64(sym)	.long sym##_lo32, sym##_hi32

	...
	IMAGE_LE64(_kernel_size_le)	// Image load offset from start	of RAM, little-endian
	IMAGE_LE64(_kernel_offset_le)	// Effective size of kernel image, little-endian
	IMAGE_LE64(_kernel_flags_le)	// Informative flags, little-endian
	...

Thanks,
Mark.

> 
> -- 
> Ard.
> 
> 
> 
> >
> >>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >>>> ---
> >>>>  arch/arm64/include/asm/assembler.h | 15 ++++++++
> >>>>  arch/arm64/kernel/head.S           | 17 +++++++--
> >>>>  arch/arm64/kernel/image.h          | 37 ++++++--------------
> >>>>  3 files changed, 40 insertions(+), 29 deletions(-)
> >>>>
> >>>> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> >>>> index d8bfcc1ce923..e211af783a3d 100644
> >>>> --- a/arch/arm64/include/asm/assembler.h
> >>>> +++ b/arch/arm64/include/asm/assembler.h
> >>>> @@ -222,4 +222,19 @@ lr       .req    x30             // link register
> >>>>       .size   __pi_##x, . - x;        \
> >>>>       ENDPROC(x)
> >>>>
> >>>> +     .macro  le16, val
> >>>> +     .byte   \val & 0xff
> >>>> +     .byte   (\val >> 8) & 0xff
> >>>> +     .endm
> >>>> +
> >>>> +     .macro  le32, val
> >>>> +     le16    \val
> >>>> +     le16    (\val >> 16)
> >>>> +     .endm
> >>>> +
> >>>> +     .macro  le64, val
> >>>> +     le32    \val
> >>>> +     le32    (\val >> 32)
> >>>> +     .endm
> >>>> +
> >>>>  #endif       /* __ASM_ASSEMBLER_H */
> >>>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> >>>> index 350515276541..211f75e673f4 100644
> >>>> --- a/arch/arm64/kernel/head.S
> >>>> +++ b/arch/arm64/kernel/head.S
> >>>> @@ -51,6 +51,17 @@
> >>>>  #define KERNEL_START _text
> >>>>  #define KERNEL_END   _end
> >>>>
> >>>> +#ifdef CONFIG_CPU_BIG_ENDIAN
> >>>> +#define __HEAD_FLAG_BE       1
> >>>> +#else
> >>>> +#define __HEAD_FLAG_BE       0
> >>>> +#endif
> >>>> +
> >>>> +#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
> >>>> +
> >>>> +#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
> >>>> +                      (__HEAD_FLAG_PAGE_SIZE << 1))
> >>>> +
> >>>>  /*
> >>>>   * Kernel startup entry point.
> >>>>   * ---------------------------
> >>>> @@ -83,9 +94,9 @@ efi_head:
> >>>>       b       stext                           // branch to kernel start, magic
> >>>>       .long   0                               // reserved
> >>>>  #endif
> >>>> -     .quad   _kernel_offset_le               // Image load offset from start of RAM, little-endian
> >>>> -     .quad   _kernel_size_le                 // Effective size of kernel image, little-endian
> >>>> -     .quad   _kernel_flags_le                // Informative flags, little-endian
> >>>> +     le64    TEXT_OFFSET                     // Image load offset from start of RAM, little-endian
> >>>> +     .long   _kernel_size_le, 0              // Effective size of kernel image, little-endian
> >>>> +     le64    __HEAD_FLAGS                    // Informative flags, little-endian
> >>>>       .quad   0                               // reserved
> >>>>       .quad   0                               // reserved
> >>>>       .quad   0                               // reserved
> >>>> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
> >>>> index bc2abb8b1599..bb6b0e69d0a4 100644
> >>>> --- a/arch/arm64/kernel/image.h
> >>>> +++ b/arch/arm64/kernel/image.h
> >>>> @@ -26,41 +26,26 @@
> >>>>   * There aren't any ELF relocations we can use to endian-swap values known only
> >>>>   * at link time (e.g. the subtraction of two symbol addresses), so we must get
> >>>>   * the linker to endian-swap certain values before emitting them.
> >>>> + * Note that this will not work for 64-bit values: these are resolved using
> >>>> + * R_AARCH64_ABS64 relocations, which are fixed up at runtime rather than at
> >>>> + * build time when building the PIE executable (for KASLR).
> >>>>   */
> >>>>  #ifdef CONFIG_CPU_BIG_ENDIAN
> >>>> -#define DATA_LE64(data)                                      \
> >>>> -     ((((data) & 0x00000000000000ff) << 56) |        \
> >>>> -      (((data) & 0x000000000000ff00) << 40) |        \
> >>>> -      (((data) & 0x0000000000ff0000) << 24) |        \
> >>>> -      (((data) & 0x00000000ff000000) << 8)  |        \
> >>>> -      (((data) & 0x000000ff00000000) >> 8)  |        \
> >>>> -      (((data) & 0x0000ff0000000000) >> 24) |        \
> >>>> -      (((data) & 0x00ff000000000000) >> 40) |        \
> >>>> -      (((data) & 0xff00000000000000) >> 56))
> >>>> +#define DATA_LE32(data)                              \
> >>>> +     ((((data) & 0x000000ff) << 24) |        \
> >>>> +      (((data) & 0x0000ff00) << 8)  |        \
> >>>> +      (((data) & 0x00ff0000) >> 8)  |        \
> >>>> +      (((data) & 0xff000000) >> 24))
> >>>>  #else
> >>>> -#define DATA_LE64(data) ((data) & 0xffffffffffffffff)
> >>>> +#define DATA_LE32(data) ((data) & 0xffffffff)
> >>>>  #endif
> >>>>
> >>>> -#ifdef CONFIG_CPU_BIG_ENDIAN
> >>>> -#define __HEAD_FLAG_BE       1
> >>>> -#else
> >>>> -#define __HEAD_FLAG_BE       0
> >>>> -#endif
> >>>> -
> >>>> -#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
> >>>> -
> >>>> -#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
> >>>> -                      (__HEAD_FLAG_PAGE_SIZE << 1))
> >>>> -
> >>>>  /*
> >>>>   * These will output as part of the Image header, which should be little-endian
> >>>> - * regardless of the endianness of the kernel. While constant values could be
> >>>> - * endian swapped in head.S, all are done here for consistency.
> >>>> + * regardless of the endianness of the kernel.
> >>>>   */
> >>>>  #define HEAD_SYMBOLS                                         \
> >>>> -     _kernel_size_le         = DATA_LE64(_end - _text);      \
> >>>> -     _kernel_offset_le       = DATA_LE64(TEXT_OFFSET);       \
> >>>> -     _kernel_flags_le        = DATA_LE64(__HEAD_FLAGS);
> >>>> +     _kernel_size_le         = DATA_LE32(_end - _text);
> >>>>
> >>>>  #ifdef CONFIG_EFI
> >>>>
> >>>> --
> >>>> 2.5.0
> >>>>
> 

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 11/21] arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
  2016-01-14 10:46             ` Mark Rutland
  (?)
@ 2016-01-14 11:22               ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-14 11:22 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 14 January 2016 at 11:46, Mark Rutland <mark.rutland@arm.com> wrote:
> On Thu, Jan 14, 2016 at 10:05:42AM +0100, Ard Biesheuvel wrote:
>> On 14 January 2016 at 09:51, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> > On 13 January 2016 at 19:48, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> >> On 13 January 2016 at 19:12, Mark Rutland <mark.rutland@arm.com> wrote:
>> >>> On Mon, Jan 11, 2016 at 02:19:04PM +0100, Ard Biesheuvel wrote:
>> >>>> Unfortunately, the current way of using the linker to emit build time
>> >>>> constants into the Image header will no longer work once we switch to
>> >>>> the use of PIE executables. The reason is that such constants are emitted
>> >>>> into the binary using R_AARCH64_ABS64 relocations, which we will resolve
>> >>>> at runtime, not at build time, and the places targeted by those
>> >>>> relocations will contain zeroes before that.
>> >>>>
>> >>>> So move back to assembly time constants or R_AARCH64_ABS32 relocations
>> >>>> (which, interestingly enough, do get resolved at build time)
>> >>>
>> >>> To me it seems very odd that ABS64 and ABS32 are treated differently,
>> >>> and it makes me somewhat uncomfortable becuase it feels like a bug.
>> >>>
>> >>> Do we know whether the inconsistency between ABS64 and ABS32 was
>> >>> deliberate?
>> >>>
>> >>> I couldn't spot anything declaring a difference in the AArch64 ELF
>> >>> spec, and I'm not sure where else to look.
>> >>>
>> >>
>> >> My assumption is that PIE only defers resolving R_AARCH64_ABS64
>> >> relocations since those are the only ones that can be used to refer to
>> >> memory addresses
>> >>
>> >
>> > OK, digging into the binutils source code, it turns out that indeed,
>> > ABSnn relocations where nn equals the ELFnn memory size are treated
>> > differently, but only if they have default visibility. This is simply
>> > a result of the fact the code path is shared between shared libraries
>> > and PIE executables, since PIE executable are fully linked. It also
>> > means that we can simply work around it by emitting the linker symbols
>> > as hidden.
>> >
>>
>> ... and the bad news is that, while emitting the symbols as hidden
>> turns them from R_AARCH64_ABS64 into a R_AARCH64_RELATIVE relocations,
>> it does not actually force the value to be emitted at build time.
>>
>> So I am going to stick with the patch, but elaborate in a comment
>> about why R_AARCH64_ABSnn are treated differently if nn equals the
>> pointer size. (look at elfNN_aarch64_final_link_relocate() in binutils
>> if you are keen to look at the code yourself)
>
> Ok. Thanks for digging into that.
>
> One thing though: I would prefer if we could still keep all the LE64
> image header values together, to have them dealt with consistently.
>
> Could we hide the ABS32 usage behind some macros to do so, e.g.
>
> in image.h:
>
> #define DEFINE_IMAGE_LE64(sym, data)                            \
>         sym##_lo32 = DATA_LE32(data & 0xffffffff);              \
>         sym##_hi32 = DATA_LE32(data >> 32);
>
> #define HEAD_SYMBOLS                                            \
>         DEFINE_IMAGE_LE64(_kernel_size_le, _end - _text);       \
>         DEFINE_IMAGE_LE64(_kernel_offset_le, TEXT_OFFSET);      \
>         DEFINE_IMAGE_LE64(_kernel_flags_le, __HEAD_FLAGS);
>

I will steal this

> and in head.S:
>
> #define IMAGE_LE64(sym) .long sym##_lo32, sym##_hi32
>
>         ...
>         IMAGE_LE64(_kernel_size_le)     // Image load offset from start of RAM, little-endian
>         IMAGE_LE64(_kernel_offset_le)   // Effective size of kernel image, little-endian
>         IMAGE_LE64(_kernel_flags_le)    // Informative flags, little-endian
>         ...

... and implement this with an asm macro.

Thanks,
Ard.



>>
>> --
>> Ard.
>>
>>
>>
>> >
>> >>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> >>>> ---
>> >>>>  arch/arm64/include/asm/assembler.h | 15 ++++++++
>> >>>>  arch/arm64/kernel/head.S           | 17 +++++++--
>> >>>>  arch/arm64/kernel/image.h          | 37 ++++++--------------
>> >>>>  3 files changed, 40 insertions(+), 29 deletions(-)
>> >>>>
>> >>>> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
>> >>>> index d8bfcc1ce923..e211af783a3d 100644
>> >>>> --- a/arch/arm64/include/asm/assembler.h
>> >>>> +++ b/arch/arm64/include/asm/assembler.h
>> >>>> @@ -222,4 +222,19 @@ lr       .req    x30             // link register
>> >>>>       .size   __pi_##x, . - x;        \
>> >>>>       ENDPROC(x)
>> >>>>
>> >>>> +     .macro  le16, val
>> >>>> +     .byte   \val & 0xff
>> >>>> +     .byte   (\val >> 8) & 0xff
>> >>>> +     .endm
>> >>>> +
>> >>>> +     .macro  le32, val
>> >>>> +     le16    \val
>> >>>> +     le16    (\val >> 16)
>> >>>> +     .endm
>> >>>> +
>> >>>> +     .macro  le64, val
>> >>>> +     le32    \val
>> >>>> +     le32    (\val >> 32)
>> >>>> +     .endm
>> >>>> +
>> >>>>  #endif       /* __ASM_ASSEMBLER_H */
>> >>>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>> >>>> index 350515276541..211f75e673f4 100644
>> >>>> --- a/arch/arm64/kernel/head.S
>> >>>> +++ b/arch/arm64/kernel/head.S
>> >>>> @@ -51,6 +51,17 @@
>> >>>>  #define KERNEL_START _text
>> >>>>  #define KERNEL_END   _end
>> >>>>
>> >>>> +#ifdef CONFIG_CPU_BIG_ENDIAN
>> >>>> +#define __HEAD_FLAG_BE       1
>> >>>> +#else
>> >>>> +#define __HEAD_FLAG_BE       0
>> >>>> +#endif
>> >>>> +
>> >>>> +#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>> >>>> +
>> >>>> +#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
>> >>>> +                      (__HEAD_FLAG_PAGE_SIZE << 1))
>> >>>> +
>> >>>>  /*
>> >>>>   * Kernel startup entry point.
>> >>>>   * ---------------------------
>> >>>> @@ -83,9 +94,9 @@ efi_head:
>> >>>>       b       stext                           // branch to kernel start, magic
>> >>>>       .long   0                               // reserved
>> >>>>  #endif
>> >>>> -     .quad   _kernel_offset_le               // Image load offset from start of RAM, little-endian
>> >>>> -     .quad   _kernel_size_le                 // Effective size of kernel image, little-endian
>> >>>> -     .quad   _kernel_flags_le                // Informative flags, little-endian
>> >>>> +     le64    TEXT_OFFSET                     // Image load offset from start of RAM, little-endian
>> >>>> +     .long   _kernel_size_le, 0              // Effective size of kernel image, little-endian
>> >>>> +     le64    __HEAD_FLAGS                    // Informative flags, little-endian
>> >>>>       .quad   0                               // reserved
>> >>>>       .quad   0                               // reserved
>> >>>>       .quad   0                               // reserved
>> >>>> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
>> >>>> index bc2abb8b1599..bb6b0e69d0a4 100644
>> >>>> --- a/arch/arm64/kernel/image.h
>> >>>> +++ b/arch/arm64/kernel/image.h
>> >>>> @@ -26,41 +26,26 @@
>> >>>>   * There aren't any ELF relocations we can use to endian-swap values known only
>> >>>>   * at link time (e.g. the subtraction of two symbol addresses), so we must get
>> >>>>   * the linker to endian-swap certain values before emitting them.
>> >>>> + * Note that this will not work for 64-bit values: these are resolved using
>> >>>> + * R_AARCH64_ABS64 relocations, which are fixed up at runtime rather than at
>> >>>> + * build time when building the PIE executable (for KASLR).
>> >>>>   */
>> >>>>  #ifdef CONFIG_CPU_BIG_ENDIAN
>> >>>> -#define DATA_LE64(data)                                      \
>> >>>> -     ((((data) & 0x00000000000000ff) << 56) |        \
>> >>>> -      (((data) & 0x000000000000ff00) << 40) |        \
>> >>>> -      (((data) & 0x0000000000ff0000) << 24) |        \
>> >>>> -      (((data) & 0x00000000ff000000) << 8)  |        \
>> >>>> -      (((data) & 0x000000ff00000000) >> 8)  |        \
>> >>>> -      (((data) & 0x0000ff0000000000) >> 24) |        \
>> >>>> -      (((data) & 0x00ff000000000000) >> 40) |        \
>> >>>> -      (((data) & 0xff00000000000000) >> 56))
>> >>>> +#define DATA_LE32(data)                              \
>> >>>> +     ((((data) & 0x000000ff) << 24) |        \
>> >>>> +      (((data) & 0x0000ff00) << 8)  |        \
>> >>>> +      (((data) & 0x00ff0000) >> 8)  |        \
>> >>>> +      (((data) & 0xff000000) >> 24))
>> >>>>  #else
>> >>>> -#define DATA_LE64(data) ((data) & 0xffffffffffffffff)
>> >>>> +#define DATA_LE32(data) ((data) & 0xffffffff)
>> >>>>  #endif
>> >>>>
>> >>>> -#ifdef CONFIG_CPU_BIG_ENDIAN
>> >>>> -#define __HEAD_FLAG_BE       1
>> >>>> -#else
>> >>>> -#define __HEAD_FLAG_BE       0
>> >>>> -#endif
>> >>>> -
>> >>>> -#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>> >>>> -
>> >>>> -#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
>> >>>> -                      (__HEAD_FLAG_PAGE_SIZE << 1))
>> >>>> -
>> >>>>  /*
>> >>>>   * These will output as part of the Image header, which should be little-endian
>> >>>> - * regardless of the endianness of the kernel. While constant values could be
>> >>>> - * endian swapped in head.S, all are done here for consistency.
>> >>>> + * regardless of the endianness of the kernel.
>> >>>>   */
>> >>>>  #define HEAD_SYMBOLS                                         \
>> >>>> -     _kernel_size_le         = DATA_LE64(_end - _text);      \
>> >>>> -     _kernel_offset_le       = DATA_LE64(TEXT_OFFSET);       \
>> >>>> -     _kernel_flags_le        = DATA_LE64(__HEAD_FLAGS);
>> >>>> +     _kernel_size_le         = DATA_LE32(_end - _text);
>> >>>>
>> >>>>  #ifdef CONFIG_EFI
>> >>>>
>> >>>> --
>> >>>> 2.5.0
>> >>>>
>>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 11/21] arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
@ 2016-01-14 11:22               ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-14 11:22 UTC (permalink / raw)
  To: linux-arm-kernel

On 14 January 2016 at 11:46, Mark Rutland <mark.rutland@arm.com> wrote:
> On Thu, Jan 14, 2016 at 10:05:42AM +0100, Ard Biesheuvel wrote:
>> On 14 January 2016 at 09:51, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> > On 13 January 2016 at 19:48, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> >> On 13 January 2016 at 19:12, Mark Rutland <mark.rutland@arm.com> wrote:
>> >>> On Mon, Jan 11, 2016 at 02:19:04PM +0100, Ard Biesheuvel wrote:
>> >>>> Unfortunately, the current way of using the linker to emit build time
>> >>>> constants into the Image header will no longer work once we switch to
>> >>>> the use of PIE executables. The reason is that such constants are emitted
>> >>>> into the binary using R_AARCH64_ABS64 relocations, which we will resolve
>> >>>> at runtime, not at build time, and the places targeted by those
>> >>>> relocations will contain zeroes before that.
>> >>>>
>> >>>> So move back to assembly time constants or R_AARCH64_ABS32 relocations
>> >>>> (which, interestingly enough, do get resolved at build time)
>> >>>
>> >>> To me it seems very odd that ABS64 and ABS32 are treated differently,
>> >>> and it makes me somewhat uncomfortable becuase it feels like a bug.
>> >>>
>> >>> Do we know whether the inconsistency between ABS64 and ABS32 was
>> >>> deliberate?
>> >>>
>> >>> I couldn't spot anything declaring a difference in the AArch64 ELF
>> >>> spec, and I'm not sure where else to look.
>> >>>
>> >>
>> >> My assumption is that PIE only defers resolving R_AARCH64_ABS64
>> >> relocations since those are the only ones that can be used to refer to
>> >> memory addresses
>> >>
>> >
>> > OK, digging into the binutils source code, it turns out that indeed,
>> > ABSnn relocations where nn equals the ELFnn memory size are treated
>> > differently, but only if they have default visibility. This is simply
>> > a result of the fact the code path is shared between shared libraries
>> > and PIE executables, since PIE executable are fully linked. It also
>> > means that we can simply work around it by emitting the linker symbols
>> > as hidden.
>> >
>>
>> ... and the bad news is that, while emitting the symbols as hidden
>> turns them from R_AARCH64_ABS64 into a R_AARCH64_RELATIVE relocations,
>> it does not actually force the value to be emitted at build time.
>>
>> So I am going to stick with the patch, but elaborate in a comment
>> about why R_AARCH64_ABSnn are treated differently if nn equals the
>> pointer size. (look at elfNN_aarch64_final_link_relocate() in binutils
>> if you are keen to look at the code yourself)
>
> Ok. Thanks for digging into that.
>
> One thing though: I would prefer if we could still keep all the LE64
> image header values together, to have them dealt with consistently.
>
> Could we hide the ABS32 usage behind some macros to do so, e.g.
>
> in image.h:
>
> #define DEFINE_IMAGE_LE64(sym, data)                            \
>         sym##_lo32 = DATA_LE32(data & 0xffffffff);              \
>         sym##_hi32 = DATA_LE32(data >> 32);
>
> #define HEAD_SYMBOLS                                            \
>         DEFINE_IMAGE_LE64(_kernel_size_le, _end - _text);       \
>         DEFINE_IMAGE_LE64(_kernel_offset_le, TEXT_OFFSET);      \
>         DEFINE_IMAGE_LE64(_kernel_flags_le, __HEAD_FLAGS);
>

I will steal this

> and in head.S:
>
> #define IMAGE_LE64(sym) .long sym##_lo32, sym##_hi32
>
>         ...
>         IMAGE_LE64(_kernel_size_le)     // Image load offset from start of RAM, little-endian
>         IMAGE_LE64(_kernel_offset_le)   // Effective size of kernel image, little-endian
>         IMAGE_LE64(_kernel_flags_le)    // Informative flags, little-endian
>         ...

... and implement this with an asm macro.

Thanks,
Ard.



>>
>> --
>> Ard.
>>
>>
>>
>> >
>> >>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> >>>> ---
>> >>>>  arch/arm64/include/asm/assembler.h | 15 ++++++++
>> >>>>  arch/arm64/kernel/head.S           | 17 +++++++--
>> >>>>  arch/arm64/kernel/image.h          | 37 ++++++--------------
>> >>>>  3 files changed, 40 insertions(+), 29 deletions(-)
>> >>>>
>> >>>> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
>> >>>> index d8bfcc1ce923..e211af783a3d 100644
>> >>>> --- a/arch/arm64/include/asm/assembler.h
>> >>>> +++ b/arch/arm64/include/asm/assembler.h
>> >>>> @@ -222,4 +222,19 @@ lr       .req    x30             // link register
>> >>>>       .size   __pi_##x, . - x;        \
>> >>>>       ENDPROC(x)
>> >>>>
>> >>>> +     .macro  le16, val
>> >>>> +     .byte   \val & 0xff
>> >>>> +     .byte   (\val >> 8) & 0xff
>> >>>> +     .endm
>> >>>> +
>> >>>> +     .macro  le32, val
>> >>>> +     le16    \val
>> >>>> +     le16    (\val >> 16)
>> >>>> +     .endm
>> >>>> +
>> >>>> +     .macro  le64, val
>> >>>> +     le32    \val
>> >>>> +     le32    (\val >> 32)
>> >>>> +     .endm
>> >>>> +
>> >>>>  #endif       /* __ASM_ASSEMBLER_H */
>> >>>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>> >>>> index 350515276541..211f75e673f4 100644
>> >>>> --- a/arch/arm64/kernel/head.S
>> >>>> +++ b/arch/arm64/kernel/head.S
>> >>>> @@ -51,6 +51,17 @@
>> >>>>  #define KERNEL_START _text
>> >>>>  #define KERNEL_END   _end
>> >>>>
>> >>>> +#ifdef CONFIG_CPU_BIG_ENDIAN
>> >>>> +#define __HEAD_FLAG_BE       1
>> >>>> +#else
>> >>>> +#define __HEAD_FLAG_BE       0
>> >>>> +#endif
>> >>>> +
>> >>>> +#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>> >>>> +
>> >>>> +#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
>> >>>> +                      (__HEAD_FLAG_PAGE_SIZE << 1))
>> >>>> +
>> >>>>  /*
>> >>>>   * Kernel startup entry point.
>> >>>>   * ---------------------------
>> >>>> @@ -83,9 +94,9 @@ efi_head:
>> >>>>       b       stext                           // branch to kernel start, magic
>> >>>>       .long   0                               // reserved
>> >>>>  #endif
>> >>>> -     .quad   _kernel_offset_le               // Image load offset from start of RAM, little-endian
>> >>>> -     .quad   _kernel_size_le                 // Effective size of kernel image, little-endian
>> >>>> -     .quad   _kernel_flags_le                // Informative flags, little-endian
>> >>>> +     le64    TEXT_OFFSET                     // Image load offset from start of RAM, little-endian
>> >>>> +     .long   _kernel_size_le, 0              // Effective size of kernel image, little-endian
>> >>>> +     le64    __HEAD_FLAGS                    // Informative flags, little-endian
>> >>>>       .quad   0                               // reserved
>> >>>>       .quad   0                               // reserved
>> >>>>       .quad   0                               // reserved
>> >>>> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
>> >>>> index bc2abb8b1599..bb6b0e69d0a4 100644
>> >>>> --- a/arch/arm64/kernel/image.h
>> >>>> +++ b/arch/arm64/kernel/image.h
>> >>>> @@ -26,41 +26,26 @@
>> >>>>   * There aren't any ELF relocations we can use to endian-swap values known only
>> >>>>   * at link time (e.g. the subtraction of two symbol addresses), so we must get
>> >>>>   * the linker to endian-swap certain values before emitting them.
>> >>>> + * Note that this will not work for 64-bit values: these are resolved using
>> >>>> + * R_AARCH64_ABS64 relocations, which are fixed up at runtime rather than at
>> >>>> + * build time when building the PIE executable (for KASLR).
>> >>>>   */
>> >>>>  #ifdef CONFIG_CPU_BIG_ENDIAN
>> >>>> -#define DATA_LE64(data)                                      \
>> >>>> -     ((((data) & 0x00000000000000ff) << 56) |        \
>> >>>> -      (((data) & 0x000000000000ff00) << 40) |        \
>> >>>> -      (((data) & 0x0000000000ff0000) << 24) |        \
>> >>>> -      (((data) & 0x00000000ff000000) << 8)  |        \
>> >>>> -      (((data) & 0x000000ff00000000) >> 8)  |        \
>> >>>> -      (((data) & 0x0000ff0000000000) >> 24) |        \
>> >>>> -      (((data) & 0x00ff000000000000) >> 40) |        \
>> >>>> -      (((data) & 0xff00000000000000) >> 56))
>> >>>> +#define DATA_LE32(data)                              \
>> >>>> +     ((((data) & 0x000000ff) << 24) |        \
>> >>>> +      (((data) & 0x0000ff00) << 8)  |        \
>> >>>> +      (((data) & 0x00ff0000) >> 8)  |        \
>> >>>> +      (((data) & 0xff000000) >> 24))
>> >>>>  #else
>> >>>> -#define DATA_LE64(data) ((data) & 0xffffffffffffffff)
>> >>>> +#define DATA_LE32(data) ((data) & 0xffffffff)
>> >>>>  #endif
>> >>>>
>> >>>> -#ifdef CONFIG_CPU_BIG_ENDIAN
>> >>>> -#define __HEAD_FLAG_BE       1
>> >>>> -#else
>> >>>> -#define __HEAD_FLAG_BE       0
>> >>>> -#endif
>> >>>> -
>> >>>> -#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>> >>>> -
>> >>>> -#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
>> >>>> -                      (__HEAD_FLAG_PAGE_SIZE << 1))
>> >>>> -
>> >>>>  /*
>> >>>>   * These will output as part of the Image header, which should be little-endian
>> >>>> - * regardless of the endianness of the kernel. While constant values could be
>> >>>> - * endian swapped in head.S, all are done here for consistency.
>> >>>> + * regardless of the endianness of the kernel.
>> >>>>   */
>> >>>>  #define HEAD_SYMBOLS                                         \
>> >>>> -     _kernel_size_le         = DATA_LE64(_end - _text);      \
>> >>>> -     _kernel_offset_le       = DATA_LE64(TEXT_OFFSET);       \
>> >>>> -     _kernel_flags_le        = DATA_LE64(__HEAD_FLAGS);
>> >>>> +     _kernel_size_le         = DATA_LE32(_end - _text);
>> >>>>
>> >>>>  #ifdef CONFIG_EFI
>> >>>>
>> >>>> --
>> >>>> 2.5.0
>> >>>>
>>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 11/21] arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
@ 2016-01-14 11:22               ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-14 11:22 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 14 January 2016 at 11:46, Mark Rutland <mark.rutland@arm.com> wrote:
> On Thu, Jan 14, 2016 at 10:05:42AM +0100, Ard Biesheuvel wrote:
>> On 14 January 2016 at 09:51, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> > On 13 January 2016 at 19:48, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> >> On 13 January 2016 at 19:12, Mark Rutland <mark.rutland@arm.com> wrote:
>> >>> On Mon, Jan 11, 2016 at 02:19:04PM +0100, Ard Biesheuvel wrote:
>> >>>> Unfortunately, the current way of using the linker to emit build time
>> >>>> constants into the Image header will no longer work once we switch to
>> >>>> the use of PIE executables. The reason is that such constants are emitted
>> >>>> into the binary using R_AARCH64_ABS64 relocations, which we will resolve
>> >>>> at runtime, not at build time, and the places targeted by those
>> >>>> relocations will contain zeroes before that.
>> >>>>
>> >>>> So move back to assembly time constants or R_AARCH64_ABS32 relocations
>> >>>> (which, interestingly enough, do get resolved at build time)
>> >>>
>> >>> To me it seems very odd that ABS64 and ABS32 are treated differently,
>> >>> and it makes me somewhat uncomfortable becuase it feels like a bug.
>> >>>
>> >>> Do we know whether the inconsistency between ABS64 and ABS32 was
>> >>> deliberate?
>> >>>
>> >>> I couldn't spot anything declaring a difference in the AArch64 ELF
>> >>> spec, and I'm not sure where else to look.
>> >>>
>> >>
>> >> My assumption is that PIE only defers resolving R_AARCH64_ABS64
>> >> relocations since those are the only ones that can be used to refer to
>> >> memory addresses
>> >>
>> >
>> > OK, digging into the binutils source code, it turns out that indeed,
>> > ABSnn relocations where nn equals the ELFnn memory size are treated
>> > differently, but only if they have default visibility. This is simply
>> > a result of the fact the code path is shared between shared libraries
>> > and PIE executables, since PIE executable are fully linked. It also
>> > means that we can simply work around it by emitting the linker symbols
>> > as hidden.
>> >
>>
>> ... and the bad news is that, while emitting the symbols as hidden
>> turns them from R_AARCH64_ABS64 into a R_AARCH64_RELATIVE relocations,
>> it does not actually force the value to be emitted at build time.
>>
>> So I am going to stick with the patch, but elaborate in a comment
>> about why R_AARCH64_ABSnn are treated differently if nn equals the
>> pointer size. (look at elfNN_aarch64_final_link_relocate() in binutils
>> if you are keen to look at the code yourself)
>
> Ok. Thanks for digging into that.
>
> One thing though: I would prefer if we could still keep all the LE64
> image header values together, to have them dealt with consistently.
>
> Could we hide the ABS32 usage behind some macros to do so, e.g.
>
> in image.h:
>
> #define DEFINE_IMAGE_LE64(sym, data)                            \
>         sym##_lo32 = DATA_LE32(data & 0xffffffff);              \
>         sym##_hi32 = DATA_LE32(data >> 32);
>
> #define HEAD_SYMBOLS                                            \
>         DEFINE_IMAGE_LE64(_kernel_size_le, _end - _text);       \
>         DEFINE_IMAGE_LE64(_kernel_offset_le, TEXT_OFFSET);      \
>         DEFINE_IMAGE_LE64(_kernel_flags_le, __HEAD_FLAGS);
>

I will steal this

> and in head.S:
>
> #define IMAGE_LE64(sym) .long sym##_lo32, sym##_hi32
>
>         ...
>         IMAGE_LE64(_kernel_size_le)     // Image load offset from start of RAM, little-endian
>         IMAGE_LE64(_kernel_offset_le)   // Effective size of kernel image, little-endian
>         IMAGE_LE64(_kernel_flags_le)    // Informative flags, little-endian
>         ...

... and implement this with an asm macro.

Thanks,
Ard.



>>
>> --
>> Ard.
>>
>>
>>
>> >
>> >>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> >>>> ---
>> >>>>  arch/arm64/include/asm/assembler.h | 15 ++++++++
>> >>>>  arch/arm64/kernel/head.S           | 17 +++++++--
>> >>>>  arch/arm64/kernel/image.h          | 37 ++++++--------------
>> >>>>  3 files changed, 40 insertions(+), 29 deletions(-)
>> >>>>
>> >>>> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
>> >>>> index d8bfcc1ce923..e211af783a3d 100644
>> >>>> --- a/arch/arm64/include/asm/assembler.h
>> >>>> +++ b/arch/arm64/include/asm/assembler.h
>> >>>> @@ -222,4 +222,19 @@ lr       .req    x30             // link register
>> >>>>       .size   __pi_##x, . - x;        \
>> >>>>       ENDPROC(x)
>> >>>>
>> >>>> +     .macro  le16, val
>> >>>> +     .byte   \val & 0xff
>> >>>> +     .byte   (\val >> 8) & 0xff
>> >>>> +     .endm
>> >>>> +
>> >>>> +     .macro  le32, val
>> >>>> +     le16    \val
>> >>>> +     le16    (\val >> 16)
>> >>>> +     .endm
>> >>>> +
>> >>>> +     .macro  le64, val
>> >>>> +     le32    \val
>> >>>> +     le32    (\val >> 32)
>> >>>> +     .endm
>> >>>> +
>> >>>>  #endif       /* __ASM_ASSEMBLER_H */
>> >>>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>> >>>> index 350515276541..211f75e673f4 100644
>> >>>> --- a/arch/arm64/kernel/head.S
>> >>>> +++ b/arch/arm64/kernel/head.S
>> >>>> @@ -51,6 +51,17 @@
>> >>>>  #define KERNEL_START _text
>> >>>>  #define KERNEL_END   _end
>> >>>>
>> >>>> +#ifdef CONFIG_CPU_BIG_ENDIAN
>> >>>> +#define __HEAD_FLAG_BE       1
>> >>>> +#else
>> >>>> +#define __HEAD_FLAG_BE       0
>> >>>> +#endif
>> >>>> +
>> >>>> +#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>> >>>> +
>> >>>> +#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
>> >>>> +                      (__HEAD_FLAG_PAGE_SIZE << 1))
>> >>>> +
>> >>>>  /*
>> >>>>   * Kernel startup entry point.
>> >>>>   * ---------------------------
>> >>>> @@ -83,9 +94,9 @@ efi_head:
>> >>>>       b       stext                           // branch to kernel start, magic
>> >>>>       .long   0                               // reserved
>> >>>>  #endif
>> >>>> -     .quad   _kernel_offset_le               // Image load offset from start of RAM, little-endian
>> >>>> -     .quad   _kernel_size_le                 // Effective size of kernel image, little-endian
>> >>>> -     .quad   _kernel_flags_le                // Informative flags, little-endian
>> >>>> +     le64    TEXT_OFFSET                     // Image load offset from start of RAM, little-endian
>> >>>> +     .long   _kernel_size_le, 0              // Effective size of kernel image, little-endian
>> >>>> +     le64    __HEAD_FLAGS                    // Informative flags, little-endian
>> >>>>       .quad   0                               // reserved
>> >>>>       .quad   0                               // reserved
>> >>>>       .quad   0                               // reserved
>> >>>> diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
>> >>>> index bc2abb8b1599..bb6b0e69d0a4 100644
>> >>>> --- a/arch/arm64/kernel/image.h
>> >>>> +++ b/arch/arm64/kernel/image.h
>> >>>> @@ -26,41 +26,26 @@
>> >>>>   * There aren't any ELF relocations we can use to endian-swap values known only
>> >>>>   * at link time (e.g. the subtraction of two symbol addresses), so we must get
>> >>>>   * the linker to endian-swap certain values before emitting them.
>> >>>> + * Note that this will not work for 64-bit values: these are resolved using
>> >>>> + * R_AARCH64_ABS64 relocations, which are fixed up at runtime rather than at
>> >>>> + * build time when building the PIE executable (for KASLR).
>> >>>>   */
>> >>>>  #ifdef CONFIG_CPU_BIG_ENDIAN
>> >>>> -#define DATA_LE64(data)                                      \
>> >>>> -     ((((data) & 0x00000000000000ff) << 56) |        \
>> >>>> -      (((data) & 0x000000000000ff00) << 40) |        \
>> >>>> -      (((data) & 0x0000000000ff0000) << 24) |        \
>> >>>> -      (((data) & 0x00000000ff000000) << 8)  |        \
>> >>>> -      (((data) & 0x000000ff00000000) >> 8)  |        \
>> >>>> -      (((data) & 0x0000ff0000000000) >> 24) |        \
>> >>>> -      (((data) & 0x00ff000000000000) >> 40) |        \
>> >>>> -      (((data) & 0xff00000000000000) >> 56))
>> >>>> +#define DATA_LE32(data)                              \
>> >>>> +     ((((data) & 0x000000ff) << 24) |        \
>> >>>> +      (((data) & 0x0000ff00) << 8)  |        \
>> >>>> +      (((data) & 0x00ff0000) >> 8)  |        \
>> >>>> +      (((data) & 0xff000000) >> 24))
>> >>>>  #else
>> >>>> -#define DATA_LE64(data) ((data) & 0xffffffffffffffff)
>> >>>> +#define DATA_LE32(data) ((data) & 0xffffffff)
>> >>>>  #endif
>> >>>>
>> >>>> -#ifdef CONFIG_CPU_BIG_ENDIAN
>> >>>> -#define __HEAD_FLAG_BE       1
>> >>>> -#else
>> >>>> -#define __HEAD_FLAG_BE       0
>> >>>> -#endif
>> >>>> -
>> >>>> -#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
>> >>>> -
>> >>>> -#define __HEAD_FLAGS ((__HEAD_FLAG_BE << 0) |        \
>> >>>> -                      (__HEAD_FLAG_PAGE_SIZE << 1))
>> >>>> -
>> >>>>  /*
>> >>>>   * These will output as part of the Image header, which should be little-endian
>> >>>> - * regardless of the endianness of the kernel. While constant values could be
>> >>>> - * endian swapped in head.S, all are done here for consistency.
>> >>>> + * regardless of the endianness of the kernel.
>> >>>>   */
>> >>>>  #define HEAD_SYMBOLS                                         \
>> >>>> -     _kernel_size_le         = DATA_LE64(_end - _text);      \
>> >>>> -     _kernel_offset_le       = DATA_LE64(TEXT_OFFSET);       \
>> >>>> -     _kernel_flags_le        = DATA_LE64(__HEAD_FLAGS);
>> >>>> +     _kernel_size_le         = DATA_LE32(_end - _text);
>> >>>>
>> >>>>  #ifdef CONFIG_EFI
>> >>>>
>> >>>> --
>> >>>> 2.5.0
>> >>>>
>>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 12/21] arm64: avoid dynamic relocations in early boot code
  2016-01-11 13:19   ` Ard Biesheuvel
  (?)
@ 2016-01-14 17:09     ` Mark Rutland
  -1 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-14 17:09 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	leif.lindholm, keescook, linux-kernel, stuart.yoder,
	bhupesh.sharma, arnd, marc.zyngier, christoffer.dall

On Mon, Jan 11, 2016 at 02:19:05PM +0100, Ard Biesheuvel wrote:
> Before implementing KASLR for arm64 by building a self-relocating PIE
> executable, we have to ensure that values we use before the relocation
> routine is executed are not subject to dynamic relocation themselves.
> This applies not only to virtual addresses, but also to values that are
> supplied by the linker at build time and relocated using R_AARCH64_ABS64
> relocations.
> 
> So instead, use assemble time constants, or force the use of static
> relocations by folding the constants into the instructions.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

I think we lose a bit of legibility due to the hoops we jump through for
the new literals. However, it is correct, and I've not managed to come
up with anything nicer.

FWIW:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

> ---
>  arch/arm64/kernel/efi-entry.S |  2 +-
>  arch/arm64/kernel/head.S      | 39 +++++++++++++-------
>  2 files changed, 27 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/arm64/kernel/efi-entry.S b/arch/arm64/kernel/efi-entry.S
> index a773db92908b..f82036e02485 100644
> --- a/arch/arm64/kernel/efi-entry.S
> +++ b/arch/arm64/kernel/efi-entry.S
> @@ -61,7 +61,7 @@ ENTRY(entry)
>  	 */
>  	mov	x20, x0		// DTB address
>  	ldr	x0, [sp, #16]	// relocated _text address
> -	ldr	x21, =stext_offset
> +	movz	x21, #:abs_g0:stext_offset
>  	add	x21, x0, x21
>  
>  	/*
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 211f75e673f4..5dc8079cef77 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -78,12 +78,11 @@
>   * in the entry routines.
>   */
>  	__HEAD
> -
> +_head:
>  	/*
>  	 * DO NOT MODIFY. Image header expected by Linux boot-loaders.
>  	 */
>  #ifdef CONFIG_EFI
> -efi_head:
>  	/*
>  	 * This add instruction has no meaningful effect except that
>  	 * its opcode forms the magic "MZ" signature required by UEFI.
> @@ -105,14 +104,14 @@ efi_head:
>  	.byte	0x4d
>  	.byte	0x64
>  #ifdef CONFIG_EFI
> -	.long	pe_header - efi_head		// Offset to the PE header.
> +	.long	pe_header - _head		// Offset to the PE header.
>  #else
>  	.word	0				// reserved
>  #endif
>  
>  #ifdef CONFIG_EFI
>  	.globl	__efistub_stext_offset
> -	.set	__efistub_stext_offset, stext - efi_head
> +	.set	__efistub_stext_offset, stext - _head
>  	.align 3
>  pe_header:
>  	.ascii	"PE"
> @@ -135,7 +134,7 @@ optional_header:
>  	.long	_end - stext			// SizeOfCode
>  	.long	0				// SizeOfInitializedData
>  	.long	0				// SizeOfUninitializedData
> -	.long	__efistub_entry - efi_head	// AddressOfEntryPoint
> +	.long	__efistub_entry - _head		// AddressOfEntryPoint
>  	.long	__efistub_stext_offset		// BaseOfCode
>  
>  extra_header_fields:
> @@ -150,7 +149,7 @@ extra_header_fields:
>  	.short	0				// MinorSubsystemVersion
>  	.long	0				// Win32VersionValue
>  
> -	.long	_end - efi_head			// SizeOfImage
> +	.long	_end - _head			// SizeOfImage
>  
>  	// Everything before the kernel image is considered part of the header
>  	.long	__efistub_stext_offset		// SizeOfHeaders
> @@ -230,11 +229,13 @@ ENTRY(stext)
>  	 * On return, the CPU will be ready for the MMU to be turned on and
>  	 * the TCR will have been set.
>  	 */
> -	ldr	x27, =__mmap_switched		// address to jump to after
> +	ldr	x27, 0f				// address to jump to after
>  						// MMU has been enabled
>  	adr_l	lr, __enable_mmu		// return (PIC) address
>  	b	__cpu_setup			// initialise processor
>  ENDPROC(stext)
> +	.align	3
> +0:	.quad	__mmap_switched - (_head - TEXT_OFFSET) + KIMAGE_VADDR
>  
>  /*
>   * Preserve the arguments passed by the bootloader in x0 .. x3
> @@ -402,7 +403,8 @@ __create_page_tables:
>  	mov	x0, x26				// swapper_pg_dir
>  	ldr	x5, =KIMAGE_VADDR
>  	create_pgd_entry x0, x5, x3, x6
> -	ldr	x6, =KERNEL_END			// __va(KERNEL_END)
> +	ldr	w6, kernel_img_size
> +	add	x6, x6, x5
>  	mov	x3, x24				// phys offset
>  	create_block_map x0, x7, x3, x5, x6
>  
> @@ -419,6 +421,9 @@ __create_page_tables:
>  	mov	lr, x27
>  	ret
>  ENDPROC(__create_page_tables)
> +
> +kernel_img_size:
> +	.long	_end - (_head - TEXT_OFFSET)
>  	.ltorg
>  
>  /*
> @@ -426,6 +431,10 @@ ENDPROC(__create_page_tables)
>   */
>  	.set	initial_sp, init_thread_union + THREAD_START_SP
>  __mmap_switched:
> +	adr_l	x8, vectors			// load VBAR_EL1 with virtual
> +	msr	vbar_el1, x8			// vector table address
> +	isb
> +
>  	// Clear BSS
>  	adr_l	x0, __bss_start
>  	mov	x1, xzr
> @@ -612,13 +621,19 @@ ENTRY(secondary_startup)
>  	adrp	x26, swapper_pg_dir
>  	bl	__cpu_setup			// initialise processor
>  
> -	ldr	x21, =secondary_data
> -	ldr	x27, =__secondary_switched	// address to jump to after enabling the MMU
> +	ldr	x8, =KIMAGE_VADDR
> +	ldr	w9, 0f
> +	sub	x27, x8, w9, sxtw		// address to jump to after enabling the MMU
>  	b	__enable_mmu
>  ENDPROC(secondary_startup)
> +0:	.long	(_text - TEXT_OFFSET) - __secondary_switched
>  
>  ENTRY(__secondary_switched)
> -	ldr	x0, [x21]			// get secondary_data.stack
> +	adr_l	x5, vectors
> +	msr	vbar_el1, x5
> +	isb
> +
> +	ldr_l	x0, secondary_data		// get secondary_data.stack
>  	mov	sp, x0
>  	and	x0, x0, #~(THREAD_SIZE - 1)
>  	msr	sp_el0, x0			// save thread_info
> @@ -643,8 +658,6 @@ __enable_mmu:
>  	ubfx	x2, x1, #ID_AA64MMFR0_TGRAN_SHIFT, 4
>  	cmp	x2, #ID_AA64MMFR0_TGRAN_SUPPORTED
>  	b.ne	__no_granule_support
> -	ldr	x5, =vectors
> -	msr	vbar_el1, x5
>  	msr	ttbr0_el1, x25			// load TTBR0
>  	msr	ttbr1_el1, x26			// load TTBR1
>  	isb
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 12/21] arm64: avoid dynamic relocations in early boot code
@ 2016-01-14 17:09     ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-14 17:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 11, 2016 at 02:19:05PM +0100, Ard Biesheuvel wrote:
> Before implementing KASLR for arm64 by building a self-relocating PIE
> executable, we have to ensure that values we use before the relocation
> routine is executed are not subject to dynamic relocation themselves.
> This applies not only to virtual addresses, but also to values that are
> supplied by the linker at build time and relocated using R_AARCH64_ABS64
> relocations.
> 
> So instead, use assemble time constants, or force the use of static
> relocations by folding the constants into the instructions.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

I think we lose a bit of legibility due to the hoops we jump through for
the new literals. However, it is correct, and I've not managed to come
up with anything nicer.

FWIW:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

> ---
>  arch/arm64/kernel/efi-entry.S |  2 +-
>  arch/arm64/kernel/head.S      | 39 +++++++++++++-------
>  2 files changed, 27 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/arm64/kernel/efi-entry.S b/arch/arm64/kernel/efi-entry.S
> index a773db92908b..f82036e02485 100644
> --- a/arch/arm64/kernel/efi-entry.S
> +++ b/arch/arm64/kernel/efi-entry.S
> @@ -61,7 +61,7 @@ ENTRY(entry)
>  	 */
>  	mov	x20, x0		// DTB address
>  	ldr	x0, [sp, #16]	// relocated _text address
> -	ldr	x21, =stext_offset
> +	movz	x21, #:abs_g0:stext_offset
>  	add	x21, x0, x21
>  
>  	/*
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 211f75e673f4..5dc8079cef77 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -78,12 +78,11 @@
>   * in the entry routines.
>   */
>  	__HEAD
> -
> +_head:
>  	/*
>  	 * DO NOT MODIFY. Image header expected by Linux boot-loaders.
>  	 */
>  #ifdef CONFIG_EFI
> -efi_head:
>  	/*
>  	 * This add instruction has no meaningful effect except that
>  	 * its opcode forms the magic "MZ" signature required by UEFI.
> @@ -105,14 +104,14 @@ efi_head:
>  	.byte	0x4d
>  	.byte	0x64
>  #ifdef CONFIG_EFI
> -	.long	pe_header - efi_head		// Offset to the PE header.
> +	.long	pe_header - _head		// Offset to the PE header.
>  #else
>  	.word	0				// reserved
>  #endif
>  
>  #ifdef CONFIG_EFI
>  	.globl	__efistub_stext_offset
> -	.set	__efistub_stext_offset, stext - efi_head
> +	.set	__efistub_stext_offset, stext - _head
>  	.align 3
>  pe_header:
>  	.ascii	"PE"
> @@ -135,7 +134,7 @@ optional_header:
>  	.long	_end - stext			// SizeOfCode
>  	.long	0				// SizeOfInitializedData
>  	.long	0				// SizeOfUninitializedData
> -	.long	__efistub_entry - efi_head	// AddressOfEntryPoint
> +	.long	__efistub_entry - _head		// AddressOfEntryPoint
>  	.long	__efistub_stext_offset		// BaseOfCode
>  
>  extra_header_fields:
> @@ -150,7 +149,7 @@ extra_header_fields:
>  	.short	0				// MinorSubsystemVersion
>  	.long	0				// Win32VersionValue
>  
> -	.long	_end - efi_head			// SizeOfImage
> +	.long	_end - _head			// SizeOfImage
>  
>  	// Everything before the kernel image is considered part of the header
>  	.long	__efistub_stext_offset		// SizeOfHeaders
> @@ -230,11 +229,13 @@ ENTRY(stext)
>  	 * On return, the CPU will be ready for the MMU to be turned on and
>  	 * the TCR will have been set.
>  	 */
> -	ldr	x27, =__mmap_switched		// address to jump to after
> +	ldr	x27, 0f				// address to jump to after
>  						// MMU has been enabled
>  	adr_l	lr, __enable_mmu		// return (PIC) address
>  	b	__cpu_setup			// initialise processor
>  ENDPROC(stext)
> +	.align	3
> +0:	.quad	__mmap_switched - (_head - TEXT_OFFSET) + KIMAGE_VADDR
>  
>  /*
>   * Preserve the arguments passed by the bootloader in x0 .. x3
> @@ -402,7 +403,8 @@ __create_page_tables:
>  	mov	x0, x26				// swapper_pg_dir
>  	ldr	x5, =KIMAGE_VADDR
>  	create_pgd_entry x0, x5, x3, x6
> -	ldr	x6, =KERNEL_END			// __va(KERNEL_END)
> +	ldr	w6, kernel_img_size
> +	add	x6, x6, x5
>  	mov	x3, x24				// phys offset
>  	create_block_map x0, x7, x3, x5, x6
>  
> @@ -419,6 +421,9 @@ __create_page_tables:
>  	mov	lr, x27
>  	ret
>  ENDPROC(__create_page_tables)
> +
> +kernel_img_size:
> +	.long	_end - (_head - TEXT_OFFSET)
>  	.ltorg
>  
>  /*
> @@ -426,6 +431,10 @@ ENDPROC(__create_page_tables)
>   */
>  	.set	initial_sp, init_thread_union + THREAD_START_SP
>  __mmap_switched:
> +	adr_l	x8, vectors			// load VBAR_EL1 with virtual
> +	msr	vbar_el1, x8			// vector table address
> +	isb
> +
>  	// Clear BSS
>  	adr_l	x0, __bss_start
>  	mov	x1, xzr
> @@ -612,13 +621,19 @@ ENTRY(secondary_startup)
>  	adrp	x26, swapper_pg_dir
>  	bl	__cpu_setup			// initialise processor
>  
> -	ldr	x21, =secondary_data
> -	ldr	x27, =__secondary_switched	// address to jump to after enabling the MMU
> +	ldr	x8, =KIMAGE_VADDR
> +	ldr	w9, 0f
> +	sub	x27, x8, w9, sxtw		// address to jump to after enabling the MMU
>  	b	__enable_mmu
>  ENDPROC(secondary_startup)
> +0:	.long	(_text - TEXT_OFFSET) - __secondary_switched
>  
>  ENTRY(__secondary_switched)
> -	ldr	x0, [x21]			// get secondary_data.stack
> +	adr_l	x5, vectors
> +	msr	vbar_el1, x5
> +	isb
> +
> +	ldr_l	x0, secondary_data		// get secondary_data.stack
>  	mov	sp, x0
>  	and	x0, x0, #~(THREAD_SIZE - 1)
>  	msr	sp_el0, x0			// save thread_info
> @@ -643,8 +658,6 @@ __enable_mmu:
>  	ubfx	x2, x1, #ID_AA64MMFR0_TGRAN_SHIFT, 4
>  	cmp	x2, #ID_AA64MMFR0_TGRAN_SUPPORTED
>  	b.ne	__no_granule_support
> -	ldr	x5, =vectors
> -	msr	vbar_el1, x5
>  	msr	ttbr0_el1, x25			// load TTBR0
>  	msr	ttbr1_el1, x26			// load TTBR1
>  	isb
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 12/21] arm64: avoid dynamic relocations in early boot code
@ 2016-01-14 17:09     ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-14 17:09 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	leif.lindholm, keescook, linux-kernel, stuart.yoder,
	bhupesh.sharma, arnd, marc.zyngier, christoffer.dall

On Mon, Jan 11, 2016 at 02:19:05PM +0100, Ard Biesheuvel wrote:
> Before implementing KASLR for arm64 by building a self-relocating PIE
> executable, we have to ensure that values we use before the relocation
> routine is executed are not subject to dynamic relocation themselves.
> This applies not only to virtual addresses, but also to values that are
> supplied by the linker at build time and relocated using R_AARCH64_ABS64
> relocations.
> 
> So instead, use assemble time constants, or force the use of static
> relocations by folding the constants into the instructions.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

I think we lose a bit of legibility due to the hoops we jump through for
the new literals. However, it is correct, and I've not managed to come
up with anything nicer.

FWIW:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

> ---
>  arch/arm64/kernel/efi-entry.S |  2 +-
>  arch/arm64/kernel/head.S      | 39 +++++++++++++-------
>  2 files changed, 27 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/arm64/kernel/efi-entry.S b/arch/arm64/kernel/efi-entry.S
> index a773db92908b..f82036e02485 100644
> --- a/arch/arm64/kernel/efi-entry.S
> +++ b/arch/arm64/kernel/efi-entry.S
> @@ -61,7 +61,7 @@ ENTRY(entry)
>  	 */
>  	mov	x20, x0		// DTB address
>  	ldr	x0, [sp, #16]	// relocated _text address
> -	ldr	x21, =stext_offset
> +	movz	x21, #:abs_g0:stext_offset
>  	add	x21, x0, x21
>  
>  	/*
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 211f75e673f4..5dc8079cef77 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -78,12 +78,11 @@
>   * in the entry routines.
>   */
>  	__HEAD
> -
> +_head:
>  	/*
>  	 * DO NOT MODIFY. Image header expected by Linux boot-loaders.
>  	 */
>  #ifdef CONFIG_EFI
> -efi_head:
>  	/*
>  	 * This add instruction has no meaningful effect except that
>  	 * its opcode forms the magic "MZ" signature required by UEFI.
> @@ -105,14 +104,14 @@ efi_head:
>  	.byte	0x4d
>  	.byte	0x64
>  #ifdef CONFIG_EFI
> -	.long	pe_header - efi_head		// Offset to the PE header.
> +	.long	pe_header - _head		// Offset to the PE header.
>  #else
>  	.word	0				// reserved
>  #endif
>  
>  #ifdef CONFIG_EFI
>  	.globl	__efistub_stext_offset
> -	.set	__efistub_stext_offset, stext - efi_head
> +	.set	__efistub_stext_offset, stext - _head
>  	.align 3
>  pe_header:
>  	.ascii	"PE"
> @@ -135,7 +134,7 @@ optional_header:
>  	.long	_end - stext			// SizeOfCode
>  	.long	0				// SizeOfInitializedData
>  	.long	0				// SizeOfUninitializedData
> -	.long	__efistub_entry - efi_head	// AddressOfEntryPoint
> +	.long	__efistub_entry - _head		// AddressOfEntryPoint
>  	.long	__efistub_stext_offset		// BaseOfCode
>  
>  extra_header_fields:
> @@ -150,7 +149,7 @@ extra_header_fields:
>  	.short	0				// MinorSubsystemVersion
>  	.long	0				// Win32VersionValue
>  
> -	.long	_end - efi_head			// SizeOfImage
> +	.long	_end - _head			// SizeOfImage
>  
>  	// Everything before the kernel image is considered part of the header
>  	.long	__efistub_stext_offset		// SizeOfHeaders
> @@ -230,11 +229,13 @@ ENTRY(stext)
>  	 * On return, the CPU will be ready for the MMU to be turned on and
>  	 * the TCR will have been set.
>  	 */
> -	ldr	x27, =__mmap_switched		// address to jump to after
> +	ldr	x27, 0f				// address to jump to after
>  						// MMU has been enabled
>  	adr_l	lr, __enable_mmu		// return (PIC) address
>  	b	__cpu_setup			// initialise processor
>  ENDPROC(stext)
> +	.align	3
> +0:	.quad	__mmap_switched - (_head - TEXT_OFFSET) + KIMAGE_VADDR
>  
>  /*
>   * Preserve the arguments passed by the bootloader in x0 .. x3
> @@ -402,7 +403,8 @@ __create_page_tables:
>  	mov	x0, x26				// swapper_pg_dir
>  	ldr	x5, =KIMAGE_VADDR
>  	create_pgd_entry x0, x5, x3, x6
> -	ldr	x6, =KERNEL_END			// __va(KERNEL_END)
> +	ldr	w6, kernel_img_size
> +	add	x6, x6, x5
>  	mov	x3, x24				// phys offset
>  	create_block_map x0, x7, x3, x5, x6
>  
> @@ -419,6 +421,9 @@ __create_page_tables:
>  	mov	lr, x27
>  	ret
>  ENDPROC(__create_page_tables)
> +
> +kernel_img_size:
> +	.long	_end - (_head - TEXT_OFFSET)
>  	.ltorg
>  
>  /*
> @@ -426,6 +431,10 @@ ENDPROC(__create_page_tables)
>   */
>  	.set	initial_sp, init_thread_union + THREAD_START_SP
>  __mmap_switched:
> +	adr_l	x8, vectors			// load VBAR_EL1 with virtual
> +	msr	vbar_el1, x8			// vector table address
> +	isb
> +
>  	// Clear BSS
>  	adr_l	x0, __bss_start
>  	mov	x1, xzr
> @@ -612,13 +621,19 @@ ENTRY(secondary_startup)
>  	adrp	x26, swapper_pg_dir
>  	bl	__cpu_setup			// initialise processor
>  
> -	ldr	x21, =secondary_data
> -	ldr	x27, =__secondary_switched	// address to jump to after enabling the MMU
> +	ldr	x8, =KIMAGE_VADDR
> +	ldr	w9, 0f
> +	sub	x27, x8, w9, sxtw		// address to jump to after enabling the MMU
>  	b	__enable_mmu
>  ENDPROC(secondary_startup)
> +0:	.long	(_text - TEXT_OFFSET) - __secondary_switched
>  
>  ENTRY(__secondary_switched)
> -	ldr	x0, [x21]			// get secondary_data.stack
> +	adr_l	x5, vectors
> +	msr	vbar_el1, x5
> +	isb
> +
> +	ldr_l	x0, secondary_data		// get secondary_data.stack
>  	mov	sp, x0
>  	and	x0, x0, #~(THREAD_SIZE - 1)
>  	msr	sp_el0, x0			// save thread_info
> @@ -643,8 +658,6 @@ __enable_mmu:
>  	ubfx	x2, x1, #ID_AA64MMFR0_TGRAN_SHIFT, 4
>  	cmp	x2, #ID_AA64MMFR0_TGRAN_SUPPORTED
>  	b.ne	__no_granule_support
> -	ldr	x5, =vectors
> -	msr	vbar_el1, x5
>  	msr	ttbr0_el1, x25			// load TTBR0
>  	msr	ttbr1_el1, x26			// load TTBR1
>  	isb
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
  2016-01-13 13:51         ` Mark Rutland
  (?)
@ 2016-01-14 18:57           ` Mark Rutland
  -1 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-14 18:57 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Kees Cook, Arnd Bergmann, kernel-hardening, Sharma Bhupesh,
	Catalin Marinas, Will Deacon, linux-kernel, Leif Lindholm,
	Stuart Yoder, Marc Zyngier, Christoffer Dall, linux-arm-kernel

On Wed, Jan 13, 2016 at 01:51:10PM +0000, Mark Rutland wrote:
> On Wed, Jan 13, 2016 at 09:39:41AM +0100, Ard Biesheuvel wrote:
> > On 12 January 2016 at 19:14, Mark Rutland <mark.rutland@arm.com> wrote:
> > > On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
> > >>  void __init kasan_init(void)
> > >>  {
> > >> +     u64 kimg_shadow_start, kimg_shadow_end;
> > >>       struct memblock_region *reg;
> > >>
> > >> +     kimg_shadow_start = round_down((u64)kasan_mem_to_shadow(_text),
> > >> +                                    SWAPPER_BLOCK_SIZE);
> > >> +     kimg_shadow_end = round_up((u64)kasan_mem_to_shadow(_end),
> > >> +                                SWAPPER_BLOCK_SIZE);
> > >
> > > This rounding looks suspect to me, given it's applied to the shadow
> > > addresses rather than the kimage addresses. That's roughly equivalent to
> > > kasan_mem_to_shadow(round_up(_end, 8 * SWAPPER_BLOCK_SIZE).
> > >
> > > I don't think we need any rounding for the kimage addresses. The image
> > > end is page-granular (and the fine-grained mapping will reflect that).
> > > Any accesses between _end and roud_up(_end, SWAPPER_BLOCK_SIZE) would be
> > > bugs (and would most likely fault) regardless of KASAN.
> > >
> > > Or am I just being thick here?
> > >
> > 
> > Well, the problem here is that vmemmap_populate() is used as a
> > surrogate vmalloc() since that is not available yet, and
> > vmemmap_populate() allocates in SWAPPER_BLOCK_SIZE granularity.

>From a look at the git history, and a chat with Catalin, it sounds like
the SWAPPER_BLOCK_SIZE granularity is a historical artifact. It happened
to be easier to implement it that way at some point in the past, but
there's no reason the 4K/16K/64K cases can't all be handled by the same
code that would go down to PAGE_SIZE granularity, using sections if
possible.

I'll drop that on the TODO list.

> > If I remove the rounding, I get false positive kasan errors which I
> > have not quite diagnosed yet, but are probably due to the fact that
> > the rounding performed by vmemmap_populate() goes in the wrong
> > direction.

As far as I can see, it implicitly rounds the base down and end up to
SWAPPER_BLOCK_SIZE granularity.

I can see that it might map too much memory, but I can't see why that
should trigger KASAN failures. Regardless of what was mapped KASAN
should stick to the region it cares about, and everything else should
stay out of that.

When do you see the failures, and are they in any way consistent?

Do you have an example to hand?

> I'll also take a peek.

I haven't managed to trigger KASAN failures with the rounding removed.
I'm using 4K pages, and running under KVM tool (no EFI, so the memory
map is a contiguous block).

What does your memory map look like?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-14 18:57           ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-14 18:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 13, 2016 at 01:51:10PM +0000, Mark Rutland wrote:
> On Wed, Jan 13, 2016 at 09:39:41AM +0100, Ard Biesheuvel wrote:
> > On 12 January 2016 at 19:14, Mark Rutland <mark.rutland@arm.com> wrote:
> > > On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
> > >>  void __init kasan_init(void)
> > >>  {
> > >> +     u64 kimg_shadow_start, kimg_shadow_end;
> > >>       struct memblock_region *reg;
> > >>
> > >> +     kimg_shadow_start = round_down((u64)kasan_mem_to_shadow(_text),
> > >> +                                    SWAPPER_BLOCK_SIZE);
> > >> +     kimg_shadow_end = round_up((u64)kasan_mem_to_shadow(_end),
> > >> +                                SWAPPER_BLOCK_SIZE);
> > >
> > > This rounding looks suspect to me, given it's applied to the shadow
> > > addresses rather than the kimage addresses. That's roughly equivalent to
> > > kasan_mem_to_shadow(round_up(_end, 8 * SWAPPER_BLOCK_SIZE).
> > >
> > > I don't think we need any rounding for the kimage addresses. The image
> > > end is page-granular (and the fine-grained mapping will reflect that).
> > > Any accesses between _end and roud_up(_end, SWAPPER_BLOCK_SIZE) would be
> > > bugs (and would most likely fault) regardless of KASAN.
> > >
> > > Or am I just being thick here?
> > >
> > 
> > Well, the problem here is that vmemmap_populate() is used as a
> > surrogate vmalloc() since that is not available yet, and
> > vmemmap_populate() allocates in SWAPPER_BLOCK_SIZE granularity.

>From a look at the git history, and a chat with Catalin, it sounds like
the SWAPPER_BLOCK_SIZE granularity is a historical artifact. It happened
to be easier to implement it that way at some point in the past, but
there's no reason the 4K/16K/64K cases can't all be handled by the same
code that would go down to PAGE_SIZE granularity, using sections if
possible.

I'll drop that on the TODO list.

> > If I remove the rounding, I get false positive kasan errors which I
> > have not quite diagnosed yet, but are probably due to the fact that
> > the rounding performed by vmemmap_populate() goes in the wrong
> > direction.

As far as I can see, it implicitly rounds the base down and end up to
SWAPPER_BLOCK_SIZE granularity.

I can see that it might map too much memory, but I can't see why that
should trigger KASAN failures. Regardless of what was mapped KASAN
should stick to the region it cares about, and everything else should
stay out of that.

When do you see the failures, and are they in any way consistent?

Do you have an example to hand?

> I'll also take a peek.

I haven't managed to trigger KASAN failures with the rounding removed.
I'm using 4K pages, and running under KVM tool (no EFI, so the memory
map is a contiguous block).

What does your memory map look like?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-14 18:57           ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-14 18:57 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Kees Cook, Arnd Bergmann, kernel-hardening, Sharma Bhupesh,
	Catalin Marinas, Will Deacon, linux-kernel, Leif Lindholm,
	Stuart Yoder, Marc Zyngier, Christoffer Dall, linux-arm-kernel

On Wed, Jan 13, 2016 at 01:51:10PM +0000, Mark Rutland wrote:
> On Wed, Jan 13, 2016 at 09:39:41AM +0100, Ard Biesheuvel wrote:
> > On 12 January 2016 at 19:14, Mark Rutland <mark.rutland@arm.com> wrote:
> > > On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
> > >>  void __init kasan_init(void)
> > >>  {
> > >> +     u64 kimg_shadow_start, kimg_shadow_end;
> > >>       struct memblock_region *reg;
> > >>
> > >> +     kimg_shadow_start = round_down((u64)kasan_mem_to_shadow(_text),
> > >> +                                    SWAPPER_BLOCK_SIZE);
> > >> +     kimg_shadow_end = round_up((u64)kasan_mem_to_shadow(_end),
> > >> +                                SWAPPER_BLOCK_SIZE);
> > >
> > > This rounding looks suspect to me, given it's applied to the shadow
> > > addresses rather than the kimage addresses. That's roughly equivalent to
> > > kasan_mem_to_shadow(round_up(_end, 8 * SWAPPER_BLOCK_SIZE).
> > >
> > > I don't think we need any rounding for the kimage addresses. The image
> > > end is page-granular (and the fine-grained mapping will reflect that).
> > > Any accesses between _end and roud_up(_end, SWAPPER_BLOCK_SIZE) would be
> > > bugs (and would most likely fault) regardless of KASAN.
> > >
> > > Or am I just being thick here?
> > >
> > 
> > Well, the problem here is that vmemmap_populate() is used as a
> > surrogate vmalloc() since that is not available yet, and
> > vmemmap_populate() allocates in SWAPPER_BLOCK_SIZE granularity.

>From a look at the git history, and a chat with Catalin, it sounds like
the SWAPPER_BLOCK_SIZE granularity is a historical artifact. It happened
to be easier to implement it that way at some point in the past, but
there's no reason the 4K/16K/64K cases can't all be handled by the same
code that would go down to PAGE_SIZE granularity, using sections if
possible.

I'll drop that on the TODO list.

> > If I remove the rounding, I get false positive kasan errors which I
> > have not quite diagnosed yet, but are probably due to the fact that
> > the rounding performed by vmemmap_populate() goes in the wrong
> > direction.

As far as I can see, it implicitly rounds the base down and end up to
SWAPPER_BLOCK_SIZE granularity.

I can see that it might map too much memory, but I can't see why that
should trigger KASAN failures. Regardless of what was mapped KASAN
should stick to the region it cares about, and everything else should
stay out of that.

When do you see the failures, and are they in any way consistent?

Do you have an example to hand?

> I'll also take a peek.

I haven't managed to trigger KASAN failures with the rounding removed.
I'm using 4K pages, and running under KVM tool (no EFI, so the memory
map is a contiguous block).

What does your memory map look like?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
  2016-01-14 18:57           ` Mark Rutland
  (?)
@ 2016-01-15  9:54             ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-15  9:54 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Kees Cook, Arnd Bergmann, kernel-hardening, Sharma Bhupesh,
	Catalin Marinas, Will Deacon, linux-kernel, Leif Lindholm,
	Stuart Yoder, Marc Zyngier, Christoffer Dall, linux-arm-kernel

On 14 January 2016 at 19:57, Mark Rutland <mark.rutland@arm.com> wrote:
> On Wed, Jan 13, 2016 at 01:51:10PM +0000, Mark Rutland wrote:
>> On Wed, Jan 13, 2016 at 09:39:41AM +0100, Ard Biesheuvel wrote:
>> > On 12 January 2016 at 19:14, Mark Rutland <mark.rutland@arm.com> wrote:
>> > > On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
>> > >>  void __init kasan_init(void)
>> > >>  {
>> > >> +     u64 kimg_shadow_start, kimg_shadow_end;
>> > >>       struct memblock_region *reg;
>> > >>
>> > >> +     kimg_shadow_start = round_down((u64)kasan_mem_to_shadow(_text),
>> > >> +                                    SWAPPER_BLOCK_SIZE);
>> > >> +     kimg_shadow_end = round_up((u64)kasan_mem_to_shadow(_end),
>> > >> +                                SWAPPER_BLOCK_SIZE);
>> > >
>> > > This rounding looks suspect to me, given it's applied to the shadow
>> > > addresses rather than the kimage addresses. That's roughly equivalent to
>> > > kasan_mem_to_shadow(round_up(_end, 8 * SWAPPER_BLOCK_SIZE).
>> > >
>> > > I don't think we need any rounding for the kimage addresses. The image
>> > > end is page-granular (and the fine-grained mapping will reflect that).
>> > > Any accesses between _end and roud_up(_end, SWAPPER_BLOCK_SIZE) would be
>> > > bugs (and would most likely fault) regardless of KASAN.
>> > >
>> > > Or am I just being thick here?
>> > >
>> >
>> > Well, the problem here is that vmemmap_populate() is used as a
>> > surrogate vmalloc() since that is not available yet, and
>> > vmemmap_populate() allocates in SWAPPER_BLOCK_SIZE granularity.
>
> From a look at the git history, and a chat with Catalin, it sounds like
> the SWAPPER_BLOCK_SIZE granularity is a historical artifact. It happened
> to be easier to implement it that way at some point in the past, but
> there's no reason the 4K/16K/64K cases can't all be handled by the same
> code that would go down to PAGE_SIZE granularity, using sections if
> possible.
>
> I'll drop that on the TODO list.
>

OK

>> > If I remove the rounding, I get false positive kasan errors which I
>> > have not quite diagnosed yet, but are probably due to the fact that
>> > the rounding performed by vmemmap_populate() goes in the wrong
>> > direction.
>
> As far as I can see, it implicitly rounds the base down and end up to
> SWAPPER_BLOCK_SIZE granularity.
>
> I can see that it might map too much memory, but I can't see why that
> should trigger KASAN failures. Regardless of what was mapped KASAN
> should stick to the region it cares about, and everything else should
> stay out of that.
>
> When do you see the failures, and are they in any way consistent?
>
> Do you have an example to hand?
>

For some reason, this issue has evaporated, i.e., I can no longer
reproduce it on my WIP v4 branch.
So I will remove the rounding.

Thanks,
Ard.


>> I'll also take a peek.
>
> I haven't managed to trigger KASAN failures with the rounding removed.
> I'm using 4K pages, and running under KVM tool (no EFI, so the memory
> map is a contiguous block).
>
> What does your memory map look like?
>
> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-15  9:54             ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-15  9:54 UTC (permalink / raw)
  To: linux-arm-kernel

On 14 January 2016 at 19:57, Mark Rutland <mark.rutland@arm.com> wrote:
> On Wed, Jan 13, 2016 at 01:51:10PM +0000, Mark Rutland wrote:
>> On Wed, Jan 13, 2016 at 09:39:41AM +0100, Ard Biesheuvel wrote:
>> > On 12 January 2016 at 19:14, Mark Rutland <mark.rutland@arm.com> wrote:
>> > > On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
>> > >>  void __init kasan_init(void)
>> > >>  {
>> > >> +     u64 kimg_shadow_start, kimg_shadow_end;
>> > >>       struct memblock_region *reg;
>> > >>
>> > >> +     kimg_shadow_start = round_down((u64)kasan_mem_to_shadow(_text),
>> > >> +                                    SWAPPER_BLOCK_SIZE);
>> > >> +     kimg_shadow_end = round_up((u64)kasan_mem_to_shadow(_end),
>> > >> +                                SWAPPER_BLOCK_SIZE);
>> > >
>> > > This rounding looks suspect to me, given it's applied to the shadow
>> > > addresses rather than the kimage addresses. That's roughly equivalent to
>> > > kasan_mem_to_shadow(round_up(_end, 8 * SWAPPER_BLOCK_SIZE).
>> > >
>> > > I don't think we need any rounding for the kimage addresses. The image
>> > > end is page-granular (and the fine-grained mapping will reflect that).
>> > > Any accesses between _end and roud_up(_end, SWAPPER_BLOCK_SIZE) would be
>> > > bugs (and would most likely fault) regardless of KASAN.
>> > >
>> > > Or am I just being thick here?
>> > >
>> >
>> > Well, the problem here is that vmemmap_populate() is used as a
>> > surrogate vmalloc() since that is not available yet, and
>> > vmemmap_populate() allocates in SWAPPER_BLOCK_SIZE granularity.
>
> From a look at the git history, and a chat with Catalin, it sounds like
> the SWAPPER_BLOCK_SIZE granularity is a historical artifact. It happened
> to be easier to implement it that way at some point in the past, but
> there's no reason the 4K/16K/64K cases can't all be handled by the same
> code that would go down to PAGE_SIZE granularity, using sections if
> possible.
>
> I'll drop that on the TODO list.
>

OK

>> > If I remove the rounding, I get false positive kasan errors which I
>> > have not quite diagnosed yet, but are probably due to the fact that
>> > the rounding performed by vmemmap_populate() goes in the wrong
>> > direction.
>
> As far as I can see, it implicitly rounds the base down and end up to
> SWAPPER_BLOCK_SIZE granularity.
>
> I can see that it might map too much memory, but I can't see why that
> should trigger KASAN failures. Regardless of what was mapped KASAN
> should stick to the region it cares about, and everything else should
> stay out of that.
>
> When do you see the failures, and are they in any way consistent?
>
> Do you have an example to hand?
>

For some reason, this issue has evaporated, i.e., I can no longer
reproduce it on my WIP v4 branch.
So I will remove the rounding.

Thanks,
Ard.


>> I'll also take a peek.
>
> I haven't managed to trigger KASAN failures with the rounding removed.
> I'm using 4K pages, and running under KVM tool (no EFI, so the memory
> map is a contiguous block).
>
> What does your memory map look like?
>
> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-15  9:54             ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-15  9:54 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Kees Cook, Arnd Bergmann, kernel-hardening, Sharma Bhupesh,
	Catalin Marinas, Will Deacon, linux-kernel, Leif Lindholm,
	Stuart Yoder, Marc Zyngier, Christoffer Dall, linux-arm-kernel

On 14 January 2016 at 19:57, Mark Rutland <mark.rutland@arm.com> wrote:
> On Wed, Jan 13, 2016 at 01:51:10PM +0000, Mark Rutland wrote:
>> On Wed, Jan 13, 2016 at 09:39:41AM +0100, Ard Biesheuvel wrote:
>> > On 12 January 2016 at 19:14, Mark Rutland <mark.rutland@arm.com> wrote:
>> > > On Mon, Jan 11, 2016 at 02:19:00PM +0100, Ard Biesheuvel wrote:
>> > >>  void __init kasan_init(void)
>> > >>  {
>> > >> +     u64 kimg_shadow_start, kimg_shadow_end;
>> > >>       struct memblock_region *reg;
>> > >>
>> > >> +     kimg_shadow_start = round_down((u64)kasan_mem_to_shadow(_text),
>> > >> +                                    SWAPPER_BLOCK_SIZE);
>> > >> +     kimg_shadow_end = round_up((u64)kasan_mem_to_shadow(_end),
>> > >> +                                SWAPPER_BLOCK_SIZE);
>> > >
>> > > This rounding looks suspect to me, given it's applied to the shadow
>> > > addresses rather than the kimage addresses. That's roughly equivalent to
>> > > kasan_mem_to_shadow(round_up(_end, 8 * SWAPPER_BLOCK_SIZE).
>> > >
>> > > I don't think we need any rounding for the kimage addresses. The image
>> > > end is page-granular (and the fine-grained mapping will reflect that).
>> > > Any accesses between _end and roud_up(_end, SWAPPER_BLOCK_SIZE) would be
>> > > bugs (and would most likely fault) regardless of KASAN.
>> > >
>> > > Or am I just being thick here?
>> > >
>> >
>> > Well, the problem here is that vmemmap_populate() is used as a
>> > surrogate vmalloc() since that is not available yet, and
>> > vmemmap_populate() allocates in SWAPPER_BLOCK_SIZE granularity.
>
> From a look at the git history, and a chat with Catalin, it sounds like
> the SWAPPER_BLOCK_SIZE granularity is a historical artifact. It happened
> to be easier to implement it that way at some point in the past, but
> there's no reason the 4K/16K/64K cases can't all be handled by the same
> code that would go down to PAGE_SIZE granularity, using sections if
> possible.
>
> I'll drop that on the TODO list.
>

OK

>> > If I remove the rounding, I get false positive kasan errors which I
>> > have not quite diagnosed yet, but are probably due to the fact that
>> > the rounding performed by vmemmap_populate() goes in the wrong
>> > direction.
>
> As far as I can see, it implicitly rounds the base down and end up to
> SWAPPER_BLOCK_SIZE granularity.
>
> I can see that it might map too much memory, but I can't see why that
> should trigger KASAN failures. Regardless of what was mapped KASAN
> should stick to the region it cares about, and everything else should
> stay out of that.
>
> When do you see the failures, and are they in any way consistent?
>
> Do you have an example to hand?
>

For some reason, this issue has evaporated, i.e., I can no longer
reproduce it on my WIP v4 branch.
So I will remove the rounding.

Thanks,
Ard.


>> I'll also take a peek.
>
> I haven't managed to trigger KASAN failures with the rounding removed.
> I'm using 4K pages, and running under KVM tool (no EFI, so the memory
> map is a contiguous block).
>
> What does your memory map look like?
>
> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
  2016-01-15  9:54             ` Ard Biesheuvel
  (?)
@ 2016-01-15 11:23               ` Mark Rutland
  -1 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-15 11:23 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Kees Cook, Arnd Bergmann, kernel-hardening, Sharma Bhupesh,
	Catalin Marinas, Will Deacon, linux-kernel, Leif Lindholm,
	Stuart Yoder, Marc Zyngier, Christoffer Dall, linux-arm-kernel

On Fri, Jan 15, 2016 at 10:54:26AM +0100, Ard Biesheuvel wrote:
> On 14 January 2016 at 19:57, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Wed, Jan 13, 2016 at 01:51:10PM +0000, Mark Rutland wrote:
> >> On Wed, Jan 13, 2016 at 09:39:41AM +0100, Ard Biesheuvel wrote:
> >> > If I remove the rounding, I get false positive kasan errors which I
> >> > have not quite diagnosed yet, but are probably due to the fact that
> >> > the rounding performed by vmemmap_populate() goes in the wrong
> >> > direction.
> >
> > As far as I can see, it implicitly rounds the base down and end up to
> > SWAPPER_BLOCK_SIZE granularity.
> >
> > I can see that it might map too much memory, but I can't see why that
> > should trigger KASAN failures. Regardless of what was mapped KASAN
> > should stick to the region it cares about, and everything else should
> > stay out of that.
> >
> > When do you see the failures, and are they in any way consistent?
> >
> > Do you have an example to hand?
> >
> 
> For some reason, this issue has evaporated, i.e., I can no longer
> reproduce it on my WIP v4 branch.
> So I will remove the rounding.

Ok.

I'll let you know if I stumble across anything that looks like a
potential cause of the KASAN failures, and I'll try to give v4 a go at
some point soon.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-15 11:23               ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-15 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 15, 2016 at 10:54:26AM +0100, Ard Biesheuvel wrote:
> On 14 January 2016 at 19:57, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Wed, Jan 13, 2016 at 01:51:10PM +0000, Mark Rutland wrote:
> >> On Wed, Jan 13, 2016 at 09:39:41AM +0100, Ard Biesheuvel wrote:
> >> > If I remove the rounding, I get false positive kasan errors which I
> >> > have not quite diagnosed yet, but are probably due to the fact that
> >> > the rounding performed by vmemmap_populate() goes in the wrong
> >> > direction.
> >
> > As far as I can see, it implicitly rounds the base down and end up to
> > SWAPPER_BLOCK_SIZE granularity.
> >
> > I can see that it might map too much memory, but I can't see why that
> > should trigger KASAN failures. Regardless of what was mapped KASAN
> > should stick to the region it cares about, and everything else should
> > stay out of that.
> >
> > When do you see the failures, and are they in any way consistent?
> >
> > Do you have an example to hand?
> >
> 
> For some reason, this issue has evaporated, i.e., I can no longer
> reproduce it on my WIP v4 branch.
> So I will remove the rounding.

Ok.

I'll let you know if I stumble across anything that looks like a
potential cause of the KASAN failures, and I'll try to give v4 a go at
some point soon.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-15 11:23               ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-15 11:23 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Kees Cook, Arnd Bergmann, kernel-hardening, Sharma Bhupesh,
	Catalin Marinas, Will Deacon, linux-kernel, Leif Lindholm,
	Stuart Yoder, Marc Zyngier, Christoffer Dall, linux-arm-kernel

On Fri, Jan 15, 2016 at 10:54:26AM +0100, Ard Biesheuvel wrote:
> On 14 January 2016 at 19:57, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Wed, Jan 13, 2016 at 01:51:10PM +0000, Mark Rutland wrote:
> >> On Wed, Jan 13, 2016 at 09:39:41AM +0100, Ard Biesheuvel wrote:
> >> > If I remove the rounding, I get false positive kasan errors which I
> >> > have not quite diagnosed yet, but are probably due to the fact that
> >> > the rounding performed by vmemmap_populate() goes in the wrong
> >> > direction.
> >
> > As far as I can see, it implicitly rounds the base down and end up to
> > SWAPPER_BLOCK_SIZE granularity.
> >
> > I can see that it might map too much memory, but I can't see why that
> > should trigger KASAN failures. Regardless of what was mapped KASAN
> > should stick to the region it cares about, and everything else should
> > stay out of that.
> >
> > When do you see the failures, and are they in any way consistent?
> >
> > Do you have an example to hand?
> >
> 
> For some reason, this issue has evaporated, i.e., I can no longer
> reproduce it on my WIP v4 branch.
> So I will remove the rounding.

Ok.

I'll let you know if I stumble across anything that looks like a
potential cause of the KASAN failures, and I'll try to give v4 a go at
some point soon.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 18/21] efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
  2016-01-11 13:19   ` Ard Biesheuvel
  (?)
@ 2016-01-21 15:42     ` Matt Fleming
  -1 siblings, 0 replies; 207+ messages in thread
From: Matt Fleming @ 2016-01-21 15:42 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel,
	stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall

On Mon, 11 Jan, at 02:19:12PM, Ard Biesheuvel wrote:
> This exposes the firmware's implementation of EFI_RNG_PROTOCOL via a new
> function efi_get_random_bytes().
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  drivers/firmware/efi/libstub/Makefile  |  2 +-
>  drivers/firmware/efi/libstub/efistub.h |  3 ++
>  drivers/firmware/efi/libstub/random.c  | 35 ++++++++++++++++++++
>  include/linux/efi.h                    |  5 ++-
>  4 files changed, 43 insertions(+), 2 deletions(-)

[...]

> @@ -0,0 +1,35 @@
> +/*
> + * Copyright (C) 2016 Linaro Ltd;  <ard.biesheuvel@linaro.org>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + */
> +
> +#include <linux/efi.h>
> +#include <asm/efi.h>
> +
> +#include "efistub.h"
> +
> +struct efi_rng_protocol_t {
> +	efi_status_t (*get_info)(struct efi_rng_protocol_t *,
> +				 unsigned long *, efi_guid_t *);
> +	efi_status_t (*get_rng)(struct efi_rng_protocol_t *,
> +				efi_guid_t *, unsigned long, u8 *out);
> +};

This is not the usual naming convention for EFI structs, it should
either be 'struct efi_rng_protocol' or 'efi_rng_protocol_t'.

But apart from that, this patch looks fine.

Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 18/21] efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
@ 2016-01-21 15:42     ` Matt Fleming
  0 siblings, 0 replies; 207+ messages in thread
From: Matt Fleming @ 2016-01-21 15:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 11 Jan, at 02:19:12PM, Ard Biesheuvel wrote:
> This exposes the firmware's implementation of EFI_RNG_PROTOCOL via a new
> function efi_get_random_bytes().
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  drivers/firmware/efi/libstub/Makefile  |  2 +-
>  drivers/firmware/efi/libstub/efistub.h |  3 ++
>  drivers/firmware/efi/libstub/random.c  | 35 ++++++++++++++++++++
>  include/linux/efi.h                    |  5 ++-
>  4 files changed, 43 insertions(+), 2 deletions(-)

[...]

> @@ -0,0 +1,35 @@
> +/*
> + * Copyright (C) 2016 Linaro Ltd;  <ard.biesheuvel@linaro.org>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + */
> +
> +#include <linux/efi.h>
> +#include <asm/efi.h>
> +
> +#include "efistub.h"
> +
> +struct efi_rng_protocol_t {
> +	efi_status_t (*get_info)(struct efi_rng_protocol_t *,
> +				 unsigned long *, efi_guid_t *);
> +	efi_status_t (*get_rng)(struct efi_rng_protocol_t *,
> +				efi_guid_t *, unsigned long, u8 *out);
> +};

This is not the usual naming convention for EFI structs, it should
either be 'struct efi_rng_protocol' or 'efi_rng_protocol_t'.

But apart from that, this patch looks fine.

Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 18/21] efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
@ 2016-01-21 15:42     ` Matt Fleming
  0 siblings, 0 replies; 207+ messages in thread
From: Matt Fleming @ 2016-01-21 15:42 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel,
	stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall

On Mon, 11 Jan, at 02:19:12PM, Ard Biesheuvel wrote:
> This exposes the firmware's implementation of EFI_RNG_PROTOCOL via a new
> function efi_get_random_bytes().
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  drivers/firmware/efi/libstub/Makefile  |  2 +-
>  drivers/firmware/efi/libstub/efistub.h |  3 ++
>  drivers/firmware/efi/libstub/random.c  | 35 ++++++++++++++++++++
>  include/linux/efi.h                    |  5 ++-
>  4 files changed, 43 insertions(+), 2 deletions(-)

[...]

> @@ -0,0 +1,35 @@
> +/*
> + * Copyright (C) 2016 Linaro Ltd;  <ard.biesheuvel@linaro.org>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + */
> +
> +#include <linux/efi.h>
> +#include <asm/efi.h>
> +
> +#include "efistub.h"
> +
> +struct efi_rng_protocol_t {
> +	efi_status_t (*get_info)(struct efi_rng_protocol_t *,
> +				 unsigned long *, efi_guid_t *);
> +	efi_status_t (*get_rng)(struct efi_rng_protocol_t *,
> +				efi_guid_t *, unsigned long, u8 *out);
> +};

This is not the usual naming convention for EFI structs, it should
either be 'struct efi_rng_protocol' or 'efi_rng_protocol_t'.

But apart from that, this patch looks fine.

Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 19/21] efi: stub: add implementation of efi_random_alloc()
  2016-01-11 13:19   ` Ard Biesheuvel
  (?)
@ 2016-01-21 16:10     ` Matt Fleming
  -1 siblings, 0 replies; 207+ messages in thread
From: Matt Fleming @ 2016-01-21 16:10 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel,
	stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall

On Mon, 11 Jan, at 02:19:13PM, Ard Biesheuvel wrote:
> This implements efi_random_alloc(), which allocates a chunk of memory of
> a certain size at a certain alignment, and uses the random_seed argument
> it receives to randomize the offset of the allocation.

s/offset/address/ ?

I see what you're getting at with the word "offset" but ultimately,
this is a memory allocation function, and it returns an address.

"offset" implies to me that the implementation allocates a larger
memory chunk than is required and returns an address that is >= the
start of the bigger-than-required-allocation.

> This is implemented by iterating over the UEFI memory map, counting the
> number of suitable slots (aligned offsets) within each region, and picking
> a random number between 0 and 'number of slots - 1' to select the slot,
> This should guarantee that each possible offset is chosen equally likely.
> 
> Suggested-by: Kees Cook <keescook@chromium.org>
> Cc: Matt Fleming <matt@codeblueprint.co.uk>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  drivers/firmware/efi/libstub/efistub.h |  4 +
>  drivers/firmware/efi/libstub/random.c  | 85 ++++++++++++++++++++
>  2 files changed, 89 insertions(+)
> 
> diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
> index 206b7252b9d1..7a38e29da53d 100644
> --- a/drivers/firmware/efi/libstub/efistub.h
> +++ b/drivers/firmware/efi/libstub/efistub.h
> @@ -46,4 +46,8 @@ void efi_get_virtmap(efi_memory_desc_t *memory_map, unsigned long map_size,
>  efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table,
>  				  unsigned long size, u8 *out);
>  
> +efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg,
> +			      unsigned long size, unsigned long align_bits,
> +			      unsigned long *addr, unsigned long random_seed);
> +
>  #endif
> diff --git a/drivers/firmware/efi/libstub/random.c b/drivers/firmware/efi/libstub/random.c
> index f539b1e31459..d4829824508c 100644
> --- a/drivers/firmware/efi/libstub/random.c
> +++ b/drivers/firmware/efi/libstub/random.c
> @@ -33,3 +33,88 @@ efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table,
>  
>  	return rng->get_rng(rng, NULL, size, out);
>  }
> +
> +/*
> + * Return a weight for a memory entry depending on how many offsets it covers
> + * that are suitably aligned and supply enough room for the allocation.
> + */
> +static unsigned long get_entry_weight(efi_memory_desc_t *md, unsigned long size,
> +				      unsigned long align_bits)
> +{
> +	u64 start, end;
> +
> +	if (md->type != EFI_CONVENTIONAL_MEMORY)
> +		return 0;
> +
> +	if (!(md->attribute & EFI_MEMORY_WB))
> +		return 0;

This could do with a comment. When would EFI_CONVENTIONAL_MEMORY not
have this attribute capability in the memory map?

> +
> +	start = round_up(md->phys_addr, 1 << align_bits);
> +	end = round_down(md->phys_addr + md->num_pages * EFI_PAGE_SIZE - size,
> +			 1 << align_bits);
> +
> +	if (start >= end)
> +		return 0;
> +
> +	return (end - start) >> align_bits;
> +}
> +
> +/*
> + * The UEFI memory descriptors have a virtual address field that is only used
> + * when installing the virtual mapping using SetVirtualAddressMap(). Since it
> + * is unused here, we can reuse it to keep track of each descriptor's weight.
> + */
> +#define MD_WEIGHT(md)	((md)->virt_addr)
> +
> +efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg,
> +			      unsigned long size, unsigned long align_bits,
> +			      unsigned long *addr, unsigned long random_seed)
> +{
> +	unsigned long map_size, desc_size, max_weight = 0, target;
> +	efi_memory_desc_t *memory_map;
> +	efi_status_t status = EFI_NOT_FOUND;
> +	int l;

Could you pick a more descriptive variable name?

> +
> +	status = efi_get_memory_map(sys_table_arg, &memory_map, &map_size,
> +				    &desc_size, NULL, NULL);
> +	if (status != EFI_SUCCESS)
> +		return status;
> +
> +	/* assign each entry in the memory map a weight */
> +	for (l = 0; l < map_size; l += desc_size) {
> +		efi_memory_desc_t *md = (void *)memory_map + l;
> +		unsigned long weight;
> +
> +		weight = get_entry_weight(md, size, align_bits);
> +		MD_WEIGHT(md) = weight;
> +		max_weight += weight;
> +	}
> +
> +	/* find a random number between 0 and max_weight */
> +	target = (max_weight * (u16)random_seed) >> 16;
> +
> +	/* find the entry whose accumulated weight covers the target */
> +	for (l = 0; l < map_size; l += desc_size) {
> +		efi_memory_desc_t *md = (void *)memory_map + l;
> +
> +		if (target < MD_WEIGHT(md)) {
> +			unsigned long pages;
> +
> +			*addr = round_up(md->phys_addr, 1 << align_bits) +
> +				(target << align_bits);
> +			pages = round_up(size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE;
> +
> +			status = efi_call_early(allocate_pages,
> +						EFI_ALLOCATE_ADDRESS,
> +						EFI_LOADER_DATA,
> +						pages,
> +						(efi_physical_addr_t *)addr);

You're mixing data types here. efi_physical_addr_t is always 64-bits,
but 'addr' is unsigned long, which is 32-bits on 32-bit platforms.
This cast isn't safe.

> +			break;
> +		}
> +		target -= MD_WEIGHT(md);

I think this needs a comment.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 19/21] efi: stub: add implementation of efi_random_alloc()
@ 2016-01-21 16:10     ` Matt Fleming
  0 siblings, 0 replies; 207+ messages in thread
From: Matt Fleming @ 2016-01-21 16:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 11 Jan, at 02:19:13PM, Ard Biesheuvel wrote:
> This implements efi_random_alloc(), which allocates a chunk of memory of
> a certain size at a certain alignment, and uses the random_seed argument
> it receives to randomize the offset of the allocation.

s/offset/address/ ?

I see what you're getting at with the word "offset" but ultimately,
this is a memory allocation function, and it returns an address.

"offset" implies to me that the implementation allocates a larger
memory chunk than is required and returns an address that is >= the
start of the bigger-than-required-allocation.

> This is implemented by iterating over the UEFI memory map, counting the
> number of suitable slots (aligned offsets) within each region, and picking
> a random number between 0 and 'number of slots - 1' to select the slot,
> This should guarantee that each possible offset is chosen equally likely.
> 
> Suggested-by: Kees Cook <keescook@chromium.org>
> Cc: Matt Fleming <matt@codeblueprint.co.uk>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  drivers/firmware/efi/libstub/efistub.h |  4 +
>  drivers/firmware/efi/libstub/random.c  | 85 ++++++++++++++++++++
>  2 files changed, 89 insertions(+)
> 
> diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
> index 206b7252b9d1..7a38e29da53d 100644
> --- a/drivers/firmware/efi/libstub/efistub.h
> +++ b/drivers/firmware/efi/libstub/efistub.h
> @@ -46,4 +46,8 @@ void efi_get_virtmap(efi_memory_desc_t *memory_map, unsigned long map_size,
>  efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table,
>  				  unsigned long size, u8 *out);
>  
> +efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg,
> +			      unsigned long size, unsigned long align_bits,
> +			      unsigned long *addr, unsigned long random_seed);
> +
>  #endif
> diff --git a/drivers/firmware/efi/libstub/random.c b/drivers/firmware/efi/libstub/random.c
> index f539b1e31459..d4829824508c 100644
> --- a/drivers/firmware/efi/libstub/random.c
> +++ b/drivers/firmware/efi/libstub/random.c
> @@ -33,3 +33,88 @@ efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table,
>  
>  	return rng->get_rng(rng, NULL, size, out);
>  }
> +
> +/*
> + * Return a weight for a memory entry depending on how many offsets it covers
> + * that are suitably aligned and supply enough room for the allocation.
> + */
> +static unsigned long get_entry_weight(efi_memory_desc_t *md, unsigned long size,
> +				      unsigned long align_bits)
> +{
> +	u64 start, end;
> +
> +	if (md->type != EFI_CONVENTIONAL_MEMORY)
> +		return 0;
> +
> +	if (!(md->attribute & EFI_MEMORY_WB))
> +		return 0;

This could do with a comment. When would EFI_CONVENTIONAL_MEMORY not
have this attribute capability in the memory map?

> +
> +	start = round_up(md->phys_addr, 1 << align_bits);
> +	end = round_down(md->phys_addr + md->num_pages * EFI_PAGE_SIZE - size,
> +			 1 << align_bits);
> +
> +	if (start >= end)
> +		return 0;
> +
> +	return (end - start) >> align_bits;
> +}
> +
> +/*
> + * The UEFI memory descriptors have a virtual address field that is only used
> + * when installing the virtual mapping using SetVirtualAddressMap(). Since it
> + * is unused here, we can reuse it to keep track of each descriptor's weight.
> + */
> +#define MD_WEIGHT(md)	((md)->virt_addr)
> +
> +efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg,
> +			      unsigned long size, unsigned long align_bits,
> +			      unsigned long *addr, unsigned long random_seed)
> +{
> +	unsigned long map_size, desc_size, max_weight = 0, target;
> +	efi_memory_desc_t *memory_map;
> +	efi_status_t status = EFI_NOT_FOUND;
> +	int l;

Could you pick a more descriptive variable name?

> +
> +	status = efi_get_memory_map(sys_table_arg, &memory_map, &map_size,
> +				    &desc_size, NULL, NULL);
> +	if (status != EFI_SUCCESS)
> +		return status;
> +
> +	/* assign each entry in the memory map a weight */
> +	for (l = 0; l < map_size; l += desc_size) {
> +		efi_memory_desc_t *md = (void *)memory_map + l;
> +		unsigned long weight;
> +
> +		weight = get_entry_weight(md, size, align_bits);
> +		MD_WEIGHT(md) = weight;
> +		max_weight += weight;
> +	}
> +
> +	/* find a random number between 0 and max_weight */
> +	target = (max_weight * (u16)random_seed) >> 16;
> +
> +	/* find the entry whose accumulated weight covers the target */
> +	for (l = 0; l < map_size; l += desc_size) {
> +		efi_memory_desc_t *md = (void *)memory_map + l;
> +
> +		if (target < MD_WEIGHT(md)) {
> +			unsigned long pages;
> +
> +			*addr = round_up(md->phys_addr, 1 << align_bits) +
> +				(target << align_bits);
> +			pages = round_up(size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE;
> +
> +			status = efi_call_early(allocate_pages,
> +						EFI_ALLOCATE_ADDRESS,
> +						EFI_LOADER_DATA,
> +						pages,
> +						(efi_physical_addr_t *)addr);

You're mixing data types here. efi_physical_addr_t is always 64-bits,
but 'addr' is unsigned long, which is 32-bits on 32-bit platforms.
This cast isn't safe.

> +			break;
> +		}
> +		target -= MD_WEIGHT(md);

I think this needs a comment.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 19/21] efi: stub: add implementation of efi_random_alloc()
@ 2016-01-21 16:10     ` Matt Fleming
  0 siblings, 0 replies; 207+ messages in thread
From: Matt Fleming @ 2016-01-21 16:10 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel,
	stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall

On Mon, 11 Jan, at 02:19:13PM, Ard Biesheuvel wrote:
> This implements efi_random_alloc(), which allocates a chunk of memory of
> a certain size at a certain alignment, and uses the random_seed argument
> it receives to randomize the offset of the allocation.

s/offset/address/ ?

I see what you're getting at with the word "offset" but ultimately,
this is a memory allocation function, and it returns an address.

"offset" implies to me that the implementation allocates a larger
memory chunk than is required and returns an address that is >= the
start of the bigger-than-required-allocation.

> This is implemented by iterating over the UEFI memory map, counting the
> number of suitable slots (aligned offsets) within each region, and picking
> a random number between 0 and 'number of slots - 1' to select the slot,
> This should guarantee that each possible offset is chosen equally likely.
> 
> Suggested-by: Kees Cook <keescook@chromium.org>
> Cc: Matt Fleming <matt@codeblueprint.co.uk>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  drivers/firmware/efi/libstub/efistub.h |  4 +
>  drivers/firmware/efi/libstub/random.c  | 85 ++++++++++++++++++++
>  2 files changed, 89 insertions(+)
> 
> diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
> index 206b7252b9d1..7a38e29da53d 100644
> --- a/drivers/firmware/efi/libstub/efistub.h
> +++ b/drivers/firmware/efi/libstub/efistub.h
> @@ -46,4 +46,8 @@ void efi_get_virtmap(efi_memory_desc_t *memory_map, unsigned long map_size,
>  efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table,
>  				  unsigned long size, u8 *out);
>  
> +efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg,
> +			      unsigned long size, unsigned long align_bits,
> +			      unsigned long *addr, unsigned long random_seed);
> +
>  #endif
> diff --git a/drivers/firmware/efi/libstub/random.c b/drivers/firmware/efi/libstub/random.c
> index f539b1e31459..d4829824508c 100644
> --- a/drivers/firmware/efi/libstub/random.c
> +++ b/drivers/firmware/efi/libstub/random.c
> @@ -33,3 +33,88 @@ efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table,
>  
>  	return rng->get_rng(rng, NULL, size, out);
>  }
> +
> +/*
> + * Return a weight for a memory entry depending on how many offsets it covers
> + * that are suitably aligned and supply enough room for the allocation.
> + */
> +static unsigned long get_entry_weight(efi_memory_desc_t *md, unsigned long size,
> +				      unsigned long align_bits)
> +{
> +	u64 start, end;
> +
> +	if (md->type != EFI_CONVENTIONAL_MEMORY)
> +		return 0;
> +
> +	if (!(md->attribute & EFI_MEMORY_WB))
> +		return 0;

This could do with a comment. When would EFI_CONVENTIONAL_MEMORY not
have this attribute capability in the memory map?

> +
> +	start = round_up(md->phys_addr, 1 << align_bits);
> +	end = round_down(md->phys_addr + md->num_pages * EFI_PAGE_SIZE - size,
> +			 1 << align_bits);
> +
> +	if (start >= end)
> +		return 0;
> +
> +	return (end - start) >> align_bits;
> +}
> +
> +/*
> + * The UEFI memory descriptors have a virtual address field that is only used
> + * when installing the virtual mapping using SetVirtualAddressMap(). Since it
> + * is unused here, we can reuse it to keep track of each descriptor's weight.
> + */
> +#define MD_WEIGHT(md)	((md)->virt_addr)
> +
> +efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg,
> +			      unsigned long size, unsigned long align_bits,
> +			      unsigned long *addr, unsigned long random_seed)
> +{
> +	unsigned long map_size, desc_size, max_weight = 0, target;
> +	efi_memory_desc_t *memory_map;
> +	efi_status_t status = EFI_NOT_FOUND;
> +	int l;

Could you pick a more descriptive variable name?

> +
> +	status = efi_get_memory_map(sys_table_arg, &memory_map, &map_size,
> +				    &desc_size, NULL, NULL);
> +	if (status != EFI_SUCCESS)
> +		return status;
> +
> +	/* assign each entry in the memory map a weight */
> +	for (l = 0; l < map_size; l += desc_size) {
> +		efi_memory_desc_t *md = (void *)memory_map + l;
> +		unsigned long weight;
> +
> +		weight = get_entry_weight(md, size, align_bits);
> +		MD_WEIGHT(md) = weight;
> +		max_weight += weight;
> +	}
> +
> +	/* find a random number between 0 and max_weight */
> +	target = (max_weight * (u16)random_seed) >> 16;
> +
> +	/* find the entry whose accumulated weight covers the target */
> +	for (l = 0; l < map_size; l += desc_size) {
> +		efi_memory_desc_t *md = (void *)memory_map + l;
> +
> +		if (target < MD_WEIGHT(md)) {
> +			unsigned long pages;
> +
> +			*addr = round_up(md->phys_addr, 1 << align_bits) +
> +				(target << align_bits);
> +			pages = round_up(size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE;
> +
> +			status = efi_call_early(allocate_pages,
> +						EFI_ALLOCATE_ADDRESS,
> +						EFI_LOADER_DATA,
> +						pages,
> +						(efi_physical_addr_t *)addr);

You're mixing data types here. efi_physical_addr_t is always 64-bits,
but 'addr' is unsigned long, which is 32-bits on 32-bit platforms.
This cast isn't safe.

> +			break;
> +		}
> +		target -= MD_WEIGHT(md);

I think this needs a comment.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 18/21] efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
  2016-01-21 15:42     ` Matt Fleming
  (?)
@ 2016-01-21 16:12       ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-21 16:12 UTC (permalink / raw)
  To: Matt Fleming
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Mark Rutland, Leif Lindholm, Kees Cook, linux-kernel,
	Stuart Yoder, Sharma Bhupesh, Arnd Bergmann, Marc Zyngier,
	Christoffer Dall

On 21 January 2016 at 16:42, Matt Fleming <matt@codeblueprint.co.uk> wrote:
> On Mon, 11 Jan, at 02:19:12PM, Ard Biesheuvel wrote:
>> This exposes the firmware's implementation of EFI_RNG_PROTOCOL via a new
>> function efi_get_random_bytes().
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  drivers/firmware/efi/libstub/Makefile  |  2 +-
>>  drivers/firmware/efi/libstub/efistub.h |  3 ++
>>  drivers/firmware/efi/libstub/random.c  | 35 ++++++++++++++++++++
>>  include/linux/efi.h                    |  5 ++-
>>  4 files changed, 43 insertions(+), 2 deletions(-)
>
> [...]
>
>> @@ -0,0 +1,35 @@
>> +/*
>> + * Copyright (C) 2016 Linaro Ltd;  <ard.biesheuvel@linaro.org>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + */
>> +
>> +#include <linux/efi.h>
>> +#include <asm/efi.h>
>> +
>> +#include "efistub.h"
>> +
>> +struct efi_rng_protocol_t {
>> +     efi_status_t (*get_info)(struct efi_rng_protocol_t *,
>> +                              unsigned long *, efi_guid_t *);
>> +     efi_status_t (*get_rng)(struct efi_rng_protocol_t *,
>> +                             efi_guid_t *, unsigned long, u8 *out);
>> +};
>
> This is not the usual naming convention for EFI structs, it should
> either be 'struct efi_rng_protocol' or 'efi_rng_protocol_t'.
>

OK, I will change that.

> But apart from that, this patch looks fine.
>
> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>

Thanks

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 18/21] efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
@ 2016-01-21 16:12       ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-21 16:12 UTC (permalink / raw)
  To: linux-arm-kernel

On 21 January 2016 at 16:42, Matt Fleming <matt@codeblueprint.co.uk> wrote:
> On Mon, 11 Jan, at 02:19:12PM, Ard Biesheuvel wrote:
>> This exposes the firmware's implementation of EFI_RNG_PROTOCOL via a new
>> function efi_get_random_bytes().
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  drivers/firmware/efi/libstub/Makefile  |  2 +-
>>  drivers/firmware/efi/libstub/efistub.h |  3 ++
>>  drivers/firmware/efi/libstub/random.c  | 35 ++++++++++++++++++++
>>  include/linux/efi.h                    |  5 ++-
>>  4 files changed, 43 insertions(+), 2 deletions(-)
>
> [...]
>
>> @@ -0,0 +1,35 @@
>> +/*
>> + * Copyright (C) 2016 Linaro Ltd;  <ard.biesheuvel@linaro.org>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + */
>> +
>> +#include <linux/efi.h>
>> +#include <asm/efi.h>
>> +
>> +#include "efistub.h"
>> +
>> +struct efi_rng_protocol_t {
>> +     efi_status_t (*get_info)(struct efi_rng_protocol_t *,
>> +                              unsigned long *, efi_guid_t *);
>> +     efi_status_t (*get_rng)(struct efi_rng_protocol_t *,
>> +                             efi_guid_t *, unsigned long, u8 *out);
>> +};
>
> This is not the usual naming convention for EFI structs, it should
> either be 'struct efi_rng_protocol' or 'efi_rng_protocol_t'.
>

OK, I will change that.

> But apart from that, this patch looks fine.
>
> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>

Thanks

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 18/21] efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
@ 2016-01-21 16:12       ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-21 16:12 UTC (permalink / raw)
  To: Matt Fleming
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Mark Rutland, Leif Lindholm, Kees Cook, linux-kernel,
	Stuart Yoder, Sharma Bhupesh, Arnd Bergmann, Marc Zyngier,
	Christoffer Dall

On 21 January 2016 at 16:42, Matt Fleming <matt@codeblueprint.co.uk> wrote:
> On Mon, 11 Jan, at 02:19:12PM, Ard Biesheuvel wrote:
>> This exposes the firmware's implementation of EFI_RNG_PROTOCOL via a new
>> function efi_get_random_bytes().
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  drivers/firmware/efi/libstub/Makefile  |  2 +-
>>  drivers/firmware/efi/libstub/efistub.h |  3 ++
>>  drivers/firmware/efi/libstub/random.c  | 35 ++++++++++++++++++++
>>  include/linux/efi.h                    |  5 ++-
>>  4 files changed, 43 insertions(+), 2 deletions(-)
>
> [...]
>
>> @@ -0,0 +1,35 @@
>> +/*
>> + * Copyright (C) 2016 Linaro Ltd;  <ard.biesheuvel@linaro.org>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + */
>> +
>> +#include <linux/efi.h>
>> +#include <asm/efi.h>
>> +
>> +#include "efistub.h"
>> +
>> +struct efi_rng_protocol_t {
>> +     efi_status_t (*get_info)(struct efi_rng_protocol_t *,
>> +                              unsigned long *, efi_guid_t *);
>> +     efi_status_t (*get_rng)(struct efi_rng_protocol_t *,
>> +                             efi_guid_t *, unsigned long, u8 *out);
>> +};
>
> This is not the usual naming convention for EFI structs, it should
> either be 'struct efi_rng_protocol' or 'efi_rng_protocol_t'.
>

OK, I will change that.

> But apart from that, this patch looks fine.
>
> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>

Thanks

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 19/21] efi: stub: add implementation of efi_random_alloc()
  2016-01-21 16:10     ` Matt Fleming
  (?)
@ 2016-01-21 16:16       ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-21 16:16 UTC (permalink / raw)
  To: Matt Fleming
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Mark Rutland, Leif Lindholm, Kees Cook, linux-kernel,
	Stuart Yoder, Sharma Bhupesh, Arnd Bergmann, Marc Zyngier,
	Christoffer Dall

On 21 January 2016 at 17:10, Matt Fleming <matt@codeblueprint.co.uk> wrote:
> On Mon, 11 Jan, at 02:19:13PM, Ard Biesheuvel wrote:
>> This implements efi_random_alloc(), which allocates a chunk of memory of
>> a certain size at a certain alignment, and uses the random_seed argument
>> it receives to randomize the offset of the allocation.
>
> s/offset/address/ ?
>
> I see what you're getting at with the word "offset" but ultimately,
> this is a memory allocation function, and it returns an address.
>
> "offset" implies to me that the implementation allocates a larger
> memory chunk than is required and returns an address that is >= the
> start of the bigger-than-required-allocation.
>

Well, offset is horribly overloaded in our world, so let's stick with 'address'

>> This is implemented by iterating over the UEFI memory map, counting the
>> number of suitable slots (aligned offsets) within each region, and picking
>> a random number between 0 and 'number of slots - 1' to select the slot,
>> This should guarantee that each possible offset is chosen equally likely.
>>
>> Suggested-by: Kees Cook <keescook@chromium.org>
>> Cc: Matt Fleming <matt@codeblueprint.co.uk>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  drivers/firmware/efi/libstub/efistub.h |  4 +
>>  drivers/firmware/efi/libstub/random.c  | 85 ++++++++++++++++++++
>>  2 files changed, 89 insertions(+)
>>
>> diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
>> index 206b7252b9d1..7a38e29da53d 100644
>> --- a/drivers/firmware/efi/libstub/efistub.h
>> +++ b/drivers/firmware/efi/libstub/efistub.h
>> @@ -46,4 +46,8 @@ void efi_get_virtmap(efi_memory_desc_t *memory_map, unsigned long map_size,
>>  efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table,
>>                                 unsigned long size, u8 *out);
>>
>> +efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg,
>> +                           unsigned long size, unsigned long align_bits,
>> +                           unsigned long *addr, unsigned long random_seed);
>> +
>>  #endif
>> diff --git a/drivers/firmware/efi/libstub/random.c b/drivers/firmware/efi/libstub/random.c
>> index f539b1e31459..d4829824508c 100644
>> --- a/drivers/firmware/efi/libstub/random.c
>> +++ b/drivers/firmware/efi/libstub/random.c
>> @@ -33,3 +33,88 @@ efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table,
>>
>>       return rng->get_rng(rng, NULL, size, out);
>>  }
>> +
>> +/*
>> + * Return a weight for a memory entry depending on how many offsets it covers
>> + * that are suitably aligned and supply enough room for the allocation.
>> + */
>> +static unsigned long get_entry_weight(efi_memory_desc_t *md, unsigned long size,
>> +                                   unsigned long align_bits)
>> +{
>> +     u64 start, end;
>> +
>> +     if (md->type != EFI_CONVENTIONAL_MEMORY)
>> +             return 0;
>> +
>> +     if (!(md->attribute & EFI_MEMORY_WB))
>> +             return 0;
>
> This could do with a comment. When would EFI_CONVENTIONAL_MEMORY not
> have this attribute capability in the memory map?
>

Actually, I think I should drop it instead. The other alloc functions
only check for EFI_CONVENTIONAL_MEMORY, and this is intended to be
generic code. Also, I have never seen a system with
EFI_CONVENTIONAL_MEMORY with the WB bit cleared.

>> +
>> +     start = round_up(md->phys_addr, 1 << align_bits);
>> +     end = round_down(md->phys_addr + md->num_pages * EFI_PAGE_SIZE - size,
>> +                      1 << align_bits);
>> +
>> +     if (start >= end)
>> +             return 0;
>> +
>> +     return (end - start) >> align_bits;
>> +}
>> +
>> +/*
>> + * The UEFI memory descriptors have a virtual address field that is only used
>> + * when installing the virtual mapping using SetVirtualAddressMap(). Since it
>> + * is unused here, we can reuse it to keep track of each descriptor's weight.
>> + */
>> +#define MD_WEIGHT(md)        ((md)->virt_addr)
>> +
>> +efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg,
>> +                           unsigned long size, unsigned long align_bits,
>> +                           unsigned long *addr, unsigned long random_seed)
>> +{
>> +     unsigned long map_size, desc_size, max_weight = 0, target;
>> +     efi_memory_desc_t *memory_map;
>> +     efi_status_t status = EFI_NOT_FOUND;
>> +     int l;
>
> Could you pick a more descriptive variable name?
>

Sure :-)

>> +
>> +     status = efi_get_memory_map(sys_table_arg, &memory_map, &map_size,
>> +                                 &desc_size, NULL, NULL);
>> +     if (status != EFI_SUCCESS)
>> +             return status;
>> +
>> +     /* assign each entry in the memory map a weight */
>> +     for (l = 0; l < map_size; l += desc_size) {
>> +             efi_memory_desc_t *md = (void *)memory_map + l;
>> +             unsigned long weight;
>> +
>> +             weight = get_entry_weight(md, size, align_bits);
>> +             MD_WEIGHT(md) = weight;
>> +             max_weight += weight;
>> +     }
>> +
>> +     /* find a random number between 0 and max_weight */
>> +     target = (max_weight * (u16)random_seed) >> 16;
>> +
>> +     /* find the entry whose accumulated weight covers the target */
>> +     for (l = 0; l < map_size; l += desc_size) {
>> +             efi_memory_desc_t *md = (void *)memory_map + l;
>> +
>> +             if (target < MD_WEIGHT(md)) {
>> +                     unsigned long pages;
>> +
>> +                     *addr = round_up(md->phys_addr, 1 << align_bits) +
>> +                             (target << align_bits);
>> +                     pages = round_up(size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE;
>> +
>> +                     status = efi_call_early(allocate_pages,
>> +                                             EFI_ALLOCATE_ADDRESS,
>> +                                             EFI_LOADER_DATA,
>> +                                             pages,
>> +                                             (efi_physical_addr_t *)addr);
>
> You're mixing data types here. efi_physical_addr_t is always 64-bits,
> but 'addr' is unsigned long, which is 32-bits on 32-bit platforms.
> This cast isn't safe.
>

OK, I will fix that.

>> +                     break;
>> +             }
>> +             target -= MD_WEIGHT(md);
>
> I think this needs a comment.

Sure. Note that in my local version, I already replaced max_weight
with total_weight since it wasn't entirely accurate. So I'll try to
pick a better name for target as well.

Thanks,
Ard.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 19/21] efi: stub: add implementation of efi_random_alloc()
@ 2016-01-21 16:16       ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-21 16:16 UTC (permalink / raw)
  To: linux-arm-kernel

On 21 January 2016 at 17:10, Matt Fleming <matt@codeblueprint.co.uk> wrote:
> On Mon, 11 Jan, at 02:19:13PM, Ard Biesheuvel wrote:
>> This implements efi_random_alloc(), which allocates a chunk of memory of
>> a certain size at a certain alignment, and uses the random_seed argument
>> it receives to randomize the offset of the allocation.
>
> s/offset/address/ ?
>
> I see what you're getting at with the word "offset" but ultimately,
> this is a memory allocation function, and it returns an address.
>
> "offset" implies to me that the implementation allocates a larger
> memory chunk than is required and returns an address that is >= the
> start of the bigger-than-required-allocation.
>

Well, offset is horribly overloaded in our world, so let's stick with 'address'

>> This is implemented by iterating over the UEFI memory map, counting the
>> number of suitable slots (aligned offsets) within each region, and picking
>> a random number between 0 and 'number of slots - 1' to select the slot,
>> This should guarantee that each possible offset is chosen equally likely.
>>
>> Suggested-by: Kees Cook <keescook@chromium.org>
>> Cc: Matt Fleming <matt@codeblueprint.co.uk>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  drivers/firmware/efi/libstub/efistub.h |  4 +
>>  drivers/firmware/efi/libstub/random.c  | 85 ++++++++++++++++++++
>>  2 files changed, 89 insertions(+)
>>
>> diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
>> index 206b7252b9d1..7a38e29da53d 100644
>> --- a/drivers/firmware/efi/libstub/efistub.h
>> +++ b/drivers/firmware/efi/libstub/efistub.h
>> @@ -46,4 +46,8 @@ void efi_get_virtmap(efi_memory_desc_t *memory_map, unsigned long map_size,
>>  efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table,
>>                                 unsigned long size, u8 *out);
>>
>> +efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg,
>> +                           unsigned long size, unsigned long align_bits,
>> +                           unsigned long *addr, unsigned long random_seed);
>> +
>>  #endif
>> diff --git a/drivers/firmware/efi/libstub/random.c b/drivers/firmware/efi/libstub/random.c
>> index f539b1e31459..d4829824508c 100644
>> --- a/drivers/firmware/efi/libstub/random.c
>> +++ b/drivers/firmware/efi/libstub/random.c
>> @@ -33,3 +33,88 @@ efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table,
>>
>>       return rng->get_rng(rng, NULL, size, out);
>>  }
>> +
>> +/*
>> + * Return a weight for a memory entry depending on how many offsets it covers
>> + * that are suitably aligned and supply enough room for the allocation.
>> + */
>> +static unsigned long get_entry_weight(efi_memory_desc_t *md, unsigned long size,
>> +                                   unsigned long align_bits)
>> +{
>> +     u64 start, end;
>> +
>> +     if (md->type != EFI_CONVENTIONAL_MEMORY)
>> +             return 0;
>> +
>> +     if (!(md->attribute & EFI_MEMORY_WB))
>> +             return 0;
>
> This could do with a comment. When would EFI_CONVENTIONAL_MEMORY not
> have this attribute capability in the memory map?
>

Actually, I think I should drop it instead. The other alloc functions
only check for EFI_CONVENTIONAL_MEMORY, and this is intended to be
generic code. Also, I have never seen a system with
EFI_CONVENTIONAL_MEMORY with the WB bit cleared.

>> +
>> +     start = round_up(md->phys_addr, 1 << align_bits);
>> +     end = round_down(md->phys_addr + md->num_pages * EFI_PAGE_SIZE - size,
>> +                      1 << align_bits);
>> +
>> +     if (start >= end)
>> +             return 0;
>> +
>> +     return (end - start) >> align_bits;
>> +}
>> +
>> +/*
>> + * The UEFI memory descriptors have a virtual address field that is only used
>> + * when installing the virtual mapping using SetVirtualAddressMap(). Since it
>> + * is unused here, we can reuse it to keep track of each descriptor's weight.
>> + */
>> +#define MD_WEIGHT(md)        ((md)->virt_addr)
>> +
>> +efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg,
>> +                           unsigned long size, unsigned long align_bits,
>> +                           unsigned long *addr, unsigned long random_seed)
>> +{
>> +     unsigned long map_size, desc_size, max_weight = 0, target;
>> +     efi_memory_desc_t *memory_map;
>> +     efi_status_t status = EFI_NOT_FOUND;
>> +     int l;
>
> Could you pick a more descriptive variable name?
>

Sure :-)

>> +
>> +     status = efi_get_memory_map(sys_table_arg, &memory_map, &map_size,
>> +                                 &desc_size, NULL, NULL);
>> +     if (status != EFI_SUCCESS)
>> +             return status;
>> +
>> +     /* assign each entry in the memory map a weight */
>> +     for (l = 0; l < map_size; l += desc_size) {
>> +             efi_memory_desc_t *md = (void *)memory_map + l;
>> +             unsigned long weight;
>> +
>> +             weight = get_entry_weight(md, size, align_bits);
>> +             MD_WEIGHT(md) = weight;
>> +             max_weight += weight;
>> +     }
>> +
>> +     /* find a random number between 0 and max_weight */
>> +     target = (max_weight * (u16)random_seed) >> 16;
>> +
>> +     /* find the entry whose accumulated weight covers the target */
>> +     for (l = 0; l < map_size; l += desc_size) {
>> +             efi_memory_desc_t *md = (void *)memory_map + l;
>> +
>> +             if (target < MD_WEIGHT(md)) {
>> +                     unsigned long pages;
>> +
>> +                     *addr = round_up(md->phys_addr, 1 << align_bits) +
>> +                             (target << align_bits);
>> +                     pages = round_up(size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE;
>> +
>> +                     status = efi_call_early(allocate_pages,
>> +                                             EFI_ALLOCATE_ADDRESS,
>> +                                             EFI_LOADER_DATA,
>> +                                             pages,
>> +                                             (efi_physical_addr_t *)addr);
>
> You're mixing data types here. efi_physical_addr_t is always 64-bits,
> but 'addr' is unsigned long, which is 32-bits on 32-bit platforms.
> This cast isn't safe.
>

OK, I will fix that.

>> +                     break;
>> +             }
>> +             target -= MD_WEIGHT(md);
>
> I think this needs a comment.

Sure. Note that in my local version, I already replaced max_weight
with total_weight since it wasn't entirely accurate. So I'll try to
pick a better name for target as well.

Thanks,
Ard.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 19/21] efi: stub: add implementation of efi_random_alloc()
@ 2016-01-21 16:16       ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-21 16:16 UTC (permalink / raw)
  To: Matt Fleming
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Mark Rutland, Leif Lindholm, Kees Cook, linux-kernel,
	Stuart Yoder, Sharma Bhupesh, Arnd Bergmann, Marc Zyngier,
	Christoffer Dall

On 21 January 2016 at 17:10, Matt Fleming <matt@codeblueprint.co.uk> wrote:
> On Mon, 11 Jan, at 02:19:13PM, Ard Biesheuvel wrote:
>> This implements efi_random_alloc(), which allocates a chunk of memory of
>> a certain size at a certain alignment, and uses the random_seed argument
>> it receives to randomize the offset of the allocation.
>
> s/offset/address/ ?
>
> I see what you're getting at with the word "offset" but ultimately,
> this is a memory allocation function, and it returns an address.
>
> "offset" implies to me that the implementation allocates a larger
> memory chunk than is required and returns an address that is >= the
> start of the bigger-than-required-allocation.
>

Well, offset is horribly overloaded in our world, so let's stick with 'address'

>> This is implemented by iterating over the UEFI memory map, counting the
>> number of suitable slots (aligned offsets) within each region, and picking
>> a random number between 0 and 'number of slots - 1' to select the slot,
>> This should guarantee that each possible offset is chosen equally likely.
>>
>> Suggested-by: Kees Cook <keescook@chromium.org>
>> Cc: Matt Fleming <matt@codeblueprint.co.uk>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  drivers/firmware/efi/libstub/efistub.h |  4 +
>>  drivers/firmware/efi/libstub/random.c  | 85 ++++++++++++++++++++
>>  2 files changed, 89 insertions(+)
>>
>> diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
>> index 206b7252b9d1..7a38e29da53d 100644
>> --- a/drivers/firmware/efi/libstub/efistub.h
>> +++ b/drivers/firmware/efi/libstub/efistub.h
>> @@ -46,4 +46,8 @@ void efi_get_virtmap(efi_memory_desc_t *memory_map, unsigned long map_size,
>>  efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table,
>>                                 unsigned long size, u8 *out);
>>
>> +efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg,
>> +                           unsigned long size, unsigned long align_bits,
>> +                           unsigned long *addr, unsigned long random_seed);
>> +
>>  #endif
>> diff --git a/drivers/firmware/efi/libstub/random.c b/drivers/firmware/efi/libstub/random.c
>> index f539b1e31459..d4829824508c 100644
>> --- a/drivers/firmware/efi/libstub/random.c
>> +++ b/drivers/firmware/efi/libstub/random.c
>> @@ -33,3 +33,88 @@ efi_status_t efi_get_random_bytes(efi_system_table_t *sys_table,
>>
>>       return rng->get_rng(rng, NULL, size, out);
>>  }
>> +
>> +/*
>> + * Return a weight for a memory entry depending on how many offsets it covers
>> + * that are suitably aligned and supply enough room for the allocation.
>> + */
>> +static unsigned long get_entry_weight(efi_memory_desc_t *md, unsigned long size,
>> +                                   unsigned long align_bits)
>> +{
>> +     u64 start, end;
>> +
>> +     if (md->type != EFI_CONVENTIONAL_MEMORY)
>> +             return 0;
>> +
>> +     if (!(md->attribute & EFI_MEMORY_WB))
>> +             return 0;
>
> This could do with a comment. When would EFI_CONVENTIONAL_MEMORY not
> have this attribute capability in the memory map?
>

Actually, I think I should drop it instead. The other alloc functions
only check for EFI_CONVENTIONAL_MEMORY, and this is intended to be
generic code. Also, I have never seen a system with
EFI_CONVENTIONAL_MEMORY with the WB bit cleared.

>> +
>> +     start = round_up(md->phys_addr, 1 << align_bits);
>> +     end = round_down(md->phys_addr + md->num_pages * EFI_PAGE_SIZE - size,
>> +                      1 << align_bits);
>> +
>> +     if (start >= end)
>> +             return 0;
>> +
>> +     return (end - start) >> align_bits;
>> +}
>> +
>> +/*
>> + * The UEFI memory descriptors have a virtual address field that is only used
>> + * when installing the virtual mapping using SetVirtualAddressMap(). Since it
>> + * is unused here, we can reuse it to keep track of each descriptor's weight.
>> + */
>> +#define MD_WEIGHT(md)        ((md)->virt_addr)
>> +
>> +efi_status_t efi_random_alloc(efi_system_table_t *sys_table_arg,
>> +                           unsigned long size, unsigned long align_bits,
>> +                           unsigned long *addr, unsigned long random_seed)
>> +{
>> +     unsigned long map_size, desc_size, max_weight = 0, target;
>> +     efi_memory_desc_t *memory_map;
>> +     efi_status_t status = EFI_NOT_FOUND;
>> +     int l;
>
> Could you pick a more descriptive variable name?
>

Sure :-)

>> +
>> +     status = efi_get_memory_map(sys_table_arg, &memory_map, &map_size,
>> +                                 &desc_size, NULL, NULL);
>> +     if (status != EFI_SUCCESS)
>> +             return status;
>> +
>> +     /* assign each entry in the memory map a weight */
>> +     for (l = 0; l < map_size; l += desc_size) {
>> +             efi_memory_desc_t *md = (void *)memory_map + l;
>> +             unsigned long weight;
>> +
>> +             weight = get_entry_weight(md, size, align_bits);
>> +             MD_WEIGHT(md) = weight;
>> +             max_weight += weight;
>> +     }
>> +
>> +     /* find a random number between 0 and max_weight */
>> +     target = (max_weight * (u16)random_seed) >> 16;
>> +
>> +     /* find the entry whose accumulated weight covers the target */
>> +     for (l = 0; l < map_size; l += desc_size) {
>> +             efi_memory_desc_t *md = (void *)memory_map + l;
>> +
>> +             if (target < MD_WEIGHT(md)) {
>> +                     unsigned long pages;
>> +
>> +                     *addr = round_up(md->phys_addr, 1 << align_bits) +
>> +                             (target << align_bits);
>> +                     pages = round_up(size, EFI_PAGE_SIZE) / EFI_PAGE_SIZE;
>> +
>> +                     status = efi_call_early(allocate_pages,
>> +                                             EFI_ALLOCATE_ADDRESS,
>> +                                             EFI_LOADER_DATA,
>> +                                             pages,
>> +                                             (efi_physical_addr_t *)addr);
>
> You're mixing data types here. efi_physical_addr_t is always 64-bits,
> but 'addr' is unsigned long, which is 32-bits on 32-bit platforms.
> This cast isn't safe.
>

OK, I will fix that.

>> +                     break;
>> +             }
>> +             target -= MD_WEIGHT(md);
>
> I think this needs a comment.

Sure. Note that in my local version, I already replaced max_weight
with total_weight since it wasn't entirely accurate. So I'll try to
pick a better name for target as well.

Thanks,
Ard.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 20/21] efi: stub: use high allocation for converted command line
  2016-01-11 13:19   ` Ard Biesheuvel
  (?)
@ 2016-01-21 16:20     ` Matt Fleming
  -1 siblings, 0 replies; 207+ messages in thread
From: Matt Fleming @ 2016-01-21 16:20 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel,
	stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall

On Mon, 11 Jan, at 02:19:14PM, Ard Biesheuvel wrote:
> Before we can move the command line processing before the allocation
> of the kernel, which is required for detecting the 'nokaslr' option
> which controls that allocation, move the converted command line higher
> up in memory, to prevent it from interfering with the kernel itself.
> 
> Since x86 needs the address to fit in 32 bits, use UINT_MAX as the upper
> bound there. Otherwise, use ULONG_MAX (i.e., no limit)
> 
> Cc: Matt Fleming <matt@codeblueprint.co.uk>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/x86/include/asm/efi.h                     |  2 ++
>  drivers/firmware/efi/libstub/efi-stub-helper.c | 14 +++++++++++++-
>  2 files changed, 15 insertions(+), 1 deletion(-)

Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 20/21] efi: stub: use high allocation for converted command line
@ 2016-01-21 16:20     ` Matt Fleming
  0 siblings, 0 replies; 207+ messages in thread
From: Matt Fleming @ 2016-01-21 16:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 11 Jan, at 02:19:14PM, Ard Biesheuvel wrote:
> Before we can move the command line processing before the allocation
> of the kernel, which is required for detecting the 'nokaslr' option
> which controls that allocation, move the converted command line higher
> up in memory, to prevent it from interfering with the kernel itself.
> 
> Since x86 needs the address to fit in 32 bits, use UINT_MAX as the upper
> bound there. Otherwise, use ULONG_MAX (i.e., no limit)
> 
> Cc: Matt Fleming <matt@codeblueprint.co.uk>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/x86/include/asm/efi.h                     |  2 ++
>  drivers/firmware/efi/libstub/efi-stub-helper.c | 14 +++++++++++++-
>  2 files changed, 15 insertions(+), 1 deletion(-)

Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 20/21] efi: stub: use high allocation for converted command line
@ 2016-01-21 16:20     ` Matt Fleming
  0 siblings, 0 replies; 207+ messages in thread
From: Matt Fleming @ 2016-01-21 16:20 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel,
	stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall

On Mon, 11 Jan, at 02:19:14PM, Ard Biesheuvel wrote:
> Before we can move the command line processing before the allocation
> of the kernel, which is required for detecting the 'nokaslr' option
> which controls that allocation, move the converted command line higher
> up in memory, to prevent it from interfering with the kernel itself.
> 
> Since x86 needs the address to fit in 32 bits, use UINT_MAX as the upper
> bound there. Otherwise, use ULONG_MAX (i.e., no limit)
> 
> Cc: Matt Fleming <matt@codeblueprint.co.uk>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/x86/include/asm/efi.h                     |  2 ++
>  drivers/firmware/efi/libstub/efi-stub-helper.c | 14 +++++++++++++-
>  2 files changed, 15 insertions(+), 1 deletion(-)

Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 21/21] arm64: efi: invoke EFI_RNG_PROTOCOL to supply KASLR randomness
  2016-01-11 13:19   ` Ard Biesheuvel
  (?)
@ 2016-01-21 16:31     ` Matt Fleming
  -1 siblings, 0 replies; 207+ messages in thread
From: Matt Fleming @ 2016-01-21 16:31 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel,
	stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall

On Mon, 11 Jan, at 02:19:15PM, Ard Biesheuvel wrote:
> Since arm64 does not use a decompressor that supplies an execution
> environment where it is feasible to some extent to provide a source of
> randomness, the arm64 KASLR kernel depends on the bootloader to supply
> some random bits in register x1 upon kernel entry.
> 
> On UEFI systems, we can use the EFI_RNG_PROTOCOL, if supplied, to obtain
> some random bits. At the same time, use it to randomize the offset of the
> kernel Image in physical memory.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/Kconfig                             |  5 ++
>  arch/arm64/kernel/efi-entry.S                  |  7 +-
>  drivers/firmware/efi/libstub/arm-stub.c        | 17 ++---
>  drivers/firmware/efi/libstub/arm64-stub.c      | 67 +++++++++++++++-----
>  drivers/firmware/efi/libstub/efi-stub-helper.c | 10 +++
>  drivers/firmware/efi/libstub/efistub.h         |  2 +
>  6 files changed, 82 insertions(+), 26 deletions(-)

[...]

> diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
> index 2a7a3015d7e0..e8a3b8cd53cc 100644
> --- a/drivers/firmware/efi/libstub/efi-stub-helper.c
> +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
> @@ -32,6 +32,10 @@
>  
>  static unsigned long __chunk_size = EFI_READ_CHUNK_SIZE;
>  
> +#ifdef CONFIG_RANDOMIZE_BASE
> +bool __nokaslr;
> +#endif
> +
>  /*
>   * Allow the platform to override the allocation granularity: this allows
>   * systems that have the capability to run with a larger page size to deal
> @@ -317,6 +321,12 @@ efi_status_t efi_parse_options(char *cmdline)
>  {
>  	char *str;
>  
> +#ifdef CONFIG_RANDOMIZE_BASE
> +	str = strstr(cmdline, "nokaslr");
> +	if (str && (str == cmdline || *(str - 1) == ' '))
> +		__nokaslr = true;
> +#endif
> +
>  	/*
>  	 * If no EFI parameters were specified on the cmdline we've got
>  	 * nothing to do.

Could we not keep the "nokaslr" parsing inside of arm-stub.c? It's not
really specific to EFI and doesn't make use of any of the code in
efi_parse_options() anyhow.

As an added bonus, __nokaslr could then become static.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 21/21] arm64: efi: invoke EFI_RNG_PROTOCOL to supply KASLR randomness
@ 2016-01-21 16:31     ` Matt Fleming
  0 siblings, 0 replies; 207+ messages in thread
From: Matt Fleming @ 2016-01-21 16:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 11 Jan, at 02:19:15PM, Ard Biesheuvel wrote:
> Since arm64 does not use a decompressor that supplies an execution
> environment where it is feasible to some extent to provide a source of
> randomness, the arm64 KASLR kernel depends on the bootloader to supply
> some random bits in register x1 upon kernel entry.
> 
> On UEFI systems, we can use the EFI_RNG_PROTOCOL, if supplied, to obtain
> some random bits. At the same time, use it to randomize the offset of the
> kernel Image in physical memory.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/Kconfig                             |  5 ++
>  arch/arm64/kernel/efi-entry.S                  |  7 +-
>  drivers/firmware/efi/libstub/arm-stub.c        | 17 ++---
>  drivers/firmware/efi/libstub/arm64-stub.c      | 67 +++++++++++++++-----
>  drivers/firmware/efi/libstub/efi-stub-helper.c | 10 +++
>  drivers/firmware/efi/libstub/efistub.h         |  2 +
>  6 files changed, 82 insertions(+), 26 deletions(-)

[...]

> diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
> index 2a7a3015d7e0..e8a3b8cd53cc 100644
> --- a/drivers/firmware/efi/libstub/efi-stub-helper.c
> +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
> @@ -32,6 +32,10 @@
>  
>  static unsigned long __chunk_size = EFI_READ_CHUNK_SIZE;
>  
> +#ifdef CONFIG_RANDOMIZE_BASE
> +bool __nokaslr;
> +#endif
> +
>  /*
>   * Allow the platform to override the allocation granularity: this allows
>   * systems that have the capability to run with a larger page size to deal
> @@ -317,6 +321,12 @@ efi_status_t efi_parse_options(char *cmdline)
>  {
>  	char *str;
>  
> +#ifdef CONFIG_RANDOMIZE_BASE
> +	str = strstr(cmdline, "nokaslr");
> +	if (str && (str == cmdline || *(str - 1) == ' '))
> +		__nokaslr = true;
> +#endif
> +
>  	/*
>  	 * If no EFI parameters were specified on the cmdline we've got
>  	 * nothing to do.

Could we not keep the "nokaslr" parsing inside of arm-stub.c? It's not
really specific to EFI and doesn't make use of any of the code in
efi_parse_options() anyhow.

As an added bonus, __nokaslr could then become static.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 21/21] arm64: efi: invoke EFI_RNG_PROTOCOL to supply KASLR randomness
@ 2016-01-21 16:31     ` Matt Fleming
  0 siblings, 0 replies; 207+ messages in thread
From: Matt Fleming @ 2016-01-21 16:31 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	mark.rutland, leif.lindholm, keescook, linux-kernel,
	stuart.yoder, bhupesh.sharma, arnd, marc.zyngier,
	christoffer.dall

On Mon, 11 Jan, at 02:19:15PM, Ard Biesheuvel wrote:
> Since arm64 does not use a decompressor that supplies an execution
> environment where it is feasible to some extent to provide a source of
> randomness, the arm64 KASLR kernel depends on the bootloader to supply
> some random bits in register x1 upon kernel entry.
> 
> On UEFI systems, we can use the EFI_RNG_PROTOCOL, if supplied, to obtain
> some random bits. At the same time, use it to randomize the offset of the
> kernel Image in physical memory.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/Kconfig                             |  5 ++
>  arch/arm64/kernel/efi-entry.S                  |  7 +-
>  drivers/firmware/efi/libstub/arm-stub.c        | 17 ++---
>  drivers/firmware/efi/libstub/arm64-stub.c      | 67 +++++++++++++++-----
>  drivers/firmware/efi/libstub/efi-stub-helper.c | 10 +++
>  drivers/firmware/efi/libstub/efistub.h         |  2 +
>  6 files changed, 82 insertions(+), 26 deletions(-)

[...]

> diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
> index 2a7a3015d7e0..e8a3b8cd53cc 100644
> --- a/drivers/firmware/efi/libstub/efi-stub-helper.c
> +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
> @@ -32,6 +32,10 @@
>  
>  static unsigned long __chunk_size = EFI_READ_CHUNK_SIZE;
>  
> +#ifdef CONFIG_RANDOMIZE_BASE
> +bool __nokaslr;
> +#endif
> +
>  /*
>   * Allow the platform to override the allocation granularity: this allows
>   * systems that have the capability to run with a larger page size to deal
> @@ -317,6 +321,12 @@ efi_status_t efi_parse_options(char *cmdline)
>  {
>  	char *str;
>  
> +#ifdef CONFIG_RANDOMIZE_BASE
> +	str = strstr(cmdline, "nokaslr");
> +	if (str && (str == cmdline || *(str - 1) == ' '))
> +		__nokaslr = true;
> +#endif
> +
>  	/*
>  	 * If no EFI parameters were specified on the cmdline we've got
>  	 * nothing to do.

Could we not keep the "nokaslr" parsing inside of arm-stub.c? It's not
really specific to EFI and doesn't make use of any of the code in
efi_parse_options() anyhow.

As an added bonus, __nokaslr could then become static.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 08/21] arm64: add support for module PLTs
  2016-01-11 13:19   ` Ard Biesheuvel
  (?)
@ 2016-01-22 16:55     ` Mark Rutland
  -1 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-22 16:55 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	leif.lindholm, keescook, linux-kernel, stuart.yoder,
	bhupesh.sharma, arnd, marc.zyngier, christoffer.dall

Hi Ard,

This looks good.

My comments below are mostly nits, and much of the rest probably betrays
my lack of familiarity with ELF.

On Mon, Jan 11, 2016 at 02:19:01PM +0100, Ard Biesheuvel wrote:
> This adds support for emitting PLTs at module load time for relative
> branches that are out of range. This is a prerequisite for KASLR, which
> may place the kernel and the modules anywhere in the vmalloc area,
> making it more likely that branch target offsets exceed the maximum
> range of +/- 128 MB.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/Kconfig              |   9 ++
>  arch/arm64/Makefile             |   6 +-
>  arch/arm64/include/asm/module.h |  11 ++
>  arch/arm64/kernel/Makefile      |   1 +
>  arch/arm64/kernel/module-plts.c | 137 ++++++++++++++++++++
>  arch/arm64/kernel/module.c      |  12 ++
>  arch/arm64/kernel/module.lds    |   4 +
>  7 files changed, 179 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index ffa3c549a4ba..778df20bf623 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -363,6 +363,7 @@ config ARM64_ERRATUM_843419
>  	bool "Cortex-A53: 843419: A load or store might access an incorrect address"
>  	depends on MODULES
>  	default y
> +	select ARM64_MODULE_CMODEL_LARGE
>  	help
>  	  This option builds kernel modules using the large memory model in
>  	  order to avoid the use of the ADRP instruction, which can cause
> @@ -702,6 +703,14 @@ config ARM64_LSE_ATOMICS
>  
>  endmenu
>  
> +config ARM64_MODULE_CMODEL_LARGE
> +	bool
> +
> +config ARM64_MODULE_PLTS
> +	bool
> +	select ARM64_MODULE_CMODEL_LARGE
> +	select HAVE_MOD_ARCH_SPECIFIC
> +
>  endmenu
>  
>  menu "Boot options"
> diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> index cd822d8454c0..db462980c6be 100644
> --- a/arch/arm64/Makefile
> +++ b/arch/arm64/Makefile
> @@ -41,10 +41,14 @@ endif
>  
>  CHECKFLAGS	+= -D__aarch64__
>  
> -ifeq ($(CONFIG_ARM64_ERRATUM_843419), y)
> +ifeq ($(CONFIG_ARM64_MODULE_CMODEL_LARGE), y)
>  KBUILD_CFLAGS_MODULE	+= -mcmodel=large
>  endif
>  
> +ifeq ($(CONFIG_ARM64_MODULE_PLTS),y)
> +KBUILD_LDFLAGS_MODULE	+= -T $(srctree)/arch/arm64/kernel/module.lds
> +endif
> +
>  # Default value
>  head-y		:= arch/arm64/kernel/head.o
>  
> diff --git a/arch/arm64/include/asm/module.h b/arch/arm64/include/asm/module.h
> index e80e232b730e..7b8cd3dc9d8e 100644
> --- a/arch/arm64/include/asm/module.h
> +++ b/arch/arm64/include/asm/module.h
> @@ -20,4 +20,15 @@
>  
>  #define MODULE_ARCH_VERMAGIC	"aarch64"
>  
> +#ifdef CONFIG_ARM64_MODULE_PLTS
> +struct mod_arch_specific {
> +	struct elf64_shdr	*core_plt;
> +	struct elf64_shdr	*init_plt;
> +	int			core_plt_count;
> +	int			init_plt_count;
> +};
> +#endif
> +
> +u64 get_module_plt(struct module *mod, void *loc, u64 val);
> +
>  #endif /* __ASM_MODULE_H */
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index 474691f8b13a..f42b0fff607f 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -30,6 +30,7 @@ arm64-obj-$(CONFIG_COMPAT)		+= sys32.o kuser32.o signal32.o 	\
>  					   ../../arm/kernel/opcodes.o
>  arm64-obj-$(CONFIG_FUNCTION_TRACER)	+= ftrace.o entry-ftrace.o
>  arm64-obj-$(CONFIG_MODULES)		+= arm64ksyms.o module.o
> +arm64-obj-$(CONFIG_ARM64_MODULE_PLTS)	+= module-plts.o
>  arm64-obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o perf_callchain.o
>  arm64-obj-$(CONFIG_HW_PERF_EVENTS)	+= perf_event.o
>  arm64-obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= hw_breakpoint.o
> diff --git a/arch/arm64/kernel/module-plts.c b/arch/arm64/kernel/module-plts.c
> new file mode 100644
> index 000000000000..4a8ef9ea01ee
> --- /dev/null
> +++ b/arch/arm64/kernel/module-plts.c
> @@ -0,0 +1,137 @@
> +/*
> + * Copyright (C) 2014-2015 Linaro Ltd. <ard.biesheuvel@linaro.org>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/elf.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +
> +struct plt_entry {
> +	__le32	mov0;	/* movn	x16, #0x....			*/
> +	__le32	mov1;	/* movk	x16, #0x...., lsl #16		*/
> +	__le32	mov2;	/* movk	x16, #0x...., lsl #32		*/
> +	__le32	br;	/* br	x16				*/
> +} __aligned(8);

We only need natural alignment for the instructions, so what's the
alignment for? I can't see that anything else cares.

It might be worth a comment regarding why why use x16 (i.e. because the
AAPCS says that as IP0 it is valid for veneers/PLTs to clobber).

> +static bool in_init(const struct module *mod, void *addr)
> +{
> +	return (u64)addr - (u64)mod->module_init < mod->init_size;
> +}
> +
> +u64 get_module_plt(struct module *mod, void *loc, u64 val)
> +{
> +	struct plt_entry entry = {
> +		cpu_to_le32(0x92800010 | (((~val      ) & 0xffff)) << 5),
> +		cpu_to_le32(0xf2a00010 | ((( val >> 16) & 0xffff)) << 5),
> +		cpu_to_le32(0xf2c00010 | ((( val >> 32) & 0xffff)) << 5),
> +		cpu_to_le32(0xd61f0200)
> +	}, *plt;

It would be nice if we could un-magic this, though I see that reusing
the existing insn or reloc_insn code is painful here.

> +	int i, *count;
> +
> +	if (in_init(mod, loc)) {
> +		plt = (struct plt_entry *)mod->arch.init_plt->sh_addr;
> +		count = &mod->arch.init_plt_count;
> +	} else {
> +		plt = (struct plt_entry *)mod->arch.core_plt->sh_addr;
> +		count = &mod->arch.core_plt_count;
> +	}
> +
> +	/* Look for an existing entry pointing to 'val' */
> +	for (i = 0; i < *count; i++)
> +		if (plt[i].mov0 == entry.mov0 &&
> +		    plt[i].mov1 == entry.mov1 &&
> +		    plt[i].mov2 == entry.mov2)
> +			return (u64)&plt[i];

I think that at the cost of redundantly comparing the br x16, you could
simplify this by comparing the whole struct, e.g.

	for (i = 0; i < *count; i++)
		if (plt[i] == entry)
			return (u64)&plt[i];

Which would also work if we change the veneer for some reason.

> +
> +	i = (*count)++;

given i == *count at the end of the loop, you could just increment
*count here.

> +	plt[i] = entry;
> +	return (u64)&plt[i];
> +}
> +
> +static int duplicate_rel(Elf64_Addr base, const Elf64_Rela *rela, int num)

Perhaps: static bool is_duplicate_rel

> +{
> +	int i;
> +
> +	for (i = 0; i < num; i++) {
> +		if (rela[i].r_info == rela[num].r_info &&
> +		    rela[i].r_addend == rela[num].r_addend)
> +			return 1;
> +	}
> +	return 0;
> +}
> +
> +/* Count how many PLT entries we may need */
> +static unsigned int count_plts(Elf64_Addr base, const Elf64_Rela *rela, int num)
> +{
> +	unsigned int ret = 0;
> +	int i;
> +
> +	/*
> +	 * Sure, this is order(n^2), but it's usually short, and not
> +	 * time critical
> +	 */
> +	for (i = 0; i < num; i++)
> +		switch (ELF64_R_TYPE(rela[i].r_info)) {
> +		case R_AARCH64_JUMP26:
> +		case R_AARCH64_CALL26:
> +			if (!duplicate_rel(base, rela, i))
> +				ret++;
> +			break;
> +		}

While braces aren't strictly required on the for loop, i think it would
look better with them given the contained logic is non-trivial.

> +	return ret;
> +}
> +
> +int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
> +			      char *secstrings, struct module *mod)
> +{
> +	unsigned long core_plts = 0, init_plts = 0;
> +	Elf64_Shdr *s, *sechdrs_end = sechdrs + ehdr->e_shnum;
> +
> +	/*
> +	 * To store the PLTs, we expand the .text section for core module code
> +	 * and the .init.text section for initialization code.
> +	 */

That comment is a bit misleading, given we don't touch .text and
.init.text, but rather .core.plt and .init.plt, relying on
layout_sections to group those with .text and .init.text.

> +	for (s = sechdrs; s < sechdrs_end; ++s)
> +		if (strcmp(".core.plt", secstrings + s->sh_name) == 0)
> +			mod->arch.core_plt = s;
> +		else if (strcmp(".init.plt", secstrings + s->sh_name) == 0)
> +			mod->arch.init_plt = s;

This would be nicer with braces.

> +
> +	if (!mod->arch.core_plt || !mod->arch.init_plt) {
> +		pr_err("%s: sections missing\n", mod->name);
> +		return -ENOEXEC;
> +	}
> +
> +	for (s = sechdrs + 1; s < sechdrs_end; ++s) {

Could we have a comment as to why we skip the first Shdr? I recall it's
in some way special, but I can't recall why/how.

> +		const Elf64_Rela *rels = (void *)ehdr + s->sh_offset;
> +		int numrels = s->sh_size / sizeof(Elf64_Rela);
> +		Elf64_Shdr *dstsec = sechdrs + s->sh_info;
> +
> +		if (s->sh_type != SHT_RELA)
> +			continue;

We only have RELA, and no REL?

Thanks,
Mark.

> +
> +		if (strstr(secstrings + s->sh_name, ".init"))
> +			init_plts += count_plts(dstsec->sh_addr, rels, numrels);
> +		else
> +			core_plts += count_plts(dstsec->sh_addr, rels, numrels);
> +	}
> +
> +	mod->arch.core_plt->sh_type = SHT_NOBITS;
> +	mod->arch.core_plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
> +	mod->arch.core_plt->sh_addralign = L1_CACHE_BYTES;
> +	mod->arch.core_plt->sh_size = core_plts * sizeof(struct plt_entry);
> +	mod->arch.core_plt_count = 0;
> +
> +	mod->arch.init_plt->sh_type = SHT_NOBITS;
> +	mod->arch.init_plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
> +	mod->arch.init_plt->sh_addralign = L1_CACHE_BYTES;
> +	mod->arch.init_plt->sh_size = init_plts * sizeof(struct plt_entry);
> +	mod->arch.init_plt_count = 0;
> +	pr_debug("%s: core.plt=%lld, init.plt=%lld\n", __func__,
> +		 mod->arch.core_plt->sh_size, mod->arch.init_plt->sh_size);
> +	return 0;
> +}
> diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
> index 93e970231ca9..3a298b0e21bb 100644
> --- a/arch/arm64/kernel/module.c
> +++ b/arch/arm64/kernel/module.c
> @@ -38,6 +38,11 @@ void *module_alloc(unsigned long size)
>  				GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
>  				NUMA_NO_NODE, __builtin_return_address(0));
>  
> +	if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
> +		p = __vmalloc_node_range(size, MODULE_ALIGN, VMALLOC_START,
> +				VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
> +				NUMA_NO_NODE, __builtin_return_address(0));
> +
>  	if (p && (kasan_module_alloc(p, size) < 0)) {
>  		vfree(p);
>  		return NULL;
> @@ -361,6 +366,13 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
>  		case R_AARCH64_CALL26:
>  			ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 2, 26,
>  					     AARCH64_INSN_IMM_26);
> +
> +			if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
> +			    ovf == -ERANGE) {
> +				val = get_module_plt(me, loc, val);
> +				ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 2,
> +						     26, AARCH64_INSN_IMM_26);
> +			}
>  			break;
>  
>  		default:
> diff --git a/arch/arm64/kernel/module.lds b/arch/arm64/kernel/module.lds
> new file mode 100644
> index 000000000000..3682fa107918
> --- /dev/null
> +++ b/arch/arm64/kernel/module.lds
> @@ -0,0 +1,4 @@
> +SECTIONS {
> +        .core.plt : { BYTE(0) }
> +        .init.plt : { BYTE(0) }
> +}
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 08/21] arm64: add support for module PLTs
@ 2016-01-22 16:55     ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-22 16:55 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ard,

This looks good.

My comments below are mostly nits, and much of the rest probably betrays
my lack of familiarity with ELF.

On Mon, Jan 11, 2016 at 02:19:01PM +0100, Ard Biesheuvel wrote:
> This adds support for emitting PLTs at module load time for relative
> branches that are out of range. This is a prerequisite for KASLR, which
> may place the kernel and the modules anywhere in the vmalloc area,
> making it more likely that branch target offsets exceed the maximum
> range of +/- 128 MB.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/Kconfig              |   9 ++
>  arch/arm64/Makefile             |   6 +-
>  arch/arm64/include/asm/module.h |  11 ++
>  arch/arm64/kernel/Makefile      |   1 +
>  arch/arm64/kernel/module-plts.c | 137 ++++++++++++++++++++
>  arch/arm64/kernel/module.c      |  12 ++
>  arch/arm64/kernel/module.lds    |   4 +
>  7 files changed, 179 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index ffa3c549a4ba..778df20bf623 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -363,6 +363,7 @@ config ARM64_ERRATUM_843419
>  	bool "Cortex-A53: 843419: A load or store might access an incorrect address"
>  	depends on MODULES
>  	default y
> +	select ARM64_MODULE_CMODEL_LARGE
>  	help
>  	  This option builds kernel modules using the large memory model in
>  	  order to avoid the use of the ADRP instruction, which can cause
> @@ -702,6 +703,14 @@ config ARM64_LSE_ATOMICS
>  
>  endmenu
>  
> +config ARM64_MODULE_CMODEL_LARGE
> +	bool
> +
> +config ARM64_MODULE_PLTS
> +	bool
> +	select ARM64_MODULE_CMODEL_LARGE
> +	select HAVE_MOD_ARCH_SPECIFIC
> +
>  endmenu
>  
>  menu "Boot options"
> diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> index cd822d8454c0..db462980c6be 100644
> --- a/arch/arm64/Makefile
> +++ b/arch/arm64/Makefile
> @@ -41,10 +41,14 @@ endif
>  
>  CHECKFLAGS	+= -D__aarch64__
>  
> -ifeq ($(CONFIG_ARM64_ERRATUM_843419), y)
> +ifeq ($(CONFIG_ARM64_MODULE_CMODEL_LARGE), y)
>  KBUILD_CFLAGS_MODULE	+= -mcmodel=large
>  endif
>  
> +ifeq ($(CONFIG_ARM64_MODULE_PLTS),y)
> +KBUILD_LDFLAGS_MODULE	+= -T $(srctree)/arch/arm64/kernel/module.lds
> +endif
> +
>  # Default value
>  head-y		:= arch/arm64/kernel/head.o
>  
> diff --git a/arch/arm64/include/asm/module.h b/arch/arm64/include/asm/module.h
> index e80e232b730e..7b8cd3dc9d8e 100644
> --- a/arch/arm64/include/asm/module.h
> +++ b/arch/arm64/include/asm/module.h
> @@ -20,4 +20,15 @@
>  
>  #define MODULE_ARCH_VERMAGIC	"aarch64"
>  
> +#ifdef CONFIG_ARM64_MODULE_PLTS
> +struct mod_arch_specific {
> +	struct elf64_shdr	*core_plt;
> +	struct elf64_shdr	*init_plt;
> +	int			core_plt_count;
> +	int			init_plt_count;
> +};
> +#endif
> +
> +u64 get_module_plt(struct module *mod, void *loc, u64 val);
> +
>  #endif /* __ASM_MODULE_H */
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index 474691f8b13a..f42b0fff607f 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -30,6 +30,7 @@ arm64-obj-$(CONFIG_COMPAT)		+= sys32.o kuser32.o signal32.o 	\
>  					   ../../arm/kernel/opcodes.o
>  arm64-obj-$(CONFIG_FUNCTION_TRACER)	+= ftrace.o entry-ftrace.o
>  arm64-obj-$(CONFIG_MODULES)		+= arm64ksyms.o module.o
> +arm64-obj-$(CONFIG_ARM64_MODULE_PLTS)	+= module-plts.o
>  arm64-obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o perf_callchain.o
>  arm64-obj-$(CONFIG_HW_PERF_EVENTS)	+= perf_event.o
>  arm64-obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= hw_breakpoint.o
> diff --git a/arch/arm64/kernel/module-plts.c b/arch/arm64/kernel/module-plts.c
> new file mode 100644
> index 000000000000..4a8ef9ea01ee
> --- /dev/null
> +++ b/arch/arm64/kernel/module-plts.c
> @@ -0,0 +1,137 @@
> +/*
> + * Copyright (C) 2014-2015 Linaro Ltd. <ard.biesheuvel@linaro.org>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/elf.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +
> +struct plt_entry {
> +	__le32	mov0;	/* movn	x16, #0x....			*/
> +	__le32	mov1;	/* movk	x16, #0x...., lsl #16		*/
> +	__le32	mov2;	/* movk	x16, #0x...., lsl #32		*/
> +	__le32	br;	/* br	x16				*/
> +} __aligned(8);

We only need natural alignment for the instructions, so what's the
alignment for? I can't see that anything else cares.

It might be worth a comment regarding why why use x16 (i.e. because the
AAPCS says that as IP0 it is valid for veneers/PLTs to clobber).

> +static bool in_init(const struct module *mod, void *addr)
> +{
> +	return (u64)addr - (u64)mod->module_init < mod->init_size;
> +}
> +
> +u64 get_module_plt(struct module *mod, void *loc, u64 val)
> +{
> +	struct plt_entry entry = {
> +		cpu_to_le32(0x92800010 | (((~val      ) & 0xffff)) << 5),
> +		cpu_to_le32(0xf2a00010 | ((( val >> 16) & 0xffff)) << 5),
> +		cpu_to_le32(0xf2c00010 | ((( val >> 32) & 0xffff)) << 5),
> +		cpu_to_le32(0xd61f0200)
> +	}, *plt;

It would be nice if we could un-magic this, though I see that reusing
the existing insn or reloc_insn code is painful here.

> +	int i, *count;
> +
> +	if (in_init(mod, loc)) {
> +		plt = (struct plt_entry *)mod->arch.init_plt->sh_addr;
> +		count = &mod->arch.init_plt_count;
> +	} else {
> +		plt = (struct plt_entry *)mod->arch.core_plt->sh_addr;
> +		count = &mod->arch.core_plt_count;
> +	}
> +
> +	/* Look for an existing entry pointing to 'val' */
> +	for (i = 0; i < *count; i++)
> +		if (plt[i].mov0 == entry.mov0 &&
> +		    plt[i].mov1 == entry.mov1 &&
> +		    plt[i].mov2 == entry.mov2)
> +			return (u64)&plt[i];

I think that at the cost of redundantly comparing the br x16, you could
simplify this by comparing the whole struct, e.g.

	for (i = 0; i < *count; i++)
		if (plt[i] == entry)
			return (u64)&plt[i];

Which would also work if we change the veneer for some reason.

> +
> +	i = (*count)++;

given i == *count at the end of the loop, you could just increment
*count here.

> +	plt[i] = entry;
> +	return (u64)&plt[i];
> +}
> +
> +static int duplicate_rel(Elf64_Addr base, const Elf64_Rela *rela, int num)

Perhaps: static bool is_duplicate_rel

> +{
> +	int i;
> +
> +	for (i = 0; i < num; i++) {
> +		if (rela[i].r_info == rela[num].r_info &&
> +		    rela[i].r_addend == rela[num].r_addend)
> +			return 1;
> +	}
> +	return 0;
> +}
> +
> +/* Count how many PLT entries we may need */
> +static unsigned int count_plts(Elf64_Addr base, const Elf64_Rela *rela, int num)
> +{
> +	unsigned int ret = 0;
> +	int i;
> +
> +	/*
> +	 * Sure, this is order(n^2), but it's usually short, and not
> +	 * time critical
> +	 */
> +	for (i = 0; i < num; i++)
> +		switch (ELF64_R_TYPE(rela[i].r_info)) {
> +		case R_AARCH64_JUMP26:
> +		case R_AARCH64_CALL26:
> +			if (!duplicate_rel(base, rela, i))
> +				ret++;
> +			break;
> +		}

While braces aren't strictly required on the for loop, i think it would
look better with them given the contained logic is non-trivial.

> +	return ret;
> +}
> +
> +int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
> +			      char *secstrings, struct module *mod)
> +{
> +	unsigned long core_plts = 0, init_plts = 0;
> +	Elf64_Shdr *s, *sechdrs_end = sechdrs + ehdr->e_shnum;
> +
> +	/*
> +	 * To store the PLTs, we expand the .text section for core module code
> +	 * and the .init.text section for initialization code.
> +	 */

That comment is a bit misleading, given we don't touch .text and
.init.text, but rather .core.plt and .init.plt, relying on
layout_sections to group those with .text and .init.text.

> +	for (s = sechdrs; s < sechdrs_end; ++s)
> +		if (strcmp(".core.plt", secstrings + s->sh_name) == 0)
> +			mod->arch.core_plt = s;
> +		else if (strcmp(".init.plt", secstrings + s->sh_name) == 0)
> +			mod->arch.init_plt = s;

This would be nicer with braces.

> +
> +	if (!mod->arch.core_plt || !mod->arch.init_plt) {
> +		pr_err("%s: sections missing\n", mod->name);
> +		return -ENOEXEC;
> +	}
> +
> +	for (s = sechdrs + 1; s < sechdrs_end; ++s) {

Could we have a comment as to why we skip the first Shdr? I recall it's
in some way special, but I can't recall why/how.

> +		const Elf64_Rela *rels = (void *)ehdr + s->sh_offset;
> +		int numrels = s->sh_size / sizeof(Elf64_Rela);
> +		Elf64_Shdr *dstsec = sechdrs + s->sh_info;
> +
> +		if (s->sh_type != SHT_RELA)
> +			continue;

We only have RELA, and no REL?

Thanks,
Mark.

> +
> +		if (strstr(secstrings + s->sh_name, ".init"))
> +			init_plts += count_plts(dstsec->sh_addr, rels, numrels);
> +		else
> +			core_plts += count_plts(dstsec->sh_addr, rels, numrels);
> +	}
> +
> +	mod->arch.core_plt->sh_type = SHT_NOBITS;
> +	mod->arch.core_plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
> +	mod->arch.core_plt->sh_addralign = L1_CACHE_BYTES;
> +	mod->arch.core_plt->sh_size = core_plts * sizeof(struct plt_entry);
> +	mod->arch.core_plt_count = 0;
> +
> +	mod->arch.init_plt->sh_type = SHT_NOBITS;
> +	mod->arch.init_plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
> +	mod->arch.init_plt->sh_addralign = L1_CACHE_BYTES;
> +	mod->arch.init_plt->sh_size = init_plts * sizeof(struct plt_entry);
> +	mod->arch.init_plt_count = 0;
> +	pr_debug("%s: core.plt=%lld, init.plt=%lld\n", __func__,
> +		 mod->arch.core_plt->sh_size, mod->arch.init_plt->sh_size);
> +	return 0;
> +}
> diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
> index 93e970231ca9..3a298b0e21bb 100644
> --- a/arch/arm64/kernel/module.c
> +++ b/arch/arm64/kernel/module.c
> @@ -38,6 +38,11 @@ void *module_alloc(unsigned long size)
>  				GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
>  				NUMA_NO_NODE, __builtin_return_address(0));
>  
> +	if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
> +		p = __vmalloc_node_range(size, MODULE_ALIGN, VMALLOC_START,
> +				VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
> +				NUMA_NO_NODE, __builtin_return_address(0));
> +
>  	if (p && (kasan_module_alloc(p, size) < 0)) {
>  		vfree(p);
>  		return NULL;
> @@ -361,6 +366,13 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
>  		case R_AARCH64_CALL26:
>  			ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 2, 26,
>  					     AARCH64_INSN_IMM_26);
> +
> +			if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
> +			    ovf == -ERANGE) {
> +				val = get_module_plt(me, loc, val);
> +				ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 2,
> +						     26, AARCH64_INSN_IMM_26);
> +			}
>  			break;
>  
>  		default:
> diff --git a/arch/arm64/kernel/module.lds b/arch/arm64/kernel/module.lds
> new file mode 100644
> index 000000000000..3682fa107918
> --- /dev/null
> +++ b/arch/arm64/kernel/module.lds
> @@ -0,0 +1,4 @@
> +SECTIONS {
> +        .core.plt : { BYTE(0) }
> +        .init.plt : { BYTE(0) }
> +}
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 08/21] arm64: add support for module PLTs
@ 2016-01-22 16:55     ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-22 16:55 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, will.deacon, catalin.marinas,
	leif.lindholm, keescook, linux-kernel, stuart.yoder,
	bhupesh.sharma, arnd, marc.zyngier, christoffer.dall

Hi Ard,

This looks good.

My comments below are mostly nits, and much of the rest probably betrays
my lack of familiarity with ELF.

On Mon, Jan 11, 2016 at 02:19:01PM +0100, Ard Biesheuvel wrote:
> This adds support for emitting PLTs at module load time for relative
> branches that are out of range. This is a prerequisite for KASLR, which
> may place the kernel and the modules anywhere in the vmalloc area,
> making it more likely that branch target offsets exceed the maximum
> range of +/- 128 MB.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/Kconfig              |   9 ++
>  arch/arm64/Makefile             |   6 +-
>  arch/arm64/include/asm/module.h |  11 ++
>  arch/arm64/kernel/Makefile      |   1 +
>  arch/arm64/kernel/module-plts.c | 137 ++++++++++++++++++++
>  arch/arm64/kernel/module.c      |  12 ++
>  arch/arm64/kernel/module.lds    |   4 +
>  7 files changed, 179 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index ffa3c549a4ba..778df20bf623 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -363,6 +363,7 @@ config ARM64_ERRATUM_843419
>  	bool "Cortex-A53: 843419: A load or store might access an incorrect address"
>  	depends on MODULES
>  	default y
> +	select ARM64_MODULE_CMODEL_LARGE
>  	help
>  	  This option builds kernel modules using the large memory model in
>  	  order to avoid the use of the ADRP instruction, which can cause
> @@ -702,6 +703,14 @@ config ARM64_LSE_ATOMICS
>  
>  endmenu
>  
> +config ARM64_MODULE_CMODEL_LARGE
> +	bool
> +
> +config ARM64_MODULE_PLTS
> +	bool
> +	select ARM64_MODULE_CMODEL_LARGE
> +	select HAVE_MOD_ARCH_SPECIFIC
> +
>  endmenu
>  
>  menu "Boot options"
> diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> index cd822d8454c0..db462980c6be 100644
> --- a/arch/arm64/Makefile
> +++ b/arch/arm64/Makefile
> @@ -41,10 +41,14 @@ endif
>  
>  CHECKFLAGS	+= -D__aarch64__
>  
> -ifeq ($(CONFIG_ARM64_ERRATUM_843419), y)
> +ifeq ($(CONFIG_ARM64_MODULE_CMODEL_LARGE), y)
>  KBUILD_CFLAGS_MODULE	+= -mcmodel=large
>  endif
>  
> +ifeq ($(CONFIG_ARM64_MODULE_PLTS),y)
> +KBUILD_LDFLAGS_MODULE	+= -T $(srctree)/arch/arm64/kernel/module.lds
> +endif
> +
>  # Default value
>  head-y		:= arch/arm64/kernel/head.o
>  
> diff --git a/arch/arm64/include/asm/module.h b/arch/arm64/include/asm/module.h
> index e80e232b730e..7b8cd3dc9d8e 100644
> --- a/arch/arm64/include/asm/module.h
> +++ b/arch/arm64/include/asm/module.h
> @@ -20,4 +20,15 @@
>  
>  #define MODULE_ARCH_VERMAGIC	"aarch64"
>  
> +#ifdef CONFIG_ARM64_MODULE_PLTS
> +struct mod_arch_specific {
> +	struct elf64_shdr	*core_plt;
> +	struct elf64_shdr	*init_plt;
> +	int			core_plt_count;
> +	int			init_plt_count;
> +};
> +#endif
> +
> +u64 get_module_plt(struct module *mod, void *loc, u64 val);
> +
>  #endif /* __ASM_MODULE_H */
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index 474691f8b13a..f42b0fff607f 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -30,6 +30,7 @@ arm64-obj-$(CONFIG_COMPAT)		+= sys32.o kuser32.o signal32.o 	\
>  					   ../../arm/kernel/opcodes.o
>  arm64-obj-$(CONFIG_FUNCTION_TRACER)	+= ftrace.o entry-ftrace.o
>  arm64-obj-$(CONFIG_MODULES)		+= arm64ksyms.o module.o
> +arm64-obj-$(CONFIG_ARM64_MODULE_PLTS)	+= module-plts.o
>  arm64-obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o perf_callchain.o
>  arm64-obj-$(CONFIG_HW_PERF_EVENTS)	+= perf_event.o
>  arm64-obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= hw_breakpoint.o
> diff --git a/arch/arm64/kernel/module-plts.c b/arch/arm64/kernel/module-plts.c
> new file mode 100644
> index 000000000000..4a8ef9ea01ee
> --- /dev/null
> +++ b/arch/arm64/kernel/module-plts.c
> @@ -0,0 +1,137 @@
> +/*
> + * Copyright (C) 2014-2015 Linaro Ltd. <ard.biesheuvel@linaro.org>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/elf.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +
> +struct plt_entry {
> +	__le32	mov0;	/* movn	x16, #0x....			*/
> +	__le32	mov1;	/* movk	x16, #0x...., lsl #16		*/
> +	__le32	mov2;	/* movk	x16, #0x...., lsl #32		*/
> +	__le32	br;	/* br	x16				*/
> +} __aligned(8);

We only need natural alignment for the instructions, so what's the
alignment for? I can't see that anything else cares.

It might be worth a comment regarding why why use x16 (i.e. because the
AAPCS says that as IP0 it is valid for veneers/PLTs to clobber).

> +static bool in_init(const struct module *mod, void *addr)
> +{
> +	return (u64)addr - (u64)mod->module_init < mod->init_size;
> +}
> +
> +u64 get_module_plt(struct module *mod, void *loc, u64 val)
> +{
> +	struct plt_entry entry = {
> +		cpu_to_le32(0x92800010 | (((~val      ) & 0xffff)) << 5),
> +		cpu_to_le32(0xf2a00010 | ((( val >> 16) & 0xffff)) << 5),
> +		cpu_to_le32(0xf2c00010 | ((( val >> 32) & 0xffff)) << 5),
> +		cpu_to_le32(0xd61f0200)
> +	}, *plt;

It would be nice if we could un-magic this, though I see that reusing
the existing insn or reloc_insn code is painful here.

> +	int i, *count;
> +
> +	if (in_init(mod, loc)) {
> +		plt = (struct plt_entry *)mod->arch.init_plt->sh_addr;
> +		count = &mod->arch.init_plt_count;
> +	} else {
> +		plt = (struct plt_entry *)mod->arch.core_plt->sh_addr;
> +		count = &mod->arch.core_plt_count;
> +	}
> +
> +	/* Look for an existing entry pointing to 'val' */
> +	for (i = 0; i < *count; i++)
> +		if (plt[i].mov0 == entry.mov0 &&
> +		    plt[i].mov1 == entry.mov1 &&
> +		    plt[i].mov2 == entry.mov2)
> +			return (u64)&plt[i];

I think that at the cost of redundantly comparing the br x16, you could
simplify this by comparing the whole struct, e.g.

	for (i = 0; i < *count; i++)
		if (plt[i] == entry)
			return (u64)&plt[i];

Which would also work if we change the veneer for some reason.

> +
> +	i = (*count)++;

given i == *count at the end of the loop, you could just increment
*count here.

> +	plt[i] = entry;
> +	return (u64)&plt[i];
> +}
> +
> +static int duplicate_rel(Elf64_Addr base, const Elf64_Rela *rela, int num)

Perhaps: static bool is_duplicate_rel

> +{
> +	int i;
> +
> +	for (i = 0; i < num; i++) {
> +		if (rela[i].r_info == rela[num].r_info &&
> +		    rela[i].r_addend == rela[num].r_addend)
> +			return 1;
> +	}
> +	return 0;
> +}
> +
> +/* Count how many PLT entries we may need */
> +static unsigned int count_plts(Elf64_Addr base, const Elf64_Rela *rela, int num)
> +{
> +	unsigned int ret = 0;
> +	int i;
> +
> +	/*
> +	 * Sure, this is order(n^2), but it's usually short, and not
> +	 * time critical
> +	 */
> +	for (i = 0; i < num; i++)
> +		switch (ELF64_R_TYPE(rela[i].r_info)) {
> +		case R_AARCH64_JUMP26:
> +		case R_AARCH64_CALL26:
> +			if (!duplicate_rel(base, rela, i))
> +				ret++;
> +			break;
> +		}

While braces aren't strictly required on the for loop, i think it would
look better with them given the contained logic is non-trivial.

> +	return ret;
> +}
> +
> +int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
> +			      char *secstrings, struct module *mod)
> +{
> +	unsigned long core_plts = 0, init_plts = 0;
> +	Elf64_Shdr *s, *sechdrs_end = sechdrs + ehdr->e_shnum;
> +
> +	/*
> +	 * To store the PLTs, we expand the .text section for core module code
> +	 * and the .init.text section for initialization code.
> +	 */

That comment is a bit misleading, given we don't touch .text and
.init.text, but rather .core.plt and .init.plt, relying on
layout_sections to group those with .text and .init.text.

> +	for (s = sechdrs; s < sechdrs_end; ++s)
> +		if (strcmp(".core.plt", secstrings + s->sh_name) == 0)
> +			mod->arch.core_plt = s;
> +		else if (strcmp(".init.plt", secstrings + s->sh_name) == 0)
> +			mod->arch.init_plt = s;

This would be nicer with braces.

> +
> +	if (!mod->arch.core_plt || !mod->arch.init_plt) {
> +		pr_err("%s: sections missing\n", mod->name);
> +		return -ENOEXEC;
> +	}
> +
> +	for (s = sechdrs + 1; s < sechdrs_end; ++s) {

Could we have a comment as to why we skip the first Shdr? I recall it's
in some way special, but I can't recall why/how.

> +		const Elf64_Rela *rels = (void *)ehdr + s->sh_offset;
> +		int numrels = s->sh_size / sizeof(Elf64_Rela);
> +		Elf64_Shdr *dstsec = sechdrs + s->sh_info;
> +
> +		if (s->sh_type != SHT_RELA)
> +			continue;

We only have RELA, and no REL?

Thanks,
Mark.

> +
> +		if (strstr(secstrings + s->sh_name, ".init"))
> +			init_plts += count_plts(dstsec->sh_addr, rels, numrels);
> +		else
> +			core_plts += count_plts(dstsec->sh_addr, rels, numrels);
> +	}
> +
> +	mod->arch.core_plt->sh_type = SHT_NOBITS;
> +	mod->arch.core_plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
> +	mod->arch.core_plt->sh_addralign = L1_CACHE_BYTES;
> +	mod->arch.core_plt->sh_size = core_plts * sizeof(struct plt_entry);
> +	mod->arch.core_plt_count = 0;
> +
> +	mod->arch.init_plt->sh_type = SHT_NOBITS;
> +	mod->arch.init_plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
> +	mod->arch.init_plt->sh_addralign = L1_CACHE_BYTES;
> +	mod->arch.init_plt->sh_size = init_plts * sizeof(struct plt_entry);
> +	mod->arch.init_plt_count = 0;
> +	pr_debug("%s: core.plt=%lld, init.plt=%lld\n", __func__,
> +		 mod->arch.core_plt->sh_size, mod->arch.init_plt->sh_size);
> +	return 0;
> +}
> diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
> index 93e970231ca9..3a298b0e21bb 100644
> --- a/arch/arm64/kernel/module.c
> +++ b/arch/arm64/kernel/module.c
> @@ -38,6 +38,11 @@ void *module_alloc(unsigned long size)
>  				GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
>  				NUMA_NO_NODE, __builtin_return_address(0));
>  
> +	if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
> +		p = __vmalloc_node_range(size, MODULE_ALIGN, VMALLOC_START,
> +				VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
> +				NUMA_NO_NODE, __builtin_return_address(0));
> +
>  	if (p && (kasan_module_alloc(p, size) < 0)) {
>  		vfree(p);
>  		return NULL;
> @@ -361,6 +366,13 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
>  		case R_AARCH64_CALL26:
>  			ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 2, 26,
>  					     AARCH64_INSN_IMM_26);
> +
> +			if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
> +			    ovf == -ERANGE) {
> +				val = get_module_plt(me, loc, val);
> +				ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 2,
> +						     26, AARCH64_INSN_IMM_26);
> +			}
>  			break;
>  
>  		default:
> diff --git a/arch/arm64/kernel/module.lds b/arch/arm64/kernel/module.lds
> new file mode 100644
> index 000000000000..3682fa107918
> --- /dev/null
> +++ b/arch/arm64/kernel/module.lds
> @@ -0,0 +1,4 @@
> +SECTIONS {
> +        .core.plt : { BYTE(0) }
> +        .init.plt : { BYTE(0) }
> +}
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 08/21] arm64: add support for module PLTs
  2016-01-22 16:55     ` Mark Rutland
  (?)
@ 2016-01-22 17:06       ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-22 17:06 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 22 January 2016 at 17:55, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi Ard,
>
> This looks good.
>

Thanks for taking a look. I must say that this looks slightly
different now in my upcoming v4: I got rid of the O(n^2) loops in
favor of sorting the RELA section (iff it relocates an executable
section)

> My comments below are mostly nits, and much of the rest probably betrays
> my lack of familiarity with ELF.
>
> On Mon, Jan 11, 2016 at 02:19:01PM +0100, Ard Biesheuvel wrote:
>> This adds support for emitting PLTs at module load time for relative
>> branches that are out of range. This is a prerequisite for KASLR, which
>> may place the kernel and the modules anywhere in the vmalloc area,
>> making it more likely that branch target offsets exceed the maximum
>> range of +/- 128 MB.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/Kconfig              |   9 ++
>>  arch/arm64/Makefile             |   6 +-
>>  arch/arm64/include/asm/module.h |  11 ++
>>  arch/arm64/kernel/Makefile      |   1 +
>>  arch/arm64/kernel/module-plts.c | 137 ++++++++++++++++++++
>>  arch/arm64/kernel/module.c      |  12 ++
>>  arch/arm64/kernel/module.lds    |   4 +
>>  7 files changed, 179 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index ffa3c549a4ba..778df20bf623 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -363,6 +363,7 @@ config ARM64_ERRATUM_843419
>>       bool "Cortex-A53: 843419: A load or store might access an incorrect address"
>>       depends on MODULES
>>       default y
>> +     select ARM64_MODULE_CMODEL_LARGE
>>       help
>>         This option builds kernel modules using the large memory model in
>>         order to avoid the use of the ADRP instruction, which can cause
>> @@ -702,6 +703,14 @@ config ARM64_LSE_ATOMICS
>>
>>  endmenu
>>
>> +config ARM64_MODULE_CMODEL_LARGE
>> +     bool
>> +
>> +config ARM64_MODULE_PLTS
>> +     bool
>> +     select ARM64_MODULE_CMODEL_LARGE
>> +     select HAVE_MOD_ARCH_SPECIFIC
>> +
>>  endmenu
>>
>>  menu "Boot options"
>> diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
>> index cd822d8454c0..db462980c6be 100644
>> --- a/arch/arm64/Makefile
>> +++ b/arch/arm64/Makefile
>> @@ -41,10 +41,14 @@ endif
>>
>>  CHECKFLAGS   += -D__aarch64__
>>
>> -ifeq ($(CONFIG_ARM64_ERRATUM_843419), y)
>> +ifeq ($(CONFIG_ARM64_MODULE_CMODEL_LARGE), y)
>>  KBUILD_CFLAGS_MODULE += -mcmodel=large
>>  endif
>>
>> +ifeq ($(CONFIG_ARM64_MODULE_PLTS),y)
>> +KBUILD_LDFLAGS_MODULE        += -T $(srctree)/arch/arm64/kernel/module.lds
>> +endif
>> +
>>  # Default value
>>  head-y               := arch/arm64/kernel/head.o
>>
>> diff --git a/arch/arm64/include/asm/module.h b/arch/arm64/include/asm/module.h
>> index e80e232b730e..7b8cd3dc9d8e 100644
>> --- a/arch/arm64/include/asm/module.h
>> +++ b/arch/arm64/include/asm/module.h
>> @@ -20,4 +20,15 @@
>>
>>  #define MODULE_ARCH_VERMAGIC "aarch64"
>>
>> +#ifdef CONFIG_ARM64_MODULE_PLTS
>> +struct mod_arch_specific {
>> +     struct elf64_shdr       *core_plt;
>> +     struct elf64_shdr       *init_plt;
>> +     int                     core_plt_count;
>> +     int                     init_plt_count;
>> +};
>> +#endif
>> +
>> +u64 get_module_plt(struct module *mod, void *loc, u64 val);
>> +
>>  #endif /* __ASM_MODULE_H */
>> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
>> index 474691f8b13a..f42b0fff607f 100644
>> --- a/arch/arm64/kernel/Makefile
>> +++ b/arch/arm64/kernel/Makefile
>> @@ -30,6 +30,7 @@ arm64-obj-$(CONFIG_COMPAT)          += sys32.o kuser32.o signal32.o         \
>>                                          ../../arm/kernel/opcodes.o
>>  arm64-obj-$(CONFIG_FUNCTION_TRACER)  += ftrace.o entry-ftrace.o
>>  arm64-obj-$(CONFIG_MODULES)          += arm64ksyms.o module.o
>> +arm64-obj-$(CONFIG_ARM64_MODULE_PLTS)        += module-plts.o
>>  arm64-obj-$(CONFIG_PERF_EVENTS)              += perf_regs.o perf_callchain.o
>>  arm64-obj-$(CONFIG_HW_PERF_EVENTS)   += perf_event.o
>>  arm64-obj-$(CONFIG_HAVE_HW_BREAKPOINT)       += hw_breakpoint.o
>> diff --git a/arch/arm64/kernel/module-plts.c b/arch/arm64/kernel/module-plts.c
>> new file mode 100644
>> index 000000000000..4a8ef9ea01ee
>> --- /dev/null
>> +++ b/arch/arm64/kernel/module-plts.c
>> @@ -0,0 +1,137 @@
>> +/*
>> + * Copyright (C) 2014-2015 Linaro Ltd. <ard.biesheuvel@linaro.org>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + */
>> +
>> +#include <linux/elf.h>
>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +
>> +struct plt_entry {
>> +     __le32  mov0;   /* movn x16, #0x....                    */
>> +     __le32  mov1;   /* movk x16, #0x...., lsl #16           */
>> +     __le32  mov2;   /* movk x16, #0x...., lsl #32           */
>> +     __le32  br;     /* br   x16                             */
>> +} __aligned(8);
>
> We only need natural alignment for the instructions, so what's the
> alignment for? I can't see that anything else cares.
>

This allows the compiler to emit a single load for the first two
fields when performing the comparison in the loop below. All of this
is somewhat moot now, since the sorting of the section causes the
duplicates to be adjacent, and I only have to compare against the last
veneer that was emitted.

> It might be worth a comment regarding why why use x16 (i.e. because the
> AAPCS says that as IP0 it is valid for veneers/PLTs to clobber).
>

Yep.

>> +static bool in_init(const struct module *mod, void *addr)
>> +{
>> +     return (u64)addr - (u64)mod->module_init < mod->init_size;
>> +}
>> +
>> +u64 get_module_plt(struct module *mod, void *loc, u64 val)
>> +{
>> +     struct plt_entry entry = {
>> +             cpu_to_le32(0x92800010 | (((~val      ) & 0xffff)) << 5),
>> +             cpu_to_le32(0xf2a00010 | ((( val >> 16) & 0xffff)) << 5),
>> +             cpu_to_le32(0xf2c00010 | ((( val >> 32) & 0xffff)) << 5),
>> +             cpu_to_le32(0xd61f0200)
>> +     }, *plt;
>
> It would be nice if we could un-magic this, though I see that reusing
> the existing insn or reloc_insn code is painful here.
>

Well, I could #define PLT0 PLT1 PLT2 etc, and document them a bit
better, but having all the instruction machinery for emitting the
exact same instructions each time seems a bit overkill imo.

>> +     int i, *count;
>> +
>> +     if (in_init(mod, loc)) {
>> +             plt = (struct plt_entry *)mod->arch.init_plt->sh_addr;
>> +             count = &mod->arch.init_plt_count;
>> +     } else {
>> +             plt = (struct plt_entry *)mod->arch.core_plt->sh_addr;
>> +             count = &mod->arch.core_plt_count;
>> +     }
>> +
>> +     /* Look for an existing entry pointing to 'val' */
>> +     for (i = 0; i < *count; i++)
>> +             if (plt[i].mov0 == entry.mov0 &&
>> +                 plt[i].mov1 == entry.mov1 &&
>> +                 plt[i].mov2 == entry.mov2)
>> +                     return (u64)&plt[i];
>
> I think that at the cost of redundantly comparing the br x16, you could
> simplify this by comparing the whole struct, e.g.
>
>         for (i = 0; i < *count; i++)
>                 if (plt[i] == entry)

You can use struct types in assignments, but not in comparisons,
strangely enough

>                         return (u64)&plt[i];
>
> Which would also work if we change the veneer for some reason.
>
>> +
>> +     i = (*count)++;
>
> given i == *count at the end of the loop, you could just increment
> *count here.
>
>> +     plt[i] = entry;
>> +     return (u64)&plt[i];
>> +}
>> +
>> +static int duplicate_rel(Elf64_Addr base, const Elf64_Rela *rela, int num)
>
> Perhaps: static bool is_duplicate_rel
>
>> +{
>> +     int i;
>> +
>> +     for (i = 0; i < num; i++) {
>> +             if (rela[i].r_info == rela[num].r_info &&
>> +                 rela[i].r_addend == rela[num].r_addend)
>> +                     return 1;
>> +     }
>> +     return 0;
>> +}
>> +
>> +/* Count how many PLT entries we may need */
>> +static unsigned int count_plts(Elf64_Addr base, const Elf64_Rela *rela, int num)
>> +{
>> +     unsigned int ret = 0;
>> +     int i;
>> +
>> +     /*
>> +      * Sure, this is order(n^2), but it's usually short, and not
>> +      * time critical
>> +      */
>> +     for (i = 0; i < num; i++)
>> +             switch (ELF64_R_TYPE(rela[i].r_info)) {
>> +             case R_AARCH64_JUMP26:
>> +             case R_AARCH64_CALL26:
>> +                     if (!duplicate_rel(base, rela, i))
>> +                             ret++;
>> +                     break;
>> +             }
>
> While braces aren't strictly required on the for loop, i think it would
> look better with them given the contained logic is non-trivial.
>

Indeed. I will add them

>> +     return ret;
>> +}
>> +
>> +int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
>> +                           char *secstrings, struct module *mod)
>> +{
>> +     unsigned long core_plts = 0, init_plts = 0;
>> +     Elf64_Shdr *s, *sechdrs_end = sechdrs + ehdr->e_shnum;
>> +
>> +     /*
>> +      * To store the PLTs, we expand the .text section for core module code
>> +      * and the .init.text section for initialization code.
>> +      */
>
> That comment is a bit misleading, given we don't touch .text and
> .init.text, but rather .core.plt and .init.plt, relying on
> layout_sections to group those with .text and .init.text.
>

ok

>> +     for (s = sechdrs; s < sechdrs_end; ++s)
>> +             if (strcmp(".core.plt", secstrings + s->sh_name) == 0)
>> +                     mod->arch.core_plt = s;
>> +             else if (strcmp(".init.plt", secstrings + s->sh_name) == 0)
>> +                     mod->arch.init_plt = s;
>
> This would be nicer with braces.
>

ok

>> +
>> +     if (!mod->arch.core_plt || !mod->arch.init_plt) {
>> +             pr_err("%s: sections missing\n", mod->name);
>> +             return -ENOEXEC;
>> +     }
>> +
>> +     for (s = sechdrs + 1; s < sechdrs_end; ++s) {
>
> Could we have a comment as to why we skip the first Shdr? I recall it's
> in some way special, but I can't recall why/how.
>

I don't remember exactly, and some of this code originated on ia64 IIRC.
Probably better to simply start from [0]

>> +             const Elf64_Rela *rels = (void *)ehdr + s->sh_offset;
>> +             int numrels = s->sh_size / sizeof(Elf64_Rela);
>> +             Elf64_Shdr *dstsec = sechdrs + s->sh_info;
>> +
>> +             if (s->sh_type != SHT_RELA)
>> +                     continue;
>
> We only have RELA, and no REL?
>

Nope.

arch/arm64/Kconfig:86:  select MODULES_USE_ELF_RELA

As I said, this code will look different in the next version, but I
will make sure to take your review points.

Thanks,
Ard.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 08/21] arm64: add support for module PLTs
@ 2016-01-22 17:06       ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-22 17:06 UTC (permalink / raw)
  To: linux-arm-kernel

On 22 January 2016 at 17:55, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi Ard,
>
> This looks good.
>

Thanks for taking a look. I must say that this looks slightly
different now in my upcoming v4: I got rid of the O(n^2) loops in
favor of sorting the RELA section (iff it relocates an executable
section)

> My comments below are mostly nits, and much of the rest probably betrays
> my lack of familiarity with ELF.
>
> On Mon, Jan 11, 2016 at 02:19:01PM +0100, Ard Biesheuvel wrote:
>> This adds support for emitting PLTs at module load time for relative
>> branches that are out of range. This is a prerequisite for KASLR, which
>> may place the kernel and the modules anywhere in the vmalloc area,
>> making it more likely that branch target offsets exceed the maximum
>> range of +/- 128 MB.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/Kconfig              |   9 ++
>>  arch/arm64/Makefile             |   6 +-
>>  arch/arm64/include/asm/module.h |  11 ++
>>  arch/arm64/kernel/Makefile      |   1 +
>>  arch/arm64/kernel/module-plts.c | 137 ++++++++++++++++++++
>>  arch/arm64/kernel/module.c      |  12 ++
>>  arch/arm64/kernel/module.lds    |   4 +
>>  7 files changed, 179 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index ffa3c549a4ba..778df20bf623 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -363,6 +363,7 @@ config ARM64_ERRATUM_843419
>>       bool "Cortex-A53: 843419: A load or store might access an incorrect address"
>>       depends on MODULES
>>       default y
>> +     select ARM64_MODULE_CMODEL_LARGE
>>       help
>>         This option builds kernel modules using the large memory model in
>>         order to avoid the use of the ADRP instruction, which can cause
>> @@ -702,6 +703,14 @@ config ARM64_LSE_ATOMICS
>>
>>  endmenu
>>
>> +config ARM64_MODULE_CMODEL_LARGE
>> +     bool
>> +
>> +config ARM64_MODULE_PLTS
>> +     bool
>> +     select ARM64_MODULE_CMODEL_LARGE
>> +     select HAVE_MOD_ARCH_SPECIFIC
>> +
>>  endmenu
>>
>>  menu "Boot options"
>> diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
>> index cd822d8454c0..db462980c6be 100644
>> --- a/arch/arm64/Makefile
>> +++ b/arch/arm64/Makefile
>> @@ -41,10 +41,14 @@ endif
>>
>>  CHECKFLAGS   += -D__aarch64__
>>
>> -ifeq ($(CONFIG_ARM64_ERRATUM_843419), y)
>> +ifeq ($(CONFIG_ARM64_MODULE_CMODEL_LARGE), y)
>>  KBUILD_CFLAGS_MODULE += -mcmodel=large
>>  endif
>>
>> +ifeq ($(CONFIG_ARM64_MODULE_PLTS),y)
>> +KBUILD_LDFLAGS_MODULE        += -T $(srctree)/arch/arm64/kernel/module.lds
>> +endif
>> +
>>  # Default value
>>  head-y               := arch/arm64/kernel/head.o
>>
>> diff --git a/arch/arm64/include/asm/module.h b/arch/arm64/include/asm/module.h
>> index e80e232b730e..7b8cd3dc9d8e 100644
>> --- a/arch/arm64/include/asm/module.h
>> +++ b/arch/arm64/include/asm/module.h
>> @@ -20,4 +20,15 @@
>>
>>  #define MODULE_ARCH_VERMAGIC "aarch64"
>>
>> +#ifdef CONFIG_ARM64_MODULE_PLTS
>> +struct mod_arch_specific {
>> +     struct elf64_shdr       *core_plt;
>> +     struct elf64_shdr       *init_plt;
>> +     int                     core_plt_count;
>> +     int                     init_plt_count;
>> +};
>> +#endif
>> +
>> +u64 get_module_plt(struct module *mod, void *loc, u64 val);
>> +
>>  #endif /* __ASM_MODULE_H */
>> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
>> index 474691f8b13a..f42b0fff607f 100644
>> --- a/arch/arm64/kernel/Makefile
>> +++ b/arch/arm64/kernel/Makefile
>> @@ -30,6 +30,7 @@ arm64-obj-$(CONFIG_COMPAT)          += sys32.o kuser32.o signal32.o         \
>>                                          ../../arm/kernel/opcodes.o
>>  arm64-obj-$(CONFIG_FUNCTION_TRACER)  += ftrace.o entry-ftrace.o
>>  arm64-obj-$(CONFIG_MODULES)          += arm64ksyms.o module.o
>> +arm64-obj-$(CONFIG_ARM64_MODULE_PLTS)        += module-plts.o
>>  arm64-obj-$(CONFIG_PERF_EVENTS)              += perf_regs.o perf_callchain.o
>>  arm64-obj-$(CONFIG_HW_PERF_EVENTS)   += perf_event.o
>>  arm64-obj-$(CONFIG_HAVE_HW_BREAKPOINT)       += hw_breakpoint.o
>> diff --git a/arch/arm64/kernel/module-plts.c b/arch/arm64/kernel/module-plts.c
>> new file mode 100644
>> index 000000000000..4a8ef9ea01ee
>> --- /dev/null
>> +++ b/arch/arm64/kernel/module-plts.c
>> @@ -0,0 +1,137 @@
>> +/*
>> + * Copyright (C) 2014-2015 Linaro Ltd. <ard.biesheuvel@linaro.org>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + */
>> +
>> +#include <linux/elf.h>
>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +
>> +struct plt_entry {
>> +     __le32  mov0;   /* movn x16, #0x....                    */
>> +     __le32  mov1;   /* movk x16, #0x...., lsl #16           */
>> +     __le32  mov2;   /* movk x16, #0x...., lsl #32           */
>> +     __le32  br;     /* br   x16                             */
>> +} __aligned(8);
>
> We only need natural alignment for the instructions, so what's the
> alignment for? I can't see that anything else cares.
>

This allows the compiler to emit a single load for the first two
fields when performing the comparison in the loop below. All of this
is somewhat moot now, since the sorting of the section causes the
duplicates to be adjacent, and I only have to compare against the last
veneer that was emitted.

> It might be worth a comment regarding why why use x16 (i.e. because the
> AAPCS says that as IP0 it is valid for veneers/PLTs to clobber).
>

Yep.

>> +static bool in_init(const struct module *mod, void *addr)
>> +{
>> +     return (u64)addr - (u64)mod->module_init < mod->init_size;
>> +}
>> +
>> +u64 get_module_plt(struct module *mod, void *loc, u64 val)
>> +{
>> +     struct plt_entry entry = {
>> +             cpu_to_le32(0x92800010 | (((~val      ) & 0xffff)) << 5),
>> +             cpu_to_le32(0xf2a00010 | ((( val >> 16) & 0xffff)) << 5),
>> +             cpu_to_le32(0xf2c00010 | ((( val >> 32) & 0xffff)) << 5),
>> +             cpu_to_le32(0xd61f0200)
>> +     }, *plt;
>
> It would be nice if we could un-magic this, though I see that reusing
> the existing insn or reloc_insn code is painful here.
>

Well, I could #define PLT0 PLT1 PLT2 etc, and document them a bit
better, but having all the instruction machinery for emitting the
exact same instructions each time seems a bit overkill imo.

>> +     int i, *count;
>> +
>> +     if (in_init(mod, loc)) {
>> +             plt = (struct plt_entry *)mod->arch.init_plt->sh_addr;
>> +             count = &mod->arch.init_plt_count;
>> +     } else {
>> +             plt = (struct plt_entry *)mod->arch.core_plt->sh_addr;
>> +             count = &mod->arch.core_plt_count;
>> +     }
>> +
>> +     /* Look for an existing entry pointing to 'val' */
>> +     for (i = 0; i < *count; i++)
>> +             if (plt[i].mov0 == entry.mov0 &&
>> +                 plt[i].mov1 == entry.mov1 &&
>> +                 plt[i].mov2 == entry.mov2)
>> +                     return (u64)&plt[i];
>
> I think that at the cost of redundantly comparing the br x16, you could
> simplify this by comparing the whole struct, e.g.
>
>         for (i = 0; i < *count; i++)
>                 if (plt[i] == entry)

You can use struct types in assignments, but not in comparisons,
strangely enough

>                         return (u64)&plt[i];
>
> Which would also work if we change the veneer for some reason.
>
>> +
>> +     i = (*count)++;
>
> given i == *count at the end of the loop, you could just increment
> *count here.
>
>> +     plt[i] = entry;
>> +     return (u64)&plt[i];
>> +}
>> +
>> +static int duplicate_rel(Elf64_Addr base, const Elf64_Rela *rela, int num)
>
> Perhaps: static bool is_duplicate_rel
>
>> +{
>> +     int i;
>> +
>> +     for (i = 0; i < num; i++) {
>> +             if (rela[i].r_info == rela[num].r_info &&
>> +                 rela[i].r_addend == rela[num].r_addend)
>> +                     return 1;
>> +     }
>> +     return 0;
>> +}
>> +
>> +/* Count how many PLT entries we may need */
>> +static unsigned int count_plts(Elf64_Addr base, const Elf64_Rela *rela, int num)
>> +{
>> +     unsigned int ret = 0;
>> +     int i;
>> +
>> +     /*
>> +      * Sure, this is order(n^2), but it's usually short, and not
>> +      * time critical
>> +      */
>> +     for (i = 0; i < num; i++)
>> +             switch (ELF64_R_TYPE(rela[i].r_info)) {
>> +             case R_AARCH64_JUMP26:
>> +             case R_AARCH64_CALL26:
>> +                     if (!duplicate_rel(base, rela, i))
>> +                             ret++;
>> +                     break;
>> +             }
>
> While braces aren't strictly required on the for loop, i think it would
> look better with them given the contained logic is non-trivial.
>

Indeed. I will add them

>> +     return ret;
>> +}
>> +
>> +int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
>> +                           char *secstrings, struct module *mod)
>> +{
>> +     unsigned long core_plts = 0, init_plts = 0;
>> +     Elf64_Shdr *s, *sechdrs_end = sechdrs + ehdr->e_shnum;
>> +
>> +     /*
>> +      * To store the PLTs, we expand the .text section for core module code
>> +      * and the .init.text section for initialization code.
>> +      */
>
> That comment is a bit misleading, given we don't touch .text and
> .init.text, but rather .core.plt and .init.plt, relying on
> layout_sections to group those with .text and .init.text.
>

ok

>> +     for (s = sechdrs; s < sechdrs_end; ++s)
>> +             if (strcmp(".core.plt", secstrings + s->sh_name) == 0)
>> +                     mod->arch.core_plt = s;
>> +             else if (strcmp(".init.plt", secstrings + s->sh_name) == 0)
>> +                     mod->arch.init_plt = s;
>
> This would be nicer with braces.
>

ok

>> +
>> +     if (!mod->arch.core_plt || !mod->arch.init_plt) {
>> +             pr_err("%s: sections missing\n", mod->name);
>> +             return -ENOEXEC;
>> +     }
>> +
>> +     for (s = sechdrs + 1; s < sechdrs_end; ++s) {
>
> Could we have a comment as to why we skip the first Shdr? I recall it's
> in some way special, but I can't recall why/how.
>

I don't remember exactly, and some of this code originated on ia64 IIRC.
Probably better to simply start from [0]

>> +             const Elf64_Rela *rels = (void *)ehdr + s->sh_offset;
>> +             int numrels = s->sh_size / sizeof(Elf64_Rela);
>> +             Elf64_Shdr *dstsec = sechdrs + s->sh_info;
>> +
>> +             if (s->sh_type != SHT_RELA)
>> +                     continue;
>
> We only have RELA, and no REL?
>

Nope.

arch/arm64/Kconfig:86:  select MODULES_USE_ELF_RELA

As I said, this code will look different in the next version, but I
will make sure to take your review points.

Thanks,
Ard.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 08/21] arm64: add support for module PLTs
@ 2016-01-22 17:06       ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-22 17:06 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On 22 January 2016 at 17:55, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi Ard,
>
> This looks good.
>

Thanks for taking a look. I must say that this looks slightly
different now in my upcoming v4: I got rid of the O(n^2) loops in
favor of sorting the RELA section (iff it relocates an executable
section)

> My comments below are mostly nits, and much of the rest probably betrays
> my lack of familiarity with ELF.
>
> On Mon, Jan 11, 2016 at 02:19:01PM +0100, Ard Biesheuvel wrote:
>> This adds support for emitting PLTs at module load time for relative
>> branches that are out of range. This is a prerequisite for KASLR, which
>> may place the kernel and the modules anywhere in the vmalloc area,
>> making it more likely that branch target offsets exceed the maximum
>> range of +/- 128 MB.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm64/Kconfig              |   9 ++
>>  arch/arm64/Makefile             |   6 +-
>>  arch/arm64/include/asm/module.h |  11 ++
>>  arch/arm64/kernel/Makefile      |   1 +
>>  arch/arm64/kernel/module-plts.c | 137 ++++++++++++++++++++
>>  arch/arm64/kernel/module.c      |  12 ++
>>  arch/arm64/kernel/module.lds    |   4 +
>>  7 files changed, 179 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index ffa3c549a4ba..778df20bf623 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -363,6 +363,7 @@ config ARM64_ERRATUM_843419
>>       bool "Cortex-A53: 843419: A load or store might access an incorrect address"
>>       depends on MODULES
>>       default y
>> +     select ARM64_MODULE_CMODEL_LARGE
>>       help
>>         This option builds kernel modules using the large memory model in
>>         order to avoid the use of the ADRP instruction, which can cause
>> @@ -702,6 +703,14 @@ config ARM64_LSE_ATOMICS
>>
>>  endmenu
>>
>> +config ARM64_MODULE_CMODEL_LARGE
>> +     bool
>> +
>> +config ARM64_MODULE_PLTS
>> +     bool
>> +     select ARM64_MODULE_CMODEL_LARGE
>> +     select HAVE_MOD_ARCH_SPECIFIC
>> +
>>  endmenu
>>
>>  menu "Boot options"
>> diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
>> index cd822d8454c0..db462980c6be 100644
>> --- a/arch/arm64/Makefile
>> +++ b/arch/arm64/Makefile
>> @@ -41,10 +41,14 @@ endif
>>
>>  CHECKFLAGS   += -D__aarch64__
>>
>> -ifeq ($(CONFIG_ARM64_ERRATUM_843419), y)
>> +ifeq ($(CONFIG_ARM64_MODULE_CMODEL_LARGE), y)
>>  KBUILD_CFLAGS_MODULE += -mcmodel=large
>>  endif
>>
>> +ifeq ($(CONFIG_ARM64_MODULE_PLTS),y)
>> +KBUILD_LDFLAGS_MODULE        += -T $(srctree)/arch/arm64/kernel/module.lds
>> +endif
>> +
>>  # Default value
>>  head-y               := arch/arm64/kernel/head.o
>>
>> diff --git a/arch/arm64/include/asm/module.h b/arch/arm64/include/asm/module.h
>> index e80e232b730e..7b8cd3dc9d8e 100644
>> --- a/arch/arm64/include/asm/module.h
>> +++ b/arch/arm64/include/asm/module.h
>> @@ -20,4 +20,15 @@
>>
>>  #define MODULE_ARCH_VERMAGIC "aarch64"
>>
>> +#ifdef CONFIG_ARM64_MODULE_PLTS
>> +struct mod_arch_specific {
>> +     struct elf64_shdr       *core_plt;
>> +     struct elf64_shdr       *init_plt;
>> +     int                     core_plt_count;
>> +     int                     init_plt_count;
>> +};
>> +#endif
>> +
>> +u64 get_module_plt(struct module *mod, void *loc, u64 val);
>> +
>>  #endif /* __ASM_MODULE_H */
>> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
>> index 474691f8b13a..f42b0fff607f 100644
>> --- a/arch/arm64/kernel/Makefile
>> +++ b/arch/arm64/kernel/Makefile
>> @@ -30,6 +30,7 @@ arm64-obj-$(CONFIG_COMPAT)          += sys32.o kuser32.o signal32.o         \
>>                                          ../../arm/kernel/opcodes.o
>>  arm64-obj-$(CONFIG_FUNCTION_TRACER)  += ftrace.o entry-ftrace.o
>>  arm64-obj-$(CONFIG_MODULES)          += arm64ksyms.o module.o
>> +arm64-obj-$(CONFIG_ARM64_MODULE_PLTS)        += module-plts.o
>>  arm64-obj-$(CONFIG_PERF_EVENTS)              += perf_regs.o perf_callchain.o
>>  arm64-obj-$(CONFIG_HW_PERF_EVENTS)   += perf_event.o
>>  arm64-obj-$(CONFIG_HAVE_HW_BREAKPOINT)       += hw_breakpoint.o
>> diff --git a/arch/arm64/kernel/module-plts.c b/arch/arm64/kernel/module-plts.c
>> new file mode 100644
>> index 000000000000..4a8ef9ea01ee
>> --- /dev/null
>> +++ b/arch/arm64/kernel/module-plts.c
>> @@ -0,0 +1,137 @@
>> +/*
>> + * Copyright (C) 2014-2015 Linaro Ltd. <ard.biesheuvel@linaro.org>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + */
>> +
>> +#include <linux/elf.h>
>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +
>> +struct plt_entry {
>> +     __le32  mov0;   /* movn x16, #0x....                    */
>> +     __le32  mov1;   /* movk x16, #0x...., lsl #16           */
>> +     __le32  mov2;   /* movk x16, #0x...., lsl #32           */
>> +     __le32  br;     /* br   x16                             */
>> +} __aligned(8);
>
> We only need natural alignment for the instructions, so what's the
> alignment for? I can't see that anything else cares.
>

This allows the compiler to emit a single load for the first two
fields when performing the comparison in the loop below. All of this
is somewhat moot now, since the sorting of the section causes the
duplicates to be adjacent, and I only have to compare against the last
veneer that was emitted.

> It might be worth a comment regarding why why use x16 (i.e. because the
> AAPCS says that as IP0 it is valid for veneers/PLTs to clobber).
>

Yep.

>> +static bool in_init(const struct module *mod, void *addr)
>> +{
>> +     return (u64)addr - (u64)mod->module_init < mod->init_size;
>> +}
>> +
>> +u64 get_module_plt(struct module *mod, void *loc, u64 val)
>> +{
>> +     struct plt_entry entry = {
>> +             cpu_to_le32(0x92800010 | (((~val      ) & 0xffff)) << 5),
>> +             cpu_to_le32(0xf2a00010 | ((( val >> 16) & 0xffff)) << 5),
>> +             cpu_to_le32(0xf2c00010 | ((( val >> 32) & 0xffff)) << 5),
>> +             cpu_to_le32(0xd61f0200)
>> +     }, *plt;
>
> It would be nice if we could un-magic this, though I see that reusing
> the existing insn or reloc_insn code is painful here.
>

Well, I could #define PLT0 PLT1 PLT2 etc, and document them a bit
better, but having all the instruction machinery for emitting the
exact same instructions each time seems a bit overkill imo.

>> +     int i, *count;
>> +
>> +     if (in_init(mod, loc)) {
>> +             plt = (struct plt_entry *)mod->arch.init_plt->sh_addr;
>> +             count = &mod->arch.init_plt_count;
>> +     } else {
>> +             plt = (struct plt_entry *)mod->arch.core_plt->sh_addr;
>> +             count = &mod->arch.core_plt_count;
>> +     }
>> +
>> +     /* Look for an existing entry pointing to 'val' */
>> +     for (i = 0; i < *count; i++)
>> +             if (plt[i].mov0 == entry.mov0 &&
>> +                 plt[i].mov1 == entry.mov1 &&
>> +                 plt[i].mov2 == entry.mov2)
>> +                     return (u64)&plt[i];
>
> I think that at the cost of redundantly comparing the br x16, you could
> simplify this by comparing the whole struct, e.g.
>
>         for (i = 0; i < *count; i++)
>                 if (plt[i] == entry)

You can use struct types in assignments, but not in comparisons,
strangely enough

>                         return (u64)&plt[i];
>
> Which would also work if we change the veneer for some reason.
>
>> +
>> +     i = (*count)++;
>
> given i == *count at the end of the loop, you could just increment
> *count here.
>
>> +     plt[i] = entry;
>> +     return (u64)&plt[i];
>> +}
>> +
>> +static int duplicate_rel(Elf64_Addr base, const Elf64_Rela *rela, int num)
>
> Perhaps: static bool is_duplicate_rel
>
>> +{
>> +     int i;
>> +
>> +     for (i = 0; i < num; i++) {
>> +             if (rela[i].r_info == rela[num].r_info &&
>> +                 rela[i].r_addend == rela[num].r_addend)
>> +                     return 1;
>> +     }
>> +     return 0;
>> +}
>> +
>> +/* Count how many PLT entries we may need */
>> +static unsigned int count_plts(Elf64_Addr base, const Elf64_Rela *rela, int num)
>> +{
>> +     unsigned int ret = 0;
>> +     int i;
>> +
>> +     /*
>> +      * Sure, this is order(n^2), but it's usually short, and not
>> +      * time critical
>> +      */
>> +     for (i = 0; i < num; i++)
>> +             switch (ELF64_R_TYPE(rela[i].r_info)) {
>> +             case R_AARCH64_JUMP26:
>> +             case R_AARCH64_CALL26:
>> +                     if (!duplicate_rel(base, rela, i))
>> +                             ret++;
>> +                     break;
>> +             }
>
> While braces aren't strictly required on the for loop, i think it would
> look better with them given the contained logic is non-trivial.
>

Indeed. I will add them

>> +     return ret;
>> +}
>> +
>> +int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
>> +                           char *secstrings, struct module *mod)
>> +{
>> +     unsigned long core_plts = 0, init_plts = 0;
>> +     Elf64_Shdr *s, *sechdrs_end = sechdrs + ehdr->e_shnum;
>> +
>> +     /*
>> +      * To store the PLTs, we expand the .text section for core module code
>> +      * and the .init.text section for initialization code.
>> +      */
>
> That comment is a bit misleading, given we don't touch .text and
> .init.text, but rather .core.plt and .init.plt, relying on
> layout_sections to group those with .text and .init.text.
>

ok

>> +     for (s = sechdrs; s < sechdrs_end; ++s)
>> +             if (strcmp(".core.plt", secstrings + s->sh_name) == 0)
>> +                     mod->arch.core_plt = s;
>> +             else if (strcmp(".init.plt", secstrings + s->sh_name) == 0)
>> +                     mod->arch.init_plt = s;
>
> This would be nicer with braces.
>

ok

>> +
>> +     if (!mod->arch.core_plt || !mod->arch.init_plt) {
>> +             pr_err("%s: sections missing\n", mod->name);
>> +             return -ENOEXEC;
>> +     }
>> +
>> +     for (s = sechdrs + 1; s < sechdrs_end; ++s) {
>
> Could we have a comment as to why we skip the first Shdr? I recall it's
> in some way special, but I can't recall why/how.
>

I don't remember exactly, and some of this code originated on ia64 IIRC.
Probably better to simply start from [0]

>> +             const Elf64_Rela *rels = (void *)ehdr + s->sh_offset;
>> +             int numrels = s->sh_size / sizeof(Elf64_Rela);
>> +             Elf64_Shdr *dstsec = sechdrs + s->sh_info;
>> +
>> +             if (s->sh_type != SHT_RELA)
>> +                     continue;
>
> We only have RELA, and no REL?
>

Nope.

arch/arm64/Kconfig:86:  select MODULES_USE_ELF_RELA

As I said, this code will look different in the next version, but I
will make sure to take your review points.

Thanks,
Ard.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 08/21] arm64: add support for module PLTs
  2016-01-22 17:06       ` Ard Biesheuvel
  (?)
@ 2016-01-22 17:19         ` Mark Rutland
  -1 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-22 17:19 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On Fri, Jan 22, 2016 at 06:06:52PM +0100, Ard Biesheuvel wrote:
> On 22 January 2016 at 17:55, Mark Rutland <mark.rutland@arm.com> wrote:
> >> +static bool in_init(const struct module *mod, void *addr)
> >> +{
> >> +     return (u64)addr - (u64)mod->module_init < mod->init_size;
> >> +}
> >> +
> >> +u64 get_module_plt(struct module *mod, void *loc, u64 val)
> >> +{
> >> +     struct plt_entry entry = {
> >> +             cpu_to_le32(0x92800010 | (((~val      ) & 0xffff)) << 5),
> >> +             cpu_to_le32(0xf2a00010 | ((( val >> 16) & 0xffff)) << 5),
> >> +             cpu_to_le32(0xf2c00010 | ((( val >> 32) & 0xffff)) << 5),
> >> +             cpu_to_le32(0xd61f0200)
> >> +     }, *plt;
> >
> > It would be nice if we could un-magic this, though I see that reusing
> > the existing insn or reloc_insn code is painful here.
> >
> 
> Well, I could #define PLT0 PLT1 PLT2 etc, and document them a bit
> better, but having all the instruction machinery for emitting the
> exact same instructions each time seems a bit overkill imo.

Well, almost the same (the target address does change after all).

I agree that this looks more complicated using the insn machinery, based
on local experimentation. Oh well...
 
> >> +     int i, *count;
> >> +
> >> +     if (in_init(mod, loc)) {
> >> +             plt = (struct plt_entry *)mod->arch.init_plt->sh_addr;
> >> +             count = &mod->arch.init_plt_count;
> >> +     } else {
> >> +             plt = (struct plt_entry *)mod->arch.core_plt->sh_addr;
> >> +             count = &mod->arch.core_plt_count;
> >> +     }
> >> +
> >> +     /* Look for an existing entry pointing to 'val' */
> >> +     for (i = 0; i < *count; i++)
> >> +             if (plt[i].mov0 == entry.mov0 &&
> >> +                 plt[i].mov1 == entry.mov1 &&
> >> +                 plt[i].mov2 == entry.mov2)
> >> +                     return (u64)&plt[i];
> >
> > I think that at the cost of redundantly comparing the br x16, you could
> > simplify this by comparing the whole struct, e.g.
> >
> >         for (i = 0; i < *count; i++)
> >                 if (plt[i] == entry)
> 
> You can use struct types in assignments, but not in comparisons,
> strangely enough

Ah, sorry for the noise.

> >> +     for (s = sechdrs + 1; s < sechdrs_end; ++s) {
> >
> > Could we have a comment as to why we skip the first Shdr? I recall it's
> > in some way special, but I can't recall why/how.
> >
> 
> I don't remember exactly, and some of this code originated on ia64 IIRC.
> Probably better to simply start from [0]

Ok.

> >> +             const Elf64_Rela *rels = (void *)ehdr + s->sh_offset;
> >> +             int numrels = s->sh_size / sizeof(Elf64_Rela);
> >> +             Elf64_Shdr *dstsec = sechdrs + s->sh_info;
> >> +
> >> +             if (s->sh_type != SHT_RELA)
> >> +                     continue;
> >
> > We only have RELA, and no REL?
> >
> 
> Nope.
> 
> arch/arm64/Kconfig:86:  select MODULES_USE_ELF_RELA

Evidently I didn't do enough background reading.

> As I said, this code will look different in the next version, but I
> will make sure to take your review points.

Cheers! :)

Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 08/21] arm64: add support for module PLTs
@ 2016-01-22 17:19         ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-22 17:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 22, 2016 at 06:06:52PM +0100, Ard Biesheuvel wrote:
> On 22 January 2016 at 17:55, Mark Rutland <mark.rutland@arm.com> wrote:
> >> +static bool in_init(const struct module *mod, void *addr)
> >> +{
> >> +     return (u64)addr - (u64)mod->module_init < mod->init_size;
> >> +}
> >> +
> >> +u64 get_module_plt(struct module *mod, void *loc, u64 val)
> >> +{
> >> +     struct plt_entry entry = {
> >> +             cpu_to_le32(0x92800010 | (((~val      ) & 0xffff)) << 5),
> >> +             cpu_to_le32(0xf2a00010 | ((( val >> 16) & 0xffff)) << 5),
> >> +             cpu_to_le32(0xf2c00010 | ((( val >> 32) & 0xffff)) << 5),
> >> +             cpu_to_le32(0xd61f0200)
> >> +     }, *plt;
> >
> > It would be nice if we could un-magic this, though I see that reusing
> > the existing insn or reloc_insn code is painful here.
> >
> 
> Well, I could #define PLT0 PLT1 PLT2 etc, and document them a bit
> better, but having all the instruction machinery for emitting the
> exact same instructions each time seems a bit overkill imo.

Well, almost the same (the target address does change after all).

I agree that this looks more complicated using the insn machinery, based
on local experimentation. Oh well...
 
> >> +     int i, *count;
> >> +
> >> +     if (in_init(mod, loc)) {
> >> +             plt = (struct plt_entry *)mod->arch.init_plt->sh_addr;
> >> +             count = &mod->arch.init_plt_count;
> >> +     } else {
> >> +             plt = (struct plt_entry *)mod->arch.core_plt->sh_addr;
> >> +             count = &mod->arch.core_plt_count;
> >> +     }
> >> +
> >> +     /* Look for an existing entry pointing to 'val' */
> >> +     for (i = 0; i < *count; i++)
> >> +             if (plt[i].mov0 == entry.mov0 &&
> >> +                 plt[i].mov1 == entry.mov1 &&
> >> +                 plt[i].mov2 == entry.mov2)
> >> +                     return (u64)&plt[i];
> >
> > I think that at the cost of redundantly comparing the br x16, you could
> > simplify this by comparing the whole struct, e.g.
> >
> >         for (i = 0; i < *count; i++)
> >                 if (plt[i] == entry)
> 
> You can use struct types in assignments, but not in comparisons,
> strangely enough

Ah, sorry for the noise.

> >> +     for (s = sechdrs + 1; s < sechdrs_end; ++s) {
> >
> > Could we have a comment as to why we skip the first Shdr? I recall it's
> > in some way special, but I can't recall why/how.
> >
> 
> I don't remember exactly, and some of this code originated on ia64 IIRC.
> Probably better to simply start from [0]

Ok.

> >> +             const Elf64_Rela *rels = (void *)ehdr + s->sh_offset;
> >> +             int numrels = s->sh_size / sizeof(Elf64_Rela);
> >> +             Elf64_Shdr *dstsec = sechdrs + s->sh_info;
> >> +
> >> +             if (s->sh_type != SHT_RELA)
> >> +                     continue;
> >
> > We only have RELA, and no REL?
> >
> 
> Nope.
> 
> arch/arm64/Kconfig:86:  select MODULES_USE_ELF_RELA

Evidently I didn't do enough background reading.

> As I said, this code will look different in the next version, but I
> will make sure to take your review points.

Cheers! :)

Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 08/21] arm64: add support for module PLTs
@ 2016-01-22 17:19         ` Mark Rutland
  0 siblings, 0 replies; 207+ messages in thread
From: Mark Rutland @ 2016-01-22 17:19 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, kernel-hardening, Will Deacon, Catalin Marinas,
	Leif Lindholm, Kees Cook, linux-kernel, Stuart Yoder,
	Sharma Bhupesh, Arnd Bergmann, Marc Zyngier, Christoffer Dall

On Fri, Jan 22, 2016 at 06:06:52PM +0100, Ard Biesheuvel wrote:
> On 22 January 2016 at 17:55, Mark Rutland <mark.rutland@arm.com> wrote:
> >> +static bool in_init(const struct module *mod, void *addr)
> >> +{
> >> +     return (u64)addr - (u64)mod->module_init < mod->init_size;
> >> +}
> >> +
> >> +u64 get_module_plt(struct module *mod, void *loc, u64 val)
> >> +{
> >> +     struct plt_entry entry = {
> >> +             cpu_to_le32(0x92800010 | (((~val      ) & 0xffff)) << 5),
> >> +             cpu_to_le32(0xf2a00010 | ((( val >> 16) & 0xffff)) << 5),
> >> +             cpu_to_le32(0xf2c00010 | ((( val >> 32) & 0xffff)) << 5),
> >> +             cpu_to_le32(0xd61f0200)
> >> +     }, *plt;
> >
> > It would be nice if we could un-magic this, though I see that reusing
> > the existing insn or reloc_insn code is painful here.
> >
> 
> Well, I could #define PLT0 PLT1 PLT2 etc, and document them a bit
> better, but having all the instruction machinery for emitting the
> exact same instructions each time seems a bit overkill imo.

Well, almost the same (the target address does change after all).

I agree that this looks more complicated using the insn machinery, based
on local experimentation. Oh well...
 
> >> +     int i, *count;
> >> +
> >> +     if (in_init(mod, loc)) {
> >> +             plt = (struct plt_entry *)mod->arch.init_plt->sh_addr;
> >> +             count = &mod->arch.init_plt_count;
> >> +     } else {
> >> +             plt = (struct plt_entry *)mod->arch.core_plt->sh_addr;
> >> +             count = &mod->arch.core_plt_count;
> >> +     }
> >> +
> >> +     /* Look for an existing entry pointing to 'val' */
> >> +     for (i = 0; i < *count; i++)
> >> +             if (plt[i].mov0 == entry.mov0 &&
> >> +                 plt[i].mov1 == entry.mov1 &&
> >> +                 plt[i].mov2 == entry.mov2)
> >> +                     return (u64)&plt[i];
> >
> > I think that at the cost of redundantly comparing the br x16, you could
> > simplify this by comparing the whole struct, e.g.
> >
> >         for (i = 0; i < *count; i++)
> >                 if (plt[i] == entry)
> 
> You can use struct types in assignments, but not in comparisons,
> strangely enough

Ah, sorry for the noise.

> >> +     for (s = sechdrs + 1; s < sechdrs_end; ++s) {
> >
> > Could we have a comment as to why we skip the first Shdr? I recall it's
> > in some way special, but I can't recall why/how.
> >
> 
> I don't remember exactly, and some of this code originated on ia64 IIRC.
> Probably better to simply start from [0]

Ok.

> >> +             const Elf64_Rela *rels = (void *)ehdr + s->sh_offset;
> >> +             int numrels = s->sh_size / sizeof(Elf64_Rela);
> >> +             Elf64_Shdr *dstsec = sechdrs + s->sh_info;
> >> +
> >> +             if (s->sh_type != SHT_RELA)
> >> +                     continue;
> >
> > We only have RELA, and no REL?
> >
> 
> Nope.
> 
> arch/arm64/Kconfig:86:  select MODULES_USE_ELF_RELA

Evidently I didn't do enough background reading.

> As I said, this code will look different in the next version, but I
> will make sure to take your review points.

Cheers! :)

Mark.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* Re: [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
  2016-01-15 11:23               ` Mark Rutland
  (?)
@ 2016-01-27 14:31                 ` Ard Biesheuvel
  -1 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-27 14:31 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Kees Cook, Arnd Bergmann, kernel-hardening, Sharma Bhupesh,
	Catalin Marinas, Will Deacon, linux-kernel, Leif Lindholm,
	Stuart Yoder, Marc Zyngier, Christoffer Dall, linux-arm-kernel

On 15 January 2016 at 12:23, Mark Rutland <mark.rutland@arm.com> wrote:
> On Fri, Jan 15, 2016 at 10:54:26AM +0100, Ard Biesheuvel wrote:
>> On 14 January 2016 at 19:57, Mark Rutland <mark.rutland@arm.com> wrote:
>> > On Wed, Jan 13, 2016 at 01:51:10PM +0000, Mark Rutland wrote:
>> >> On Wed, Jan 13, 2016 at 09:39:41AM +0100, Ard Biesheuvel wrote:
>> >> > If I remove the rounding, I get false positive kasan errors which I
>> >> > have not quite diagnosed yet, but are probably due to the fact that
>> >> > the rounding performed by vmemmap_populate() goes in the wrong
>> >> > direction.
>> >
>> > As far as I can see, it implicitly rounds the base down and end up to
>> > SWAPPER_BLOCK_SIZE granularity.
>> >
>> > I can see that it might map too much memory, but I can't see why that
>> > should trigger KASAN failures. Regardless of what was mapped KASAN
>> > should stick to the region it cares about, and everything else should
>> > stay out of that.
>> >
>> > When do you see the failures, and are they in any way consistent?
>> >
>> > Do you have an example to hand?
>> >
>>
>> For some reason, this issue has evaporated, i.e., I can no longer
>> reproduce it on my WIP v4 branch.
>> So I will remove the rounding.
>
> Ok.
>
> I'll let you know if I stumble across anything that looks like a
> potential cause of the KASAN failures, and I'll try to give v4 a go at
> some point soon.
>

OK, I managed to track this down (I think). The issue here is that,
while vmemmap_populate() does the right thing wrt the start and end
boundaries, populate_zero_shadow() will map the adjoining regions down
to page granularity, replacing vmemmap_populate()'s PMD block mappings
with PMD table mappings. So I need to put back the rounding (I removed
it in v4)

Thanks,
Ard.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-27 14:31                 ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-27 14:31 UTC (permalink / raw)
  To: linux-arm-kernel

On 15 January 2016 at 12:23, Mark Rutland <mark.rutland@arm.com> wrote:
> On Fri, Jan 15, 2016 at 10:54:26AM +0100, Ard Biesheuvel wrote:
>> On 14 January 2016 at 19:57, Mark Rutland <mark.rutland@arm.com> wrote:
>> > On Wed, Jan 13, 2016 at 01:51:10PM +0000, Mark Rutland wrote:
>> >> On Wed, Jan 13, 2016 at 09:39:41AM +0100, Ard Biesheuvel wrote:
>> >> > If I remove the rounding, I get false positive kasan errors which I
>> >> > have not quite diagnosed yet, but are probably due to the fact that
>> >> > the rounding performed by vmemmap_populate() goes in the wrong
>> >> > direction.
>> >
>> > As far as I can see, it implicitly rounds the base down and end up to
>> > SWAPPER_BLOCK_SIZE granularity.
>> >
>> > I can see that it might map too much memory, but I can't see why that
>> > should trigger KASAN failures. Regardless of what was mapped KASAN
>> > should stick to the region it cares about, and everything else should
>> > stay out of that.
>> >
>> > When do you see the failures, and are they in any way consistent?
>> >
>> > Do you have an example to hand?
>> >
>>
>> For some reason, this issue has evaporated, i.e., I can no longer
>> reproduce it on my WIP v4 branch.
>> So I will remove the rounding.
>
> Ok.
>
> I'll let you know if I stumble across anything that looks like a
> potential cause of the KASAN failures, and I'll try to give v4 a go at
> some point soon.
>

OK, I managed to track this down (I think). The issue here is that,
while vmemmap_populate() does the right thing wrt the start and end
boundaries, populate_zero_shadow() will map the adjoining regions down
to page granularity, replacing vmemmap_populate()'s PMD block mappings
with PMD table mappings. So I need to put back the rounding (I removed
it in v4)

Thanks,
Ard.

^ permalink raw reply	[flat|nested] 207+ messages in thread

* [kernel-hardening] Re: [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area
@ 2016-01-27 14:31                 ` Ard Biesheuvel
  0 siblings, 0 replies; 207+ messages in thread
From: Ard Biesheuvel @ 2016-01-27 14:31 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Kees Cook, Arnd Bergmann, kernel-hardening, Sharma Bhupesh,
	Catalin Marinas, Will Deacon, linux-kernel, Leif Lindholm,
	Stuart Yoder, Marc Zyngier, Christoffer Dall, linux-arm-kernel

On 15 January 2016 at 12:23, Mark Rutland <mark.rutland@arm.com> wrote:
> On Fri, Jan 15, 2016 at 10:54:26AM +0100, Ard Biesheuvel wrote:
>> On 14 January 2016 at 19:57, Mark Rutland <mark.rutland@arm.com> wrote:
>> > On Wed, Jan 13, 2016 at 01:51:10PM +0000, Mark Rutland wrote:
>> >> On Wed, Jan 13, 2016 at 09:39:41AM +0100, Ard Biesheuvel wrote:
>> >> > If I remove the rounding, I get false positive kasan errors which I
>> >> > have not quite diagnosed yet, but are probably due to the fact that
>> >> > the rounding performed by vmemmap_populate() goes in the wrong
>> >> > direction.
>> >
>> > As far as I can see, it implicitly rounds the base down and end up to
>> > SWAPPER_BLOCK_SIZE granularity.
>> >
>> > I can see that it might map too much memory, but I can't see why that
>> > should trigger KASAN failures. Regardless of what was mapped KASAN
>> > should stick to the region it cares about, and everything else should
>> > stay out of that.
>> >
>> > When do you see the failures, and are they in any way consistent?
>> >
>> > Do you have an example to hand?
>> >
>>
>> For some reason, this issue has evaporated, i.e., I can no longer
>> reproduce it on my WIP v4 branch.
>> So I will remove the rounding.
>
> Ok.
>
> I'll let you know if I stumble across anything that looks like a
> potential cause of the KASAN failures, and I'll try to give v4 a go at
> some point soon.
>

OK, I managed to track this down (I think). The issue here is that,
while vmemmap_populate() does the right thing wrt the start and end
boundaries, populate_zero_shadow() will map the adjoining regions down
to page granularity, replacing vmemmap_populate()'s PMD block mappings
with PMD table mappings. So I need to put back the rounding (I removed
it in v4)

Thanks,
Ard.

^ permalink raw reply	[flat|nested] 207+ messages in thread

end of thread, other threads:[~2016-01-27 14:31 UTC | newest]

Thread overview: 207+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-11 13:18 [PATCH v3 00/21] arm64: implement support for KASLR Ard Biesheuvel
2016-01-11 13:18 ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 13:18 ` Ard Biesheuvel
2016-01-11 13:18 ` [PATCH v3 01/21] of/fdt: make memblock minimum physical address arch configurable Ard Biesheuvel
2016-01-11 13:18   ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 13:18   ` Ard Biesheuvel
2016-01-11 13:18 ` [PATCH v3 02/21] arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region Ard Biesheuvel
2016-01-11 13:18   ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 13:18   ` Ard Biesheuvel
2016-01-11 16:31   ` Mark Rutland
2016-01-11 16:31     ` [kernel-hardening] " Mark Rutland
2016-01-11 16:31     ` Mark Rutland
2016-01-11 13:18 ` [PATCH v3 03/21] arm64: pgtable: add dummy pud_index() and pmd_index() definitions Ard Biesheuvel
2016-01-11 13:18   ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 13:18   ` Ard Biesheuvel
2016-01-11 17:40   ` Mark Rutland
2016-01-11 17:40     ` [kernel-hardening] " Mark Rutland
2016-01-11 17:40     ` Mark Rutland
2016-01-12 17:25     ` Ard Biesheuvel
2016-01-12 17:25       ` [kernel-hardening] " Ard Biesheuvel
2016-01-12 17:25       ` Ard Biesheuvel
2016-01-11 13:18 ` [PATCH v3 04/21] arm64: decouple early fixmap init from linear mapping Ard Biesheuvel
2016-01-11 13:18   ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 13:18   ` Ard Biesheuvel
2016-01-11 16:09   ` Mark Rutland
2016-01-11 16:09     ` [kernel-hardening] " Mark Rutland
2016-01-11 16:09     ` Mark Rutland
2016-01-11 16:15     ` Ard Biesheuvel
2016-01-11 16:15       ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 16:15       ` Ard Biesheuvel
2016-01-11 16:27       ` Mark Rutland
2016-01-11 16:27         ` [kernel-hardening] " Mark Rutland
2016-01-11 16:27         ` Mark Rutland
2016-01-11 16:51         ` Mark Rutland
2016-01-11 16:51           ` [kernel-hardening] " Mark Rutland
2016-01-11 16:51           ` Mark Rutland
2016-01-11 17:08           ` Ard Biesheuvel
2016-01-11 17:08             ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 17:08             ` Ard Biesheuvel
2016-01-11 17:15             ` Ard Biesheuvel
2016-01-11 17:15               ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 17:15               ` Ard Biesheuvel
2016-01-11 17:21               ` Mark Rutland
2016-01-11 17:21                 ` [kernel-hardening] " Mark Rutland
2016-01-11 17:21                 ` Mark Rutland
2016-01-11 13:18 ` [PATCH v3 05/21] arm64: kvm: deal with kernel symbols outside of " Ard Biesheuvel
2016-01-11 13:18   ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 13:18   ` Ard Biesheuvel
2016-01-12 12:36   ` Mark Rutland
2016-01-12 12:36     ` [kernel-hardening] " Mark Rutland
2016-01-12 12:36     ` Mark Rutland
2016-01-12 13:23     ` Ard Biesheuvel
2016-01-12 13:23       ` [kernel-hardening] " Ard Biesheuvel
2016-01-12 13:23       ` Ard Biesheuvel
2016-01-11 13:18 ` [PATCH v3 06/21] arm64: pgtable: implement static [pte|pmd|pud]_offset variants Ard Biesheuvel
2016-01-11 13:18   ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 13:18   ` Ard Biesheuvel
2016-01-11 16:24   ` Mark Rutland
2016-01-11 16:24     ` [kernel-hardening] " Mark Rutland
2016-01-11 16:24     ` Mark Rutland
2016-01-11 17:28     ` Ard Biesheuvel
2016-01-11 17:28       ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 17:28       ` Ard Biesheuvel
2016-01-11 17:31       ` Mark Rutland
2016-01-11 17:31         ` [kernel-hardening] " Mark Rutland
2016-01-11 17:31         ` Mark Rutland
2016-01-11 13:19 ` [PATCH v3 07/21] arm64: move kernel image to base of vmalloc area Ard Biesheuvel
2016-01-11 13:19   ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 13:19   ` Ard Biesheuvel
2016-01-12 18:14   ` Mark Rutland
2016-01-12 18:14     ` [kernel-hardening] " Mark Rutland
2016-01-12 18:14     ` Mark Rutland
2016-01-13  8:39     ` Ard Biesheuvel
2016-01-13  8:39       ` [kernel-hardening] " Ard Biesheuvel
2016-01-13  8:39       ` Ard Biesheuvel
2016-01-13  9:58       ` Ard Biesheuvel
2016-01-13  9:58         ` [kernel-hardening] " Ard Biesheuvel
2016-01-13  9:58         ` Ard Biesheuvel
2016-01-13 11:11         ` Mark Rutland
2016-01-13 11:11           ` [kernel-hardening] " Mark Rutland
2016-01-13 11:11           ` Mark Rutland
2016-01-13 11:14           ` Ard Biesheuvel
2016-01-13 11:14             ` [kernel-hardening] " Ard Biesheuvel
2016-01-13 11:14             ` Ard Biesheuvel
2016-01-13 13:51       ` Mark Rutland
2016-01-13 13:51         ` [kernel-hardening] " Mark Rutland
2016-01-13 13:51         ` Mark Rutland
2016-01-13 15:50         ` Ard Biesheuvel
2016-01-13 15:50           ` [kernel-hardening] " Ard Biesheuvel
2016-01-13 15:50           ` Ard Biesheuvel
2016-01-13 16:26           ` Mark Rutland
2016-01-13 16:26             ` [kernel-hardening] " Mark Rutland
2016-01-13 16:26             ` Mark Rutland
2016-01-14 18:57         ` Mark Rutland
2016-01-14 18:57           ` [kernel-hardening] " Mark Rutland
2016-01-14 18:57           ` Mark Rutland
2016-01-15  9:54           ` Ard Biesheuvel
2016-01-15  9:54             ` [kernel-hardening] " Ard Biesheuvel
2016-01-15  9:54             ` Ard Biesheuvel
2016-01-15 11:23             ` Mark Rutland
2016-01-15 11:23               ` [kernel-hardening] " Mark Rutland
2016-01-15 11:23               ` Mark Rutland
2016-01-27 14:31               ` Ard Biesheuvel
2016-01-27 14:31                 ` [kernel-hardening] " Ard Biesheuvel
2016-01-27 14:31                 ` Ard Biesheuvel
2016-01-11 13:19 ` [PATCH v3 08/21] arm64: add support for module PLTs Ard Biesheuvel
2016-01-11 13:19   ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 13:19   ` Ard Biesheuvel
2016-01-22 16:55   ` Mark Rutland
2016-01-22 16:55     ` [kernel-hardening] " Mark Rutland
2016-01-22 16:55     ` Mark Rutland
2016-01-22 17:06     ` Ard Biesheuvel
2016-01-22 17:06       ` [kernel-hardening] " Ard Biesheuvel
2016-01-22 17:06       ` Ard Biesheuvel
2016-01-22 17:19       ` Mark Rutland
2016-01-22 17:19         ` [kernel-hardening] " Mark Rutland
2016-01-22 17:19         ` Mark Rutland
2016-01-11 13:19 ` [PATCH v3 09/21] extable: add support for relative extables to search and sort routines Ard Biesheuvel
2016-01-11 13:19   ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 13:19   ` Ard Biesheuvel
2016-01-11 13:19 ` [PATCH v3 10/21] arm64: switch to relative exception tables Ard Biesheuvel
2016-01-11 13:19   ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 13:19   ` Ard Biesheuvel
2016-01-11 13:19 ` [PATCH v3 11/21] arm64: avoid R_AARCH64_ABS64 relocations for Image header fields Ard Biesheuvel
2016-01-11 13:19   ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 13:19   ` Ard Biesheuvel
2016-01-13 18:12   ` Mark Rutland
2016-01-13 18:12     ` [kernel-hardening] " Mark Rutland
2016-01-13 18:12     ` Mark Rutland
2016-01-13 18:48     ` Ard Biesheuvel
2016-01-13 18:48       ` [kernel-hardening] " Ard Biesheuvel
2016-01-13 18:48       ` Ard Biesheuvel
2016-01-14  8:51       ` Ard Biesheuvel
2016-01-14  8:51         ` [kernel-hardening] " Ard Biesheuvel
2016-01-14  8:51         ` Ard Biesheuvel
2016-01-14  9:05         ` Ard Biesheuvel
2016-01-14  9:05           ` [kernel-hardening] " Ard Biesheuvel
2016-01-14  9:05           ` Ard Biesheuvel
2016-01-14 10:46           ` Mark Rutland
2016-01-14 10:46             ` [kernel-hardening] " Mark Rutland
2016-01-14 10:46             ` Mark Rutland
2016-01-14 11:22             ` Ard Biesheuvel
2016-01-14 11:22               ` [kernel-hardening] " Ard Biesheuvel
2016-01-14 11:22               ` Ard Biesheuvel
2016-01-11 13:19 ` [PATCH v3 12/21] arm64: avoid dynamic relocations in early boot code Ard Biesheuvel
2016-01-11 13:19   ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 13:19   ` Ard Biesheuvel
2016-01-14 17:09   ` Mark Rutland
2016-01-14 17:09     ` [kernel-hardening] " Mark Rutland
2016-01-14 17:09     ` Mark Rutland
2016-01-11 13:19 ` [PATCH v3 13/21] arm64: allow kernel Image to be loaded anywhere in physical memory Ard Biesheuvel
2016-01-11 13:19   ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 13:19   ` Ard Biesheuvel
2016-01-11 13:19 ` [PATCH v3 14/21] arm64: redefine SWAPPER_TABLE_SHIFT for use in asm code Ard Biesheuvel
2016-01-11 13:19   ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 13:19   ` Ard Biesheuvel
2016-01-11 13:19 ` [PATCH v3 14/21] arm64: [re]define SWAPPER_TABLE_[SHIFT|SIZE] " Ard Biesheuvel
2016-01-11 13:19   ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 13:19   ` Ard Biesheuvel
2016-01-11 13:26   ` Ard Biesheuvel
2016-01-11 13:26     ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 13:26     ` Ard Biesheuvel
2016-01-11 13:19 ` [PATCH v3 15/21] arm64: split elf relocs into a separate header Ard Biesheuvel
2016-01-11 13:19   ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 13:19   ` Ard Biesheuvel
2016-01-11 13:19 ` [PATCH v3 16/21] scripts/sortextable: add support for ET_DYN binaries Ard Biesheuvel
2016-01-11 13:19   ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 13:19   ` Ard Biesheuvel
2016-01-11 13:19 ` [PATCH v3 17/21] arm64: add support for a relocatable kernel and KASLR Ard Biesheuvel
2016-01-11 13:19   ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 13:19   ` Ard Biesheuvel
2016-01-11 13:19 ` [PATCH v3 18/21] efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL Ard Biesheuvel
2016-01-11 13:19   ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 13:19   ` Ard Biesheuvel
2016-01-21 15:42   ` Matt Fleming
2016-01-21 15:42     ` [kernel-hardening] " Matt Fleming
2016-01-21 15:42     ` Matt Fleming
2016-01-21 16:12     ` Ard Biesheuvel
2016-01-21 16:12       ` [kernel-hardening] " Ard Biesheuvel
2016-01-21 16:12       ` Ard Biesheuvel
2016-01-11 13:19 ` [PATCH v3 19/21] efi: stub: add implementation of efi_random_alloc() Ard Biesheuvel
2016-01-11 13:19   ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 13:19   ` Ard Biesheuvel
2016-01-21 16:10   ` Matt Fleming
2016-01-21 16:10     ` [kernel-hardening] " Matt Fleming
2016-01-21 16:10     ` Matt Fleming
2016-01-21 16:16     ` Ard Biesheuvel
2016-01-21 16:16       ` [kernel-hardening] " Ard Biesheuvel
2016-01-21 16:16       ` Ard Biesheuvel
2016-01-11 13:19 ` [PATCH v3 20/21] efi: stub: use high allocation for converted command line Ard Biesheuvel
2016-01-11 13:19   ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 13:19   ` Ard Biesheuvel
2016-01-21 16:20   ` Matt Fleming
2016-01-21 16:20     ` [kernel-hardening] " Matt Fleming
2016-01-21 16:20     ` Matt Fleming
2016-01-11 13:19 ` [PATCH v3 21/21] arm64: efi: invoke EFI_RNG_PROTOCOL to supply KASLR randomness Ard Biesheuvel
2016-01-11 13:19   ` [kernel-hardening] " Ard Biesheuvel
2016-01-11 13:19   ` Ard Biesheuvel
2016-01-21 16:31   ` Matt Fleming
2016-01-21 16:31     ` [kernel-hardening] " Matt Fleming
2016-01-21 16:31     ` Matt Fleming
2016-01-11 22:07 ` [PATCH v3 00/21] arm64: implement support for KASLR Kees Cook
2016-01-11 22:07   ` [kernel-hardening] " Kees Cook
2016-01-11 22:07   ` Kees Cook
2016-01-12  7:17   ` Ard Biesheuvel
2016-01-12  7:17     ` [kernel-hardening] " Ard Biesheuvel
2016-01-12  7:17     ` Ard Biesheuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.