All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v9 00/11] arm64: kexec: add kexec_file_load() support
@ 2018-04-25  6:26 ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd
  Cc: ard.biesheuvel, james.morse, bhsharma, kexec, linux-arm-kernel,
	linux-kernel, AKASHI Takahiro

This is the ninth round of implementing kexec_file_load() support
on arm64.[1] Most of the code is based on kexec-tools.


This patch series enables us to
  * load the kernel by specifying its file descriptor, instead of user-
    filled buffer, at kexec_file_load() system call, and
  * optionally verify its signature at load time for trusted boot.
Kernel virtual address randomization is also supported since v9.

Contrary to kexec_load() system call, as we discussed a long time ago,
users may not be allowed to provide a device tree to the 2nd kernel
explicitly, hence enforcing a dt blob of the first kernel to be re-used
internally.

To use kexec_file_load() system call, instead of kexec_load(), at kexec
command, '-s' option must be specified. See [2] for a necessary patch for
kexec-tools.

To analyze a generated crash dump file, use the latest master branch of
crash utility[3]. I always try to submit patches to fix any inconsistencies
introduced in the latest kernel.

Regarding a kernel image verification, a signature must be presented
along with the binary itself. A signature is basically a hash value
calculated against the whole binary data and encrypted by a key which
will be authenticated by one of the system's trusted certificates.
Any attempt to read and load a to-be-kexec-ed kernel image through
a system call will be checked and blocked if the binary's hash value
doesn't match its associated signature.

There are two methods available now:
1. implementing arch-specific verification hook of kexec_file_load()
2. utilizing IMA(Integrity Measurement Architecture)[4] appraisal framework

Before my v7, I believed that my patch only supports (1) but am now
confident that (2) comes free if IMA is enabled and properly configured.


(1) Arch-specific verification hook
If CONFIG_KEXEC_VERIFY_SIG is enabled, kexec_file_load() invokes an arch-
defined (and hence file-format-specific) hook function to check for the
validity of kernel binary.

On x86, a signature is embedded into a PE file (Microsoft's format) header
of binary. Since arm64's "Image" can also be seen as a PE file as far as
CONFIG_EFI is enabled, we adopt this format for kernel signing.  

As in the case of UEFI applications, we can create a signed kernel image:
    $ sbsign --key ${KEY} --cert ${CERT} Image

You may want to use certs/signing_key.pem, which is intended to be used
for module signing (CONFIG_MODULE_SIG), as ${KEY} and ${CERT} for test
purpose.


(2) IMA appraisal-based
IMA was first introduced in linux in order to meet TCG (Trusted Computing
Group) requirement that all the sensitive files be *measured* before
reading/executing them to detect any untrusted changes/modification.
Then appraisal feature, which allows us to ensure the integrity of
files and even prevent them from reading/executing, was added later.

Meanwhile, kexec_file_load() has been merged since v3.17 and evolved to
enable IMA-appraisal type verification by the commit b804defe4297 ("kexec:
replace call to copy_file_from_fd() with kernel version").

In this scheme, a signature will be stored in a extended file attribute,
"security.ima" while a decryption key is hold in a dedicated keyring,
".ima" or "_ima".  All the necessary process of verification is confined
in a secure API, kernel_read_file_from_fd(), called by kexec_file_load().

    Please note that powerpc is one of the two architectures now
    supporting KEXEC_FILE, and that it wishes to exntend IMA,
    where a signature may be appended to "vmlinux" file[5], like module
    signing, instead of using an extended file attribute.

While IMA meant to be used with TPM (Trusted Platform Module) on secure
platform, IMA is still usable without TPM. Here is an example procedure
about how we can give it a try to run the feature using a self-signed
root ca for demo/test purposes:

 1) Generate needed keys and certificates, following "Generate trusted
    keys" section in README of ima-evm-utils[6].

 2) Build the kernel with the following kernel configurations, specifying
    "ima-local-ca.pem" for CONFIG_SYSTEM_TRUSTED_KEYS:
	CONFIG_EXT4_FS_SECURITY
	CONFIG_INTEGRITY_SIGNATURE
	CONFIG_INTEGRITY_ASYMMETRIC_KEYS
	CONFIG_INTEGRITY_TRUSTED_KEYRING
	CONFIG_IMA
	CONFIG_IMA_WRITE_POLICY
	CONFIG_IMA_READ_POLICY
	CONFIG_IMA_APPRAISE
	CONFIG_IMA_APPRAISE_BOOTPARAM
	CONFIG_SYSTEM_TRUSTED_KEYS
    Please note that CONFIG_KEXEC_VERIFY_SIG is not, actually should
    not be, enabled.

 3) Sign(label) a kernel image binary to be kexec-ed on target filesystem:
    $ evmctl ima_sign --key /path/to/private_key.pem /your/Image

 4) Add a command line parameter and boot the kernel:
    ima_appraise=enforce

 On live system,
 5) Set a security policy:
    $ mount -t securityfs none /sys/kernel/security
    $ echo "appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig" \
      > /sys/kernel/security/ima/policy

 6) Add a key for ima:
    $ keyctl padd asymmetric my_ima_key %:.ima < /path/to/x509_ima.der
    (or evmctl import /path/to/x509_ima.der <ima_keyring_id>)

 7) Then try kexec as normal.


Concerns(or future works):
* Support for physical address randomization
* Signature verification of big endian kernel with CONFIG_KEXEC_VERIFY_SIG
  While big-endian kernel can support kernel signing, I'm not sure that
  Image can be recognized as in PE format because x86 standard only
  defines little-endian-based format.
* Support for vminux loading

  [1] http://git.linaro.org/people/takahiro.akashi/linux-aarch64.git
	branch:arm64/kexec_file
  [2] http://git.linaro.org/people/takahiro.akashi/kexec-tools.git
	branch:arm64/kexec_file
  [3] http://github.com/crash-utility/crash.git
  [4] https://sourceforge.net/p/linux-ima/wiki/Home/
  [5] http://lkml.iu.edu//hypermail/linux/kernel/1707.0/03669.html
  [6] https://sourceforge.net/p/linux-ima/ima-evm-utils/ci/master/tree/


Changes in v9 (April 25, 2018)
* rebased to v4.17-rc
* remove preparatory patches on generic/x86/ppc code
  They have now been merged in v4.17-rc1.
* allocate memory based on memblock list instead of system resources
  This will prevent reserved regions, particularly UEFI/ACPI data,
  from being corrupted.
* correct dt property names, linux,initrd-*, in newly-created dtb
  "linux," was missing.
* remove alignment requirement for initrd loading
* add kaslr (kernel virtual address randomization) support
* misc code clean-up
* revise commit messages

Changes in v8 (Feb 22, 2018)
* introduce ARCH_HAS_KEXEC_PURGATORY so that arm64 will be able to skip
  purgatory
* remove "ifdef CONFIG_X86_64" stuffs from a re-factored function,
  prepare_elf64_headers(), making its interface more generic
  (The original patch was split into two for easier reviews.)
* modify cpu_soft_restart() so as to let the 2nd kernel jump into its entry
  code directly without requiring purgatory in case of kexec_file_load
* remove CONFIG_KEXEC_FILE_IMAGE_FMT and introduce
  CONFIG_KEXEC_IMAGE_VERIFY_SIG, much similar to x86 but quite redundant
  for now.
* In addition, update/modify dependencies of KEXEC_IMAGE_VERIFY_SIG

Changes in v7 (Dec 4, 2017)
* rebased to v4.15-rc2
* re-organize the patch set to separate KEXEC_FILE_VERIFY_SIG-related
  code from the others
* revamp factored-out code in kernel/kexec_file.c due to the changes
  in original x86 code
* redefine walk_sys_ram_res_rev() prototype due to change of callback
  type in the counterpart, walk_sys_ram_res()
* make KEXEC_FILE_IMAGE_FMT default on if KEXEC_FILE selected

Changes in v6 (Oct 24, 2017)
* fix a for-loop bug in _kexec_kernel_image_probe() per Julien

Changes in v5 (Oct 10, 2017)
* fix kbuild errors around patch #3
per Julien's comments,
* fix a bug in walk_system_ram_res_rev() with some cleanup
* modify fdt_setprop_range() to use vmalloc()
* modify fill_property() to use memset()

Changes in v4 (Oct 2, 2017)
* reinstate x86's arch_kexec_kernel_image_load()
* rename weak arch_kexec_kernel_xxx() to _kexec_kernel_xxx() for
  better re-use
* constify kexec_file_loaders[]

Changes in v3 (Sep 15, 2017)
* fix kbuild test error
* factor out arch_kexec_kernel_*() & arch_kimage_file_post_load_cleanup()
* remove CONFIG_CRASH_CORE guard from kexec_file.c
* add vmapped kernel region to vmcore for gdb backtracing
  (see prepare_elf64_headers())
* merge asm/kexec_file.h into asm/kexec.h
* and some cleanups

Changes in v2 (Sep 8, 2017)
* move core-header-related functions from crash_core.c to kexec_file.c
* drop hash-check code from purgatory
* modify purgatory asm to remove arch_kexec_apply_relocations_add()
* drop older kernel support
* drop vmlinux support (at least, for this series)


Patch #1 to #10 are essential part for KEXEC_FILE support
(additionally allowing for IMA-based verification):
  Patch #1 to #6 are all preparatory patches on generic side.
  Patch #7 to #11 are to enable kexec_file_load on arm64.

Patch #12 to #13 are for KEXEC_VERIFY_SIG (arch-specific verification)
support

AKASHI Takahiro (11):
  asm-generic: add kexec_file_load system call to unistd.h
  kexec_file: make kexec_image_post_load_cleanup_default() global
  arm64: kexec_file: invoke the kernel without purgatory
  arm64: kexec_file: allocate memory walking through memblock list
  arm64: kexec_file: load initrd and device-tree
  arm64: kexec_file: allow for loading Image-format kernel
  arm64: kexec_file: add crash dump support
  arm64: enable KEXEC_FILE config
  include: pe.h: remove message[] from mz header definition
  arm64: kexec_file: add kernel signature verification support
  arm64: kexec_file: add kaslr support

 arch/arm64/Kconfig                     |  34 ++
 arch/arm64/include/asm/kexec.h         |  86 +++++
 arch/arm64/kernel/Makefile             |   3 +-
 arch/arm64/kernel/cpu-reset.S          |   6 +-
 arch/arm64/kernel/kexec_image.c        |  99 ++++++
 arch/arm64/kernel/machine_kexec.c      |  11 +-
 arch/arm64/kernel/machine_kexec_file.c | 427 +++++++++++++++++++++++++
 arch/arm64/kernel/relocate_kernel.S    |   3 +-
 include/linux/kexec.h                  |   1 +
 include/linux/pe.h                     |   2 +-
 include/uapi/asm-generic/unistd.h      |   4 +-
 kernel/kexec_file.c                    |   2 +-
 12 files changed, 668 insertions(+), 10 deletions(-)
 create mode 100644 arch/arm64/kernel/kexec_image.c
 create mode 100644 arch/arm64/kernel/machine_kexec_file.c

-- 
2.17.0

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 00/11] arm64: kexec: add kexec_file_load() support
@ 2018-04-25  6:26 ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: linux-arm-kernel

This is the ninth round of implementing kexec_file_load() support
on arm64.[1] Most of the code is based on kexec-tools.


This patch series enables us to
  * load the kernel by specifying its file descriptor, instead of user-
    filled buffer, at kexec_file_load() system call, and
  * optionally verify its signature at load time for trusted boot.
Kernel virtual address randomization is also supported since v9.

Contrary to kexec_load() system call, as we discussed a long time ago,
users may not be allowed to provide a device tree to the 2nd kernel
explicitly, hence enforcing a dt blob of the first kernel to be re-used
internally.

To use kexec_file_load() system call, instead of kexec_load(), at kexec
command, '-s' option must be specified. See [2] for a necessary patch for
kexec-tools.

To analyze a generated crash dump file, use the latest master branch of
crash utility[3]. I always try to submit patches to fix any inconsistencies
introduced in the latest kernel.

Regarding a kernel image verification, a signature must be presented
along with the binary itself. A signature is basically a hash value
calculated against the whole binary data and encrypted by a key which
will be authenticated by one of the system's trusted certificates.
Any attempt to read and load a to-be-kexec-ed kernel image through
a system call will be checked and blocked if the binary's hash value
doesn't match its associated signature.

There are two methods available now:
1. implementing arch-specific verification hook of kexec_file_load()
2. utilizing IMA(Integrity Measurement Architecture)[4] appraisal framework

Before my v7, I believed that my patch only supports (1) but am now
confident that (2) comes free if IMA is enabled and properly configured.


(1) Arch-specific verification hook
If CONFIG_KEXEC_VERIFY_SIG is enabled, kexec_file_load() invokes an arch-
defined (and hence file-format-specific) hook function to check for the
validity of kernel binary.

On x86, a signature is embedded into a PE file (Microsoft's format) header
of binary. Since arm64's "Image" can also be seen as a PE file as far as
CONFIG_EFI is enabled, we adopt this format for kernel signing.  

As in the case of UEFI applications, we can create a signed kernel image:
    $ sbsign --key ${KEY} --cert ${CERT} Image

You may want to use certs/signing_key.pem, which is intended to be used
for module signing (CONFIG_MODULE_SIG), as ${KEY} and ${CERT} for test
purpose.


(2) IMA appraisal-based
IMA was first introduced in linux in order to meet TCG (Trusted Computing
Group) requirement that all the sensitive files be *measured* before
reading/executing them to detect any untrusted changes/modification.
Then appraisal feature, which allows us to ensure the integrity of
files and even prevent them from reading/executing, was added later.

Meanwhile, kexec_file_load() has been merged since v3.17 and evolved to
enable IMA-appraisal type verification by the commit b804defe4297 ("kexec:
replace call to copy_file_from_fd() with kernel version").

In this scheme, a signature will be stored in a extended file attribute,
"security.ima" while a decryption key is hold in a dedicated keyring,
".ima" or "_ima".  All the necessary process of verification is confined
in a secure API, kernel_read_file_from_fd(), called by kexec_file_load().

    Please note that powerpc is one of the two architectures now
    supporting KEXEC_FILE, and that it wishes to exntend IMA,
    where a signature may be appended to "vmlinux" file[5], like module
    signing, instead of using an extended file attribute.

While IMA meant to be used with TPM (Trusted Platform Module) on secure
platform, IMA is still usable without TPM. Here is an example procedure
about how we can give it a try to run the feature using a self-signed
root ca for demo/test purposes:

 1) Generate needed keys and certificates, following "Generate trusted
    keys" section in README of ima-evm-utils[6].

 2) Build the kernel with the following kernel configurations, specifying
    "ima-local-ca.pem" for CONFIG_SYSTEM_TRUSTED_KEYS:
	CONFIG_EXT4_FS_SECURITY
	CONFIG_INTEGRITY_SIGNATURE
	CONFIG_INTEGRITY_ASYMMETRIC_KEYS
	CONFIG_INTEGRITY_TRUSTED_KEYRING
	CONFIG_IMA
	CONFIG_IMA_WRITE_POLICY
	CONFIG_IMA_READ_POLICY
	CONFIG_IMA_APPRAISE
	CONFIG_IMA_APPRAISE_BOOTPARAM
	CONFIG_SYSTEM_TRUSTED_KEYS
    Please note that CONFIG_KEXEC_VERIFY_SIG is not, actually should
    not be, enabled.

 3) Sign(label) a kernel image binary to be kexec-ed on target filesystem:
    $ evmctl ima_sign --key /path/to/private_key.pem /your/Image

 4) Add a command line parameter and boot the kernel:
    ima_appraise=enforce

 On live system,
 5) Set a security policy:
    $ mount -t securityfs none /sys/kernel/security
    $ echo "appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig" \
      > /sys/kernel/security/ima/policy

 6) Add a key for ima:
    $ keyctl padd asymmetric my_ima_key %:.ima < /path/to/x509_ima.der
    (or evmctl import /path/to/x509_ima.der <ima_keyring_id>)

 7) Then try kexec as normal.


Concerns(or future works):
* Support for physical address randomization
* Signature verification of big endian kernel with CONFIG_KEXEC_VERIFY_SIG
  While big-endian kernel can support kernel signing, I'm not sure that
  Image can be recognized as in PE format because x86 standard only
  defines little-endian-based format.
* Support for vminux loading

  [1] http://git.linaro.org/people/takahiro.akashi/linux-aarch64.git
	branch:arm64/kexec_file
  [2] http://git.linaro.org/people/takahiro.akashi/kexec-tools.git
	branch:arm64/kexec_file
  [3] http://github.com/crash-utility/crash.git
  [4] https://sourceforge.net/p/linux-ima/wiki/Home/
  [5] http://lkml.iu.edu//hypermail/linux/kernel/1707.0/03669.html
  [6] https://sourceforge.net/p/linux-ima/ima-evm-utils/ci/master/tree/


Changes in v9 (April 25, 2018)
* rebased to v4.17-rc
* remove preparatory patches on generic/x86/ppc code
  They have now been merged in v4.17-rc1.
* allocate memory based on memblock list instead of system resources
  This will prevent reserved regions, particularly UEFI/ACPI data,
  from being corrupted.
* correct dt property names, linux,initrd-*, in newly-created dtb
  "linux," was missing.
* remove alignment requirement for initrd loading
* add kaslr (kernel virtual address randomization) support
* misc code clean-up
* revise commit messages

Changes in v8 (Feb 22, 2018)
* introduce ARCH_HAS_KEXEC_PURGATORY so that arm64 will be able to skip
  purgatory
* remove "ifdef CONFIG_X86_64" stuffs from a re-factored function,
  prepare_elf64_headers(), making its interface more generic
  (The original patch was split into two for easier reviews.)
* modify cpu_soft_restart() so as to let the 2nd kernel jump into its entry
  code directly without requiring purgatory in case of kexec_file_load
* remove CONFIG_KEXEC_FILE_IMAGE_FMT and introduce
  CONFIG_KEXEC_IMAGE_VERIFY_SIG, much similar to x86 but quite redundant
  for now.
* In addition, update/modify dependencies of KEXEC_IMAGE_VERIFY_SIG

Changes in v7 (Dec 4, 2017)
* rebased to v4.15-rc2
* re-organize the patch set to separate KEXEC_FILE_VERIFY_SIG-related
  code from the others
* revamp factored-out code in kernel/kexec_file.c due to the changes
  in original x86 code
* redefine walk_sys_ram_res_rev() prototype due to change of callback
  type in the counterpart, walk_sys_ram_res()
* make KEXEC_FILE_IMAGE_FMT default on if KEXEC_FILE selected

Changes in v6 (Oct 24, 2017)
* fix a for-loop bug in _kexec_kernel_image_probe() per Julien

Changes in v5 (Oct 10, 2017)
* fix kbuild errors around patch #3
per Julien's comments,
* fix a bug in walk_system_ram_res_rev() with some cleanup
* modify fdt_setprop_range() to use vmalloc()
* modify fill_property() to use memset()

Changes in v4 (Oct 2, 2017)
* reinstate x86's arch_kexec_kernel_image_load()
* rename weak arch_kexec_kernel_xxx() to _kexec_kernel_xxx() for
  better re-use
* constify kexec_file_loaders[]

Changes in v3 (Sep 15, 2017)
* fix kbuild test error
* factor out arch_kexec_kernel_*() & arch_kimage_file_post_load_cleanup()
* remove CONFIG_CRASH_CORE guard from kexec_file.c
* add vmapped kernel region to vmcore for gdb backtracing
  (see prepare_elf64_headers())
* merge asm/kexec_file.h into asm/kexec.h
* and some cleanups

Changes in v2 (Sep 8, 2017)
* move core-header-related functions from crash_core.c to kexec_file.c
* drop hash-check code from purgatory
* modify purgatory asm to remove arch_kexec_apply_relocations_add()
* drop older kernel support
* drop vmlinux support (at least, for this series)


Patch #1 to #10 are essential part for KEXEC_FILE support
(additionally allowing for IMA-based verification):
  Patch #1 to #6 are all preparatory patches on generic side.
  Patch #7 to #11 are to enable kexec_file_load on arm64.

Patch #12 to #13 are for KEXEC_VERIFY_SIG (arch-specific verification)
support

AKASHI Takahiro (11):
  asm-generic: add kexec_file_load system call to unistd.h
  kexec_file: make kexec_image_post_load_cleanup_default() global
  arm64: kexec_file: invoke the kernel without purgatory
  arm64: kexec_file: allocate memory walking through memblock list
  arm64: kexec_file: load initrd and device-tree
  arm64: kexec_file: allow for loading Image-format kernel
  arm64: kexec_file: add crash dump support
  arm64: enable KEXEC_FILE config
  include: pe.h: remove message[] from mz header definition
  arm64: kexec_file: add kernel signature verification support
  arm64: kexec_file: add kaslr support

 arch/arm64/Kconfig                     |  34 ++
 arch/arm64/include/asm/kexec.h         |  86 +++++
 arch/arm64/kernel/Makefile             |   3 +-
 arch/arm64/kernel/cpu-reset.S          |   6 +-
 arch/arm64/kernel/kexec_image.c        |  99 ++++++
 arch/arm64/kernel/machine_kexec.c      |  11 +-
 arch/arm64/kernel/machine_kexec_file.c | 427 +++++++++++++++++++++++++
 arch/arm64/kernel/relocate_kernel.S    |   3 +-
 include/linux/kexec.h                  |   1 +
 include/linux/pe.h                     |   2 +-
 include/uapi/asm-generic/unistd.h      |   4 +-
 kernel/kexec_file.c                    |   2 +-
 12 files changed, 668 insertions(+), 10 deletions(-)
 create mode 100644 arch/arm64/kernel/kexec_image.c
 create mode 100644 arch/arm64/kernel/machine_kexec_file.c

-- 
2.17.0

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 00/11] arm64: kexec: add kexec_file_load() support
@ 2018-04-25  6:26 ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd
  Cc: ard.biesheuvel, bhsharma, kexec, linux-kernel, AKASHI Takahiro,
	james.morse, linux-arm-kernel

This is the ninth round of implementing kexec_file_load() support
on arm64.[1] Most of the code is based on kexec-tools.


This patch series enables us to
  * load the kernel by specifying its file descriptor, instead of user-
    filled buffer, at kexec_file_load() system call, and
  * optionally verify its signature at load time for trusted boot.
Kernel virtual address randomization is also supported since v9.

Contrary to kexec_load() system call, as we discussed a long time ago,
users may not be allowed to provide a device tree to the 2nd kernel
explicitly, hence enforcing a dt blob of the first kernel to be re-used
internally.

To use kexec_file_load() system call, instead of kexec_load(), at kexec
command, '-s' option must be specified. See [2] for a necessary patch for
kexec-tools.

To analyze a generated crash dump file, use the latest master branch of
crash utility[3]. I always try to submit patches to fix any inconsistencies
introduced in the latest kernel.

Regarding a kernel image verification, a signature must be presented
along with the binary itself. A signature is basically a hash value
calculated against the whole binary data and encrypted by a key which
will be authenticated by one of the system's trusted certificates.
Any attempt to read and load a to-be-kexec-ed kernel image through
a system call will be checked and blocked if the binary's hash value
doesn't match its associated signature.

There are two methods available now:
1. implementing arch-specific verification hook of kexec_file_load()
2. utilizing IMA(Integrity Measurement Architecture)[4] appraisal framework

Before my v7, I believed that my patch only supports (1) but am now
confident that (2) comes free if IMA is enabled and properly configured.


(1) Arch-specific verification hook
If CONFIG_KEXEC_VERIFY_SIG is enabled, kexec_file_load() invokes an arch-
defined (and hence file-format-specific) hook function to check for the
validity of kernel binary.

On x86, a signature is embedded into a PE file (Microsoft's format) header
of binary. Since arm64's "Image" can also be seen as a PE file as far as
CONFIG_EFI is enabled, we adopt this format for kernel signing.  

As in the case of UEFI applications, we can create a signed kernel image:
    $ sbsign --key ${KEY} --cert ${CERT} Image

You may want to use certs/signing_key.pem, which is intended to be used
for module signing (CONFIG_MODULE_SIG), as ${KEY} and ${CERT} for test
purpose.


(2) IMA appraisal-based
IMA was first introduced in linux in order to meet TCG (Trusted Computing
Group) requirement that all the sensitive files be *measured* before
reading/executing them to detect any untrusted changes/modification.
Then appraisal feature, which allows us to ensure the integrity of
files and even prevent them from reading/executing, was added later.

Meanwhile, kexec_file_load() has been merged since v3.17 and evolved to
enable IMA-appraisal type verification by the commit b804defe4297 ("kexec:
replace call to copy_file_from_fd() with kernel version").

In this scheme, a signature will be stored in a extended file attribute,
"security.ima" while a decryption key is hold in a dedicated keyring,
".ima" or "_ima".  All the necessary process of verification is confined
in a secure API, kernel_read_file_from_fd(), called by kexec_file_load().

    Please note that powerpc is one of the two architectures now
    supporting KEXEC_FILE, and that it wishes to exntend IMA,
    where a signature may be appended to "vmlinux" file[5], like module
    signing, instead of using an extended file attribute.

While IMA meant to be used with TPM (Trusted Platform Module) on secure
platform, IMA is still usable without TPM. Here is an example procedure
about how we can give it a try to run the feature using a self-signed
root ca for demo/test purposes:

 1) Generate needed keys and certificates, following "Generate trusted
    keys" section in README of ima-evm-utils[6].

 2) Build the kernel with the following kernel configurations, specifying
    "ima-local-ca.pem" for CONFIG_SYSTEM_TRUSTED_KEYS:
	CONFIG_EXT4_FS_SECURITY
	CONFIG_INTEGRITY_SIGNATURE
	CONFIG_INTEGRITY_ASYMMETRIC_KEYS
	CONFIG_INTEGRITY_TRUSTED_KEYRING
	CONFIG_IMA
	CONFIG_IMA_WRITE_POLICY
	CONFIG_IMA_READ_POLICY
	CONFIG_IMA_APPRAISE
	CONFIG_IMA_APPRAISE_BOOTPARAM
	CONFIG_SYSTEM_TRUSTED_KEYS
    Please note that CONFIG_KEXEC_VERIFY_SIG is not, actually should
    not be, enabled.

 3) Sign(label) a kernel image binary to be kexec-ed on target filesystem:
    $ evmctl ima_sign --key /path/to/private_key.pem /your/Image

 4) Add a command line parameter and boot the kernel:
    ima_appraise=enforce

 On live system,
 5) Set a security policy:
    $ mount -t securityfs none /sys/kernel/security
    $ echo "appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig" \
      > /sys/kernel/security/ima/policy

 6) Add a key for ima:
    $ keyctl padd asymmetric my_ima_key %:.ima < /path/to/x509_ima.der
    (or evmctl import /path/to/x509_ima.der <ima_keyring_id>)

 7) Then try kexec as normal.


Concerns(or future works):
* Support for physical address randomization
* Signature verification of big endian kernel with CONFIG_KEXEC_VERIFY_SIG
  While big-endian kernel can support kernel signing, I'm not sure that
  Image can be recognized as in PE format because x86 standard only
  defines little-endian-based format.
* Support for vminux loading

  [1] http://git.linaro.org/people/takahiro.akashi/linux-aarch64.git
	branch:arm64/kexec_file
  [2] http://git.linaro.org/people/takahiro.akashi/kexec-tools.git
	branch:arm64/kexec_file
  [3] http://github.com/crash-utility/crash.git
  [4] https://sourceforge.net/p/linux-ima/wiki/Home/
  [5] http://lkml.iu.edu//hypermail/linux/kernel/1707.0/03669.html
  [6] https://sourceforge.net/p/linux-ima/ima-evm-utils/ci/master/tree/


Changes in v9 (April 25, 2018)
* rebased to v4.17-rc
* remove preparatory patches on generic/x86/ppc code
  They have now been merged in v4.17-rc1.
* allocate memory based on memblock list instead of system resources
  This will prevent reserved regions, particularly UEFI/ACPI data,
  from being corrupted.
* correct dt property names, linux,initrd-*, in newly-created dtb
  "linux," was missing.
* remove alignment requirement for initrd loading
* add kaslr (kernel virtual address randomization) support
* misc code clean-up
* revise commit messages

Changes in v8 (Feb 22, 2018)
* introduce ARCH_HAS_KEXEC_PURGATORY so that arm64 will be able to skip
  purgatory
* remove "ifdef CONFIG_X86_64" stuffs from a re-factored function,
  prepare_elf64_headers(), making its interface more generic
  (The original patch was split into two for easier reviews.)
* modify cpu_soft_restart() so as to let the 2nd kernel jump into its entry
  code directly without requiring purgatory in case of kexec_file_load
* remove CONFIG_KEXEC_FILE_IMAGE_FMT and introduce
  CONFIG_KEXEC_IMAGE_VERIFY_SIG, much similar to x86 but quite redundant
  for now.
* In addition, update/modify dependencies of KEXEC_IMAGE_VERIFY_SIG

Changes in v7 (Dec 4, 2017)
* rebased to v4.15-rc2
* re-organize the patch set to separate KEXEC_FILE_VERIFY_SIG-related
  code from the others
* revamp factored-out code in kernel/kexec_file.c due to the changes
  in original x86 code
* redefine walk_sys_ram_res_rev() prototype due to change of callback
  type in the counterpart, walk_sys_ram_res()
* make KEXEC_FILE_IMAGE_FMT default on if KEXEC_FILE selected

Changes in v6 (Oct 24, 2017)
* fix a for-loop bug in _kexec_kernel_image_probe() per Julien

Changes in v5 (Oct 10, 2017)
* fix kbuild errors around patch #3
per Julien's comments,
* fix a bug in walk_system_ram_res_rev() with some cleanup
* modify fdt_setprop_range() to use vmalloc()
* modify fill_property() to use memset()

Changes in v4 (Oct 2, 2017)
* reinstate x86's arch_kexec_kernel_image_load()
* rename weak arch_kexec_kernel_xxx() to _kexec_kernel_xxx() for
  better re-use
* constify kexec_file_loaders[]

Changes in v3 (Sep 15, 2017)
* fix kbuild test error
* factor out arch_kexec_kernel_*() & arch_kimage_file_post_load_cleanup()
* remove CONFIG_CRASH_CORE guard from kexec_file.c
* add vmapped kernel region to vmcore for gdb backtracing
  (see prepare_elf64_headers())
* merge asm/kexec_file.h into asm/kexec.h
* and some cleanups

Changes in v2 (Sep 8, 2017)
* move core-header-related functions from crash_core.c to kexec_file.c
* drop hash-check code from purgatory
* modify purgatory asm to remove arch_kexec_apply_relocations_add()
* drop older kernel support
* drop vmlinux support (at least, for this series)


Patch #1 to #10 are essential part for KEXEC_FILE support
(additionally allowing for IMA-based verification):
  Patch #1 to #6 are all preparatory patches on generic side.
  Patch #7 to #11 are to enable kexec_file_load on arm64.

Patch #12 to #13 are for KEXEC_VERIFY_SIG (arch-specific verification)
support

AKASHI Takahiro (11):
  asm-generic: add kexec_file_load system call to unistd.h
  kexec_file: make kexec_image_post_load_cleanup_default() global
  arm64: kexec_file: invoke the kernel without purgatory
  arm64: kexec_file: allocate memory walking through memblock list
  arm64: kexec_file: load initrd and device-tree
  arm64: kexec_file: allow for loading Image-format kernel
  arm64: kexec_file: add crash dump support
  arm64: enable KEXEC_FILE config
  include: pe.h: remove message[] from mz header definition
  arm64: kexec_file: add kernel signature verification support
  arm64: kexec_file: add kaslr support

 arch/arm64/Kconfig                     |  34 ++
 arch/arm64/include/asm/kexec.h         |  86 +++++
 arch/arm64/kernel/Makefile             |   3 +-
 arch/arm64/kernel/cpu-reset.S          |   6 +-
 arch/arm64/kernel/kexec_image.c        |  99 ++++++
 arch/arm64/kernel/machine_kexec.c      |  11 +-
 arch/arm64/kernel/machine_kexec_file.c | 427 +++++++++++++++++++++++++
 arch/arm64/kernel/relocate_kernel.S    |   3 +-
 include/linux/kexec.h                  |   1 +
 include/linux/pe.h                     |   2 +-
 include/uapi/asm-generic/unistd.h      |   4 +-
 kernel/kexec_file.c                    |   2 +-
 12 files changed, 668 insertions(+), 10 deletions(-)
 create mode 100644 arch/arm64/kernel/kexec_image.c
 create mode 100644 arch/arm64/kernel/machine_kexec_file.c

-- 
2.17.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 01/11] asm-generic: add kexec_file_load system call to unistd.h
  2018-04-25  6:26 ` AKASHI Takahiro
  (?)
@ 2018-04-25  6:26   ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd
  Cc: ard.biesheuvel, james.morse, bhsharma, kexec, linux-arm-kernel,
	linux-kernel, AKASHI Takahiro

The initial user of this system call number is arm64.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/uapi/asm-generic/unistd.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 8bcb186c6f67..745bad1d8269 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -732,9 +732,11 @@ __SYSCALL(__NR_pkey_alloc,    sys_pkey_alloc)
 __SYSCALL(__NR_pkey_free,     sys_pkey_free)
 #define __NR_statx 291
 __SYSCALL(__NR_statx,     sys_statx)
+#define __NR_kexec_file_load 292
+__SYSCALL(__NR_kexec_file_load,     sys_kexec_file_load)
 
 #undef __NR_syscalls
-#define __NR_syscalls 292
+#define __NR_syscalls 293
 
 /*
  * 32 bit systems traditionally used different
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 01/11] asm-generic: add kexec_file_load system call to unistd.h
@ 2018-04-25  6:26   ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: linux-arm-kernel

The initial user of this system call number is arm64.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/uapi/asm-generic/unistd.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 8bcb186c6f67..745bad1d8269 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -732,9 +732,11 @@ __SYSCALL(__NR_pkey_alloc,    sys_pkey_alloc)
 __SYSCALL(__NR_pkey_free,     sys_pkey_free)
 #define __NR_statx 291
 __SYSCALL(__NR_statx,     sys_statx)
+#define __NR_kexec_file_load 292
+__SYSCALL(__NR_kexec_file_load,     sys_kexec_file_load)
 
 #undef __NR_syscalls
-#define __NR_syscalls 292
+#define __NR_syscalls 293
 
 /*
  * 32 bit systems traditionally used different
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 01/11] asm-generic: add kexec_file_load system call to unistd.h
@ 2018-04-25  6:26   ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd
  Cc: ard.biesheuvel, bhsharma, kexec, linux-kernel, AKASHI Takahiro,
	james.morse, linux-arm-kernel

The initial user of this system call number is arm64.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/uapi/asm-generic/unistd.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 8bcb186c6f67..745bad1d8269 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -732,9 +732,11 @@ __SYSCALL(__NR_pkey_alloc,    sys_pkey_alloc)
 __SYSCALL(__NR_pkey_free,     sys_pkey_free)
 #define __NR_statx 291
 __SYSCALL(__NR_statx,     sys_statx)
+#define __NR_kexec_file_load 292
+__SYSCALL(__NR_kexec_file_load,     sys_kexec_file_load)
 
 #undef __NR_syscalls
-#define __NR_syscalls 292
+#define __NR_syscalls 293
 
 /*
  * 32 bit systems traditionally used different
-- 
2.17.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 02/11] kexec_file: make kexec_image_post_load_cleanup_default() global
  2018-04-25  6:26 ` AKASHI Takahiro
  (?)
@ 2018-04-25  6:26   ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd
  Cc: ard.biesheuvel, james.morse, bhsharma, kexec, linux-arm-kernel,
	linux-kernel, AKASHI Takahiro

Change this function from static to global so that arm64 can implement
its own arch_kimage_file_post_load_cleanup() later using
kexec_image_post_load_cleanup_default().

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
---
 include/linux/kexec.h | 1 +
 kernel/kexec_file.c   | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 9e4e638fb505..49ab758f4d91 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -143,6 +143,7 @@ extern const struct kexec_file_ops * const kexec_file_loaders[];
 
 int kexec_image_probe_default(struct kimage *image, void *buf,
 			      unsigned long buf_len);
+int kexec_image_post_load_cleanup_default(struct kimage *image);
 
 /**
  * struct kexec_buf - parameters for finding a place for a buffer in memory
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 75d8e7cf040e..eef89d9b1f03 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -78,7 +78,7 @@ void * __weak arch_kexec_kernel_image_load(struct kimage *image)
 	return kexec_image_load_default(image);
 }
 
-static int kexec_image_post_load_cleanup_default(struct kimage *image)
+int kexec_image_post_load_cleanup_default(struct kimage *image)
 {
 	if (!image->fops || !image->fops->cleanup)
 		return 0;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 02/11] kexec_file: make kexec_image_post_load_cleanup_default() global
@ 2018-04-25  6:26   ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: linux-arm-kernel

Change this function from static to global so that arm64 can implement
its own arch_kimage_file_post_load_cleanup() later using
kexec_image_post_load_cleanup_default().

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
---
 include/linux/kexec.h | 1 +
 kernel/kexec_file.c   | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 9e4e638fb505..49ab758f4d91 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -143,6 +143,7 @@ extern const struct kexec_file_ops * const kexec_file_loaders[];
 
 int kexec_image_probe_default(struct kimage *image, void *buf,
 			      unsigned long buf_len);
+int kexec_image_post_load_cleanup_default(struct kimage *image);
 
 /**
  * struct kexec_buf - parameters for finding a place for a buffer in memory
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 75d8e7cf040e..eef89d9b1f03 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -78,7 +78,7 @@ void * __weak arch_kexec_kernel_image_load(struct kimage *image)
 	return kexec_image_load_default(image);
 }
 
-static int kexec_image_post_load_cleanup_default(struct kimage *image)
+int kexec_image_post_load_cleanup_default(struct kimage *image)
 {
 	if (!image->fops || !image->fops->cleanup)
 		return 0;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 02/11] kexec_file: make kexec_image_post_load_cleanup_default() global
@ 2018-04-25  6:26   ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd
  Cc: ard.biesheuvel, bhsharma, kexec, linux-kernel, AKASHI Takahiro,
	james.morse, linux-arm-kernel

Change this function from static to global so that arm64 can implement
its own arch_kimage_file_post_load_cleanup() later using
kexec_image_post_load_cleanup_default().

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
---
 include/linux/kexec.h | 1 +
 kernel/kexec_file.c   | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 9e4e638fb505..49ab758f4d91 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -143,6 +143,7 @@ extern const struct kexec_file_ops * const kexec_file_loaders[];
 
 int kexec_image_probe_default(struct kimage *image, void *buf,
 			      unsigned long buf_len);
+int kexec_image_post_load_cleanup_default(struct kimage *image);
 
 /**
  * struct kexec_buf - parameters for finding a place for a buffer in memory
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 75d8e7cf040e..eef89d9b1f03 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -78,7 +78,7 @@ void * __weak arch_kexec_kernel_image_load(struct kimage *image)
 	return kexec_image_load_default(image);
 }
 
-static int kexec_image_post_load_cleanup_default(struct kimage *image)
+int kexec_image_post_load_cleanup_default(struct kimage *image)
 {
 	if (!image->fops || !image->fops->cleanup)
 		return 0;
-- 
2.17.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 03/11] arm64: kexec_file: invoke the kernel without purgatory
  2018-04-25  6:26 ` AKASHI Takahiro
  (?)
@ 2018-04-25  6:26   ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd
  Cc: ard.biesheuvel, james.morse, bhsharma, kexec, linux-arm-kernel,
	linux-kernel, AKASHI Takahiro

On arm64, purugatory would do almosty nothing. So just invoke secondary
kernel directy by jumping into its entry code.

While, in this case, cpu_soft_restart() must be called with dtb address
in the fifth argument, the behavior still stays compatible with kexec_load
case as long as the argument is null.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/kernel/cpu-reset.S       |  6 +++---
 arch/arm64/kernel/machine_kexec.c   | 11 +++++++++--
 arch/arm64/kernel/relocate_kernel.S |  3 ++-
 3 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/cpu-reset.S b/arch/arm64/kernel/cpu-reset.S
index 8021b46c9743..391df91328ac 100644
--- a/arch/arm64/kernel/cpu-reset.S
+++ b/arch/arm64/kernel/cpu-reset.S
@@ -24,9 +24,9 @@
  *
  * @el2_switch: Flag to indicate a swich to EL2 is needed.
  * @entry: Location to jump to for soft reset.
- * arg0: First argument passed to @entry.
- * arg1: Second argument passed to @entry.
- * arg2: Third argument passed to @entry.
+ * arg0: First argument passed to @entry. (relocation list)
+ * arg1: Second argument passed to @entry.(physcal kernel entry)
+ * arg2: Third argument passed to @entry. (physical dtb address)
  *
  * Put the CPU into the same state as it would be if it had been reset, and
  * branch to what would be the reset vector. It must be executed with the
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index f76ea92dff91..f7dbba00be10 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -205,10 +205,17 @@ void machine_kexec(struct kimage *kimage)
 	 * uses physical addressing to relocate the new image to its final
 	 * position and transfers control to the image entry point when the
 	 * relocation is complete.
+	 * In case of kexec_file_load syscall, we directly start the kernel,
+	 * skipping purgatory.
 	 */
-
 	cpu_soft_restart(kimage != kexec_crash_image,
-		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
+		reboot_code_buffer_phys, kimage->head, kimage->start,
+#ifdef CONFIG_KEXEC_FILE
+				kimage->purgatory_info.purgatory_buf ?
+						0 : kimage->arch.dtb_mem);
+#else
+				0);
+#endif
 
 	BUG(); /* Should never get here. */
 }
diff --git a/arch/arm64/kernel/relocate_kernel.S b/arch/arm64/kernel/relocate_kernel.S
index f407e422a720..95fd94209aae 100644
--- a/arch/arm64/kernel/relocate_kernel.S
+++ b/arch/arm64/kernel/relocate_kernel.S
@@ -32,6 +32,7 @@
 ENTRY(arm64_relocate_new_kernel)
 
 	/* Setup the list loop variables. */
+	mov	x18, x2				/* x18 = dtb address */
 	mov	x17, x1				/* x17 = kimage_start */
 	mov	x16, x0				/* x16 = kimage_head */
 	raw_dcache_line_size x15, x0		/* x15 = dcache line size */
@@ -107,7 +108,7 @@ ENTRY(arm64_relocate_new_kernel)
 	isb
 
 	/* Start new image. */
-	mov	x0, xzr
+	mov	x0, x18
 	mov	x1, xzr
 	mov	x2, xzr
 	mov	x3, xzr
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 03/11] arm64: kexec_file: invoke the kernel without purgatory
@ 2018-04-25  6:26   ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: linux-arm-kernel

On arm64, purugatory would do almosty nothing. So just invoke secondary
kernel directy by jumping into its entry code.

While, in this case, cpu_soft_restart() must be called with dtb address
in the fifth argument, the behavior still stays compatible with kexec_load
case as long as the argument is null.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/kernel/cpu-reset.S       |  6 +++---
 arch/arm64/kernel/machine_kexec.c   | 11 +++++++++--
 arch/arm64/kernel/relocate_kernel.S |  3 ++-
 3 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/cpu-reset.S b/arch/arm64/kernel/cpu-reset.S
index 8021b46c9743..391df91328ac 100644
--- a/arch/arm64/kernel/cpu-reset.S
+++ b/arch/arm64/kernel/cpu-reset.S
@@ -24,9 +24,9 @@
  *
  * @el2_switch: Flag to indicate a swich to EL2 is needed.
  * @entry: Location to jump to for soft reset.
- * arg0: First argument passed to @entry.
- * arg1: Second argument passed to @entry.
- * arg2: Third argument passed to @entry.
+ * arg0: First argument passed to @entry. (relocation list)
+ * arg1: Second argument passed to @entry.(physcal kernel entry)
+ * arg2: Third argument passed to @entry. (physical dtb address)
  *
  * Put the CPU into the same state as it would be if it had been reset, and
  * branch to what would be the reset vector. It must be executed with the
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index f76ea92dff91..f7dbba00be10 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -205,10 +205,17 @@ void machine_kexec(struct kimage *kimage)
 	 * uses physical addressing to relocate the new image to its final
 	 * position and transfers control to the image entry point when the
 	 * relocation is complete.
+	 * In case of kexec_file_load syscall, we directly start the kernel,
+	 * skipping purgatory.
 	 */
-
 	cpu_soft_restart(kimage != kexec_crash_image,
-		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
+		reboot_code_buffer_phys, kimage->head, kimage->start,
+#ifdef CONFIG_KEXEC_FILE
+				kimage->purgatory_info.purgatory_buf ?
+						0 : kimage->arch.dtb_mem);
+#else
+				0);
+#endif
 
 	BUG(); /* Should never get here. */
 }
diff --git a/arch/arm64/kernel/relocate_kernel.S b/arch/arm64/kernel/relocate_kernel.S
index f407e422a720..95fd94209aae 100644
--- a/arch/arm64/kernel/relocate_kernel.S
+++ b/arch/arm64/kernel/relocate_kernel.S
@@ -32,6 +32,7 @@
 ENTRY(arm64_relocate_new_kernel)
 
 	/* Setup the list loop variables. */
+	mov	x18, x2				/* x18 = dtb address */
 	mov	x17, x1				/* x17 = kimage_start */
 	mov	x16, x0				/* x16 = kimage_head */
 	raw_dcache_line_size x15, x0		/* x15 = dcache line size */
@@ -107,7 +108,7 @@ ENTRY(arm64_relocate_new_kernel)
 	isb
 
 	/* Start new image. */
-	mov	x0, xzr
+	mov	x0, x18
 	mov	x1, xzr
 	mov	x2, xzr
 	mov	x3, xzr
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 03/11] arm64: kexec_file: invoke the kernel without purgatory
@ 2018-04-25  6:26   ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd
  Cc: ard.biesheuvel, bhsharma, kexec, linux-kernel, AKASHI Takahiro,
	james.morse, linux-arm-kernel

On arm64, purugatory would do almosty nothing. So just invoke secondary
kernel directy by jumping into its entry code.

While, in this case, cpu_soft_restart() must be called with dtb address
in the fifth argument, the behavior still stays compatible with kexec_load
case as long as the argument is null.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/kernel/cpu-reset.S       |  6 +++---
 arch/arm64/kernel/machine_kexec.c   | 11 +++++++++--
 arch/arm64/kernel/relocate_kernel.S |  3 ++-
 3 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/cpu-reset.S b/arch/arm64/kernel/cpu-reset.S
index 8021b46c9743..391df91328ac 100644
--- a/arch/arm64/kernel/cpu-reset.S
+++ b/arch/arm64/kernel/cpu-reset.S
@@ -24,9 +24,9 @@
  *
  * @el2_switch: Flag to indicate a swich to EL2 is needed.
  * @entry: Location to jump to for soft reset.
- * arg0: First argument passed to @entry.
- * arg1: Second argument passed to @entry.
- * arg2: Third argument passed to @entry.
+ * arg0: First argument passed to @entry. (relocation list)
+ * arg1: Second argument passed to @entry.(physcal kernel entry)
+ * arg2: Third argument passed to @entry. (physical dtb address)
  *
  * Put the CPU into the same state as it would be if it had been reset, and
  * branch to what would be the reset vector. It must be executed with the
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index f76ea92dff91..f7dbba00be10 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -205,10 +205,17 @@ void machine_kexec(struct kimage *kimage)
 	 * uses physical addressing to relocate the new image to its final
 	 * position and transfers control to the image entry point when the
 	 * relocation is complete.
+	 * In case of kexec_file_load syscall, we directly start the kernel,
+	 * skipping purgatory.
 	 */
-
 	cpu_soft_restart(kimage != kexec_crash_image,
-		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
+		reboot_code_buffer_phys, kimage->head, kimage->start,
+#ifdef CONFIG_KEXEC_FILE
+				kimage->purgatory_info.purgatory_buf ?
+						0 : kimage->arch.dtb_mem);
+#else
+				0);
+#endif
 
 	BUG(); /* Should never get here. */
 }
diff --git a/arch/arm64/kernel/relocate_kernel.S b/arch/arm64/kernel/relocate_kernel.S
index f407e422a720..95fd94209aae 100644
--- a/arch/arm64/kernel/relocate_kernel.S
+++ b/arch/arm64/kernel/relocate_kernel.S
@@ -32,6 +32,7 @@
 ENTRY(arm64_relocate_new_kernel)
 
 	/* Setup the list loop variables. */
+	mov	x18, x2				/* x18 = dtb address */
 	mov	x17, x1				/* x17 = kimage_start */
 	mov	x16, x0				/* x16 = kimage_head */
 	raw_dcache_line_size x15, x0		/* x15 = dcache line size */
@@ -107,7 +108,7 @@ ENTRY(arm64_relocate_new_kernel)
 	isb
 
 	/* Start new image. */
-	mov	x0, xzr
+	mov	x0, x18
 	mov	x1, xzr
 	mov	x2, xzr
 	mov	x3, xzr
-- 
2.17.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
  2018-04-25  6:26 ` AKASHI Takahiro
  (?)
@ 2018-04-25  6:26   ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd
  Cc: ard.biesheuvel, james.morse, bhsharma, kexec, linux-arm-kernel,
	linux-kernel, AKASHI Takahiro

We need to prevent firmware-reserved memory regions, particularly EFI
memory map as well as ACPI tables, from being corrupted by loading
kernel/initrd (or other kexec buffers). We also want to support memory
allocation in top-down manner in addition to default bottom-up.
So let's have arm64 specific arch_kexec_walk_mem() which will search
for available memory ranges in usable memblock list,
i.e. !NOMAP & !reserved, instead of system resource tree.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/kernel/Makefile             |  3 +-
 arch/arm64/kernel/machine_kexec_file.c | 57 ++++++++++++++++++++++++++
 2 files changed, 59 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kernel/machine_kexec_file.c

diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index bf825f38d206..2f2b2757ae7a 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -48,8 +48,9 @@ arm64-obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL)	+= acpi_parking_protocol.o
 arm64-obj-$(CONFIG_PARAVIRT)		+= paravirt.o
 arm64-obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr.o
 arm64-obj-$(CONFIG_HIBERNATION)		+= hibernate.o hibernate-asm.o
-arm64-obj-$(CONFIG_KEXEC)		+= machine_kexec.o relocate_kernel.o	\
+arm64-obj-$(CONFIG_KEXEC_CORE)		+= machine_kexec.o relocate_kernel.o	\
 					   cpu-reset.o
+arm64-obj-$(CONFIG_KEXEC_FILE)		+= machine_kexec_file.o
 arm64-obj-$(CONFIG_ARM64_RELOC_TEST)	+= arm64-reloc-test.o
 arm64-reloc-test-y := reloc_test_core.o reloc_test_syms.o
 arm64-obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
new file mode 100644
index 000000000000..f9ebf54ca247
--- /dev/null
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * kexec_file for arm64
+ *
+ * Copyright (C) 2018 Linaro Limited
+ * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
+ *
+ * Most code is derived from arm64 port of kexec-tools
+ */
+
+#define pr_fmt(fmt) "kexec_file: " fmt
+
+#include <linux/ioport.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/memblock.h>
+
+int arch_kexec_walk_mem(struct kexec_buf *kbuf,
+				int (*func)(struct resource *, void *))
+{
+	phys_addr_t start, end;
+	struct resource res;
+	u64 i;
+	int ret = 0;
+
+	if (kbuf->image->type == KEXEC_TYPE_CRASH)
+		return func(&crashk_res, kbuf);
+
+	if (kbuf->top_down)
+		for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,
+				NUMA_NO_NODE, MEMBLOCK_NONE,
+				&start, &end, NULL) {
+			if (!memblock_is_map_memory(start))
+				continue;
+
+			res.start = start;
+			res.end = end;
+			ret = func(&res, kbuf);
+			if (ret)
+				break;
+		}
+	else
+		for_each_mem_range(i, &memblock.memory, &memblock.reserved,
+				NUMA_NO_NODE, MEMBLOCK_NONE,
+				&start, &end, NULL) {
+			if (!memblock_is_map_memory(start))
+				continue;
+
+			res.start = start;
+			res.end = end;
+			ret = func(&res, kbuf);
+			if (ret)
+				break;
+		}
+
+	return ret;
+}
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
@ 2018-04-25  6:26   ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: linux-arm-kernel

We need to prevent firmware-reserved memory regions, particularly EFI
memory map as well as ACPI tables, from being corrupted by loading
kernel/initrd (or other kexec buffers). We also want to support memory
allocation in top-down manner in addition to default bottom-up.
So let's have arm64 specific arch_kexec_walk_mem() which will search
for available memory ranges in usable memblock list,
i.e. !NOMAP & !reserved, instead of system resource tree.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/kernel/Makefile             |  3 +-
 arch/arm64/kernel/machine_kexec_file.c | 57 ++++++++++++++++++++++++++
 2 files changed, 59 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kernel/machine_kexec_file.c

diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index bf825f38d206..2f2b2757ae7a 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -48,8 +48,9 @@ arm64-obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL)	+= acpi_parking_protocol.o
 arm64-obj-$(CONFIG_PARAVIRT)		+= paravirt.o
 arm64-obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr.o
 arm64-obj-$(CONFIG_HIBERNATION)		+= hibernate.o hibernate-asm.o
-arm64-obj-$(CONFIG_KEXEC)		+= machine_kexec.o relocate_kernel.o	\
+arm64-obj-$(CONFIG_KEXEC_CORE)		+= machine_kexec.o relocate_kernel.o	\
 					   cpu-reset.o
+arm64-obj-$(CONFIG_KEXEC_FILE)		+= machine_kexec_file.o
 arm64-obj-$(CONFIG_ARM64_RELOC_TEST)	+= arm64-reloc-test.o
 arm64-reloc-test-y := reloc_test_core.o reloc_test_syms.o
 arm64-obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
new file mode 100644
index 000000000000..f9ebf54ca247
--- /dev/null
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * kexec_file for arm64
+ *
+ * Copyright (C) 2018 Linaro Limited
+ * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
+ *
+ * Most code is derived from arm64 port of kexec-tools
+ */
+
+#define pr_fmt(fmt) "kexec_file: " fmt
+
+#include <linux/ioport.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/memblock.h>
+
+int arch_kexec_walk_mem(struct kexec_buf *kbuf,
+				int (*func)(struct resource *, void *))
+{
+	phys_addr_t start, end;
+	struct resource res;
+	u64 i;
+	int ret = 0;
+
+	if (kbuf->image->type == KEXEC_TYPE_CRASH)
+		return func(&crashk_res, kbuf);
+
+	if (kbuf->top_down)
+		for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,
+				NUMA_NO_NODE, MEMBLOCK_NONE,
+				&start, &end, NULL) {
+			if (!memblock_is_map_memory(start))
+				continue;
+
+			res.start = start;
+			res.end = end;
+			ret = func(&res, kbuf);
+			if (ret)
+				break;
+		}
+	else
+		for_each_mem_range(i, &memblock.memory, &memblock.reserved,
+				NUMA_NO_NODE, MEMBLOCK_NONE,
+				&start, &end, NULL) {
+			if (!memblock_is_map_memory(start))
+				continue;
+
+			res.start = start;
+			res.end = end;
+			ret = func(&res, kbuf);
+			if (ret)
+				break;
+		}
+
+	return ret;
+}
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
@ 2018-04-25  6:26   ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd
  Cc: ard.biesheuvel, bhsharma, kexec, linux-kernel, AKASHI Takahiro,
	james.morse, linux-arm-kernel

We need to prevent firmware-reserved memory regions, particularly EFI
memory map as well as ACPI tables, from being corrupted by loading
kernel/initrd (or other kexec buffers). We also want to support memory
allocation in top-down manner in addition to default bottom-up.
So let's have arm64 specific arch_kexec_walk_mem() which will search
for available memory ranges in usable memblock list,
i.e. !NOMAP & !reserved, instead of system resource tree.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/kernel/Makefile             |  3 +-
 arch/arm64/kernel/machine_kexec_file.c | 57 ++++++++++++++++++++++++++
 2 files changed, 59 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kernel/machine_kexec_file.c

diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index bf825f38d206..2f2b2757ae7a 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -48,8 +48,9 @@ arm64-obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL)	+= acpi_parking_protocol.o
 arm64-obj-$(CONFIG_PARAVIRT)		+= paravirt.o
 arm64-obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr.o
 arm64-obj-$(CONFIG_HIBERNATION)		+= hibernate.o hibernate-asm.o
-arm64-obj-$(CONFIG_KEXEC)		+= machine_kexec.o relocate_kernel.o	\
+arm64-obj-$(CONFIG_KEXEC_CORE)		+= machine_kexec.o relocate_kernel.o	\
 					   cpu-reset.o
+arm64-obj-$(CONFIG_KEXEC_FILE)		+= machine_kexec_file.o
 arm64-obj-$(CONFIG_ARM64_RELOC_TEST)	+= arm64-reloc-test.o
 arm64-reloc-test-y := reloc_test_core.o reloc_test_syms.o
 arm64-obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
new file mode 100644
index 000000000000..f9ebf54ca247
--- /dev/null
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * kexec_file for arm64
+ *
+ * Copyright (C) 2018 Linaro Limited
+ * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
+ *
+ * Most code is derived from arm64 port of kexec-tools
+ */
+
+#define pr_fmt(fmt) "kexec_file: " fmt
+
+#include <linux/ioport.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/memblock.h>
+
+int arch_kexec_walk_mem(struct kexec_buf *kbuf,
+				int (*func)(struct resource *, void *))
+{
+	phys_addr_t start, end;
+	struct resource res;
+	u64 i;
+	int ret = 0;
+
+	if (kbuf->image->type == KEXEC_TYPE_CRASH)
+		return func(&crashk_res, kbuf);
+
+	if (kbuf->top_down)
+		for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,
+				NUMA_NO_NODE, MEMBLOCK_NONE,
+				&start, &end, NULL) {
+			if (!memblock_is_map_memory(start))
+				continue;
+
+			res.start = start;
+			res.end = end;
+			ret = func(&res, kbuf);
+			if (ret)
+				break;
+		}
+	else
+		for_each_mem_range(i, &memblock.memory, &memblock.reserved,
+				NUMA_NO_NODE, MEMBLOCK_NONE,
+				&start, &end, NULL) {
+			if (!memblock_is_map_memory(start))
+				continue;
+
+			res.start = start;
+			res.end = end;
+			ret = func(&res, kbuf);
+			if (ret)
+				break;
+		}
+
+	return ret;
+}
-- 
2.17.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 05/11] arm64: kexec_file: load initrd and device-tree
  2018-04-25  6:26 ` AKASHI Takahiro
  (?)
@ 2018-04-25  6:26   ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd
  Cc: ard.biesheuvel, james.morse, bhsharma, kexec, linux-arm-kernel,
	linux-kernel, AKASHI Takahiro

load_other_segments() is expected to allocate and place all the necessary
memory segments other than kernel, including initrd and device-tree
blob (and elf core header for crash).
While most of the code was borrowed from kexec-tools' counterpart,
users may not be allowed to specify dtb explicitly, instead, the dtb
presented by a boot loader is reused.

arch_kimage_kernel_post_load_cleanup() is responsible for freeing arm64-
specific data allocated in load_other_segments().

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/kexec.h         |  16 +++
 arch/arm64/kernel/machine_kexec_file.c | 160 +++++++++++++++++++++++++
 2 files changed, 176 insertions(+)

diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index e17f0529a882..e4de1223715f 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -93,6 +93,22 @@ static inline void crash_prepare_suspend(void) {}
 static inline void crash_post_resume(void) {}
 #endif
 
+#ifdef CONFIG_KEXEC_FILE
+#define ARCH_HAS_KIMAGE_ARCH
+
+struct kimage_arch {
+	int kern_segment;
+	phys_addr_t dtb_mem;
+	void *dtb_buf;
+};
+
+struct kimage;
+
+extern int load_other_segments(struct kimage *image,
+		char *initrd, unsigned long initrd_len,
+		char *cmdline, unsigned long cmdline_len);
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index f9ebf54ca247..b3b9b1725d8a 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -13,7 +13,26 @@
 #include <linux/ioport.h>
 #include <linux/kernel.h>
 #include <linux/kexec.h>
+#include <linux/libfdt.h>
 #include <linux/memblock.h>
+#include <linux/of_fdt.h>
+#include <linux/types.h>
+#include <asm/byteorder.h>
+
+static int __dt_root_addr_cells;
+static int __dt_root_size_cells;
+
+const struct kexec_file_ops * const kexec_file_loaders[] = {
+	NULL
+};
+
+int arch_kimage_file_post_load_cleanup(struct kimage *image)
+{
+	vfree(image->arch.dtb_buf);
+	image->arch.dtb_buf = NULL;
+
+	return kexec_image_post_load_cleanup_default(image);
+}
 
 int arch_kexec_walk_mem(struct kexec_buf *kbuf,
 				int (*func)(struct resource *, void *))
@@ -55,3 +74,144 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
 
 	return ret;
 }
+
+static int setup_dtb(struct kimage *image,
+		unsigned long initrd_load_addr, unsigned long initrd_len,
+		char *cmdline, unsigned long cmdline_len,
+		char **dtb_buf, size_t *dtb_buf_len)
+{
+	char *buf = NULL;
+	size_t buf_size;
+	int nodeoffset;
+	u64 value;
+	int range_len;
+	int ret;
+
+	/* duplicate dt blob */
+	buf_size = fdt_totalsize(initial_boot_params);
+	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
+
+	if (initrd_load_addr)
+		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
+				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
+
+	if (cmdline)
+		buf_size += fdt_prop_len("bootargs", cmdline_len + 1);
+
+	buf = vmalloc(buf_size);
+	if (!buf) {
+		ret = -ENOMEM;
+		goto out_err;
+	}
+
+	ret = fdt_open_into(initial_boot_params, buf, buf_size);
+	if (ret)
+		goto out_err;
+
+	nodeoffset = fdt_path_offset(buf, "/chosen");
+	if (nodeoffset < 0)
+		goto out_err;
+
+	/* add bootargs */
+	if (cmdline) {
+		ret = fdt_setprop(buf, nodeoffset, "bootargs",
+						cmdline, cmdline_len + 1);
+		if (ret)
+			goto out_err;
+	}
+
+	/* add initrd-* */
+	if (initrd_load_addr) {
+		value = cpu_to_fdt64(initrd_load_addr);
+		ret = fdt_setprop(buf, nodeoffset, "linux,initrd-start",
+				&value, sizeof(value));
+		if (ret)
+			goto out_err;
+
+		value = cpu_to_fdt64(initrd_load_addr + initrd_len);
+		ret = fdt_setprop(buf, nodeoffset, "linux,initrd-end",
+				&value, sizeof(value));
+		if (ret)
+			goto out_err;
+	}
+
+	/* trim a buffer */
+	fdt_pack(buf);
+	*dtb_buf = buf;
+	*dtb_buf_len = fdt_totalsize(buf);
+
+	return 0;
+
+out_err:
+	vfree(buf);
+	return ret;
+}
+
+int load_other_segments(struct kimage *image,
+			char *initrd, unsigned long initrd_len,
+			char *cmdline, unsigned long cmdline_len)
+{
+	struct kexec_segment *kern_seg;
+	struct kexec_buf kbuf;
+	unsigned long initrd_load_addr = 0;
+	char *dtb = NULL;
+	unsigned long dtb_len = 0;
+	int ret = 0;
+
+	kern_seg = &image->segment[image->arch.kern_segment];
+	kbuf.image = image;
+	/* not allocate anything below the kernel */
+	kbuf.buf_min = kern_seg->mem + kern_seg->memsz;
+
+	/* load initrd */
+	if (initrd) {
+		kbuf.buffer = initrd;
+		kbuf.bufsz = initrd_len;
+		kbuf.memsz = initrd_len;
+		kbuf.buf_align = 0;
+		/* within 1GB-aligned window of up to 32GB in size */
+		kbuf.buf_max = round_down(kern_seg->mem, SZ_1G)
+						+ (unsigned long)SZ_1G * 32;
+		kbuf.top_down = false;
+
+		ret = kexec_add_buffer(&kbuf);
+		if (ret)
+			goto out_err;
+		initrd_load_addr = kbuf.mem;
+
+		pr_debug("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
+				initrd_load_addr, initrd_len, initrd_len);
+	}
+
+	/* load dtb blob */
+	ret = setup_dtb(image, initrd_load_addr, initrd_len,
+				cmdline, cmdline_len, &dtb, &dtb_len);
+	if (ret) {
+		pr_err("Preparing for new dtb failed\n");
+		goto out_err;
+	}
+
+	kbuf.buffer = dtb;
+	kbuf.bufsz = dtb_len;
+	kbuf.memsz = dtb_len;
+	/* not across 2MB boundary */
+	kbuf.buf_align = SZ_2M;
+	kbuf.buf_max = ULONG_MAX;
+	kbuf.top_down = true;
+
+	ret = kexec_add_buffer(&kbuf);
+	if (ret)
+		goto out_err;
+	image->arch.dtb_mem = kbuf.mem;
+	image->arch.dtb_buf = dtb;
+
+	pr_debug("Loaded dtb at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
+			kbuf.mem, dtb_len, dtb_len);
+
+	return 0;
+
+out_err:
+	vfree(dtb);
+	image->arch.dtb_buf = NULL;
+	return ret;
+}
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 05/11] arm64: kexec_file: load initrd and device-tree
@ 2018-04-25  6:26   ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: linux-arm-kernel

load_other_segments() is expected to allocate and place all the necessary
memory segments other than kernel, including initrd and device-tree
blob (and elf core header for crash).
While most of the code was borrowed from kexec-tools' counterpart,
users may not be allowed to specify dtb explicitly, instead, the dtb
presented by a boot loader is reused.

arch_kimage_kernel_post_load_cleanup() is responsible for freeing arm64-
specific data allocated in load_other_segments().

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/kexec.h         |  16 +++
 arch/arm64/kernel/machine_kexec_file.c | 160 +++++++++++++++++++++++++
 2 files changed, 176 insertions(+)

diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index e17f0529a882..e4de1223715f 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -93,6 +93,22 @@ static inline void crash_prepare_suspend(void) {}
 static inline void crash_post_resume(void) {}
 #endif
 
+#ifdef CONFIG_KEXEC_FILE
+#define ARCH_HAS_KIMAGE_ARCH
+
+struct kimage_arch {
+	int kern_segment;
+	phys_addr_t dtb_mem;
+	void *dtb_buf;
+};
+
+struct kimage;
+
+extern int load_other_segments(struct kimage *image,
+		char *initrd, unsigned long initrd_len,
+		char *cmdline, unsigned long cmdline_len);
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index f9ebf54ca247..b3b9b1725d8a 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -13,7 +13,26 @@
 #include <linux/ioport.h>
 #include <linux/kernel.h>
 #include <linux/kexec.h>
+#include <linux/libfdt.h>
 #include <linux/memblock.h>
+#include <linux/of_fdt.h>
+#include <linux/types.h>
+#include <asm/byteorder.h>
+
+static int __dt_root_addr_cells;
+static int __dt_root_size_cells;
+
+const struct kexec_file_ops * const kexec_file_loaders[] = {
+	NULL
+};
+
+int arch_kimage_file_post_load_cleanup(struct kimage *image)
+{
+	vfree(image->arch.dtb_buf);
+	image->arch.dtb_buf = NULL;
+
+	return kexec_image_post_load_cleanup_default(image);
+}
 
 int arch_kexec_walk_mem(struct kexec_buf *kbuf,
 				int (*func)(struct resource *, void *))
@@ -55,3 +74,144 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
 
 	return ret;
 }
+
+static int setup_dtb(struct kimage *image,
+		unsigned long initrd_load_addr, unsigned long initrd_len,
+		char *cmdline, unsigned long cmdline_len,
+		char **dtb_buf, size_t *dtb_buf_len)
+{
+	char *buf = NULL;
+	size_t buf_size;
+	int nodeoffset;
+	u64 value;
+	int range_len;
+	int ret;
+
+	/* duplicate dt blob */
+	buf_size = fdt_totalsize(initial_boot_params);
+	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
+
+	if (initrd_load_addr)
+		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
+				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
+
+	if (cmdline)
+		buf_size += fdt_prop_len("bootargs", cmdline_len + 1);
+
+	buf = vmalloc(buf_size);
+	if (!buf) {
+		ret = -ENOMEM;
+		goto out_err;
+	}
+
+	ret = fdt_open_into(initial_boot_params, buf, buf_size);
+	if (ret)
+		goto out_err;
+
+	nodeoffset = fdt_path_offset(buf, "/chosen");
+	if (nodeoffset < 0)
+		goto out_err;
+
+	/* add bootargs */
+	if (cmdline) {
+		ret = fdt_setprop(buf, nodeoffset, "bootargs",
+						cmdline, cmdline_len + 1);
+		if (ret)
+			goto out_err;
+	}
+
+	/* add initrd-* */
+	if (initrd_load_addr) {
+		value = cpu_to_fdt64(initrd_load_addr);
+		ret = fdt_setprop(buf, nodeoffset, "linux,initrd-start",
+				&value, sizeof(value));
+		if (ret)
+			goto out_err;
+
+		value = cpu_to_fdt64(initrd_load_addr + initrd_len);
+		ret = fdt_setprop(buf, nodeoffset, "linux,initrd-end",
+				&value, sizeof(value));
+		if (ret)
+			goto out_err;
+	}
+
+	/* trim a buffer */
+	fdt_pack(buf);
+	*dtb_buf = buf;
+	*dtb_buf_len = fdt_totalsize(buf);
+
+	return 0;
+
+out_err:
+	vfree(buf);
+	return ret;
+}
+
+int load_other_segments(struct kimage *image,
+			char *initrd, unsigned long initrd_len,
+			char *cmdline, unsigned long cmdline_len)
+{
+	struct kexec_segment *kern_seg;
+	struct kexec_buf kbuf;
+	unsigned long initrd_load_addr = 0;
+	char *dtb = NULL;
+	unsigned long dtb_len = 0;
+	int ret = 0;
+
+	kern_seg = &image->segment[image->arch.kern_segment];
+	kbuf.image = image;
+	/* not allocate anything below the kernel */
+	kbuf.buf_min = kern_seg->mem + kern_seg->memsz;
+
+	/* load initrd */
+	if (initrd) {
+		kbuf.buffer = initrd;
+		kbuf.bufsz = initrd_len;
+		kbuf.memsz = initrd_len;
+		kbuf.buf_align = 0;
+		/* within 1GB-aligned window of up to 32GB in size */
+		kbuf.buf_max = round_down(kern_seg->mem, SZ_1G)
+						+ (unsigned long)SZ_1G * 32;
+		kbuf.top_down = false;
+
+		ret = kexec_add_buffer(&kbuf);
+		if (ret)
+			goto out_err;
+		initrd_load_addr = kbuf.mem;
+
+		pr_debug("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
+				initrd_load_addr, initrd_len, initrd_len);
+	}
+
+	/* load dtb blob */
+	ret = setup_dtb(image, initrd_load_addr, initrd_len,
+				cmdline, cmdline_len, &dtb, &dtb_len);
+	if (ret) {
+		pr_err("Preparing for new dtb failed\n");
+		goto out_err;
+	}
+
+	kbuf.buffer = dtb;
+	kbuf.bufsz = dtb_len;
+	kbuf.memsz = dtb_len;
+	/* not across 2MB boundary */
+	kbuf.buf_align = SZ_2M;
+	kbuf.buf_max = ULONG_MAX;
+	kbuf.top_down = true;
+
+	ret = kexec_add_buffer(&kbuf);
+	if (ret)
+		goto out_err;
+	image->arch.dtb_mem = kbuf.mem;
+	image->arch.dtb_buf = dtb;
+
+	pr_debug("Loaded dtb at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
+			kbuf.mem, dtb_len, dtb_len);
+
+	return 0;
+
+out_err:
+	vfree(dtb);
+	image->arch.dtb_buf = NULL;
+	return ret;
+}
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 05/11] arm64: kexec_file: load initrd and device-tree
@ 2018-04-25  6:26   ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd
  Cc: ard.biesheuvel, bhsharma, kexec, linux-kernel, AKASHI Takahiro,
	james.morse, linux-arm-kernel

load_other_segments() is expected to allocate and place all the necessary
memory segments other than kernel, including initrd and device-tree
blob (and elf core header for crash).
While most of the code was borrowed from kexec-tools' counterpart,
users may not be allowed to specify dtb explicitly, instead, the dtb
presented by a boot loader is reused.

arch_kimage_kernel_post_load_cleanup() is responsible for freeing arm64-
specific data allocated in load_other_segments().

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/kexec.h         |  16 +++
 arch/arm64/kernel/machine_kexec_file.c | 160 +++++++++++++++++++++++++
 2 files changed, 176 insertions(+)

diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index e17f0529a882..e4de1223715f 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -93,6 +93,22 @@ static inline void crash_prepare_suspend(void) {}
 static inline void crash_post_resume(void) {}
 #endif
 
+#ifdef CONFIG_KEXEC_FILE
+#define ARCH_HAS_KIMAGE_ARCH
+
+struct kimage_arch {
+	int kern_segment;
+	phys_addr_t dtb_mem;
+	void *dtb_buf;
+};
+
+struct kimage;
+
+extern int load_other_segments(struct kimage *image,
+		char *initrd, unsigned long initrd_len,
+		char *cmdline, unsigned long cmdline_len);
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index f9ebf54ca247..b3b9b1725d8a 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -13,7 +13,26 @@
 #include <linux/ioport.h>
 #include <linux/kernel.h>
 #include <linux/kexec.h>
+#include <linux/libfdt.h>
 #include <linux/memblock.h>
+#include <linux/of_fdt.h>
+#include <linux/types.h>
+#include <asm/byteorder.h>
+
+static int __dt_root_addr_cells;
+static int __dt_root_size_cells;
+
+const struct kexec_file_ops * const kexec_file_loaders[] = {
+	NULL
+};
+
+int arch_kimage_file_post_load_cleanup(struct kimage *image)
+{
+	vfree(image->arch.dtb_buf);
+	image->arch.dtb_buf = NULL;
+
+	return kexec_image_post_load_cleanup_default(image);
+}
 
 int arch_kexec_walk_mem(struct kexec_buf *kbuf,
 				int (*func)(struct resource *, void *))
@@ -55,3 +74,144 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
 
 	return ret;
 }
+
+static int setup_dtb(struct kimage *image,
+		unsigned long initrd_load_addr, unsigned long initrd_len,
+		char *cmdline, unsigned long cmdline_len,
+		char **dtb_buf, size_t *dtb_buf_len)
+{
+	char *buf = NULL;
+	size_t buf_size;
+	int nodeoffset;
+	u64 value;
+	int range_len;
+	int ret;
+
+	/* duplicate dt blob */
+	buf_size = fdt_totalsize(initial_boot_params);
+	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
+
+	if (initrd_load_addr)
+		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
+				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
+
+	if (cmdline)
+		buf_size += fdt_prop_len("bootargs", cmdline_len + 1);
+
+	buf = vmalloc(buf_size);
+	if (!buf) {
+		ret = -ENOMEM;
+		goto out_err;
+	}
+
+	ret = fdt_open_into(initial_boot_params, buf, buf_size);
+	if (ret)
+		goto out_err;
+
+	nodeoffset = fdt_path_offset(buf, "/chosen");
+	if (nodeoffset < 0)
+		goto out_err;
+
+	/* add bootargs */
+	if (cmdline) {
+		ret = fdt_setprop(buf, nodeoffset, "bootargs",
+						cmdline, cmdline_len + 1);
+		if (ret)
+			goto out_err;
+	}
+
+	/* add initrd-* */
+	if (initrd_load_addr) {
+		value = cpu_to_fdt64(initrd_load_addr);
+		ret = fdt_setprop(buf, nodeoffset, "linux,initrd-start",
+				&value, sizeof(value));
+		if (ret)
+			goto out_err;
+
+		value = cpu_to_fdt64(initrd_load_addr + initrd_len);
+		ret = fdt_setprop(buf, nodeoffset, "linux,initrd-end",
+				&value, sizeof(value));
+		if (ret)
+			goto out_err;
+	}
+
+	/* trim a buffer */
+	fdt_pack(buf);
+	*dtb_buf = buf;
+	*dtb_buf_len = fdt_totalsize(buf);
+
+	return 0;
+
+out_err:
+	vfree(buf);
+	return ret;
+}
+
+int load_other_segments(struct kimage *image,
+			char *initrd, unsigned long initrd_len,
+			char *cmdline, unsigned long cmdline_len)
+{
+	struct kexec_segment *kern_seg;
+	struct kexec_buf kbuf;
+	unsigned long initrd_load_addr = 0;
+	char *dtb = NULL;
+	unsigned long dtb_len = 0;
+	int ret = 0;
+
+	kern_seg = &image->segment[image->arch.kern_segment];
+	kbuf.image = image;
+	/* not allocate anything below the kernel */
+	kbuf.buf_min = kern_seg->mem + kern_seg->memsz;
+
+	/* load initrd */
+	if (initrd) {
+		kbuf.buffer = initrd;
+		kbuf.bufsz = initrd_len;
+		kbuf.memsz = initrd_len;
+		kbuf.buf_align = 0;
+		/* within 1GB-aligned window of up to 32GB in size */
+		kbuf.buf_max = round_down(kern_seg->mem, SZ_1G)
+						+ (unsigned long)SZ_1G * 32;
+		kbuf.top_down = false;
+
+		ret = kexec_add_buffer(&kbuf);
+		if (ret)
+			goto out_err;
+		initrd_load_addr = kbuf.mem;
+
+		pr_debug("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
+				initrd_load_addr, initrd_len, initrd_len);
+	}
+
+	/* load dtb blob */
+	ret = setup_dtb(image, initrd_load_addr, initrd_len,
+				cmdline, cmdline_len, &dtb, &dtb_len);
+	if (ret) {
+		pr_err("Preparing for new dtb failed\n");
+		goto out_err;
+	}
+
+	kbuf.buffer = dtb;
+	kbuf.bufsz = dtb_len;
+	kbuf.memsz = dtb_len;
+	/* not across 2MB boundary */
+	kbuf.buf_align = SZ_2M;
+	kbuf.buf_max = ULONG_MAX;
+	kbuf.top_down = true;
+
+	ret = kexec_add_buffer(&kbuf);
+	if (ret)
+		goto out_err;
+	image->arch.dtb_mem = kbuf.mem;
+	image->arch.dtb_buf = dtb;
+
+	pr_debug("Loaded dtb at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
+			kbuf.mem, dtb_len, dtb_len);
+
+	return 0;
+
+out_err:
+	vfree(dtb);
+	image->arch.dtb_buf = NULL;
+	return ret;
+}
-- 
2.17.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 06/11] arm64: kexec_file: allow for loading Image-format kernel
  2018-04-25  6:26 ` AKASHI Takahiro
  (?)
@ 2018-04-25  6:26   ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd
  Cc: ard.biesheuvel, james.morse, bhsharma, kexec, linux-arm-kernel,
	linux-kernel, AKASHI Takahiro

This patch provides kexec_file_ops for "Image"-format kernel. In this
implementation, a binary is always loaded with a fixed offset identified
in text_offset field of its header.

Regarding signature verification for trusted boot, this patch doesn't
contains CONFIG_KEXEC_VERIFY_SIG support, which is to be added later
in this series, but file-attribute-based verification is still a viable
option by enabling IMA security subsystem.

You can sign(label) a to-be-kexec'ed kernel image on target file system
with:
    $ evmctl ima_sign --key /path/to/private_key.pem Image

On live system, you must have IMA enforced with, at least, the following
security policy:
    "appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig"

See more details about IMA here:
    https://sourceforge.net/p/linux-ima/wiki/Home/

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/kexec.h         | 50 ++++++++++++++++
 arch/arm64/kernel/Makefile             |  2 +-
 arch/arm64/kernel/kexec_image.c        | 79 ++++++++++++++++++++++++++
 arch/arm64/kernel/machine_kexec_file.c |  1 +
 4 files changed, 131 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kernel/kexec_image.c

diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index e4de1223715f..3cba4161818a 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -102,6 +102,56 @@ struct kimage_arch {
 	void *dtb_buf;
 };
 
+/**
+ * struct arm64_image_header - arm64 kernel image header
+ *
+ * @pe_sig: Optional PE format 'MZ' signature
+ * @branch_code: Instruction to branch to stext
+ * @text_offset: Image load offset, little endian
+ * @image_size: Effective image size, little endian
+ * @flags:
+ *	Bit 0: Kernel endianness. 0=little endian, 1=big endian
+ * @reserved: Reserved
+ * @magic: Magic number, "ARM\x64"
+ * @pe_header: Optional offset to a PE format header
+ **/
+
+struct arm64_image_header {
+	u8 pe_sig[2];
+	u8 pad[2];
+	u32 branch_code;
+	u64 text_offset;
+	u64 image_size;
+	u64 flags;
+	u64 reserved[3];
+	u8 magic[4];
+	u32 pe_header;
+};
+
+static const u8 arm64_image_magic[4] = {'A', 'R', 'M', 0x64U};
+
+/**
+ * arm64_header_check_magic - Helper to check the arm64 image header.
+ *
+ * Returns non-zero if header is OK.
+ */
+
+static inline int arm64_header_check_magic(const struct arm64_image_header *h)
+{
+	if (!h)
+		return 0;
+
+	if (!h->text_offset)
+		return 0;
+
+	return (h->magic[0] == arm64_image_magic[0]
+		&& h->magic[1] == arm64_image_magic[1]
+		&& h->magic[2] == arm64_image_magic[2]
+		&& h->magic[3] == arm64_image_magic[3]);
+}
+
+extern const struct kexec_file_ops kexec_image_ops;
+
 struct kimage;
 
 extern int load_other_segments(struct kimage *image,
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 2f2b2757ae7a..1e110aa571dd 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -50,7 +50,7 @@ arm64-obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr.o
 arm64-obj-$(CONFIG_HIBERNATION)		+= hibernate.o hibernate-asm.o
 arm64-obj-$(CONFIG_KEXEC_CORE)		+= machine_kexec.o relocate_kernel.o	\
 					   cpu-reset.o
-arm64-obj-$(CONFIG_KEXEC_FILE)		+= machine_kexec_file.o
+arm64-obj-$(CONFIG_KEXEC_FILE)		+= machine_kexec_file.o kexec_image.o
 arm64-obj-$(CONFIG_ARM64_RELOC_TEST)	+= arm64-reloc-test.o
 arm64-reloc-test-y := reloc_test_core.o reloc_test_syms.o
 arm64-obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o
diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
new file mode 100644
index 000000000000..4dd524ad6611
--- /dev/null
+++ b/arch/arm64/kernel/kexec_image.c
@@ -0,0 +1,79 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Kexec image loader
+
+ * Copyright (C) 2018 Linaro Limited
+ * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
+ */
+
+#define pr_fmt(fmt)	"kexec_file(Image): " fmt
+
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <asm/byteorder.h>
+#include <asm/memory.h>
+
+static int image_probe(const char *kernel_buf, unsigned long kernel_len)
+{
+	const struct arm64_image_header *h;
+
+	h = (const struct arm64_image_header *)(kernel_buf);
+
+	if ((kernel_len < sizeof(*h)) || !arm64_header_check_magic(h))
+		return -EINVAL;
+
+	return 0;
+}
+
+static void *image_load(struct kimage *image,
+				char *kernel, unsigned long kernel_len,
+				char *initrd, unsigned long initrd_len,
+				char *cmdline, unsigned long cmdline_len)
+{
+	struct kexec_buf kbuf;
+	struct arm64_image_header *h = (struct arm64_image_header *)kernel;
+	unsigned long text_offset;
+	int ret;
+
+	/* Load the kernel */
+	kbuf.image = image;
+	kbuf.buf_min = 0;
+	kbuf.buf_max = ULONG_MAX;
+	kbuf.top_down = false;
+
+	kbuf.buffer = kernel;
+	kbuf.bufsz = kernel_len;
+	kbuf.memsz = le64_to_cpu(h->image_size);
+	text_offset = le64_to_cpu(h->text_offset);
+	kbuf.buf_align = SZ_2M;
+
+	/* Adjust kernel segment with TEXT_OFFSET */
+	kbuf.memsz += text_offset;
+
+	ret = kexec_add_buffer(&kbuf);
+	if (ret)
+		goto out;
+
+	image->arch.kern_segment = image->nr_segments - 1;
+	image->segment[image->arch.kern_segment].mem += text_offset;
+	image->segment[image->arch.kern_segment].memsz -= text_offset;
+	image->start = image->segment[image->arch.kern_segment].mem;
+
+	pr_debug("Loaded kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
+				image->segment[image->arch.kern_segment].mem,
+				kbuf.bufsz, kbuf.memsz);
+
+	/* Load additional data */
+	ret = load_other_segments(image, initrd, initrd_len,
+				cmdline, cmdline_len);
+
+out:
+	return ERR_PTR(ret);
+}
+
+const struct kexec_file_ops kexec_image_ops = {
+	.probe = image_probe,
+	.load = image_load,
+};
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index b3b9b1725d8a..37c0a9dc2e47 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -23,6 +23,7 @@ static int __dt_root_addr_cells;
 static int __dt_root_size_cells;
 
 const struct kexec_file_ops * const kexec_file_loaders[] = {
+	&kexec_image_ops,
 	NULL
 };
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 06/11] arm64: kexec_file: allow for loading Image-format kernel
@ 2018-04-25  6:26   ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: linux-arm-kernel

This patch provides kexec_file_ops for "Image"-format kernel. In this
implementation, a binary is always loaded with a fixed offset identified
in text_offset field of its header.

Regarding signature verification for trusted boot, this patch doesn't
contains CONFIG_KEXEC_VERIFY_SIG support, which is to be added later
in this series, but file-attribute-based verification is still a viable
option by enabling IMA security subsystem.

You can sign(label) a to-be-kexec'ed kernel image on target file system
with:
    $ evmctl ima_sign --key /path/to/private_key.pem Image

On live system, you must have IMA enforced with, at least, the following
security policy:
    "appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig"

See more details about IMA here:
    https://sourceforge.net/p/linux-ima/wiki/Home/

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/kexec.h         | 50 ++++++++++++++++
 arch/arm64/kernel/Makefile             |  2 +-
 arch/arm64/kernel/kexec_image.c        | 79 ++++++++++++++++++++++++++
 arch/arm64/kernel/machine_kexec_file.c |  1 +
 4 files changed, 131 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kernel/kexec_image.c

diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index e4de1223715f..3cba4161818a 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -102,6 +102,56 @@ struct kimage_arch {
 	void *dtb_buf;
 };
 
+/**
+ * struct arm64_image_header - arm64 kernel image header
+ *
+ * @pe_sig: Optional PE format 'MZ' signature
+ * @branch_code: Instruction to branch to stext
+ * @text_offset: Image load offset, little endian
+ * @image_size: Effective image size, little endian
+ * @flags:
+ *	Bit 0: Kernel endianness. 0=little endian, 1=big endian
+ * @reserved: Reserved
+ * @magic: Magic number, "ARM\x64"
+ * @pe_header: Optional offset to a PE format header
+ **/
+
+struct arm64_image_header {
+	u8 pe_sig[2];
+	u8 pad[2];
+	u32 branch_code;
+	u64 text_offset;
+	u64 image_size;
+	u64 flags;
+	u64 reserved[3];
+	u8 magic[4];
+	u32 pe_header;
+};
+
+static const u8 arm64_image_magic[4] = {'A', 'R', 'M', 0x64U};
+
+/**
+ * arm64_header_check_magic - Helper to check the arm64 image header.
+ *
+ * Returns non-zero if header is OK.
+ */
+
+static inline int arm64_header_check_magic(const struct arm64_image_header *h)
+{
+	if (!h)
+		return 0;
+
+	if (!h->text_offset)
+		return 0;
+
+	return (h->magic[0] == arm64_image_magic[0]
+		&& h->magic[1] == arm64_image_magic[1]
+		&& h->magic[2] == arm64_image_magic[2]
+		&& h->magic[3] == arm64_image_magic[3]);
+}
+
+extern const struct kexec_file_ops kexec_image_ops;
+
 struct kimage;
 
 extern int load_other_segments(struct kimage *image,
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 2f2b2757ae7a..1e110aa571dd 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -50,7 +50,7 @@ arm64-obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr.o
 arm64-obj-$(CONFIG_HIBERNATION)		+= hibernate.o hibernate-asm.o
 arm64-obj-$(CONFIG_KEXEC_CORE)		+= machine_kexec.o relocate_kernel.o	\
 					   cpu-reset.o
-arm64-obj-$(CONFIG_KEXEC_FILE)		+= machine_kexec_file.o
+arm64-obj-$(CONFIG_KEXEC_FILE)		+= machine_kexec_file.o kexec_image.o
 arm64-obj-$(CONFIG_ARM64_RELOC_TEST)	+= arm64-reloc-test.o
 arm64-reloc-test-y := reloc_test_core.o reloc_test_syms.o
 arm64-obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o
diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
new file mode 100644
index 000000000000..4dd524ad6611
--- /dev/null
+++ b/arch/arm64/kernel/kexec_image.c
@@ -0,0 +1,79 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Kexec image loader
+
+ * Copyright (C) 2018 Linaro Limited
+ * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
+ */
+
+#define pr_fmt(fmt)	"kexec_file(Image): " fmt
+
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <asm/byteorder.h>
+#include <asm/memory.h>
+
+static int image_probe(const char *kernel_buf, unsigned long kernel_len)
+{
+	const struct arm64_image_header *h;
+
+	h = (const struct arm64_image_header *)(kernel_buf);
+
+	if ((kernel_len < sizeof(*h)) || !arm64_header_check_magic(h))
+		return -EINVAL;
+
+	return 0;
+}
+
+static void *image_load(struct kimage *image,
+				char *kernel, unsigned long kernel_len,
+				char *initrd, unsigned long initrd_len,
+				char *cmdline, unsigned long cmdline_len)
+{
+	struct kexec_buf kbuf;
+	struct arm64_image_header *h = (struct arm64_image_header *)kernel;
+	unsigned long text_offset;
+	int ret;
+
+	/* Load the kernel */
+	kbuf.image = image;
+	kbuf.buf_min = 0;
+	kbuf.buf_max = ULONG_MAX;
+	kbuf.top_down = false;
+
+	kbuf.buffer = kernel;
+	kbuf.bufsz = kernel_len;
+	kbuf.memsz = le64_to_cpu(h->image_size);
+	text_offset = le64_to_cpu(h->text_offset);
+	kbuf.buf_align = SZ_2M;
+
+	/* Adjust kernel segment with TEXT_OFFSET */
+	kbuf.memsz += text_offset;
+
+	ret = kexec_add_buffer(&kbuf);
+	if (ret)
+		goto out;
+
+	image->arch.kern_segment = image->nr_segments - 1;
+	image->segment[image->arch.kern_segment].mem += text_offset;
+	image->segment[image->arch.kern_segment].memsz -= text_offset;
+	image->start = image->segment[image->arch.kern_segment].mem;
+
+	pr_debug("Loaded kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
+				image->segment[image->arch.kern_segment].mem,
+				kbuf.bufsz, kbuf.memsz);
+
+	/* Load additional data */
+	ret = load_other_segments(image, initrd, initrd_len,
+				cmdline, cmdline_len);
+
+out:
+	return ERR_PTR(ret);
+}
+
+const struct kexec_file_ops kexec_image_ops = {
+	.probe = image_probe,
+	.load = image_load,
+};
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index b3b9b1725d8a..37c0a9dc2e47 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -23,6 +23,7 @@ static int __dt_root_addr_cells;
 static int __dt_root_size_cells;
 
 const struct kexec_file_ops * const kexec_file_loaders[] = {
+	&kexec_image_ops,
 	NULL
 };
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 06/11] arm64: kexec_file: allow for loading Image-format kernel
@ 2018-04-25  6:26   ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd
  Cc: ard.biesheuvel, bhsharma, kexec, linux-kernel, AKASHI Takahiro,
	james.morse, linux-arm-kernel

This patch provides kexec_file_ops for "Image"-format kernel. In this
implementation, a binary is always loaded with a fixed offset identified
in text_offset field of its header.

Regarding signature verification for trusted boot, this patch doesn't
contains CONFIG_KEXEC_VERIFY_SIG support, which is to be added later
in this series, but file-attribute-based verification is still a viable
option by enabling IMA security subsystem.

You can sign(label) a to-be-kexec'ed kernel image on target file system
with:
    $ evmctl ima_sign --key /path/to/private_key.pem Image

On live system, you must have IMA enforced with, at least, the following
security policy:
    "appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig"

See more details about IMA here:
    https://sourceforge.net/p/linux-ima/wiki/Home/

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/kexec.h         | 50 ++++++++++++++++
 arch/arm64/kernel/Makefile             |  2 +-
 arch/arm64/kernel/kexec_image.c        | 79 ++++++++++++++++++++++++++
 arch/arm64/kernel/machine_kexec_file.c |  1 +
 4 files changed, 131 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kernel/kexec_image.c

diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index e4de1223715f..3cba4161818a 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -102,6 +102,56 @@ struct kimage_arch {
 	void *dtb_buf;
 };
 
+/**
+ * struct arm64_image_header - arm64 kernel image header
+ *
+ * @pe_sig: Optional PE format 'MZ' signature
+ * @branch_code: Instruction to branch to stext
+ * @text_offset: Image load offset, little endian
+ * @image_size: Effective image size, little endian
+ * @flags:
+ *	Bit 0: Kernel endianness. 0=little endian, 1=big endian
+ * @reserved: Reserved
+ * @magic: Magic number, "ARM\x64"
+ * @pe_header: Optional offset to a PE format header
+ **/
+
+struct arm64_image_header {
+	u8 pe_sig[2];
+	u8 pad[2];
+	u32 branch_code;
+	u64 text_offset;
+	u64 image_size;
+	u64 flags;
+	u64 reserved[3];
+	u8 magic[4];
+	u32 pe_header;
+};
+
+static const u8 arm64_image_magic[4] = {'A', 'R', 'M', 0x64U};
+
+/**
+ * arm64_header_check_magic - Helper to check the arm64 image header.
+ *
+ * Returns non-zero if header is OK.
+ */
+
+static inline int arm64_header_check_magic(const struct arm64_image_header *h)
+{
+	if (!h)
+		return 0;
+
+	if (!h->text_offset)
+		return 0;
+
+	return (h->magic[0] == arm64_image_magic[0]
+		&& h->magic[1] == arm64_image_magic[1]
+		&& h->magic[2] == arm64_image_magic[2]
+		&& h->magic[3] == arm64_image_magic[3]);
+}
+
+extern const struct kexec_file_ops kexec_image_ops;
+
 struct kimage;
 
 extern int load_other_segments(struct kimage *image,
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 2f2b2757ae7a..1e110aa571dd 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -50,7 +50,7 @@ arm64-obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr.o
 arm64-obj-$(CONFIG_HIBERNATION)		+= hibernate.o hibernate-asm.o
 arm64-obj-$(CONFIG_KEXEC_CORE)		+= machine_kexec.o relocate_kernel.o	\
 					   cpu-reset.o
-arm64-obj-$(CONFIG_KEXEC_FILE)		+= machine_kexec_file.o
+arm64-obj-$(CONFIG_KEXEC_FILE)		+= machine_kexec_file.o kexec_image.o
 arm64-obj-$(CONFIG_ARM64_RELOC_TEST)	+= arm64-reloc-test.o
 arm64-reloc-test-y := reloc_test_core.o reloc_test_syms.o
 arm64-obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o
diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
new file mode 100644
index 000000000000..4dd524ad6611
--- /dev/null
+++ b/arch/arm64/kernel/kexec_image.c
@@ -0,0 +1,79 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Kexec image loader
+
+ * Copyright (C) 2018 Linaro Limited
+ * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
+ */
+
+#define pr_fmt(fmt)	"kexec_file(Image): " fmt
+
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <asm/byteorder.h>
+#include <asm/memory.h>
+
+static int image_probe(const char *kernel_buf, unsigned long kernel_len)
+{
+	const struct arm64_image_header *h;
+
+	h = (const struct arm64_image_header *)(kernel_buf);
+
+	if ((kernel_len < sizeof(*h)) || !arm64_header_check_magic(h))
+		return -EINVAL;
+
+	return 0;
+}
+
+static void *image_load(struct kimage *image,
+				char *kernel, unsigned long kernel_len,
+				char *initrd, unsigned long initrd_len,
+				char *cmdline, unsigned long cmdline_len)
+{
+	struct kexec_buf kbuf;
+	struct arm64_image_header *h = (struct arm64_image_header *)kernel;
+	unsigned long text_offset;
+	int ret;
+
+	/* Load the kernel */
+	kbuf.image = image;
+	kbuf.buf_min = 0;
+	kbuf.buf_max = ULONG_MAX;
+	kbuf.top_down = false;
+
+	kbuf.buffer = kernel;
+	kbuf.bufsz = kernel_len;
+	kbuf.memsz = le64_to_cpu(h->image_size);
+	text_offset = le64_to_cpu(h->text_offset);
+	kbuf.buf_align = SZ_2M;
+
+	/* Adjust kernel segment with TEXT_OFFSET */
+	kbuf.memsz += text_offset;
+
+	ret = kexec_add_buffer(&kbuf);
+	if (ret)
+		goto out;
+
+	image->arch.kern_segment = image->nr_segments - 1;
+	image->segment[image->arch.kern_segment].mem += text_offset;
+	image->segment[image->arch.kern_segment].memsz -= text_offset;
+	image->start = image->segment[image->arch.kern_segment].mem;
+
+	pr_debug("Loaded kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
+				image->segment[image->arch.kern_segment].mem,
+				kbuf.bufsz, kbuf.memsz);
+
+	/* Load additional data */
+	ret = load_other_segments(image, initrd, initrd_len,
+				cmdline, cmdline_len);
+
+out:
+	return ERR_PTR(ret);
+}
+
+const struct kexec_file_ops kexec_image_ops = {
+	.probe = image_probe,
+	.load = image_load,
+};
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index b3b9b1725d8a..37c0a9dc2e47 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -23,6 +23,7 @@ static int __dt_root_addr_cells;
 static int __dt_root_size_cells;
 
 const struct kexec_file_ops * const kexec_file_loaders[] = {
+	&kexec_image_ops,
 	NULL
 };
 
-- 
2.17.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 07/11] arm64: kexec_file: add crash dump support
  2018-04-25  6:26 ` AKASHI Takahiro
  (?)
@ 2018-04-25  6:26   ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd
  Cc: ard.biesheuvel, james.morse, bhsharma, kexec, linux-arm-kernel,
	linux-kernel, AKASHI Takahiro

Enabling crash dump (kdump) includes
* prepare contents of ELF header of a core dump file, /proc/vmcore,
  using crash_prepare_elf64_headers(), and
* add two device tree properties, "linux,usable-memory-range" and
  "linux,elfcorehdr", which represent repsectively a memory range
  to be used by crash dump kernel and the header's location

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/kexec.h         |   4 +
 arch/arm64/kernel/kexec_image.c        |   9 +-
 arch/arm64/kernel/machine_kexec_file.c | 202 +++++++++++++++++++++++++
 3 files changed, 213 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 3cba4161818a..77f05bcf6a42 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -100,6 +100,10 @@ struct kimage_arch {
 	int kern_segment;
 	phys_addr_t dtb_mem;
 	void *dtb_buf;
+	/* Core ELF header buffer */
+	void *elf_headers;
+	unsigned long elf_headers_sz;
+	unsigned long elf_load_addr;
 };
 
 /**
diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
index 4dd524ad6611..2b3baf7285e0 100644
--- a/arch/arm64/kernel/kexec_image.c
+++ b/arch/arm64/kernel/kexec_image.c
@@ -39,8 +39,13 @@ static void *image_load(struct kimage *image,
 
 	/* Load the kernel */
 	kbuf.image = image;
-	kbuf.buf_min = 0;
-	kbuf.buf_max = ULONG_MAX;
+	if (image->type == KEXEC_TYPE_CRASH) {
+		kbuf.buf_min = crashk_res.start;
+		kbuf.buf_max = crashk_res.end + 1;
+	} else {
+		kbuf.buf_min = 0;
+		kbuf.buf_max = ULONG_MAX;
+	}
 	kbuf.top_down = false;
 
 	kbuf.buffer = kernel;
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index 37c0a9dc2e47..ec674f4d267c 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -17,6 +17,7 @@
 #include <linux/memblock.h>
 #include <linux/of_fdt.h>
 #include <linux/types.h>
+#include <linux/vmalloc.h>
 #include <asm/byteorder.h>
 
 static int __dt_root_addr_cells;
@@ -32,6 +33,10 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image)
 	vfree(image->arch.dtb_buf);
 	image->arch.dtb_buf = NULL;
 
+	vfree(image->arch.elf_headers);
+	image->arch.elf_headers = NULL;
+	image->arch.elf_headers_sz = 0;
+
 	return kexec_image_post_load_cleanup_default(image);
 }
 
@@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
 	return ret;
 }
 
+static int __init arch_kexec_file_init(void)
+{
+	/* Those values are used later on loading the kernel */
+	__dt_root_addr_cells = dt_root_addr_cells;
+	__dt_root_size_cells = dt_root_size_cells;
+
+	return 0;
+}
+late_initcall(arch_kexec_file_init);
+
+#define FDT_ALIGN(x, a)	(((x) + (a) - 1) & ~((a) - 1))
+#define FDT_TAGALIGN(x)	(FDT_ALIGN((x), FDT_TAGSIZE))
+
+static int fdt_prop_len(const char *prop_name, int len)
+{
+	return (strlen(prop_name) + 1) +
+		sizeof(struct fdt_property) +
+		FDT_TAGALIGN(len);
+}
+
+static bool cells_size_fitted(unsigned long base, unsigned long size)
+{
+	/* if *_cells >= 2, cells can hold 64-bit values anyway */
+	if ((__dt_root_addr_cells == 1) && (base >= (1ULL << 32)))
+		return false;
+
+	if ((__dt_root_size_cells == 1) && (size >= (1ULL << 32)))
+		return false;
+
+	return true;
+}
+
+static void fill_property(void *buf, u64 val64, int cells)
+{
+	u32 val32;
+
+	if (cells == 1) {
+		val32 = cpu_to_fdt32((u32)val64);
+		memcpy(buf, &val32, sizeof(val32));
+	} else {
+		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
+		buf += cells * sizeof(u32) - sizeof(u64);
+
+		val64 = cpu_to_fdt64(val64);
+		memcpy(buf, &val64, sizeof(val64));
+	}
+}
+
+static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
+				unsigned long addr, unsigned long size)
+{
+	void *buf, *prop;
+	size_t buf_size;
+	int result;
+
+	buf_size = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
+	prop = buf = vmalloc(buf_size);
+	if (!buf)
+		return -ENOMEM;
+
+	fill_property(prop, addr, __dt_root_addr_cells);
+	prop += __dt_root_addr_cells * sizeof(u32);
+
+	fill_property(prop, size, __dt_root_size_cells);
+
+	result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
+
+	vfree(buf);
+
+	return result;
+}
+
 static int setup_dtb(struct kimage *image,
 		unsigned long initrd_load_addr, unsigned long initrd_len,
 		char *cmdline, unsigned long cmdline_len,
@@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
 	int range_len;
 	int ret;
 
+	/* check ranges against root's #address-cells and #size-cells */
+	if (image->type == KEXEC_TYPE_CRASH &&
+		(!cells_size_fitted(image->arch.elf_load_addr,
+				image->arch.elf_headers_sz) ||
+		 !cells_size_fitted(crashk_res.start,
+				crashk_res.end - crashk_res.start + 1))) {
+		pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
+		ret = -EINVAL;
+		goto out_err;
+	}
+
 	/* duplicate dt blob */
 	buf_size = fdt_totalsize(initial_boot_params);
 	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
 
+	if (image->type == KEXEC_TYPE_CRASH)
+		buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
+				+ fdt_prop_len("linux,usable-memory-range",
+								range_len);
+
 	if (initrd_load_addr)
 		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
 				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
@@ -113,6 +206,23 @@ static int setup_dtb(struct kimage *image,
 	if (nodeoffset < 0)
 		goto out_err;
 
+	if (image->type == KEXEC_TYPE_CRASH) {
+		/* add linux,elfcorehdr */
+		ret = fdt_setprop_range(buf, nodeoffset, "linux,elfcorehdr",
+				image->arch.elf_load_addr,
+				image->arch.elf_headers_sz);
+		if (ret)
+			goto out_err;
+
+		/* add linux,usable-memory-range */
+		ret = fdt_setprop_range(buf, nodeoffset,
+				"linux,usable-memory-range",
+				crashk_res.start,
+				crashk_res.end - crashk_res.start + 1);
+		if (ret)
+			goto out_err;
+	}
+
 	/* add bootargs */
 	if (cmdline) {
 		ret = fdt_setprop(buf, nodeoffset, "bootargs",
@@ -148,17 +258,109 @@ static int setup_dtb(struct kimage *image,
 	return ret;
 }
 
+static int get_nr_ranges_callback(struct resource *res, void *arg)
+{
+	unsigned int *nr_ranges = arg;
+
+	(*nr_ranges)++;
+	return 0;
+}
+
+static int add_mem_range_callback(struct resource *res, void *arg)
+{
+	struct crash_mem *cmem = arg;
+
+	cmem->ranges[cmem->nr_ranges].start = res->start;
+	cmem->ranges[cmem->nr_ranges].end = res->end;
+	cmem->nr_ranges++;
+
+	return 0;
+}
+
+static struct crash_mem *get_crash_memory_ranges(void)
+{
+	unsigned int nr_ranges;
+	struct crash_mem *cmem;
+
+	nr_ranges = 1; /* for exclusion of crashkernel region */
+	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ranges_callback);
+
+	cmem = vmalloc(sizeof(struct crash_mem) +
+			sizeof(struct crash_mem_range) * nr_ranges);
+	if (!cmem)
+		return NULL;
+
+	cmem->max_nr_ranges = nr_ranges;
+	cmem->nr_ranges = 0;
+	walk_system_ram_res(0, -1, cmem, add_mem_range_callback);
+
+	/* Exclude crashkernel region */
+	if (crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end)) {
+		vfree(cmem);
+		return NULL;
+	}
+
+	return cmem;
+}
+
+static int prepare_elf_headers(void **addr, unsigned long *sz)
+{
+	struct crash_mem *cmem;
+	int ret = 0;
+
+	cmem = get_crash_memory_ranges();
+	if (!cmem)
+		return -ENOMEM;
+
+	ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
+
+	vfree(cmem);
+	return ret;
+}
+
 int load_other_segments(struct kimage *image,
 			char *initrd, unsigned long initrd_len,
 			char *cmdline, unsigned long cmdline_len)
 {
 	struct kexec_segment *kern_seg;
 	struct kexec_buf kbuf;
+	void *hdrs_addr;
+	unsigned long hdrs_sz;
 	unsigned long initrd_load_addr = 0;
 	char *dtb = NULL;
 	unsigned long dtb_len = 0;
 	int ret = 0;
 
+	/* load elf core header */
+	if (image->type == KEXEC_TYPE_CRASH) {
+		ret = prepare_elf_headers(&hdrs_addr, &hdrs_sz);
+		if (ret) {
+			pr_err("Preparing elf core header failed\n");
+			goto out_err;
+		}
+
+		kbuf.image = image;
+		kbuf.buffer = hdrs_addr;
+		kbuf.bufsz = hdrs_sz;
+		kbuf.memsz = hdrs_sz;
+		kbuf.buf_align = PAGE_SIZE;
+		kbuf.buf_min = crashk_res.start;
+		kbuf.buf_max = crashk_res.end + 1;
+		kbuf.top_down = true;
+
+		ret = kexec_add_buffer(&kbuf);
+		if (ret) {
+			vfree(hdrs_addr);
+			goto out_err;
+		}
+		image->arch.elf_headers = hdrs_addr;
+		image->arch.elf_headers_sz = hdrs_sz;
+		image->arch.elf_load_addr = kbuf.mem;
+
+		pr_debug("Loaded elf core header at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
+				 image->arch.elf_load_addr, hdrs_sz, hdrs_sz);
+	}
+
 	kern_seg = &image->segment[image->arch.kern_segment];
 	kbuf.image = image;
 	/* not allocate anything below the kernel */
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-04-25  6:26   ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: linux-arm-kernel

Enabling crash dump (kdump) includes
* prepare contents of ELF header of a core dump file, /proc/vmcore,
  using crash_prepare_elf64_headers(), and
* add two device tree properties, "linux,usable-memory-range" and
  "linux,elfcorehdr", which represent repsectively a memory range
  to be used by crash dump kernel and the header's location

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/kexec.h         |   4 +
 arch/arm64/kernel/kexec_image.c        |   9 +-
 arch/arm64/kernel/machine_kexec_file.c | 202 +++++++++++++++++++++++++
 3 files changed, 213 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 3cba4161818a..77f05bcf6a42 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -100,6 +100,10 @@ struct kimage_arch {
 	int kern_segment;
 	phys_addr_t dtb_mem;
 	void *dtb_buf;
+	/* Core ELF header buffer */
+	void *elf_headers;
+	unsigned long elf_headers_sz;
+	unsigned long elf_load_addr;
 };
 
 /**
diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
index 4dd524ad6611..2b3baf7285e0 100644
--- a/arch/arm64/kernel/kexec_image.c
+++ b/arch/arm64/kernel/kexec_image.c
@@ -39,8 +39,13 @@ static void *image_load(struct kimage *image,
 
 	/* Load the kernel */
 	kbuf.image = image;
-	kbuf.buf_min = 0;
-	kbuf.buf_max = ULONG_MAX;
+	if (image->type == KEXEC_TYPE_CRASH) {
+		kbuf.buf_min = crashk_res.start;
+		kbuf.buf_max = crashk_res.end + 1;
+	} else {
+		kbuf.buf_min = 0;
+		kbuf.buf_max = ULONG_MAX;
+	}
 	kbuf.top_down = false;
 
 	kbuf.buffer = kernel;
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index 37c0a9dc2e47..ec674f4d267c 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -17,6 +17,7 @@
 #include <linux/memblock.h>
 #include <linux/of_fdt.h>
 #include <linux/types.h>
+#include <linux/vmalloc.h>
 #include <asm/byteorder.h>
 
 static int __dt_root_addr_cells;
@@ -32,6 +33,10 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image)
 	vfree(image->arch.dtb_buf);
 	image->arch.dtb_buf = NULL;
 
+	vfree(image->arch.elf_headers);
+	image->arch.elf_headers = NULL;
+	image->arch.elf_headers_sz = 0;
+
 	return kexec_image_post_load_cleanup_default(image);
 }
 
@@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
 	return ret;
 }
 
+static int __init arch_kexec_file_init(void)
+{
+	/* Those values are used later on loading the kernel */
+	__dt_root_addr_cells = dt_root_addr_cells;
+	__dt_root_size_cells = dt_root_size_cells;
+
+	return 0;
+}
+late_initcall(arch_kexec_file_init);
+
+#define FDT_ALIGN(x, a)	(((x) + (a) - 1) & ~((a) - 1))
+#define FDT_TAGALIGN(x)	(FDT_ALIGN((x), FDT_TAGSIZE))
+
+static int fdt_prop_len(const char *prop_name, int len)
+{
+	return (strlen(prop_name) + 1) +
+		sizeof(struct fdt_property) +
+		FDT_TAGALIGN(len);
+}
+
+static bool cells_size_fitted(unsigned long base, unsigned long size)
+{
+	/* if *_cells >= 2, cells can hold 64-bit values anyway */
+	if ((__dt_root_addr_cells == 1) && (base >= (1ULL << 32)))
+		return false;
+
+	if ((__dt_root_size_cells == 1) && (size >= (1ULL << 32)))
+		return false;
+
+	return true;
+}
+
+static void fill_property(void *buf, u64 val64, int cells)
+{
+	u32 val32;
+
+	if (cells == 1) {
+		val32 = cpu_to_fdt32((u32)val64);
+		memcpy(buf, &val32, sizeof(val32));
+	} else {
+		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
+		buf += cells * sizeof(u32) - sizeof(u64);
+
+		val64 = cpu_to_fdt64(val64);
+		memcpy(buf, &val64, sizeof(val64));
+	}
+}
+
+static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
+				unsigned long addr, unsigned long size)
+{
+	void *buf, *prop;
+	size_t buf_size;
+	int result;
+
+	buf_size = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
+	prop = buf = vmalloc(buf_size);
+	if (!buf)
+		return -ENOMEM;
+
+	fill_property(prop, addr, __dt_root_addr_cells);
+	prop += __dt_root_addr_cells * sizeof(u32);
+
+	fill_property(prop, size, __dt_root_size_cells);
+
+	result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
+
+	vfree(buf);
+
+	return result;
+}
+
 static int setup_dtb(struct kimage *image,
 		unsigned long initrd_load_addr, unsigned long initrd_len,
 		char *cmdline, unsigned long cmdline_len,
@@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
 	int range_len;
 	int ret;
 
+	/* check ranges against root's #address-cells and #size-cells */
+	if (image->type == KEXEC_TYPE_CRASH &&
+		(!cells_size_fitted(image->arch.elf_load_addr,
+				image->arch.elf_headers_sz) ||
+		 !cells_size_fitted(crashk_res.start,
+				crashk_res.end - crashk_res.start + 1))) {
+		pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
+		ret = -EINVAL;
+		goto out_err;
+	}
+
 	/* duplicate dt blob */
 	buf_size = fdt_totalsize(initial_boot_params);
 	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
 
+	if (image->type == KEXEC_TYPE_CRASH)
+		buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
+				+ fdt_prop_len("linux,usable-memory-range",
+								range_len);
+
 	if (initrd_load_addr)
 		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
 				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
@@ -113,6 +206,23 @@ static int setup_dtb(struct kimage *image,
 	if (nodeoffset < 0)
 		goto out_err;
 
+	if (image->type == KEXEC_TYPE_CRASH) {
+		/* add linux,elfcorehdr */
+		ret = fdt_setprop_range(buf, nodeoffset, "linux,elfcorehdr",
+				image->arch.elf_load_addr,
+				image->arch.elf_headers_sz);
+		if (ret)
+			goto out_err;
+
+		/* add linux,usable-memory-range */
+		ret = fdt_setprop_range(buf, nodeoffset,
+				"linux,usable-memory-range",
+				crashk_res.start,
+				crashk_res.end - crashk_res.start + 1);
+		if (ret)
+			goto out_err;
+	}
+
 	/* add bootargs */
 	if (cmdline) {
 		ret = fdt_setprop(buf, nodeoffset, "bootargs",
@@ -148,17 +258,109 @@ static int setup_dtb(struct kimage *image,
 	return ret;
 }
 
+static int get_nr_ranges_callback(struct resource *res, void *arg)
+{
+	unsigned int *nr_ranges = arg;
+
+	(*nr_ranges)++;
+	return 0;
+}
+
+static int add_mem_range_callback(struct resource *res, void *arg)
+{
+	struct crash_mem *cmem = arg;
+
+	cmem->ranges[cmem->nr_ranges].start = res->start;
+	cmem->ranges[cmem->nr_ranges].end = res->end;
+	cmem->nr_ranges++;
+
+	return 0;
+}
+
+static struct crash_mem *get_crash_memory_ranges(void)
+{
+	unsigned int nr_ranges;
+	struct crash_mem *cmem;
+
+	nr_ranges = 1; /* for exclusion of crashkernel region */
+	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ranges_callback);
+
+	cmem = vmalloc(sizeof(struct crash_mem) +
+			sizeof(struct crash_mem_range) * nr_ranges);
+	if (!cmem)
+		return NULL;
+
+	cmem->max_nr_ranges = nr_ranges;
+	cmem->nr_ranges = 0;
+	walk_system_ram_res(0, -1, cmem, add_mem_range_callback);
+
+	/* Exclude crashkernel region */
+	if (crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end)) {
+		vfree(cmem);
+		return NULL;
+	}
+
+	return cmem;
+}
+
+static int prepare_elf_headers(void **addr, unsigned long *sz)
+{
+	struct crash_mem *cmem;
+	int ret = 0;
+
+	cmem = get_crash_memory_ranges();
+	if (!cmem)
+		return -ENOMEM;
+
+	ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
+
+	vfree(cmem);
+	return ret;
+}
+
 int load_other_segments(struct kimage *image,
 			char *initrd, unsigned long initrd_len,
 			char *cmdline, unsigned long cmdline_len)
 {
 	struct kexec_segment *kern_seg;
 	struct kexec_buf kbuf;
+	void *hdrs_addr;
+	unsigned long hdrs_sz;
 	unsigned long initrd_load_addr = 0;
 	char *dtb = NULL;
 	unsigned long dtb_len = 0;
 	int ret = 0;
 
+	/* load elf core header */
+	if (image->type == KEXEC_TYPE_CRASH) {
+		ret = prepare_elf_headers(&hdrs_addr, &hdrs_sz);
+		if (ret) {
+			pr_err("Preparing elf core header failed\n");
+			goto out_err;
+		}
+
+		kbuf.image = image;
+		kbuf.buffer = hdrs_addr;
+		kbuf.bufsz = hdrs_sz;
+		kbuf.memsz = hdrs_sz;
+		kbuf.buf_align = PAGE_SIZE;
+		kbuf.buf_min = crashk_res.start;
+		kbuf.buf_max = crashk_res.end + 1;
+		kbuf.top_down = true;
+
+		ret = kexec_add_buffer(&kbuf);
+		if (ret) {
+			vfree(hdrs_addr);
+			goto out_err;
+		}
+		image->arch.elf_headers = hdrs_addr;
+		image->arch.elf_headers_sz = hdrs_sz;
+		image->arch.elf_load_addr = kbuf.mem;
+
+		pr_debug("Loaded elf core header at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
+				 image->arch.elf_load_addr, hdrs_sz, hdrs_sz);
+	}
+
 	kern_seg = &image->segment[image->arch.kern_segment];
 	kbuf.image = image;
 	/* not allocate anything below the kernel */
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-04-25  6:26   ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd
  Cc: ard.biesheuvel, bhsharma, kexec, linux-kernel, AKASHI Takahiro,
	james.morse, linux-arm-kernel

Enabling crash dump (kdump) includes
* prepare contents of ELF header of a core dump file, /proc/vmcore,
  using crash_prepare_elf64_headers(), and
* add two device tree properties, "linux,usable-memory-range" and
  "linux,elfcorehdr", which represent repsectively a memory range
  to be used by crash dump kernel and the header's location

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/kexec.h         |   4 +
 arch/arm64/kernel/kexec_image.c        |   9 +-
 arch/arm64/kernel/machine_kexec_file.c | 202 +++++++++++++++++++++++++
 3 files changed, 213 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 3cba4161818a..77f05bcf6a42 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -100,6 +100,10 @@ struct kimage_arch {
 	int kern_segment;
 	phys_addr_t dtb_mem;
 	void *dtb_buf;
+	/* Core ELF header buffer */
+	void *elf_headers;
+	unsigned long elf_headers_sz;
+	unsigned long elf_load_addr;
 };
 
 /**
diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
index 4dd524ad6611..2b3baf7285e0 100644
--- a/arch/arm64/kernel/kexec_image.c
+++ b/arch/arm64/kernel/kexec_image.c
@@ -39,8 +39,13 @@ static void *image_load(struct kimage *image,
 
 	/* Load the kernel */
 	kbuf.image = image;
-	kbuf.buf_min = 0;
-	kbuf.buf_max = ULONG_MAX;
+	if (image->type == KEXEC_TYPE_CRASH) {
+		kbuf.buf_min = crashk_res.start;
+		kbuf.buf_max = crashk_res.end + 1;
+	} else {
+		kbuf.buf_min = 0;
+		kbuf.buf_max = ULONG_MAX;
+	}
 	kbuf.top_down = false;
 
 	kbuf.buffer = kernel;
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index 37c0a9dc2e47..ec674f4d267c 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -17,6 +17,7 @@
 #include <linux/memblock.h>
 #include <linux/of_fdt.h>
 #include <linux/types.h>
+#include <linux/vmalloc.h>
 #include <asm/byteorder.h>
 
 static int __dt_root_addr_cells;
@@ -32,6 +33,10 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image)
 	vfree(image->arch.dtb_buf);
 	image->arch.dtb_buf = NULL;
 
+	vfree(image->arch.elf_headers);
+	image->arch.elf_headers = NULL;
+	image->arch.elf_headers_sz = 0;
+
 	return kexec_image_post_load_cleanup_default(image);
 }
 
@@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
 	return ret;
 }
 
+static int __init arch_kexec_file_init(void)
+{
+	/* Those values are used later on loading the kernel */
+	__dt_root_addr_cells = dt_root_addr_cells;
+	__dt_root_size_cells = dt_root_size_cells;
+
+	return 0;
+}
+late_initcall(arch_kexec_file_init);
+
+#define FDT_ALIGN(x, a)	(((x) + (a) - 1) & ~((a) - 1))
+#define FDT_TAGALIGN(x)	(FDT_ALIGN((x), FDT_TAGSIZE))
+
+static int fdt_prop_len(const char *prop_name, int len)
+{
+	return (strlen(prop_name) + 1) +
+		sizeof(struct fdt_property) +
+		FDT_TAGALIGN(len);
+}
+
+static bool cells_size_fitted(unsigned long base, unsigned long size)
+{
+	/* if *_cells >= 2, cells can hold 64-bit values anyway */
+	if ((__dt_root_addr_cells == 1) && (base >= (1ULL << 32)))
+		return false;
+
+	if ((__dt_root_size_cells == 1) && (size >= (1ULL << 32)))
+		return false;
+
+	return true;
+}
+
+static void fill_property(void *buf, u64 val64, int cells)
+{
+	u32 val32;
+
+	if (cells == 1) {
+		val32 = cpu_to_fdt32((u32)val64);
+		memcpy(buf, &val32, sizeof(val32));
+	} else {
+		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
+		buf += cells * sizeof(u32) - sizeof(u64);
+
+		val64 = cpu_to_fdt64(val64);
+		memcpy(buf, &val64, sizeof(val64));
+	}
+}
+
+static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
+				unsigned long addr, unsigned long size)
+{
+	void *buf, *prop;
+	size_t buf_size;
+	int result;
+
+	buf_size = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
+	prop = buf = vmalloc(buf_size);
+	if (!buf)
+		return -ENOMEM;
+
+	fill_property(prop, addr, __dt_root_addr_cells);
+	prop += __dt_root_addr_cells * sizeof(u32);
+
+	fill_property(prop, size, __dt_root_size_cells);
+
+	result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
+
+	vfree(buf);
+
+	return result;
+}
+
 static int setup_dtb(struct kimage *image,
 		unsigned long initrd_load_addr, unsigned long initrd_len,
 		char *cmdline, unsigned long cmdline_len,
@@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
 	int range_len;
 	int ret;
 
+	/* check ranges against root's #address-cells and #size-cells */
+	if (image->type == KEXEC_TYPE_CRASH &&
+		(!cells_size_fitted(image->arch.elf_load_addr,
+				image->arch.elf_headers_sz) ||
+		 !cells_size_fitted(crashk_res.start,
+				crashk_res.end - crashk_res.start + 1))) {
+		pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
+		ret = -EINVAL;
+		goto out_err;
+	}
+
 	/* duplicate dt blob */
 	buf_size = fdt_totalsize(initial_boot_params);
 	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
 
+	if (image->type == KEXEC_TYPE_CRASH)
+		buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
+				+ fdt_prop_len("linux,usable-memory-range",
+								range_len);
+
 	if (initrd_load_addr)
 		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
 				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
@@ -113,6 +206,23 @@ static int setup_dtb(struct kimage *image,
 	if (nodeoffset < 0)
 		goto out_err;
 
+	if (image->type == KEXEC_TYPE_CRASH) {
+		/* add linux,elfcorehdr */
+		ret = fdt_setprop_range(buf, nodeoffset, "linux,elfcorehdr",
+				image->arch.elf_load_addr,
+				image->arch.elf_headers_sz);
+		if (ret)
+			goto out_err;
+
+		/* add linux,usable-memory-range */
+		ret = fdt_setprop_range(buf, nodeoffset,
+				"linux,usable-memory-range",
+				crashk_res.start,
+				crashk_res.end - crashk_res.start + 1);
+		if (ret)
+			goto out_err;
+	}
+
 	/* add bootargs */
 	if (cmdline) {
 		ret = fdt_setprop(buf, nodeoffset, "bootargs",
@@ -148,17 +258,109 @@ static int setup_dtb(struct kimage *image,
 	return ret;
 }
 
+static int get_nr_ranges_callback(struct resource *res, void *arg)
+{
+	unsigned int *nr_ranges = arg;
+
+	(*nr_ranges)++;
+	return 0;
+}
+
+static int add_mem_range_callback(struct resource *res, void *arg)
+{
+	struct crash_mem *cmem = arg;
+
+	cmem->ranges[cmem->nr_ranges].start = res->start;
+	cmem->ranges[cmem->nr_ranges].end = res->end;
+	cmem->nr_ranges++;
+
+	return 0;
+}
+
+static struct crash_mem *get_crash_memory_ranges(void)
+{
+	unsigned int nr_ranges;
+	struct crash_mem *cmem;
+
+	nr_ranges = 1; /* for exclusion of crashkernel region */
+	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ranges_callback);
+
+	cmem = vmalloc(sizeof(struct crash_mem) +
+			sizeof(struct crash_mem_range) * nr_ranges);
+	if (!cmem)
+		return NULL;
+
+	cmem->max_nr_ranges = nr_ranges;
+	cmem->nr_ranges = 0;
+	walk_system_ram_res(0, -1, cmem, add_mem_range_callback);
+
+	/* Exclude crashkernel region */
+	if (crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end)) {
+		vfree(cmem);
+		return NULL;
+	}
+
+	return cmem;
+}
+
+static int prepare_elf_headers(void **addr, unsigned long *sz)
+{
+	struct crash_mem *cmem;
+	int ret = 0;
+
+	cmem = get_crash_memory_ranges();
+	if (!cmem)
+		return -ENOMEM;
+
+	ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
+
+	vfree(cmem);
+	return ret;
+}
+
 int load_other_segments(struct kimage *image,
 			char *initrd, unsigned long initrd_len,
 			char *cmdline, unsigned long cmdline_len)
 {
 	struct kexec_segment *kern_seg;
 	struct kexec_buf kbuf;
+	void *hdrs_addr;
+	unsigned long hdrs_sz;
 	unsigned long initrd_load_addr = 0;
 	char *dtb = NULL;
 	unsigned long dtb_len = 0;
 	int ret = 0;
 
+	/* load elf core header */
+	if (image->type == KEXEC_TYPE_CRASH) {
+		ret = prepare_elf_headers(&hdrs_addr, &hdrs_sz);
+		if (ret) {
+			pr_err("Preparing elf core header failed\n");
+			goto out_err;
+		}
+
+		kbuf.image = image;
+		kbuf.buffer = hdrs_addr;
+		kbuf.bufsz = hdrs_sz;
+		kbuf.memsz = hdrs_sz;
+		kbuf.buf_align = PAGE_SIZE;
+		kbuf.buf_min = crashk_res.start;
+		kbuf.buf_max = crashk_res.end + 1;
+		kbuf.top_down = true;
+
+		ret = kexec_add_buffer(&kbuf);
+		if (ret) {
+			vfree(hdrs_addr);
+			goto out_err;
+		}
+		image->arch.elf_headers = hdrs_addr;
+		image->arch.elf_headers_sz = hdrs_sz;
+		image->arch.elf_load_addr = kbuf.mem;
+
+		pr_debug("Loaded elf core header at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
+				 image->arch.elf_load_addr, hdrs_sz, hdrs_sz);
+	}
+
 	kern_seg = &image->segment[image->arch.kern_segment];
 	kbuf.image = image;
 	/* not allocate anything below the kernel */
-- 
2.17.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 08/11] arm64: enable KEXEC_FILE config
  2018-04-25  6:26 ` AKASHI Takahiro
  (?)
@ 2018-04-25  6:26   ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd
  Cc: ard.biesheuvel, james.morse, bhsharma, kexec, linux-arm-kernel,
	linux-kernel, AKASHI Takahiro

Modify arm64/Kconfig to enable kexec_file_load support.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/Kconfig | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index eb2cf4938f6d..d8f0dcdb8b96 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -847,6 +847,16 @@ config KEXEC
 	  but it is independent of the system firmware.   And like a reboot
 	  you can start any kernel with it, not just Linux.
 
+config KEXEC_FILE
+	bool "kexec file based system call"
+	select KEXEC_CORE
+	select BUILD_BIN2C
+	help
+	  This is new version of kexec system call. This system call is
+	  file based and takes file descriptors as system call argument
+	  for kernel and initramfs as opposed to list of segments as
+	  accepted by previous system call.
+
 config CRASH_DUMP
 	bool "Build kdump crash kernel"
 	help
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 08/11] arm64: enable KEXEC_FILE config
@ 2018-04-25  6:26   ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: linux-arm-kernel

Modify arm64/Kconfig to enable kexec_file_load support.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/Kconfig | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index eb2cf4938f6d..d8f0dcdb8b96 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -847,6 +847,16 @@ config KEXEC
 	  but it is independent of the system firmware.   And like a reboot
 	  you can start any kernel with it, not just Linux.
 
+config KEXEC_FILE
+	bool "kexec file based system call"
+	select KEXEC_CORE
+	select BUILD_BIN2C
+	help
+	  This is new version of kexec system call. This system call is
+	  file based and takes file descriptors as system call argument
+	  for kernel and initramfs as opposed to list of segments as
+	  accepted by previous system call.
+
 config CRASH_DUMP
 	bool "Build kdump crash kernel"
 	help
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 08/11] arm64: enable KEXEC_FILE config
@ 2018-04-25  6:26   ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd
  Cc: ard.biesheuvel, bhsharma, kexec, linux-kernel, AKASHI Takahiro,
	james.morse, linux-arm-kernel

Modify arm64/Kconfig to enable kexec_file_load support.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/Kconfig | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index eb2cf4938f6d..d8f0dcdb8b96 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -847,6 +847,16 @@ config KEXEC
 	  but it is independent of the system firmware.   And like a reboot
 	  you can start any kernel with it, not just Linux.
 
+config KEXEC_FILE
+	bool "kexec file based system call"
+	select KEXEC_CORE
+	select BUILD_BIN2C
+	help
+	  This is new version of kexec system call. This system call is
+	  file based and takes file descriptors as system call argument
+	  for kernel and initramfs as opposed to list of segments as
+	  accepted by previous system call.
+
 config CRASH_DUMP
 	bool "Build kdump crash kernel"
 	help
-- 
2.17.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 09/11] include: pe.h: remove message[] from mz header definition
  2018-04-25  6:26 ` AKASHI Takahiro
  (?)
@ 2018-04-25  6:26   ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd
  Cc: ard.biesheuvel, james.morse, bhsharma, kexec, linux-arm-kernel,
	linux-kernel, AKASHI Takahiro

message[] field won't be part of the definition of mz header.

This change is crucial for enabling kexec_file_load on arm64 because
arm64's "Image" binary, as in PE format, doesn't have any data for it and
accordingly the following check in pefile_parse_binary() will fail:

	chkaddr(cursor, mz->peaddr, sizeof(*pe));

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David S. Miller <davem@davemloft.net>
---
 include/linux/pe.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/pe.h b/include/linux/pe.h
index 143ce75be5f0..3482b18a48b5 100644
--- a/include/linux/pe.h
+++ b/include/linux/pe.h
@@ -166,7 +166,7 @@ struct mz_hdr {
 	uint16_t oem_info;	/* oem specific */
 	uint16_t reserved1[10];	/* reserved */
 	uint32_t peaddr;	/* address of pe header */
-	char     message[64];	/* message to print */
+	char     message[];	/* message to print */
 };
 
 struct mz_reloc {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 09/11] include: pe.h: remove message[] from mz header definition
@ 2018-04-25  6:26   ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: linux-arm-kernel

message[] field won't be part of the definition of mz header.

This change is crucial for enabling kexec_file_load on arm64 because
arm64's "Image" binary, as in PE format, doesn't have any data for it and
accordingly the following check in pefile_parse_binary() will fail:

	chkaddr(cursor, mz->peaddr, sizeof(*pe));

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David S. Miller <davem@davemloft.net>
---
 include/linux/pe.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/pe.h b/include/linux/pe.h
index 143ce75be5f0..3482b18a48b5 100644
--- a/include/linux/pe.h
+++ b/include/linux/pe.h
@@ -166,7 +166,7 @@ struct mz_hdr {
 	uint16_t oem_info;	/* oem specific */
 	uint16_t reserved1[10];	/* reserved */
 	uint32_t peaddr;	/* address of pe header */
-	char     message[64];	/* message to print */
+	char     message[];	/* message to print */
 };
 
 struct mz_reloc {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 09/11] include: pe.h: remove message[] from mz header definition
@ 2018-04-25  6:26   ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd
  Cc: ard.biesheuvel, bhsharma, kexec, linux-kernel, AKASHI Takahiro,
	james.morse, linux-arm-kernel

message[] field won't be part of the definition of mz header.

This change is crucial for enabling kexec_file_load on arm64 because
arm64's "Image" binary, as in PE format, doesn't have any data for it and
accordingly the following check in pefile_parse_binary() will fail:

	chkaddr(cursor, mz->peaddr, sizeof(*pe));

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David S. Miller <davem@davemloft.net>
---
 include/linux/pe.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/pe.h b/include/linux/pe.h
index 143ce75be5f0..3482b18a48b5 100644
--- a/include/linux/pe.h
+++ b/include/linux/pe.h
@@ -166,7 +166,7 @@ struct mz_hdr {
 	uint16_t oem_info;	/* oem specific */
 	uint16_t reserved1[10];	/* reserved */
 	uint32_t peaddr;	/* address of pe header */
-	char     message[64];	/* message to print */
+	char     message[];	/* message to print */
 };
 
 struct mz_reloc {
-- 
2.17.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 10/11] arm64: kexec_file: add kernel signature verification support
  2018-04-25  6:26 ` AKASHI Takahiro
  (?)
@ 2018-04-25  6:26   ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd
  Cc: ard.biesheuvel, james.morse, bhsharma, kexec, linux-arm-kernel,
	linux-kernel, AKASHI Takahiro

With this patch, kernel verification can be done without IMA security
subsystem enabled. Turn on CONFIG_KEXEC_VERIFY_SIG instead.

On x86, a signature is embedded into a PE file (Microsoft's format) header
of binary. Since arm64's "Image" can also be seen as a PE file as far as
CONFIG_EFI is enabled, we adopt this format for kernel signing.

You can create a signed kernel image with:
    $ sbsign --key ${KEY} --cert ${CERT} Image

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/Kconfig              | 24 ++++++++++++++++++++++++
 arch/arm64/include/asm/kexec.h  | 16 ++++++++++++++++
 arch/arm64/kernel/kexec_image.c | 15 +++++++++++++++
 3 files changed, 55 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index d8f0dcdb8b96..5c772601840d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -857,6 +857,30 @@ config KEXEC_FILE
 	  for kernel and initramfs as opposed to list of segments as
 	  accepted by previous system call.
 
+config KEXEC_VERIFY_SIG
+	bool "Verify kernel signature during kexec_file_load() syscall"
+	depends on KEXEC_FILE
+	help
+	  Select this option to verify a signature with loaded kernel
+	  image. If configured, any attempt of loading a image without
+	  valid signature will fail.
+
+	  In addition to that option, you need to enable signature
+	  verification for the corresponding kernel image type being
+	  loaded in order for this to work.
+
+config KEXEC_IMAGE_VERIFY_SIG
+	bool "Enable Image signature verification support"
+	default y
+	depends on KEXEC_VERIFY_SIG
+	depends on EFI && SIGNED_PE_FILE_VERIFICATION
+	help
+	  Enable Image signature verification support.
+
+comment "Image signature verification is missing yet"
+	depends on KEXEC_VERIFY_SIG
+	depends on !EFI || !SIGNED_PE_FILE_VERIFICATION
+
 config CRASH_DUMP
 	bool "Build kdump crash kernel"
 	help
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 77f05bcf6a42..891f2484969d 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -133,6 +133,7 @@ struct arm64_image_header {
 };
 
 static const u8 arm64_image_magic[4] = {'A', 'R', 'M', 0x64U};
+static const u8 arm64_image_pe_sig[2] = {'M', 'Z'};
 
 /**
  * arm64_header_check_magic - Helper to check the arm64 image header.
@@ -154,6 +155,21 @@ static inline int arm64_header_check_magic(const struct arm64_image_header *h)
 		&& h->magic[3] == arm64_image_magic[3]);
 }
 
+/**
+ * arm64_header_check_pe_sig - Helper to check the arm64 image header.
+ *
+ * Returns non-zero if 'MZ' signature is found.
+ */
+
+static inline int arm64_header_check_pe_sig(const struct arm64_image_header *h)
+{
+	if (!h)
+		return 0;
+
+	return (h->pe_sig[0] == arm64_image_pe_sig[0]
+		&& h->pe_sig[1] == arm64_image_pe_sig[1]);
+}
+
 extern const struct kexec_file_ops kexec_image_ops;
 
 struct kimage;
diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
index 2b3baf7285e0..7c11beefe65f 100644
--- a/arch/arm64/kernel/kexec_image.c
+++ b/arch/arm64/kernel/kexec_image.c
@@ -12,6 +12,7 @@
 #include <linux/errno.h>
 #include <linux/kernel.h>
 #include <linux/kexec.h>
+#include <linux/verification.h>
 #include <asm/byteorder.h>
 #include <asm/memory.h>
 
@@ -24,6 +25,9 @@ static int image_probe(const char *kernel_buf, unsigned long kernel_len)
 	if ((kernel_len < sizeof(*h)) || !arm64_header_check_magic(h))
 		return -EINVAL;
 
+	pr_debug("PE format: %s\n",
+			(arm64_header_check_pe_sig(h) ? "yes" : "no"));
+
 	return 0;
 }
 
@@ -78,7 +82,18 @@ static void *image_load(struct kimage *image,
 	return ERR_PTR(ret);
 }
 
+#ifdef CONFIG_KEXEC_IMAGE_VERIFY_SIG
+static int image_verify_sig(const char *kernel, unsigned long kernel_len)
+{
+	return verify_pefile_signature(kernel, kernel_len, NULL,
+				       VERIFYING_KEXEC_PE_SIGNATURE);
+}
+#endif
+
 const struct kexec_file_ops kexec_image_ops = {
 	.probe = image_probe,
 	.load = image_load,
+#ifdef CONFIG_KEXEC_IMAGE_VERIFY_SIG
+	.verify_sig = image_verify_sig,
+#endif
 };
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 10/11] arm64: kexec_file: add kernel signature verification support
@ 2018-04-25  6:26   ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: linux-arm-kernel

With this patch, kernel verification can be done without IMA security
subsystem enabled. Turn on CONFIG_KEXEC_VERIFY_SIG instead.

On x86, a signature is embedded into a PE file (Microsoft's format) header
of binary. Since arm64's "Image" can also be seen as a PE file as far as
CONFIG_EFI is enabled, we adopt this format for kernel signing.

You can create a signed kernel image with:
    $ sbsign --key ${KEY} --cert ${CERT} Image

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/Kconfig              | 24 ++++++++++++++++++++++++
 arch/arm64/include/asm/kexec.h  | 16 ++++++++++++++++
 arch/arm64/kernel/kexec_image.c | 15 +++++++++++++++
 3 files changed, 55 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index d8f0dcdb8b96..5c772601840d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -857,6 +857,30 @@ config KEXEC_FILE
 	  for kernel and initramfs as opposed to list of segments as
 	  accepted by previous system call.
 
+config KEXEC_VERIFY_SIG
+	bool "Verify kernel signature during kexec_file_load() syscall"
+	depends on KEXEC_FILE
+	help
+	  Select this option to verify a signature with loaded kernel
+	  image. If configured, any attempt of loading a image without
+	  valid signature will fail.
+
+	  In addition to that option, you need to enable signature
+	  verification for the corresponding kernel image type being
+	  loaded in order for this to work.
+
+config KEXEC_IMAGE_VERIFY_SIG
+	bool "Enable Image signature verification support"
+	default y
+	depends on KEXEC_VERIFY_SIG
+	depends on EFI && SIGNED_PE_FILE_VERIFICATION
+	help
+	  Enable Image signature verification support.
+
+comment "Image signature verification is missing yet"
+	depends on KEXEC_VERIFY_SIG
+	depends on !EFI || !SIGNED_PE_FILE_VERIFICATION
+
 config CRASH_DUMP
 	bool "Build kdump crash kernel"
 	help
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 77f05bcf6a42..891f2484969d 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -133,6 +133,7 @@ struct arm64_image_header {
 };
 
 static const u8 arm64_image_magic[4] = {'A', 'R', 'M', 0x64U};
+static const u8 arm64_image_pe_sig[2] = {'M', 'Z'};
 
 /**
  * arm64_header_check_magic - Helper to check the arm64 image header.
@@ -154,6 +155,21 @@ static inline int arm64_header_check_magic(const struct arm64_image_header *h)
 		&& h->magic[3] == arm64_image_magic[3]);
 }
 
+/**
+ * arm64_header_check_pe_sig - Helper to check the arm64 image header.
+ *
+ * Returns non-zero if 'MZ' signature is found.
+ */
+
+static inline int arm64_header_check_pe_sig(const struct arm64_image_header *h)
+{
+	if (!h)
+		return 0;
+
+	return (h->pe_sig[0] == arm64_image_pe_sig[0]
+		&& h->pe_sig[1] == arm64_image_pe_sig[1]);
+}
+
 extern const struct kexec_file_ops kexec_image_ops;
 
 struct kimage;
diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
index 2b3baf7285e0..7c11beefe65f 100644
--- a/arch/arm64/kernel/kexec_image.c
+++ b/arch/arm64/kernel/kexec_image.c
@@ -12,6 +12,7 @@
 #include <linux/errno.h>
 #include <linux/kernel.h>
 #include <linux/kexec.h>
+#include <linux/verification.h>
 #include <asm/byteorder.h>
 #include <asm/memory.h>
 
@@ -24,6 +25,9 @@ static int image_probe(const char *kernel_buf, unsigned long kernel_len)
 	if ((kernel_len < sizeof(*h)) || !arm64_header_check_magic(h))
 		return -EINVAL;
 
+	pr_debug("PE format: %s\n",
+			(arm64_header_check_pe_sig(h) ? "yes" : "no"));
+
 	return 0;
 }
 
@@ -78,7 +82,18 @@ static void *image_load(struct kimage *image,
 	return ERR_PTR(ret);
 }
 
+#ifdef CONFIG_KEXEC_IMAGE_VERIFY_SIG
+static int image_verify_sig(const char *kernel, unsigned long kernel_len)
+{
+	return verify_pefile_signature(kernel, kernel_len, NULL,
+				       VERIFYING_KEXEC_PE_SIGNATURE);
+}
+#endif
+
 const struct kexec_file_ops kexec_image_ops = {
 	.probe = image_probe,
 	.load = image_load,
+#ifdef CONFIG_KEXEC_IMAGE_VERIFY_SIG
+	.verify_sig = image_verify_sig,
+#endif
 };
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 10/11] arm64: kexec_file: add kernel signature verification support
@ 2018-04-25  6:26   ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd
  Cc: ard.biesheuvel, bhsharma, kexec, linux-kernel, AKASHI Takahiro,
	james.morse, linux-arm-kernel

With this patch, kernel verification can be done without IMA security
subsystem enabled. Turn on CONFIG_KEXEC_VERIFY_SIG instead.

On x86, a signature is embedded into a PE file (Microsoft's format) header
of binary. Since arm64's "Image" can also be seen as a PE file as far as
CONFIG_EFI is enabled, we adopt this format for kernel signing.

You can create a signed kernel image with:
    $ sbsign --key ${KEY} --cert ${CERT} Image

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/Kconfig              | 24 ++++++++++++++++++++++++
 arch/arm64/include/asm/kexec.h  | 16 ++++++++++++++++
 arch/arm64/kernel/kexec_image.c | 15 +++++++++++++++
 3 files changed, 55 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index d8f0dcdb8b96..5c772601840d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -857,6 +857,30 @@ config KEXEC_FILE
 	  for kernel and initramfs as opposed to list of segments as
 	  accepted by previous system call.
 
+config KEXEC_VERIFY_SIG
+	bool "Verify kernel signature during kexec_file_load() syscall"
+	depends on KEXEC_FILE
+	help
+	  Select this option to verify a signature with loaded kernel
+	  image. If configured, any attempt of loading a image without
+	  valid signature will fail.
+
+	  In addition to that option, you need to enable signature
+	  verification for the corresponding kernel image type being
+	  loaded in order for this to work.
+
+config KEXEC_IMAGE_VERIFY_SIG
+	bool "Enable Image signature verification support"
+	default y
+	depends on KEXEC_VERIFY_SIG
+	depends on EFI && SIGNED_PE_FILE_VERIFICATION
+	help
+	  Enable Image signature verification support.
+
+comment "Image signature verification is missing yet"
+	depends on KEXEC_VERIFY_SIG
+	depends on !EFI || !SIGNED_PE_FILE_VERIFICATION
+
 config CRASH_DUMP
 	bool "Build kdump crash kernel"
 	help
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 77f05bcf6a42..891f2484969d 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -133,6 +133,7 @@ struct arm64_image_header {
 };
 
 static const u8 arm64_image_magic[4] = {'A', 'R', 'M', 0x64U};
+static const u8 arm64_image_pe_sig[2] = {'M', 'Z'};
 
 /**
  * arm64_header_check_magic - Helper to check the arm64 image header.
@@ -154,6 +155,21 @@ static inline int arm64_header_check_magic(const struct arm64_image_header *h)
 		&& h->magic[3] == arm64_image_magic[3]);
 }
 
+/**
+ * arm64_header_check_pe_sig - Helper to check the arm64 image header.
+ *
+ * Returns non-zero if 'MZ' signature is found.
+ */
+
+static inline int arm64_header_check_pe_sig(const struct arm64_image_header *h)
+{
+	if (!h)
+		return 0;
+
+	return (h->pe_sig[0] == arm64_image_pe_sig[0]
+		&& h->pe_sig[1] == arm64_image_pe_sig[1]);
+}
+
 extern const struct kexec_file_ops kexec_image_ops;
 
 struct kimage;
diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
index 2b3baf7285e0..7c11beefe65f 100644
--- a/arch/arm64/kernel/kexec_image.c
+++ b/arch/arm64/kernel/kexec_image.c
@@ -12,6 +12,7 @@
 #include <linux/errno.h>
 #include <linux/kernel.h>
 #include <linux/kexec.h>
+#include <linux/verification.h>
 #include <asm/byteorder.h>
 #include <asm/memory.h>
 
@@ -24,6 +25,9 @@ static int image_probe(const char *kernel_buf, unsigned long kernel_len)
 	if ((kernel_len < sizeof(*h)) || !arm64_header_check_magic(h))
 		return -EINVAL;
 
+	pr_debug("PE format: %s\n",
+			(arm64_header_check_pe_sig(h) ? "yes" : "no"));
+
 	return 0;
 }
 
@@ -78,7 +82,18 @@ static void *image_load(struct kimage *image,
 	return ERR_PTR(ret);
 }
 
+#ifdef CONFIG_KEXEC_IMAGE_VERIFY_SIG
+static int image_verify_sig(const char *kernel, unsigned long kernel_len)
+{
+	return verify_pefile_signature(kernel, kernel_len, NULL,
+				       VERIFYING_KEXEC_PE_SIGNATURE);
+}
+#endif
+
 const struct kexec_file_ops kexec_image_ops = {
 	.probe = image_probe,
 	.load = image_load,
+#ifdef CONFIG_KEXEC_IMAGE_VERIFY_SIG
+	.verify_sig = image_verify_sig,
+#endif
 };
-- 
2.17.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 11/11] arm64: kexec_file: add kaslr support
  2018-04-25  6:26 ` AKASHI Takahiro
  (?)
@ 2018-04-25  6:26   ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd
  Cc: ard.biesheuvel, james.morse, bhsharma, kexec, linux-arm-kernel,
	linux-kernel, AKASHI Takahiro

Adding "kalsr-seed" to dtb enables triggering kaslr, or kernel virtual
address randomization, at secondary kernel boot. We always do this as
it wll have no harm on kaslr-incapable kernel.

We don't have any "switch" to turn off this feature directly, but still
can suppress it by passing "nokaslr" as a kernel boot argument.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/kernel/machine_kexec_file.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index ec674f4d267c..762f9102899c 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -16,6 +16,7 @@
 #include <linux/libfdt.h>
 #include <linux/memblock.h>
 #include <linux/of_fdt.h>
+#include <linux/random.h>
 #include <linux/types.h>
 #include <linux/vmalloc.h>
 #include <asm/byteorder.h>
@@ -246,6 +247,12 @@ static int setup_dtb(struct kimage *image,
 			goto out_err;
 	}
 
+	/* add kaslr-seed */
+	get_random_bytes(&value, sizeof(value));
+	ret = fdt_setprop(buf, nodeoffset, "kaslr-seed", &value, sizeof(value));
+	if (ret)
+		goto out_err;
+
 	/* trim a buffer */
 	fdt_pack(buf);
 	*dtb_buf = buf;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 11/11] arm64: kexec_file: add kaslr support
@ 2018-04-25  6:26   ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: linux-arm-kernel

Adding "kalsr-seed" to dtb enables triggering kaslr, or kernel virtual
address randomization, at secondary kernel boot. We always do this as
it wll have no harm on kaslr-incapable kernel.

We don't have any "switch" to turn off this feature directly, but still
can suppress it by passing "nokaslr" as a kernel boot argument.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/kernel/machine_kexec_file.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index ec674f4d267c..762f9102899c 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -16,6 +16,7 @@
 #include <linux/libfdt.h>
 #include <linux/memblock.h>
 #include <linux/of_fdt.h>
+#include <linux/random.h>
 #include <linux/types.h>
 #include <linux/vmalloc.h>
 #include <asm/byteorder.h>
@@ -246,6 +247,12 @@ static int setup_dtb(struct kimage *image,
 			goto out_err;
 	}
 
+	/* add kaslr-seed */
+	get_random_bytes(&value, sizeof(value));
+	ret = fdt_setprop(buf, nodeoffset, "kaslr-seed", &value, sizeof(value));
+	if (ret)
+		goto out_err;
+
 	/* trim a buffer */
 	fdt_pack(buf);
 	*dtb_buf = buf;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 11/11] arm64: kexec_file: add kaslr support
@ 2018-04-25  6:26   ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-04-25  6:26 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd
  Cc: ard.biesheuvel, bhsharma, kexec, linux-kernel, AKASHI Takahiro,
	james.morse, linux-arm-kernel

Adding "kalsr-seed" to dtb enables triggering kaslr, or kernel virtual
address randomization, at secondary kernel boot. We always do this as
it wll have no harm on kaslr-incapable kernel.

We don't have any "switch" to turn off this feature directly, but still
can suppress it by passing "nokaslr" as a kernel boot argument.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/kernel/machine_kexec_file.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index ec674f4d267c..762f9102899c 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -16,6 +16,7 @@
 #include <linux/libfdt.h>
 #include <linux/memblock.h>
 #include <linux/of_fdt.h>
+#include <linux/random.h>
 #include <linux/types.h>
 #include <linux/vmalloc.h>
 #include <asm/byteorder.h>
@@ -246,6 +247,12 @@ static int setup_dtb(struct kimage *image,
 			goto out_err;
 	}
 
+	/* add kaslr-seed */
+	get_random_bytes(&value, sizeof(value));
+	ret = fdt_setprop(buf, nodeoffset, "kaslr-seed", &value, sizeof(value));
+	if (ret)
+		goto out_err;
+
 	/* trim a buffer */
 	fdt_pack(buf);
 	*dtb_buf = buf;
-- 
2.17.0


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 02/11] kexec_file: make kexec_image_post_load_cleanup_default() global
  2018-04-25  6:26   ` AKASHI Takahiro
  (?)
@ 2018-04-28  9:45     ` Dave Young
  -1 siblings, 0 replies; 156+ messages in thread
From: Dave Young @ 2018-04-28  9:45 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	bhe, arnd, ard.biesheuvel, bhsharma, kexec, linux-kernel,
	james.morse, linux-arm-kernel

On 04/25/18 at 03:26pm, AKASHI Takahiro wrote:
> Change this function from static to global so that arm64 can implement
> its own arch_kimage_file_post_load_cleanup() later using
> kexec_image_post_load_cleanup_default().
> 
> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> Cc: Dave Young <dyoung@redhat.com>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Baoquan He <bhe@redhat.com>
> ---
>  include/linux/kexec.h | 1 +
>  kernel/kexec_file.c   | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 9e4e638fb505..49ab758f4d91 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -143,6 +143,7 @@ extern const struct kexec_file_ops * const kexec_file_loaders[];
>  
>  int kexec_image_probe_default(struct kimage *image, void *buf,
>  			      unsigned long buf_len);
> +int kexec_image_post_load_cleanup_default(struct kimage *image);
>  
>  /**
>   * struct kexec_buf - parameters for finding a place for a buffer in memory
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 75d8e7cf040e..eef89d9b1f03 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -78,7 +78,7 @@ void * __weak arch_kexec_kernel_image_load(struct kimage *image)
>  	return kexec_image_load_default(image);
>  }
>  
> -static int kexec_image_post_load_cleanup_default(struct kimage *image)
> +int kexec_image_post_load_cleanup_default(struct kimage *image)
>  {
>  	if (!image->fops || !image->fops->cleanup)
>  		return 0;
> -- 
> 2.17.0
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Acked-by: Dave Young <dyoung@redhat.com>

Thanks
Dave

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 02/11] kexec_file: make kexec_image_post_load_cleanup_default() global
@ 2018-04-28  9:45     ` Dave Young
  0 siblings, 0 replies; 156+ messages in thread
From: Dave Young @ 2018-04-28  9:45 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/25/18 at 03:26pm, AKASHI Takahiro wrote:
> Change this function from static to global so that arm64 can implement
> its own arch_kimage_file_post_load_cleanup() later using
> kexec_image_post_load_cleanup_default().
> 
> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> Cc: Dave Young <dyoung@redhat.com>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Baoquan He <bhe@redhat.com>
> ---
>  include/linux/kexec.h | 1 +
>  kernel/kexec_file.c   | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 9e4e638fb505..49ab758f4d91 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -143,6 +143,7 @@ extern const struct kexec_file_ops * const kexec_file_loaders[];
>  
>  int kexec_image_probe_default(struct kimage *image, void *buf,
>  			      unsigned long buf_len);
> +int kexec_image_post_load_cleanup_default(struct kimage *image);
>  
>  /**
>   * struct kexec_buf - parameters for finding a place for a buffer in memory
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 75d8e7cf040e..eef89d9b1f03 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -78,7 +78,7 @@ void * __weak arch_kexec_kernel_image_load(struct kimage *image)
>  	return kexec_image_load_default(image);
>  }
>  
> -static int kexec_image_post_load_cleanup_default(struct kimage *image)
> +int kexec_image_post_load_cleanup_default(struct kimage *image)
>  {
>  	if (!image->fops || !image->fops->cleanup)
>  		return 0;
> -- 
> 2.17.0
> 
> 
> _______________________________________________
> kexec mailing list
> kexec at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Acked-by: Dave Young <dyoung@redhat.com>

Thanks
Dave

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 02/11] kexec_file: make kexec_image_post_load_cleanup_default() global
@ 2018-04-28  9:45     ` Dave Young
  0 siblings, 0 replies; 156+ messages in thread
From: Dave Young @ 2018-04-28  9:45 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, james.morse, davem, vgoyal

On 04/25/18 at 03:26pm, AKASHI Takahiro wrote:
> Change this function from static to global so that arm64 can implement
> its own arch_kimage_file_post_load_cleanup() later using
> kexec_image_post_load_cleanup_default().
> 
> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> Cc: Dave Young <dyoung@redhat.com>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Baoquan He <bhe@redhat.com>
> ---
>  include/linux/kexec.h | 1 +
>  kernel/kexec_file.c   | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 9e4e638fb505..49ab758f4d91 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -143,6 +143,7 @@ extern const struct kexec_file_ops * const kexec_file_loaders[];
>  
>  int kexec_image_probe_default(struct kimage *image, void *buf,
>  			      unsigned long buf_len);
> +int kexec_image_post_load_cleanup_default(struct kimage *image);
>  
>  /**
>   * struct kexec_buf - parameters for finding a place for a buffer in memory
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 75d8e7cf040e..eef89d9b1f03 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -78,7 +78,7 @@ void * __weak arch_kexec_kernel_image_load(struct kimage *image)
>  	return kexec_image_load_default(image);
>  }
>  
> -static int kexec_image_post_load_cleanup_default(struct kimage *image)
> +int kexec_image_post_load_cleanup_default(struct kimage *image)
>  {
>  	if (!image->fops || !image->fops->cleanup)
>  		return 0;
> -- 
> 2.17.0
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Acked-by: Dave Young <dyoung@redhat.com>

Thanks
Dave

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 02/11] kexec_file: make kexec_image_post_load_cleanup_default() global
  2018-04-25  6:26   ` AKASHI Takahiro
  (?)
@ 2018-05-01 17:46     ` James Morse
  -1 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-01 17:46 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

Hi Akashi,

On 25/04/18 07:26, AKASHI Takahiro wrote:
> Change this function from static to global so that arm64 can implement
> its own arch_kimage_file_post_load_cleanup() later using
> kexec_image_post_load_cleanup_default().

Do we need to call kexec_image_post_load_cleanup_default()? All it does is call
the image-type fops->cleanup(), which you don't implement in this series.

Is this just-in-case someone adds cleanup() later and is surprised only the
arch-level helper is called?


Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 02/11] kexec_file: make kexec_image_post_load_cleanup_default() global
@ 2018-05-01 17:46     ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-01 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Akashi,

On 25/04/18 07:26, AKASHI Takahiro wrote:
> Change this function from static to global so that arm64 can implement
> its own arch_kimage_file_post_load_cleanup() later using
> kexec_image_post_load_cleanup_default().

Do we need to call kexec_image_post_load_cleanup_default()? All it does is call
the image-type fops->cleanup(), which you don't implement in this series.

Is this just-in-case someone adds cleanup() later and is surprised only the
arch-level helper is called?


Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 02/11] kexec_file: make kexec_image_post_load_cleanup_default() global
@ 2018-05-01 17:46     ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-01 17:46 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

Hi Akashi,

On 25/04/18 07:26, AKASHI Takahiro wrote:
> Change this function from static to global so that arm64 can implement
> its own arch_kimage_file_post_load_cleanup() later using
> kexec_image_post_load_cleanup_default().

Do we need to call kexec_image_post_load_cleanup_default()? All it does is call
the image-type fops->cleanup(), which you don't implement in this series.

Is this just-in-case someone adds cleanup() later and is surprised only the
arch-level helper is called?


Thanks,

James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 03/11] arm64: kexec_file: invoke the kernel without purgatory
  2018-04-25  6:26   ` AKASHI Takahiro
  (?)
@ 2018-05-01 17:46     ` James Morse
  -1 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-01 17:46 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

Hi Akashi,

On 25/04/18 07:26, AKASHI Takahiro wrote:
> On arm64, purugatory would do almosty nothing. So just invoke secondary
> kernel directy by jumping into its entry code.

(Nits: purgatory, almost, directly)


> While, in this case, cpu_soft_restart() must be called with dtb address
> in the fifth argument, the behavior still stays compatible with kexec_load
> case as long as the argument is null.


> diff --git a/arch/arm64/kernel/cpu-reset.S b/arch/arm64/kernel/cpu-reset.S
> index 8021b46c9743..391df91328ac 100644
> --- a/arch/arm64/kernel/cpu-reset.S
> +++ b/arch/arm64/kernel/cpu-reset.S
> @@ -24,9 +24,9 @@
>   *
>   * @el2_switch: Flag to indicate a swich to EL2 is needed.

(Nit: switch)

>   * @entry: Location to jump to for soft reset.
> - * arg0: First argument passed to @entry.
> - * arg1: Second argument passed to @entry.
> - * arg2: Third argument passed to @entry.
> + * arg0: First argument passed to @entry. (relocation list)
> + * arg1: Second argument passed to @entry.(physcal kernel entry)

(Nit: physical)


> + * arg2: Third argument passed to @entry. (physical dtb address)
>   *
>   * Put the CPU into the same state as it would be if it had been reset, and
>   * branch to what would be the reset vector. It must be executed with the
> diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
> index f76ea92dff91..f7dbba00be10 100644
> --- a/arch/arm64/kernel/machine_kexec.c
> +++ b/arch/arm64/kernel/machine_kexec.c
> @@ -205,10 +205,17 @@ void machine_kexec(struct kimage *kimage)
>  	 * uses physical addressing to relocate the new image to its final
>  	 * position and transfers control to the image entry point when the
>  	 * relocation is complete.
> +	 * In case of kexec_file_load syscall, we directly start the kernel,
> +	 * skipping purgatory.

We're not really skipping purgatory, purgatory doesn't exist! For regular kexec
the image/payload we run is up to kexec-tools. For kexec_file_load its a
kernel-image. Purgatory is a kexec-tools-ism.


>  	cpu_soft_restart(kimage != kexec_crash_image,
> -		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
> +		reboot_code_buffer_phys, kimage->head, kimage->start,
> +#ifdef CONFIG_KEXEC_FILE
> +				kimage->purgatory_info.purgatory_buf ?
> +						0 : kimage->arch.dtb_mem);
> +#else
> +				0);
> +#endif

Where does kimage->arch.dtb_mem come from? This patch won't build until patch 8
adds the config option, which is going to make bisecting any kexec side-effects
tricky.

purgatory_buf seems to only be set in kexec_purgatory_setup_kbuf(), called from
kexec_load_purgatory(), which we don't use. How does this get a value?

Would it be better to always use kimage->arch.dtb_mem, and ensure that is 0 for
regular kexec (as we can't know where the dtb is)? (image_arg may then be a
better name).


Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 03/11] arm64: kexec_file: invoke the kernel without purgatory
@ 2018-05-01 17:46     ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-01 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Akashi,

On 25/04/18 07:26, AKASHI Takahiro wrote:
> On arm64, purugatory would do almosty nothing. So just invoke secondary
> kernel directy by jumping into its entry code.

(Nits: purgatory, almost, directly)


> While, in this case, cpu_soft_restart() must be called with dtb address
> in the fifth argument, the behavior still stays compatible with kexec_load
> case as long as the argument is null.


> diff --git a/arch/arm64/kernel/cpu-reset.S b/arch/arm64/kernel/cpu-reset.S
> index 8021b46c9743..391df91328ac 100644
> --- a/arch/arm64/kernel/cpu-reset.S
> +++ b/arch/arm64/kernel/cpu-reset.S
> @@ -24,9 +24,9 @@
>   *
>   * @el2_switch: Flag to indicate a swich to EL2 is needed.

(Nit: switch)

>   * @entry: Location to jump to for soft reset.
> - * arg0: First argument passed to @entry.
> - * arg1: Second argument passed to @entry.
> - * arg2: Third argument passed to @entry.
> + * arg0: First argument passed to @entry. (relocation list)
> + * arg1: Second argument passed to @entry.(physcal kernel entry)

(Nit: physical)


> + * arg2: Third argument passed to @entry. (physical dtb address)
>   *
>   * Put the CPU into the same state as it would be if it had been reset, and
>   * branch to what would be the reset vector. It must be executed with the
> diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
> index f76ea92dff91..f7dbba00be10 100644
> --- a/arch/arm64/kernel/machine_kexec.c
> +++ b/arch/arm64/kernel/machine_kexec.c
> @@ -205,10 +205,17 @@ void machine_kexec(struct kimage *kimage)
>  	 * uses physical addressing to relocate the new image to its final
>  	 * position and transfers control to the image entry point when the
>  	 * relocation is complete.
> +	 * In case of kexec_file_load syscall, we directly start the kernel,
> +	 * skipping purgatory.

We're not really skipping purgatory, purgatory doesn't exist! For regular kexec
the image/payload we run is up to kexec-tools. For kexec_file_load its a
kernel-image. Purgatory is a kexec-tools-ism.


>  	cpu_soft_restart(kimage != kexec_crash_image,
> -		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
> +		reboot_code_buffer_phys, kimage->head, kimage->start,
> +#ifdef CONFIG_KEXEC_FILE
> +				kimage->purgatory_info.purgatory_buf ?
> +						0 : kimage->arch.dtb_mem);
> +#else
> +				0);
> +#endif

Where does kimage->arch.dtb_mem come from? This patch won't build until patch 8
adds the config option, which is going to make bisecting any kexec side-effects
tricky.

purgatory_buf seems to only be set in kexec_purgatory_setup_kbuf(), called from
kexec_load_purgatory(), which we don't use. How does this get a value?

Would it be better to always use kimage->arch.dtb_mem, and ensure that is 0 for
regular kexec (as we can't know where the dtb is)? (image_arg may then be a
better name).


Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 03/11] arm64: kexec_file: invoke the kernel without purgatory
@ 2018-05-01 17:46     ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-01 17:46 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

Hi Akashi,

On 25/04/18 07:26, AKASHI Takahiro wrote:
> On arm64, purugatory would do almosty nothing. So just invoke secondary
> kernel directy by jumping into its entry code.

(Nits: purgatory, almost, directly)


> While, in this case, cpu_soft_restart() must be called with dtb address
> in the fifth argument, the behavior still stays compatible with kexec_load
> case as long as the argument is null.


> diff --git a/arch/arm64/kernel/cpu-reset.S b/arch/arm64/kernel/cpu-reset.S
> index 8021b46c9743..391df91328ac 100644
> --- a/arch/arm64/kernel/cpu-reset.S
> +++ b/arch/arm64/kernel/cpu-reset.S
> @@ -24,9 +24,9 @@
>   *
>   * @el2_switch: Flag to indicate a swich to EL2 is needed.

(Nit: switch)

>   * @entry: Location to jump to for soft reset.
> - * arg0: First argument passed to @entry.
> - * arg1: Second argument passed to @entry.
> - * arg2: Third argument passed to @entry.
> + * arg0: First argument passed to @entry. (relocation list)
> + * arg1: Second argument passed to @entry.(physcal kernel entry)

(Nit: physical)


> + * arg2: Third argument passed to @entry. (physical dtb address)
>   *
>   * Put the CPU into the same state as it would be if it had been reset, and
>   * branch to what would be the reset vector. It must be executed with the
> diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
> index f76ea92dff91..f7dbba00be10 100644
> --- a/arch/arm64/kernel/machine_kexec.c
> +++ b/arch/arm64/kernel/machine_kexec.c
> @@ -205,10 +205,17 @@ void machine_kexec(struct kimage *kimage)
>  	 * uses physical addressing to relocate the new image to its final
>  	 * position and transfers control to the image entry point when the
>  	 * relocation is complete.
> +	 * In case of kexec_file_load syscall, we directly start the kernel,
> +	 * skipping purgatory.

We're not really skipping purgatory, purgatory doesn't exist! For regular kexec
the image/payload we run is up to kexec-tools. For kexec_file_load its a
kernel-image. Purgatory is a kexec-tools-ism.


>  	cpu_soft_restart(kimage != kexec_crash_image,
> -		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
> +		reboot_code_buffer_phys, kimage->head, kimage->start,
> +#ifdef CONFIG_KEXEC_FILE
> +				kimage->purgatory_info.purgatory_buf ?
> +						0 : kimage->arch.dtb_mem);
> +#else
> +				0);
> +#endif

Where does kimage->arch.dtb_mem come from? This patch won't build until patch 8
adds the config option, which is going to make bisecting any kexec side-effects
tricky.

purgatory_buf seems to only be set in kexec_purgatory_setup_kbuf(), called from
kexec_load_purgatory(), which we don't use. How does this get a value?

Would it be better to always use kimage->arch.dtb_mem, and ensure that is 0 for
regular kexec (as we can't know where the dtb is)? (image_arg may then be a
better name).


Thanks,

James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
  2018-04-25  6:26   ` AKASHI Takahiro
  (?)
@ 2018-05-01 17:46     ` James Morse
  -1 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-01 17:46 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

Hi Akashi,

On 25/04/18 07:26, AKASHI Takahiro wrote:
> We need to prevent firmware-reserved memory regions, particularly EFI
> memory map as well as ACPI tables, from being corrupted by loading
> kernel/initrd (or other kexec buffers). We also want to support memory
> allocation in top-down manner in addition to default bottom-up.
> So let's have arm64 specific arch_kexec_walk_mem() which will search
> for available memory ranges in usable memblock list,
> i.e. !NOMAP & !reserved, 

> instead of system resource tree.

Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
be safe in the EFI-memory-map/ACPI-tables case?

It would be good to avoid having two ways of doing this, and I would like to
avoid having extra arch code...


> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> new file mode 100644
> index 000000000000..f9ebf54ca247
> --- /dev/null
> +++ b/arch/arm64/kernel/machine_kexec_file.c
> @@ -0,0 +1,57 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * kexec_file for arm64
> + *
> + * Copyright (C) 2018 Linaro Limited
> + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> + *

> + * Most code is derived from arm64 port of kexec-tools

How does kexec-tools walk memblock?


> + */
> +
> +#define pr_fmt(fmt) "kexec_file: " fmt
> +
> +#include <linux/ioport.h>
> +#include <linux/kernel.h>
> +#include <linux/kexec.h>
> +#include <linux/memblock.h>
> +
> +int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> +				int (*func)(struct resource *, void *))
> +{
> +	phys_addr_t start, end;
> +	struct resource res;
> +	u64 i;
> +	int ret = 0;
> +
> +	if (kbuf->image->type == KEXEC_TYPE_CRASH)
> +		return func(&crashk_res, kbuf);
> +
> +	if (kbuf->top_down)
> +		for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,
> +				NUMA_NO_NODE, MEMBLOCK_NONE,
> +				&start, &end, NULL) {

for_each_free_mem_range_reverse() is a more readable version of this helper.

> +			if (!memblock_is_map_memory(start))
> +				continue;

Passing MEMBLOCK_NONE means this walk will never find MEMBLOCK_NOMAP memory.


> +			res.start = start;
> +			res.end = end;
> +			ret = func(&res, kbuf);
> +			if (ret)
> +				break;
> +		}
> +	else
> +		for_each_mem_range(i, &memblock.memory, &memblock.reserved,
> +				NUMA_NO_NODE, MEMBLOCK_NONE,
> +				&start, &end, NULL) {

for_each_free_mem_range()?

> +			if (!memblock_is_map_memory(start))
> +				continue;
> +
> +			res.start = start;
> +			res.end = end;
> +			ret = func(&res, kbuf);
> +			if (ret)
> +				break;
> +		}
> +
> +	return ret;
> +}
> 

With these changes, what we have is almost:
arch/powerpc/kernel/machine_kexec_file_64.c::arch_kexec_walk_mem() !
(the difference being powerpc doesn't yet support crash-kernels here)

If the argument is walking memblock gives a better answer than the stringy
walk_system_ram_res() thing, is there any mileage in moving this code into
kexec_file.c, and using it if !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)?

This would save arm64/powerpc having near-identical implementations.
32bit arm keeps memblock if it has kexec, so it may be useful there too if
kexec_file_load() support is added.


Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
@ 2018-05-01 17:46     ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-01 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Akashi,

On 25/04/18 07:26, AKASHI Takahiro wrote:
> We need to prevent firmware-reserved memory regions, particularly EFI
> memory map as well as ACPI tables, from being corrupted by loading
> kernel/initrd (or other kexec buffers). We also want to support memory
> allocation in top-down manner in addition to default bottom-up.
> So let's have arm64 specific arch_kexec_walk_mem() which will search
> for available memory ranges in usable memblock list,
> i.e. !NOMAP & !reserved, 

> instead of system resource tree.

Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
be safe in the EFI-memory-map/ACPI-tables case?

It would be good to avoid having two ways of doing this, and I would like to
avoid having extra arch code...


> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> new file mode 100644
> index 000000000000..f9ebf54ca247
> --- /dev/null
> +++ b/arch/arm64/kernel/machine_kexec_file.c
> @@ -0,0 +1,57 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * kexec_file for arm64
> + *
> + * Copyright (C) 2018 Linaro Limited
> + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> + *

> + * Most code is derived from arm64 port of kexec-tools

How does kexec-tools walk memblock?


> + */
> +
> +#define pr_fmt(fmt) "kexec_file: " fmt
> +
> +#include <linux/ioport.h>
> +#include <linux/kernel.h>
> +#include <linux/kexec.h>
> +#include <linux/memblock.h>
> +
> +int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> +				int (*func)(struct resource *, void *))
> +{
> +	phys_addr_t start, end;
> +	struct resource res;
> +	u64 i;
> +	int ret = 0;
> +
> +	if (kbuf->image->type == KEXEC_TYPE_CRASH)
> +		return func(&crashk_res, kbuf);
> +
> +	if (kbuf->top_down)
> +		for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,
> +				NUMA_NO_NODE, MEMBLOCK_NONE,
> +				&start, &end, NULL) {

for_each_free_mem_range_reverse() is a more readable version of this helper.

> +			if (!memblock_is_map_memory(start))
> +				continue;

Passing MEMBLOCK_NONE means this walk will never find MEMBLOCK_NOMAP memory.


> +			res.start = start;
> +			res.end = end;
> +			ret = func(&res, kbuf);
> +			if (ret)
> +				break;
> +		}
> +	else
> +		for_each_mem_range(i, &memblock.memory, &memblock.reserved,
> +				NUMA_NO_NODE, MEMBLOCK_NONE,
> +				&start, &end, NULL) {

for_each_free_mem_range()?

> +			if (!memblock_is_map_memory(start))
> +				continue;
> +
> +			res.start = start;
> +			res.end = end;
> +			ret = func(&res, kbuf);
> +			if (ret)
> +				break;
> +		}
> +
> +	return ret;
> +}
> 

With these changes, what we have is almost:
arch/powerpc/kernel/machine_kexec_file_64.c::arch_kexec_walk_mem() !
(the difference being powerpc doesn't yet support crash-kernels here)

If the argument is walking memblock gives a better answer than the stringy
walk_system_ram_res() thing, is there any mileage in moving this code into
kexec_file.c, and using it if !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)?

This would save arm64/powerpc having near-identical implementations.
32bit arm keeps memblock if it has kexec, so it may be useful there too if
kexec_file_load() support is added.


Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
@ 2018-05-01 17:46     ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-01 17:46 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

Hi Akashi,

On 25/04/18 07:26, AKASHI Takahiro wrote:
> We need to prevent firmware-reserved memory regions, particularly EFI
> memory map as well as ACPI tables, from being corrupted by loading
> kernel/initrd (or other kexec buffers). We also want to support memory
> allocation in top-down manner in addition to default bottom-up.
> So let's have arm64 specific arch_kexec_walk_mem() which will search
> for available memory ranges in usable memblock list,
> i.e. !NOMAP & !reserved, 

> instead of system resource tree.

Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
be safe in the EFI-memory-map/ACPI-tables case?

It would be good to avoid having two ways of doing this, and I would like to
avoid having extra arch code...


> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> new file mode 100644
> index 000000000000..f9ebf54ca247
> --- /dev/null
> +++ b/arch/arm64/kernel/machine_kexec_file.c
> @@ -0,0 +1,57 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * kexec_file for arm64
> + *
> + * Copyright (C) 2018 Linaro Limited
> + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> + *

> + * Most code is derived from arm64 port of kexec-tools

How does kexec-tools walk memblock?


> + */
> +
> +#define pr_fmt(fmt) "kexec_file: " fmt
> +
> +#include <linux/ioport.h>
> +#include <linux/kernel.h>
> +#include <linux/kexec.h>
> +#include <linux/memblock.h>
> +
> +int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> +				int (*func)(struct resource *, void *))
> +{
> +	phys_addr_t start, end;
> +	struct resource res;
> +	u64 i;
> +	int ret = 0;
> +
> +	if (kbuf->image->type == KEXEC_TYPE_CRASH)
> +		return func(&crashk_res, kbuf);
> +
> +	if (kbuf->top_down)
> +		for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,
> +				NUMA_NO_NODE, MEMBLOCK_NONE,
> +				&start, &end, NULL) {

for_each_free_mem_range_reverse() is a more readable version of this helper.

> +			if (!memblock_is_map_memory(start))
> +				continue;

Passing MEMBLOCK_NONE means this walk will never find MEMBLOCK_NOMAP memory.


> +			res.start = start;
> +			res.end = end;
> +			ret = func(&res, kbuf);
> +			if (ret)
> +				break;
> +		}
> +	else
> +		for_each_mem_range(i, &memblock.memory, &memblock.reserved,
> +				NUMA_NO_NODE, MEMBLOCK_NONE,
> +				&start, &end, NULL) {

for_each_free_mem_range()?

> +			if (!memblock_is_map_memory(start))
> +				continue;
> +
> +			res.start = start;
> +			res.end = end;
> +			ret = func(&res, kbuf);
> +			if (ret)
> +				break;
> +		}
> +
> +	return ret;
> +}
> 

With these changes, what we have is almost:
arch/powerpc/kernel/machine_kexec_file_64.c::arch_kexec_walk_mem() !
(the difference being powerpc doesn't yet support crash-kernels here)

If the argument is walking memblock gives a better answer than the stringy
walk_system_ram_res() thing, is there any mileage in moving this code into
kexec_file.c, and using it if !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)?

This would save arm64/powerpc having near-identical implementations.
32bit arm keeps memblock if it has kexec, so it may be useful there too if
kexec_file_load() support is added.


Thanks,

James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 06/11] arm64: kexec_file: allow for loading Image-format kernel
  2018-04-25  6:26   ` AKASHI Takahiro
  (?)
@ 2018-05-01 17:46     ` James Morse
  -1 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-01 17:46 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

Hi Akashi,

On 25/04/18 07:26, AKASHI Takahiro wrote:
> This patch provides kexec_file_ops for "Image"-format kernel. In this
> implementation, a binary is always loaded with a fixed offset identified
> in text_offset field of its header.


> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
> index e4de1223715f..3cba4161818a 100644
> --- a/arch/arm64/include/asm/kexec.h
> +++ b/arch/arm64/include/asm/kexec.h
> @@ -102,6 +102,56 @@ struct kimage_arch {
>  	void *dtb_buf;
>  };
>  
> +/**
> + * struct arm64_image_header - arm64 kernel image header
> + *
> + * @pe_sig: Optional PE format 'MZ' signature
> + * @branch_code: Instruction to branch to stext
> + * @text_offset: Image load offset, little endian
> + * @image_size: Effective image size, little endian
> + * @flags:
> + *	Bit 0: Kernel endianness. 0=little endian, 1=big endian

Page size? What about 'phys_base'?, (whatever that is...)
Probably best to refer to Documentation/arm64/booting.txt here, its the
authoritative source of what these fields mean.


> + * @reserved: Reserved
> + * @magic: Magic number, "ARM\x64"
> + * @pe_header: Optional offset to a PE format header
> + **/
> +
> +struct arm64_image_header {
> +	u8 pe_sig[2];
> +	u8 pad[2];
> +	u32 branch_code;
> +	u64 text_offset;
> +	u64 image_size;
> +	u64 flags;

__le64 as appropriate here would let tools like sparse catch any missing endian
conversion bugs.


> +	u64 reserved[3];
> +	u8 magic[4];
> +	u32 pe_header;
> +};

I'm surprised we don't have a definition for this already, I guess its always
done in asm. We have kernel/image.h that holds some of this stuff, if we are
going to validate the flags, is it worth adding the code there, (and moving it
to include/asm)?


> +static const u8 arm64_image_magic[4] = {'A', 'R', 'M', 0x64U};

Any chance this magic could be a pre-processor symbol shared with head.S?


> +
> +/**
> + * arm64_header_check_magic - Helper to check the arm64 image header.
> + *
> + * Returns non-zero if header is OK.
> + */
> +
> +static inline int arm64_header_check_magic(const struct arm64_image_header *h)
> +{
> +	if (!h)
> +		return 0;
> +
> +	if (!h->text_offset)
> +		return 0;
> +
> +	return (h->magic[0] == arm64_image_magic[0]
> +		&& h->magic[1] == arm64_image_magic[1]
> +		&& h->magic[2] == arm64_image_magic[2]
> +		&& h->magic[3] == arm64_image_magic[3]);

memcmp()? Or just define it as a 32bit value?
I guess you skip the MZ prefix as its not present for !EFI?

Could we check branch_code is non-zero, and text-offset points within image-size?


We could check that this platform supports the page-size/endian config that this
Image was built with... We get a message from the EFI stub if the page-size
can't be supported, it would be nice to do the same here (as we can).

(no idea if kexec-tool checks this stuff, it probably can't get at the id
registers to know)


> diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
> new file mode 100644
> index 000000000000..4dd524ad6611
> --- /dev/null
> +++ b/arch/arm64/kernel/kexec_image.c
> @@ -0,0 +1,79 @@

> +static void *image_load(struct kimage *image,
> +				char *kernel, unsigned long kernel_len,
> +				char *initrd, unsigned long initrd_len,
> +				char *cmdline, unsigned long cmdline_len)
> +{
> +	struct kexec_buf kbuf;
> +	struct arm64_image_header *h = (struct arm64_image_header *)kernel;
> +	unsigned long text_offset;
> +	int ret;
> +
> +	/* Load the kernel */
> +	kbuf.image = image;
> +	kbuf.buf_min = 0;
> +	kbuf.buf_max = ULONG_MAX;
> +	kbuf.top_down = false;
> +
> +	kbuf.buffer = kernel;
> +	kbuf.bufsz = kernel_len;
> +	kbuf.memsz = le64_to_cpu(h->image_size);
> +	text_offset = le64_to_cpu(h->text_offset);
> +	kbuf.buf_align = SZ_2M;

> +	/* Adjust kernel segment with TEXT_OFFSET */
> +	kbuf.memsz += text_offset;
> +
> +	ret = kexec_add_buffer(&kbuf);
> +	if (ret)
> +		goto out;
> +
> +	image->arch.kern_segment = image->nr_segments - 1;

You only seem to use kern_segment here, and in load_other_segments() called
below. Could it not be a local variable passed in? Instead of arch-specific data
we keep forever?


> +	image->segment[image->arch.kern_segment].mem += text_offset;
> +	image->segment[image->arch.kern_segment].memsz -= text_offset;
> +	image->start = image->segment[image->arch.kern_segment].mem;
> +
> +	pr_debug("Loaded kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> +				image->segment[image->arch.kern_segment].mem,
> +				kbuf.bufsz, kbuf.memsz);
> +
> +	/* Load additional data */
> +	ret = load_other_segments(image, initrd, initrd_len,
> +				cmdline, cmdline_len);
> +
> +out:
> +	return ERR_PTR(ret);
> +}
Looks good,

Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 06/11] arm64: kexec_file: allow for loading Image-format kernel
@ 2018-05-01 17:46     ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-01 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Akashi,

On 25/04/18 07:26, AKASHI Takahiro wrote:
> This patch provides kexec_file_ops for "Image"-format kernel. In this
> implementation, a binary is always loaded with a fixed offset identified
> in text_offset field of its header.


> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
> index e4de1223715f..3cba4161818a 100644
> --- a/arch/arm64/include/asm/kexec.h
> +++ b/arch/arm64/include/asm/kexec.h
> @@ -102,6 +102,56 @@ struct kimage_arch {
>  	void *dtb_buf;
>  };
>  
> +/**
> + * struct arm64_image_header - arm64 kernel image header
> + *
> + * @pe_sig: Optional PE format 'MZ' signature
> + * @branch_code: Instruction to branch to stext
> + * @text_offset: Image load offset, little endian
> + * @image_size: Effective image size, little endian
> + * @flags:
> + *	Bit 0: Kernel endianness. 0=little endian, 1=big endian

Page size? What about 'phys_base'?, (whatever that is...)
Probably best to refer to Documentation/arm64/booting.txt here, its the
authoritative source of what these fields mean.


> + * @reserved: Reserved
> + * @magic: Magic number, "ARM\x64"
> + * @pe_header: Optional offset to a PE format header
> + **/
> +
> +struct arm64_image_header {
> +	u8 pe_sig[2];
> +	u8 pad[2];
> +	u32 branch_code;
> +	u64 text_offset;
> +	u64 image_size;
> +	u64 flags;

__le64 as appropriate here would let tools like sparse catch any missing endian
conversion bugs.


> +	u64 reserved[3];
> +	u8 magic[4];
> +	u32 pe_header;
> +};

I'm surprised we don't have a definition for this already, I guess its always
done in asm. We have kernel/image.h that holds some of this stuff, if we are
going to validate the flags, is it worth adding the code there, (and moving it
to include/asm)?


> +static const u8 arm64_image_magic[4] = {'A', 'R', 'M', 0x64U};

Any chance this magic could be a pre-processor symbol shared with head.S?


> +
> +/**
> + * arm64_header_check_magic - Helper to check the arm64 image header.
> + *
> + * Returns non-zero if header is OK.
> + */
> +
> +static inline int arm64_header_check_magic(const struct arm64_image_header *h)
> +{
> +	if (!h)
> +		return 0;
> +
> +	if (!h->text_offset)
> +		return 0;
> +
> +	return (h->magic[0] == arm64_image_magic[0]
> +		&& h->magic[1] == arm64_image_magic[1]
> +		&& h->magic[2] == arm64_image_magic[2]
> +		&& h->magic[3] == arm64_image_magic[3]);

memcmp()? Or just define it as a 32bit value?
I guess you skip the MZ prefix as its not present for !EFI?

Could we check branch_code is non-zero, and text-offset points within image-size?


We could check that this platform supports the page-size/endian config that this
Image was built with... We get a message from the EFI stub if the page-size
can't be supported, it would be nice to do the same here (as we can).

(no idea if kexec-tool checks this stuff, it probably can't get at the id
registers to know)


> diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
> new file mode 100644
> index 000000000000..4dd524ad6611
> --- /dev/null
> +++ b/arch/arm64/kernel/kexec_image.c
> @@ -0,0 +1,79 @@

> +static void *image_load(struct kimage *image,
> +				char *kernel, unsigned long kernel_len,
> +				char *initrd, unsigned long initrd_len,
> +				char *cmdline, unsigned long cmdline_len)
> +{
> +	struct kexec_buf kbuf;
> +	struct arm64_image_header *h = (struct arm64_image_header *)kernel;
> +	unsigned long text_offset;
> +	int ret;
> +
> +	/* Load the kernel */
> +	kbuf.image = image;
> +	kbuf.buf_min = 0;
> +	kbuf.buf_max = ULONG_MAX;
> +	kbuf.top_down = false;
> +
> +	kbuf.buffer = kernel;
> +	kbuf.bufsz = kernel_len;
> +	kbuf.memsz = le64_to_cpu(h->image_size);
> +	text_offset = le64_to_cpu(h->text_offset);
> +	kbuf.buf_align = SZ_2M;

> +	/* Adjust kernel segment with TEXT_OFFSET */
> +	kbuf.memsz += text_offset;
> +
> +	ret = kexec_add_buffer(&kbuf);
> +	if (ret)
> +		goto out;
> +
> +	image->arch.kern_segment = image->nr_segments - 1;

You only seem to use kern_segment here, and in load_other_segments() called
below. Could it not be a local variable passed in? Instead of arch-specific data
we keep forever?


> +	image->segment[image->arch.kern_segment].mem += text_offset;
> +	image->segment[image->arch.kern_segment].memsz -= text_offset;
> +	image->start = image->segment[image->arch.kern_segment].mem;
> +
> +	pr_debug("Loaded kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> +				image->segment[image->arch.kern_segment].mem,
> +				kbuf.bufsz, kbuf.memsz);
> +
> +	/* Load additional data */
> +	ret = load_other_segments(image, initrd, initrd_len,
> +				cmdline, cmdline_len);
> +
> +out:
> +	return ERR_PTR(ret);
> +}
Looks good,

Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 06/11] arm64: kexec_file: allow for loading Image-format kernel
@ 2018-05-01 17:46     ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-01 17:46 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

Hi Akashi,

On 25/04/18 07:26, AKASHI Takahiro wrote:
> This patch provides kexec_file_ops for "Image"-format kernel. In this
> implementation, a binary is always loaded with a fixed offset identified
> in text_offset field of its header.


> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
> index e4de1223715f..3cba4161818a 100644
> --- a/arch/arm64/include/asm/kexec.h
> +++ b/arch/arm64/include/asm/kexec.h
> @@ -102,6 +102,56 @@ struct kimage_arch {
>  	void *dtb_buf;
>  };
>  
> +/**
> + * struct arm64_image_header - arm64 kernel image header
> + *
> + * @pe_sig: Optional PE format 'MZ' signature
> + * @branch_code: Instruction to branch to stext
> + * @text_offset: Image load offset, little endian
> + * @image_size: Effective image size, little endian
> + * @flags:
> + *	Bit 0: Kernel endianness. 0=little endian, 1=big endian

Page size? What about 'phys_base'?, (whatever that is...)
Probably best to refer to Documentation/arm64/booting.txt here, its the
authoritative source of what these fields mean.


> + * @reserved: Reserved
> + * @magic: Magic number, "ARM\x64"
> + * @pe_header: Optional offset to a PE format header
> + **/
> +
> +struct arm64_image_header {
> +	u8 pe_sig[2];
> +	u8 pad[2];
> +	u32 branch_code;
> +	u64 text_offset;
> +	u64 image_size;
> +	u64 flags;

__le64 as appropriate here would let tools like sparse catch any missing endian
conversion bugs.


> +	u64 reserved[3];
> +	u8 magic[4];
> +	u32 pe_header;
> +};

I'm surprised we don't have a definition for this already, I guess its always
done in asm. We have kernel/image.h that holds some of this stuff, if we are
going to validate the flags, is it worth adding the code there, (and moving it
to include/asm)?


> +static const u8 arm64_image_magic[4] = {'A', 'R', 'M', 0x64U};

Any chance this magic could be a pre-processor symbol shared with head.S?


> +
> +/**
> + * arm64_header_check_magic - Helper to check the arm64 image header.
> + *
> + * Returns non-zero if header is OK.
> + */
> +
> +static inline int arm64_header_check_magic(const struct arm64_image_header *h)
> +{
> +	if (!h)
> +		return 0;
> +
> +	if (!h->text_offset)
> +		return 0;
> +
> +	return (h->magic[0] == arm64_image_magic[0]
> +		&& h->magic[1] == arm64_image_magic[1]
> +		&& h->magic[2] == arm64_image_magic[2]
> +		&& h->magic[3] == arm64_image_magic[3]);

memcmp()? Or just define it as a 32bit value?
I guess you skip the MZ prefix as its not present for !EFI?

Could we check branch_code is non-zero, and text-offset points within image-size?


We could check that this platform supports the page-size/endian config that this
Image was built with... We get a message from the EFI stub if the page-size
can't be supported, it would be nice to do the same here (as we can).

(no idea if kexec-tool checks this stuff, it probably can't get at the id
registers to know)


> diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
> new file mode 100644
> index 000000000000..4dd524ad6611
> --- /dev/null
> +++ b/arch/arm64/kernel/kexec_image.c
> @@ -0,0 +1,79 @@

> +static void *image_load(struct kimage *image,
> +				char *kernel, unsigned long kernel_len,
> +				char *initrd, unsigned long initrd_len,
> +				char *cmdline, unsigned long cmdline_len)
> +{
> +	struct kexec_buf kbuf;
> +	struct arm64_image_header *h = (struct arm64_image_header *)kernel;
> +	unsigned long text_offset;
> +	int ret;
> +
> +	/* Load the kernel */
> +	kbuf.image = image;
> +	kbuf.buf_min = 0;
> +	kbuf.buf_max = ULONG_MAX;
> +	kbuf.top_down = false;
> +
> +	kbuf.buffer = kernel;
> +	kbuf.bufsz = kernel_len;
> +	kbuf.memsz = le64_to_cpu(h->image_size);
> +	text_offset = le64_to_cpu(h->text_offset);
> +	kbuf.buf_align = SZ_2M;

> +	/* Adjust kernel segment with TEXT_OFFSET */
> +	kbuf.memsz += text_offset;
> +
> +	ret = kexec_add_buffer(&kbuf);
> +	if (ret)
> +		goto out;
> +
> +	image->arch.kern_segment = image->nr_segments - 1;

You only seem to use kern_segment here, and in load_other_segments() called
below. Could it not be a local variable passed in? Instead of arch-specific data
we keep forever?


> +	image->segment[image->arch.kern_segment].mem += text_offset;
> +	image->segment[image->arch.kern_segment].memsz -= text_offset;
> +	image->start = image->segment[image->arch.kern_segment].mem;
> +
> +	pr_debug("Loaded kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> +				image->segment[image->arch.kern_segment].mem,
> +				kbuf.bufsz, kbuf.memsz);
> +
> +	/* Load additional data */
> +	ret = load_other_segments(image, initrd, initrd_len,
> +				cmdline, cmdline_len);
> +
> +out:
> +	return ERR_PTR(ret);
> +}
Looks good,

Thanks,

James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 02/11] kexec_file: make kexec_image_post_load_cleanup_default() global
  2018-05-01 17:46     ` James Morse
  (?)
@ 2018-05-07  4:40       ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-07  4:40 UTC (permalink / raw)
  To: James Morse
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

On Tue, May 01, 2018 at 06:46:04PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > Change this function from static to global so that arm64 can implement
> > its own arch_kimage_file_post_load_cleanup() later using
> > kexec_image_post_load_cleanup_default().
> 
> Do we need to call kexec_image_post_load_cleanup_default()? All it does is call
> the image-type fops->cleanup(), which you don't implement in this series.
> 
> Is this just-in-case someone adds cleanup() later and is surprised only the
> arch-level helper is called?

Yes, we want not to miss two possibilities:
- some common clean-up code is added to kexec_image_post_load_cleanup_default()
- some format(i.e. Image)-specific clean-up code is added to fops->cleanup()

-Takahiro AKASHI

> 
> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 02/11] kexec_file: make kexec_image_post_load_cleanup_default() global
@ 2018-05-07  4:40       ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-07  4:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 01, 2018 at 06:46:04PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > Change this function from static to global so that arm64 can implement
> > its own arch_kimage_file_post_load_cleanup() later using
> > kexec_image_post_load_cleanup_default().
> 
> Do we need to call kexec_image_post_load_cleanup_default()? All it does is call
> the image-type fops->cleanup(), which you don't implement in this series.
> 
> Is this just-in-case someone adds cleanup() later and is surprised only the
> arch-level helper is called?

Yes, we want not to miss two possibilities:
- some common clean-up code is added to kexec_image_post_load_cleanup_default()
- some format(i.e. Image)-specific clean-up code is added to fops->cleanup()

-Takahiro AKASHI

> 
> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 02/11] kexec_file: make kexec_image_post_load_cleanup_default() global
@ 2018-05-07  4:40       ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-07  4:40 UTC (permalink / raw)
  To: James Morse
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

On Tue, May 01, 2018 at 06:46:04PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > Change this function from static to global so that arm64 can implement
> > its own arch_kimage_file_post_load_cleanup() later using
> > kexec_image_post_load_cleanup_default().
> 
> Do we need to call kexec_image_post_load_cleanup_default()? All it does is call
> the image-type fops->cleanup(), which you don't implement in this series.
> 
> Is this just-in-case someone adds cleanup() later and is surprised only the
> arch-level helper is called?

Yes, we want not to miss two possibilities:
- some common clean-up code is added to kexec_image_post_load_cleanup_default()
- some format(i.e. Image)-specific clean-up code is added to fops->cleanup()

-Takahiro AKASHI

> 
> 
> Thanks,
> 
> James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 03/11] arm64: kexec_file: invoke the kernel without purgatory
  2018-05-01 17:46     ` James Morse
  (?)
@ 2018-05-07  5:22       ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-07  5:22 UTC (permalink / raw)
  To: James Morse
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

On Tue, May 01, 2018 at 06:46:06PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > On arm64, purugatory would do almosty nothing. So just invoke secondary
> > kernel directy by jumping into its entry code.
> 
> (Nits: purgatory, almost, directly)

Oops, I think I ran spell before ...

> 
> > While, in this case, cpu_soft_restart() must be called with dtb address
> > in the fifth argument, the behavior still stays compatible with kexec_load
> > case as long as the argument is null.
> 
> 
> > diff --git a/arch/arm64/kernel/cpu-reset.S b/arch/arm64/kernel/cpu-reset.S
> > index 8021b46c9743..391df91328ac 100644
> > --- a/arch/arm64/kernel/cpu-reset.S
> > +++ b/arch/arm64/kernel/cpu-reset.S
> > @@ -24,9 +24,9 @@
> >   *
> >   * @el2_switch: Flag to indicate a swich to EL2 is needed.
> 
> (Nit: switch)

ditto

> >   * @entry: Location to jump to for soft reset.
> > - * arg0: First argument passed to @entry.
> > - * arg1: Second argument passed to @entry.
> > - * arg2: Third argument passed to @entry.
> > + * arg0: First argument passed to @entry. (relocation list)
> > + * arg1: Second argument passed to @entry.(physcal kernel entry)
> 
> (Nit: physical)

ditto
> 
> > + * arg2: Third argument passed to @entry. (physical dtb address)
> >   *
> >   * Put the CPU into the same state as it would be if it had been reset, and
> >   * branch to what would be the reset vector. It must be executed with the
> > diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
> > index f76ea92dff91..f7dbba00be10 100644
> > --- a/arch/arm64/kernel/machine_kexec.c
> > +++ b/arch/arm64/kernel/machine_kexec.c
> > @@ -205,10 +205,17 @@ void machine_kexec(struct kimage *kimage)
> >  	 * uses physical addressing to relocate the new image to its final
> >  	 * position and transfers control to the image entry point when the
> >  	 * relocation is complete.
> > +	 * In case of kexec_file_load syscall, we directly start the kernel,
> > +	 * skipping purgatory.
> 
> We're not really skipping purgatory, purgatory doesn't exist! For regular kexec
> the image/payload we run is up to kexec-tools. For kexec_file_load its a
> kernel-image. Purgatory is a kexec-tools-ism.

You are right, but in general, purgatory is expected to exist by
generic kexec code and does exist on all architectures,  kexec_load()
or kexec_file_load(), except arm64's kexec_file_load case.
So it would be nice to have some explicit notes here.

> 
> >  	cpu_soft_restart(kimage != kexec_crash_image,
> > -		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
> > +		reboot_code_buffer_phys, kimage->head, kimage->start,
> > +#ifdef CONFIG_KEXEC_FILE
> > +				kimage->purgatory_info.purgatory_buf ?
> > +						0 : kimage->arch.dtb_mem);
> > +#else
> > +				0);
> > +#endif
> 
> Where does kimage->arch.dtb_mem come from? This patch won't build until patch 8
> adds the config option, which is going to make bisecting any kexec side-effects
> tricky.

CONFIG_KEXEC_FILE is also used in patch #4, #5 and #6.
I don't know how we can fix this as the implementation is divided
into several patches.
(So bisecting doesn't work anyway.)

> purgatory_buf seems to only be set in kexec_purgatory_setup_kbuf(), called from
> kexec_load_purgatory(), which we don't use. How does this get a value?
> 
> Would it be better to always use kimage->arch.dtb_mem, and ensure that is 0 for
> regular kexec (as we can't know where the dtb is)? (image_arg may then be a
> better name).

The problem is arch.dtb_mem is currently defined only if CONFIG_KEXEC_FILE.
So I would like to
- merge this patch with patch#8
- change the condition
        #ifdef CONFIG_KEXEC_FILE
       				kimage->file_mode ? kimage->arch.dtb_mem : 0);
        #else
        			0);
        #endif

Thanks,
-Takahiro AKASHI

> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 03/11] arm64: kexec_file: invoke the kernel without purgatory
@ 2018-05-07  5:22       ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-07  5:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 01, 2018 at 06:46:06PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > On arm64, purugatory would do almosty nothing. So just invoke secondary
> > kernel directy by jumping into its entry code.
> 
> (Nits: purgatory, almost, directly)

Oops, I think I ran spell before ...

> 
> > While, in this case, cpu_soft_restart() must be called with dtb address
> > in the fifth argument, the behavior still stays compatible with kexec_load
> > case as long as the argument is null.
> 
> 
> > diff --git a/arch/arm64/kernel/cpu-reset.S b/arch/arm64/kernel/cpu-reset.S
> > index 8021b46c9743..391df91328ac 100644
> > --- a/arch/arm64/kernel/cpu-reset.S
> > +++ b/arch/arm64/kernel/cpu-reset.S
> > @@ -24,9 +24,9 @@
> >   *
> >   * @el2_switch: Flag to indicate a swich to EL2 is needed.
> 
> (Nit: switch)

ditto

> >   * @entry: Location to jump to for soft reset.
> > - * arg0: First argument passed to @entry.
> > - * arg1: Second argument passed to @entry.
> > - * arg2: Third argument passed to @entry.
> > + * arg0: First argument passed to @entry. (relocation list)
> > + * arg1: Second argument passed to @entry.(physcal kernel entry)
> 
> (Nit: physical)

ditto
> 
> > + * arg2: Third argument passed to @entry. (physical dtb address)
> >   *
> >   * Put the CPU into the same state as it would be if it had been reset, and
> >   * branch to what would be the reset vector. It must be executed with the
> > diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
> > index f76ea92dff91..f7dbba00be10 100644
> > --- a/arch/arm64/kernel/machine_kexec.c
> > +++ b/arch/arm64/kernel/machine_kexec.c
> > @@ -205,10 +205,17 @@ void machine_kexec(struct kimage *kimage)
> >  	 * uses physical addressing to relocate the new image to its final
> >  	 * position and transfers control to the image entry point when the
> >  	 * relocation is complete.
> > +	 * In case of kexec_file_load syscall, we directly start the kernel,
> > +	 * skipping purgatory.
> 
> We're not really skipping purgatory, purgatory doesn't exist! For regular kexec
> the image/payload we run is up to kexec-tools. For kexec_file_load its a
> kernel-image. Purgatory is a kexec-tools-ism.

You are right, but in general, purgatory is expected to exist by
generic kexec code and does exist on all architectures,  kexec_load()
or kexec_file_load(), except arm64's kexec_file_load case.
So it would be nice to have some explicit notes here.

> 
> >  	cpu_soft_restart(kimage != kexec_crash_image,
> > -		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
> > +		reboot_code_buffer_phys, kimage->head, kimage->start,
> > +#ifdef CONFIG_KEXEC_FILE
> > +				kimage->purgatory_info.purgatory_buf ?
> > +						0 : kimage->arch.dtb_mem);
> > +#else
> > +				0);
> > +#endif
> 
> Where does kimage->arch.dtb_mem come from? This patch won't build until patch 8
> adds the config option, which is going to make bisecting any kexec side-effects
> tricky.

CONFIG_KEXEC_FILE is also used in patch #4, #5 and #6.
I don't know how we can fix this as the implementation is divided
into several patches.
(So bisecting doesn't work anyway.)

> purgatory_buf seems to only be set in kexec_purgatory_setup_kbuf(), called from
> kexec_load_purgatory(), which we don't use. How does this get a value?
> 
> Would it be better to always use kimage->arch.dtb_mem, and ensure that is 0 for
> regular kexec (as we can't know where the dtb is)? (image_arg may then be a
> better name).

The problem is arch.dtb_mem is currently defined only if CONFIG_KEXEC_FILE.
So I would like to
- merge this patch with patch#8
- change the condition
        #ifdef CONFIG_KEXEC_FILE
       				kimage->file_mode ? kimage->arch.dtb_mem : 0);
        #else
        			0);
        #endif

Thanks,
-Takahiro AKASHI

> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 03/11] arm64: kexec_file: invoke the kernel without purgatory
@ 2018-05-07  5:22       ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-07  5:22 UTC (permalink / raw)
  To: James Morse
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

On Tue, May 01, 2018 at 06:46:06PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > On arm64, purugatory would do almosty nothing. So just invoke secondary
> > kernel directy by jumping into its entry code.
> 
> (Nits: purgatory, almost, directly)

Oops, I think I ran spell before ...

> 
> > While, in this case, cpu_soft_restart() must be called with dtb address
> > in the fifth argument, the behavior still stays compatible with kexec_load
> > case as long as the argument is null.
> 
> 
> > diff --git a/arch/arm64/kernel/cpu-reset.S b/arch/arm64/kernel/cpu-reset.S
> > index 8021b46c9743..391df91328ac 100644
> > --- a/arch/arm64/kernel/cpu-reset.S
> > +++ b/arch/arm64/kernel/cpu-reset.S
> > @@ -24,9 +24,9 @@
> >   *
> >   * @el2_switch: Flag to indicate a swich to EL2 is needed.
> 
> (Nit: switch)

ditto

> >   * @entry: Location to jump to for soft reset.
> > - * arg0: First argument passed to @entry.
> > - * arg1: Second argument passed to @entry.
> > - * arg2: Third argument passed to @entry.
> > + * arg0: First argument passed to @entry. (relocation list)
> > + * arg1: Second argument passed to @entry.(physcal kernel entry)
> 
> (Nit: physical)

ditto
> 
> > + * arg2: Third argument passed to @entry. (physical dtb address)
> >   *
> >   * Put the CPU into the same state as it would be if it had been reset, and
> >   * branch to what would be the reset vector. It must be executed with the
> > diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
> > index f76ea92dff91..f7dbba00be10 100644
> > --- a/arch/arm64/kernel/machine_kexec.c
> > +++ b/arch/arm64/kernel/machine_kexec.c
> > @@ -205,10 +205,17 @@ void machine_kexec(struct kimage *kimage)
> >  	 * uses physical addressing to relocate the new image to its final
> >  	 * position and transfers control to the image entry point when the
> >  	 * relocation is complete.
> > +	 * In case of kexec_file_load syscall, we directly start the kernel,
> > +	 * skipping purgatory.
> 
> We're not really skipping purgatory, purgatory doesn't exist! For regular kexec
> the image/payload we run is up to kexec-tools. For kexec_file_load its a
> kernel-image. Purgatory is a kexec-tools-ism.

You are right, but in general, purgatory is expected to exist by
generic kexec code and does exist on all architectures,  kexec_load()
or kexec_file_load(), except arm64's kexec_file_load case.
So it would be nice to have some explicit notes here.

> 
> >  	cpu_soft_restart(kimage != kexec_crash_image,
> > -		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
> > +		reboot_code_buffer_phys, kimage->head, kimage->start,
> > +#ifdef CONFIG_KEXEC_FILE
> > +				kimage->purgatory_info.purgatory_buf ?
> > +						0 : kimage->arch.dtb_mem);
> > +#else
> > +				0);
> > +#endif
> 
> Where does kimage->arch.dtb_mem come from? This patch won't build until patch 8
> adds the config option, which is going to make bisecting any kexec side-effects
> tricky.

CONFIG_KEXEC_FILE is also used in patch #4, #5 and #6.
I don't know how we can fix this as the implementation is divided
into several patches.
(So bisecting doesn't work anyway.)

> purgatory_buf seems to only be set in kexec_purgatory_setup_kbuf(), called from
> kexec_load_purgatory(), which we don't use. How does this get a value?
> 
> Would it be better to always use kimage->arch.dtb_mem, and ensure that is 0 for
> regular kexec (as we can't know where the dtb is)? (image_arg may then be a
> better name).

The problem is arch.dtb_mem is currently defined only if CONFIG_KEXEC_FILE.
So I would like to
- merge this patch with patch#8
- change the condition
        #ifdef CONFIG_KEXEC_FILE
       				kimage->file_mode ? kimage->arch.dtb_mem : 0);
        #else
        			0);
        #endif

Thanks,
-Takahiro AKASHI

> 
> Thanks,
> 
> James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
  2018-05-01 17:46     ` James Morse
  (?)
@ 2018-05-07  5:59       ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-07  5:59 UTC (permalink / raw)
  To: James Morse
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

James,

On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > We need to prevent firmware-reserved memory regions, particularly EFI
> > memory map as well as ACPI tables, from being corrupted by loading
> > kernel/initrd (or other kexec buffers). We also want to support memory
> > allocation in top-down manner in addition to default bottom-up.
> > So let's have arm64 specific arch_kexec_walk_mem() which will search
> > for available memory ranges in usable memblock list,
> > i.e. !NOMAP & !reserved, 
> 
> > instead of system resource tree.
> 
> Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
> be safe in the EFI-memory-map/ACPI-tables case?
> 
> It would be good to avoid having two ways of doing this, and I would like to
> avoid having extra arch code...

I know what you mean.
/proc/iomem or system resource is, in my opinion, not the best place to
describe memory usage of kernel but rather to describe *physical* hardware
layout. As we are still discussing about "reserved" memory, I don't want
to depend on it.
Along with memblock list, we will have more accurate control over memory
usage.

> 
> > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > new file mode 100644
> > index 000000000000..f9ebf54ca247
> > --- /dev/null
> > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > @@ -0,0 +1,57 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * kexec_file for arm64
> > + *
> > + * Copyright (C) 2018 Linaro Limited
> > + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > + *
> 
> > + * Most code is derived from arm64 port of kexec-tools
> 
> How does kexec-tools walk memblock?

Will remove this comment from this patch.
Obviously, this comment is for the rest of the code which will be
added to succeeding patches (patch #5 and #7).


> 
> > + */
> > +
> > +#define pr_fmt(fmt) "kexec_file: " fmt
> > +
> > +#include <linux/ioport.h>
> > +#include <linux/kernel.h>
> > +#include <linux/kexec.h>
> > +#include <linux/memblock.h>
> > +
> > +int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> > +				int (*func)(struct resource *, void *))
> > +{
> > +	phys_addr_t start, end;
> > +	struct resource res;
> > +	u64 i;
> > +	int ret = 0;
> > +
> > +	if (kbuf->image->type == KEXEC_TYPE_CRASH)
> > +		return func(&crashk_res, kbuf);
> > +
> > +	if (kbuf->top_down)
> > +		for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,
> > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > +				&start, &end, NULL) {
> 
> for_each_free_mem_range_reverse() is a more readable version of this helper.

OK. I used to use my own limited list of reserved memory instead of
memblock.reserved here to exclude verbose ranges.


> > +			if (!memblock_is_map_memory(start))
> > +				continue;
> 
> Passing MEMBLOCK_NONE means this walk will never find MEMBLOCK_NOMAP memory.

Sure, I confirmed it.

> 
> > +			res.start = start;
> > +			res.end = end;
> > +			ret = func(&res, kbuf);
> > +			if (ret)
> > +				break;
> > +		}
> > +	else
> > +		for_each_mem_range(i, &memblock.memory, &memblock.reserved,
> > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > +				&start, &end, NULL) {
> 
> for_each_free_mem_range()?

OK.

> > +			if (!memblock_is_map_memory(start))
> > +				continue;
> > +
> > +			res.start = start;
> > +			res.end = end;
> > +			ret = func(&res, kbuf);
> > +			if (ret)
> > +				break;
> > +		}
> > +
> > +	return ret;
> > +}
> > 
> 
> With these changes, what we have is almost:
> arch/powerpc/kernel/machine_kexec_file_64.c::arch_kexec_walk_mem() !
> (the difference being powerpc doesn't yet support crash-kernels here)
> 
> If the argument is walking memblock gives a better answer than the stringy
> walk_system_ram_res() thing, is there any mileage in moving this code into
> kexec_file.c, and using it if !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)?
> 
> This would save arm64/powerpc having near-identical implementations.
> 32bit arm keeps memblock if it has kexec, so it may be useful there too if
> kexec_file_load() support is added.

Thanks. I've forgot ppc.

-Takahiro AKASHI


> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
@ 2018-05-07  5:59       ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-07  5:59 UTC (permalink / raw)
  To: linux-arm-kernel

James,

On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > We need to prevent firmware-reserved memory regions, particularly EFI
> > memory map as well as ACPI tables, from being corrupted by loading
> > kernel/initrd (or other kexec buffers). We also want to support memory
> > allocation in top-down manner in addition to default bottom-up.
> > So let's have arm64 specific arch_kexec_walk_mem() which will search
> > for available memory ranges in usable memblock list,
> > i.e. !NOMAP & !reserved, 
> 
> > instead of system resource tree.
> 
> Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
> be safe in the EFI-memory-map/ACPI-tables case?
> 
> It would be good to avoid having two ways of doing this, and I would like to
> avoid having extra arch code...

I know what you mean.
/proc/iomem or system resource is, in my opinion, not the best place to
describe memory usage of kernel but rather to describe *physical* hardware
layout. As we are still discussing about "reserved" memory, I don't want
to depend on it.
Along with memblock list, we will have more accurate control over memory
usage.

> 
> > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > new file mode 100644
> > index 000000000000..f9ebf54ca247
> > --- /dev/null
> > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > @@ -0,0 +1,57 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * kexec_file for arm64
> > + *
> > + * Copyright (C) 2018 Linaro Limited
> > + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > + *
> 
> > + * Most code is derived from arm64 port of kexec-tools
> 
> How does kexec-tools walk memblock?

Will remove this comment from this patch.
Obviously, this comment is for the rest of the code which will be
added to succeeding patches (patch #5 and #7).


> 
> > + */
> > +
> > +#define pr_fmt(fmt) "kexec_file: " fmt
> > +
> > +#include <linux/ioport.h>
> > +#include <linux/kernel.h>
> > +#include <linux/kexec.h>
> > +#include <linux/memblock.h>
> > +
> > +int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> > +				int (*func)(struct resource *, void *))
> > +{
> > +	phys_addr_t start, end;
> > +	struct resource res;
> > +	u64 i;
> > +	int ret = 0;
> > +
> > +	if (kbuf->image->type == KEXEC_TYPE_CRASH)
> > +		return func(&crashk_res, kbuf);
> > +
> > +	if (kbuf->top_down)
> > +		for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,
> > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > +				&start, &end, NULL) {
> 
> for_each_free_mem_range_reverse() is a more readable version of this helper.

OK. I used to use my own limited list of reserved memory instead of
memblock.reserved here to exclude verbose ranges.


> > +			if (!memblock_is_map_memory(start))
> > +				continue;
> 
> Passing MEMBLOCK_NONE means this walk will never find MEMBLOCK_NOMAP memory.

Sure, I confirmed it.

> 
> > +			res.start = start;
> > +			res.end = end;
> > +			ret = func(&res, kbuf);
> > +			if (ret)
> > +				break;
> > +		}
> > +	else
> > +		for_each_mem_range(i, &memblock.memory, &memblock.reserved,
> > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > +				&start, &end, NULL) {
> 
> for_each_free_mem_range()?

OK.

> > +			if (!memblock_is_map_memory(start))
> > +				continue;
> > +
> > +			res.start = start;
> > +			res.end = end;
> > +			ret = func(&res, kbuf);
> > +			if (ret)
> > +				break;
> > +		}
> > +
> > +	return ret;
> > +}
> > 
> 
> With these changes, what we have is almost:
> arch/powerpc/kernel/machine_kexec_file_64.c::arch_kexec_walk_mem() !
> (the difference being powerpc doesn't yet support crash-kernels here)
> 
> If the argument is walking memblock gives a better answer than the stringy
> walk_system_ram_res() thing, is there any mileage in moving this code into
> kexec_file.c, and using it if !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)?
> 
> This would save arm64/powerpc having near-identical implementations.
> 32bit arm keeps memblock if it has kexec, so it may be useful there too if
> kexec_file_load() support is added.

Thanks. I've forgot ppc.

-Takahiro AKASHI


> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
@ 2018-05-07  5:59       ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-07  5:59 UTC (permalink / raw)
  To: James Morse
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

James,

On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > We need to prevent firmware-reserved memory regions, particularly EFI
> > memory map as well as ACPI tables, from being corrupted by loading
> > kernel/initrd (or other kexec buffers). We also want to support memory
> > allocation in top-down manner in addition to default bottom-up.
> > So let's have arm64 specific arch_kexec_walk_mem() which will search
> > for available memory ranges in usable memblock list,
> > i.e. !NOMAP & !reserved, 
> 
> > instead of system resource tree.
> 
> Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
> be safe in the EFI-memory-map/ACPI-tables case?
> 
> It would be good to avoid having two ways of doing this, and I would like to
> avoid having extra arch code...

I know what you mean.
/proc/iomem or system resource is, in my opinion, not the best place to
describe memory usage of kernel but rather to describe *physical* hardware
layout. As we are still discussing about "reserved" memory, I don't want
to depend on it.
Along with memblock list, we will have more accurate control over memory
usage.

> 
> > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > new file mode 100644
> > index 000000000000..f9ebf54ca247
> > --- /dev/null
> > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > @@ -0,0 +1,57 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * kexec_file for arm64
> > + *
> > + * Copyright (C) 2018 Linaro Limited
> > + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > + *
> 
> > + * Most code is derived from arm64 port of kexec-tools
> 
> How does kexec-tools walk memblock?

Will remove this comment from this patch.
Obviously, this comment is for the rest of the code which will be
added to succeeding patches (patch #5 and #7).


> 
> > + */
> > +
> > +#define pr_fmt(fmt) "kexec_file: " fmt
> > +
> > +#include <linux/ioport.h>
> > +#include <linux/kernel.h>
> > +#include <linux/kexec.h>
> > +#include <linux/memblock.h>
> > +
> > +int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> > +				int (*func)(struct resource *, void *))
> > +{
> > +	phys_addr_t start, end;
> > +	struct resource res;
> > +	u64 i;
> > +	int ret = 0;
> > +
> > +	if (kbuf->image->type == KEXEC_TYPE_CRASH)
> > +		return func(&crashk_res, kbuf);
> > +
> > +	if (kbuf->top_down)
> > +		for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,
> > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > +				&start, &end, NULL) {
> 
> for_each_free_mem_range_reverse() is a more readable version of this helper.

OK. I used to use my own limited list of reserved memory instead of
memblock.reserved here to exclude verbose ranges.


> > +			if (!memblock_is_map_memory(start))
> > +				continue;
> 
> Passing MEMBLOCK_NONE means this walk will never find MEMBLOCK_NOMAP memory.

Sure, I confirmed it.

> 
> > +			res.start = start;
> > +			res.end = end;
> > +			ret = func(&res, kbuf);
> > +			if (ret)
> > +				break;
> > +		}
> > +	else
> > +		for_each_mem_range(i, &memblock.memory, &memblock.reserved,
> > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > +				&start, &end, NULL) {
> 
> for_each_free_mem_range()?

OK.

> > +			if (!memblock_is_map_memory(start))
> > +				continue;
> > +
> > +			res.start = start;
> > +			res.end = end;
> > +			ret = func(&res, kbuf);
> > +			if (ret)
> > +				break;
> > +		}
> > +
> > +	return ret;
> > +}
> > 
> 
> With these changes, what we have is almost:
> arch/powerpc/kernel/machine_kexec_file_64.c::arch_kexec_walk_mem() !
> (the difference being powerpc doesn't yet support crash-kernels here)
> 
> If the argument is walking memblock gives a better answer than the stringy
> walk_system_ram_res() thing, is there any mileage in moving this code into
> kexec_file.c, and using it if !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)?
> 
> This would save arm64/powerpc having near-identical implementations.
> 32bit arm keeps memblock if it has kexec, so it may be useful there too if
> kexec_file_load() support is added.

Thanks. I've forgot ppc.

-Takahiro AKASHI


> 
> Thanks,
> 
> James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 06/11] arm64: kexec_file: allow for loading Image-format kernel
  2018-05-01 17:46     ` James Morse
  (?)
@ 2018-05-07  7:21       ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-07  7:21 UTC (permalink / raw)
  To: James Morse
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

James,

On Tue, May 01, 2018 at 06:46:11PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > This patch provides kexec_file_ops for "Image"-format kernel. In this
> > implementation, a binary is always loaded with a fixed offset identified
> > in text_offset field of its header.
> 
> 
> > diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
> > index e4de1223715f..3cba4161818a 100644
> > --- a/arch/arm64/include/asm/kexec.h
> > +++ b/arch/arm64/include/asm/kexec.h
> > @@ -102,6 +102,56 @@ struct kimage_arch {
> >  	void *dtb_buf;
> >  };
> >  
> > +/**
> > + * struct arm64_image_header - arm64 kernel image header
> > + *
> > + * @pe_sig: Optional PE format 'MZ' signature
> > + * @branch_code: Instruction to branch to stext
> > + * @text_offset: Image load offset, little endian
> > + * @image_size: Effective image size, little endian
> > + * @flags:
> > + *	Bit 0: Kernel endianness. 0=little endian, 1=big endian
> 
> Page size? What about 'phys_base'?, (whatever that is...)
> Probably best to refer to Documentation/arm64/booting.txt here, its the
> authoritative source of what these fields mean.

While we don't care other bit fields for now, I will add the reference
to the Documentation file.

> 
> > + * @reserved: Reserved
> > + * @magic: Magic number, "ARM\x64"
> > + * @pe_header: Optional offset to a PE format header
> > + **/
> > +
> > +struct arm64_image_header {
> > +	u8 pe_sig[2];
> > +	u8 pad[2];
> > +	u32 branch_code;
> > +	u64 text_offset;
> > +	u64 image_size;
> > +	u64 flags;
> 
> __le64 as appropriate here would let tools like sparse catch any missing endian
> conversion bugs.

OK.

> 
> > +	u64 reserved[3];
> > +	u8 magic[4];
> > +	u32 pe_header;
> > +};
> 
> I'm surprised we don't have a definition for this already, I guess its always
> done in asm. We have kernel/image.h that holds some of this stuff, if we are
> going to validate the flags, is it worth adding the code there, (and moving it
> to include/asm)?

A comment at the beginning of this file says,
    #ifndef LINKER_SCRIPT
    #error This file should only be included in vmlinux.lds.S
    #endif
Let me think about.

> 
> > +static const u8 arm64_image_magic[4] = {'A', 'R', 'M', 0x64U};
> 
> Any chance this magic could be a pre-processor symbol shared with head.S?

OK.

> 
> > +
> > +/**
> > + * arm64_header_check_magic - Helper to check the arm64 image header.
> > + *
> > + * Returns non-zero if header is OK.
> > + */
> > +
> > +static inline int arm64_header_check_magic(const struct arm64_image_header *h)
> > +{
> > +	if (!h)
> > +		return 0;
> > +
> > +	if (!h->text_offset)
> > +		return 0;
> > +
> > +	return (h->magic[0] == arm64_image_magic[0]
> > +		&& h->magic[1] == arm64_image_magic[1]
> > +		&& h->magic[2] == arm64_image_magic[2]
> > +		&& h->magic[3] == arm64_image_magic[3]);
> 
> memcmp()? Or just define it as a 32bit value?

OK. As you know, I always tried to keep the code not diverted
from kexec-tools for maintainability reason.

> I guess you skip the MZ prefix as its not present for !EFI?

CONFIG_KEXEC_IMAGE_VERIFY_SIG depends on the fact that the file
format is PE (that is, EFI is enabled).


> Could we check branch_code is non-zero, and text-offset points within image-size?

We could do it, but I don't think this check is very useful.

> 
> We could check that this platform supports the page-size/endian config that this
> Image was built with... We get a message from the EFI stub if the page-size
> can't be supported, it would be nice to do the same here (as we can).

There is no restriction on page-size or endianness for kexec.
What will be the purpose of this check?

> (no idea if kexec-tool checks this stuff, it probably can't get at the id
> registers to know)
> 
> 
> > diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
> > new file mode 100644
> > index 000000000000..4dd524ad6611
> > --- /dev/null
> > +++ b/arch/arm64/kernel/kexec_image.c
> > @@ -0,0 +1,79 @@
> 
> > +static void *image_load(struct kimage *image,
> > +				char *kernel, unsigned long kernel_len,
> > +				char *initrd, unsigned long initrd_len,
> > +				char *cmdline, unsigned long cmdline_len)
> > +{
> > +	struct kexec_buf kbuf;
> > +	struct arm64_image_header *h = (struct arm64_image_header *)kernel;
> > +	unsigned long text_offset;
> > +	int ret;
> > +
> > +	/* Load the kernel */
> > +	kbuf.image = image;
> > +	kbuf.buf_min = 0;
> > +	kbuf.buf_max = ULONG_MAX;
> > +	kbuf.top_down = false;
> > +
> > +	kbuf.buffer = kernel;
> > +	kbuf.bufsz = kernel_len;
> > +	kbuf.memsz = le64_to_cpu(h->image_size);
> > +	text_offset = le64_to_cpu(h->text_offset);
> > +	kbuf.buf_align = SZ_2M;
> 
> > +	/* Adjust kernel segment with TEXT_OFFSET */
> > +	kbuf.memsz += text_offset;
> > +
> > +	ret = kexec_add_buffer(&kbuf);
> > +	if (ret)
> > +		goto out;
> > +
> > +	image->arch.kern_segment = image->nr_segments - 1;
> 
> You only seem to use kern_segment here, and in load_other_segments() called
> below. Could it not be a local variable passed in? Instead of arch-specific data
> we keep forever?

No, kern_segment is also used in load_other_segments() in machine_kexec_file.c.
To optimize memory hole allocation logic in locate_mem_hole_callback(),
we need to know the exact range of kernel image (start and end).

(Known drawback in this code is that Image only occupies one segment, but
once vmlinux might be supported, it would occupy two segments for text and
data.)

> 
> > +	image->segment[image->arch.kern_segment].mem += text_offset;
> > +	image->segment[image->arch.kern_segment].memsz -= text_offset;
> > +	image->start = image->segment[image->arch.kern_segment].mem;
> > +
> > +	pr_debug("Loaded kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> > +				image->segment[image->arch.kern_segment].mem,
> > +				kbuf.bufsz, kbuf.memsz);
> > +
> > +	/* Load additional data */
> > +	ret = load_other_segments(image, initrd, initrd_len,
> > +				cmdline, cmdline_len);
> > +
> > +out:
> > +	return ERR_PTR(ret);
> > +}
> Looks good,

Thank you for thorough review.

-Takahiro AKASHI


> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 06/11] arm64: kexec_file: allow for loading Image-format kernel
@ 2018-05-07  7:21       ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-07  7:21 UTC (permalink / raw)
  To: linux-arm-kernel

James,

On Tue, May 01, 2018 at 06:46:11PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > This patch provides kexec_file_ops for "Image"-format kernel. In this
> > implementation, a binary is always loaded with a fixed offset identified
> > in text_offset field of its header.
> 
> 
> > diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
> > index e4de1223715f..3cba4161818a 100644
> > --- a/arch/arm64/include/asm/kexec.h
> > +++ b/arch/arm64/include/asm/kexec.h
> > @@ -102,6 +102,56 @@ struct kimage_arch {
> >  	void *dtb_buf;
> >  };
> >  
> > +/**
> > + * struct arm64_image_header - arm64 kernel image header
> > + *
> > + * @pe_sig: Optional PE format 'MZ' signature
> > + * @branch_code: Instruction to branch to stext
> > + * @text_offset: Image load offset, little endian
> > + * @image_size: Effective image size, little endian
> > + * @flags:
> > + *	Bit 0: Kernel endianness. 0=little endian, 1=big endian
> 
> Page size? What about 'phys_base'?, (whatever that is...)
> Probably best to refer to Documentation/arm64/booting.txt here, its the
> authoritative source of what these fields mean.

While we don't care other bit fields for now, I will add the reference
to the Documentation file.

> 
> > + * @reserved: Reserved
> > + * @magic: Magic number, "ARM\x64"
> > + * @pe_header: Optional offset to a PE format header
> > + **/
> > +
> > +struct arm64_image_header {
> > +	u8 pe_sig[2];
> > +	u8 pad[2];
> > +	u32 branch_code;
> > +	u64 text_offset;
> > +	u64 image_size;
> > +	u64 flags;
> 
> __le64 as appropriate here would let tools like sparse catch any missing endian
> conversion bugs.

OK.

> 
> > +	u64 reserved[3];
> > +	u8 magic[4];
> > +	u32 pe_header;
> > +};
> 
> I'm surprised we don't have a definition for this already, I guess its always
> done in asm. We have kernel/image.h that holds some of this stuff, if we are
> going to validate the flags, is it worth adding the code there, (and moving it
> to include/asm)?

A comment at the beginning of this file says,
    #ifndef LINKER_SCRIPT
    #error This file should only be included in vmlinux.lds.S
    #endif
Let me think about.

> 
> > +static const u8 arm64_image_magic[4] = {'A', 'R', 'M', 0x64U};
> 
> Any chance this magic could be a pre-processor symbol shared with head.S?

OK.

> 
> > +
> > +/**
> > + * arm64_header_check_magic - Helper to check the arm64 image header.
> > + *
> > + * Returns non-zero if header is OK.
> > + */
> > +
> > +static inline int arm64_header_check_magic(const struct arm64_image_header *h)
> > +{
> > +	if (!h)
> > +		return 0;
> > +
> > +	if (!h->text_offset)
> > +		return 0;
> > +
> > +	return (h->magic[0] == arm64_image_magic[0]
> > +		&& h->magic[1] == arm64_image_magic[1]
> > +		&& h->magic[2] == arm64_image_magic[2]
> > +		&& h->magic[3] == arm64_image_magic[3]);
> 
> memcmp()? Or just define it as a 32bit value?

OK. As you know, I always tried to keep the code not diverted
from kexec-tools for maintainability reason.

> I guess you skip the MZ prefix as its not present for !EFI?

CONFIG_KEXEC_IMAGE_VERIFY_SIG depends on the fact that the file
format is PE (that is, EFI is enabled).


> Could we check branch_code is non-zero, and text-offset points within image-size?

We could do it, but I don't think this check is very useful.

> 
> We could check that this platform supports the page-size/endian config that this
> Image was built with... We get a message from the EFI stub if the page-size
> can't be supported, it would be nice to do the same here (as we can).

There is no restriction on page-size or endianness for kexec.
What will be the purpose of this check?

> (no idea if kexec-tool checks this stuff, it probably can't get at the id
> registers to know)
> 
> 
> > diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
> > new file mode 100644
> > index 000000000000..4dd524ad6611
> > --- /dev/null
> > +++ b/arch/arm64/kernel/kexec_image.c
> > @@ -0,0 +1,79 @@
> 
> > +static void *image_load(struct kimage *image,
> > +				char *kernel, unsigned long kernel_len,
> > +				char *initrd, unsigned long initrd_len,
> > +				char *cmdline, unsigned long cmdline_len)
> > +{
> > +	struct kexec_buf kbuf;
> > +	struct arm64_image_header *h = (struct arm64_image_header *)kernel;
> > +	unsigned long text_offset;
> > +	int ret;
> > +
> > +	/* Load the kernel */
> > +	kbuf.image = image;
> > +	kbuf.buf_min = 0;
> > +	kbuf.buf_max = ULONG_MAX;
> > +	kbuf.top_down = false;
> > +
> > +	kbuf.buffer = kernel;
> > +	kbuf.bufsz = kernel_len;
> > +	kbuf.memsz = le64_to_cpu(h->image_size);
> > +	text_offset = le64_to_cpu(h->text_offset);
> > +	kbuf.buf_align = SZ_2M;
> 
> > +	/* Adjust kernel segment with TEXT_OFFSET */
> > +	kbuf.memsz += text_offset;
> > +
> > +	ret = kexec_add_buffer(&kbuf);
> > +	if (ret)
> > +		goto out;
> > +
> > +	image->arch.kern_segment = image->nr_segments - 1;
> 
> You only seem to use kern_segment here, and in load_other_segments() called
> below. Could it not be a local variable passed in? Instead of arch-specific data
> we keep forever?

No, kern_segment is also used in load_other_segments() in machine_kexec_file.c.
To optimize memory hole allocation logic in locate_mem_hole_callback(),
we need to know the exact range of kernel image (start and end).

(Known drawback in this code is that Image only occupies one segment, but
once vmlinux might be supported, it would occupy two segments for text and
data.)

> 
> > +	image->segment[image->arch.kern_segment].mem += text_offset;
> > +	image->segment[image->arch.kern_segment].memsz -= text_offset;
> > +	image->start = image->segment[image->arch.kern_segment].mem;
> > +
> > +	pr_debug("Loaded kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> > +				image->segment[image->arch.kern_segment].mem,
> > +				kbuf.bufsz, kbuf.memsz);
> > +
> > +	/* Load additional data */
> > +	ret = load_other_segments(image, initrd, initrd_len,
> > +				cmdline, cmdline_len);
> > +
> > +out:
> > +	return ERR_PTR(ret);
> > +}
> Looks good,

Thank you for thorough review.

-Takahiro AKASHI


> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 06/11] arm64: kexec_file: allow for loading Image-format kernel
@ 2018-05-07  7:21       ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-07  7:21 UTC (permalink / raw)
  To: James Morse
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

James,

On Tue, May 01, 2018 at 06:46:11PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > This patch provides kexec_file_ops for "Image"-format kernel. In this
> > implementation, a binary is always loaded with a fixed offset identified
> > in text_offset field of its header.
> 
> 
> > diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
> > index e4de1223715f..3cba4161818a 100644
> > --- a/arch/arm64/include/asm/kexec.h
> > +++ b/arch/arm64/include/asm/kexec.h
> > @@ -102,6 +102,56 @@ struct kimage_arch {
> >  	void *dtb_buf;
> >  };
> >  
> > +/**
> > + * struct arm64_image_header - arm64 kernel image header
> > + *
> > + * @pe_sig: Optional PE format 'MZ' signature
> > + * @branch_code: Instruction to branch to stext
> > + * @text_offset: Image load offset, little endian
> > + * @image_size: Effective image size, little endian
> > + * @flags:
> > + *	Bit 0: Kernel endianness. 0=little endian, 1=big endian
> 
> Page size? What about 'phys_base'?, (whatever that is...)
> Probably best to refer to Documentation/arm64/booting.txt here, its the
> authoritative source of what these fields mean.

While we don't care other bit fields for now, I will add the reference
to the Documentation file.

> 
> > + * @reserved: Reserved
> > + * @magic: Magic number, "ARM\x64"
> > + * @pe_header: Optional offset to a PE format header
> > + **/
> > +
> > +struct arm64_image_header {
> > +	u8 pe_sig[2];
> > +	u8 pad[2];
> > +	u32 branch_code;
> > +	u64 text_offset;
> > +	u64 image_size;
> > +	u64 flags;
> 
> __le64 as appropriate here would let tools like sparse catch any missing endian
> conversion bugs.

OK.

> 
> > +	u64 reserved[3];
> > +	u8 magic[4];
> > +	u32 pe_header;
> > +};
> 
> I'm surprised we don't have a definition for this already, I guess its always
> done in asm. We have kernel/image.h that holds some of this stuff, if we are
> going to validate the flags, is it worth adding the code there, (and moving it
> to include/asm)?

A comment at the beginning of this file says,
    #ifndef LINKER_SCRIPT
    #error This file should only be included in vmlinux.lds.S
    #endif
Let me think about.

> 
> > +static const u8 arm64_image_magic[4] = {'A', 'R', 'M', 0x64U};
> 
> Any chance this magic could be a pre-processor symbol shared with head.S?

OK.

> 
> > +
> > +/**
> > + * arm64_header_check_magic - Helper to check the arm64 image header.
> > + *
> > + * Returns non-zero if header is OK.
> > + */
> > +
> > +static inline int arm64_header_check_magic(const struct arm64_image_header *h)
> > +{
> > +	if (!h)
> > +		return 0;
> > +
> > +	if (!h->text_offset)
> > +		return 0;
> > +
> > +	return (h->magic[0] == arm64_image_magic[0]
> > +		&& h->magic[1] == arm64_image_magic[1]
> > +		&& h->magic[2] == arm64_image_magic[2]
> > +		&& h->magic[3] == arm64_image_magic[3]);
> 
> memcmp()? Or just define it as a 32bit value?

OK. As you know, I always tried to keep the code not diverted
from kexec-tools for maintainability reason.

> I guess you skip the MZ prefix as its not present for !EFI?

CONFIG_KEXEC_IMAGE_VERIFY_SIG depends on the fact that the file
format is PE (that is, EFI is enabled).


> Could we check branch_code is non-zero, and text-offset points within image-size?

We could do it, but I don't think this check is very useful.

> 
> We could check that this platform supports the page-size/endian config that this
> Image was built with... We get a message from the EFI stub if the page-size
> can't be supported, it would be nice to do the same here (as we can).

There is no restriction on page-size or endianness for kexec.
What will be the purpose of this check?

> (no idea if kexec-tool checks this stuff, it probably can't get at the id
> registers to know)
> 
> 
> > diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
> > new file mode 100644
> > index 000000000000..4dd524ad6611
> > --- /dev/null
> > +++ b/arch/arm64/kernel/kexec_image.c
> > @@ -0,0 +1,79 @@
> 
> > +static void *image_load(struct kimage *image,
> > +				char *kernel, unsigned long kernel_len,
> > +				char *initrd, unsigned long initrd_len,
> > +				char *cmdline, unsigned long cmdline_len)
> > +{
> > +	struct kexec_buf kbuf;
> > +	struct arm64_image_header *h = (struct arm64_image_header *)kernel;
> > +	unsigned long text_offset;
> > +	int ret;
> > +
> > +	/* Load the kernel */
> > +	kbuf.image = image;
> > +	kbuf.buf_min = 0;
> > +	kbuf.buf_max = ULONG_MAX;
> > +	kbuf.top_down = false;
> > +
> > +	kbuf.buffer = kernel;
> > +	kbuf.bufsz = kernel_len;
> > +	kbuf.memsz = le64_to_cpu(h->image_size);
> > +	text_offset = le64_to_cpu(h->text_offset);
> > +	kbuf.buf_align = SZ_2M;
> 
> > +	/* Adjust kernel segment with TEXT_OFFSET */
> > +	kbuf.memsz += text_offset;
> > +
> > +	ret = kexec_add_buffer(&kbuf);
> > +	if (ret)
> > +		goto out;
> > +
> > +	image->arch.kern_segment = image->nr_segments - 1;
> 
> You only seem to use kern_segment here, and in load_other_segments() called
> below. Could it not be a local variable passed in? Instead of arch-specific data
> we keep forever?

No, kern_segment is also used in load_other_segments() in machine_kexec_file.c.
To optimize memory hole allocation logic in locate_mem_hole_callback(),
we need to know the exact range of kernel image (start and end).

(Known drawback in this code is that Image only occupies one segment, but
once vmlinux might be supported, it would occupy two segments for text and
data.)

> 
> > +	image->segment[image->arch.kern_segment].mem += text_offset;
> > +	image->segment[image->arch.kern_segment].memsz -= text_offset;
> > +	image->start = image->segment[image->arch.kern_segment].mem;
> > +
> > +	pr_debug("Loaded kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> > +				image->segment[image->arch.kern_segment].mem,
> > +				kbuf.bufsz, kbuf.memsz);
> > +
> > +	/* Load additional data */
> > +	ret = load_other_segments(image, initrd, initrd_len,
> > +				cmdline, cmdline_len);
> > +
> > +out:
> > +	return ERR_PTR(ret);
> > +}
> Looks good,

Thank you for thorough review.

-Takahiro AKASHI


> Thanks,
> 
> James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 03/11] arm64: kexec_file: invoke the kernel without purgatory
  2018-05-07  5:22       ` AKASHI Takahiro
  (?)
@ 2018-05-11 17:03         ` James Morse
  -1 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-11 17:03 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

Hi Akashi,

On 07/05/18 06:22, AKASHI Takahiro wrote:
> On Tue, May 01, 2018 at 06:46:06PM +0100, James Morse wrote:
>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>> diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
>>> index f76ea92dff91..f7dbba00be10 100644
>>> --- a/arch/arm64/kernel/machine_kexec.c
>>> +++ b/arch/arm64/kernel/machine_kexec.c
>>> @@ -205,10 +205,17 @@ void machine_kexec(struct kimage *kimage)

>>>  	cpu_soft_restart(kimage != kexec_crash_image,
>>> -		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
>>> +		reboot_code_buffer_phys, kimage->head, kimage->start,
>>> +#ifdef CONFIG_KEXEC_FILE
>>> +				kimage->purgatory_info.purgatory_buf ?
>>> +						0 : kimage->arch.dtb_mem);
>>> +#else
>>> +				0);
>>> +#endif


>> purgatory_buf seems to only be set in kexec_purgatory_setup_kbuf(), called from
>> kexec_load_purgatory(), which we don't use. How does this get a value?
>>
>> Would it be better to always use kimage->arch.dtb_mem, and ensure that is 0 for
>> regular kexec (as we can't know where the dtb is)? (image_arg may then be a
>> better name).
> 
> The problem is arch.dtb_mem is currently defined only if CONFIG_KEXEC_FILE.

I thought it was ARCH_HAS_KIMAGE_ARCH, which we can define all the time if
that's what we want.


> So I would like to
> - merge this patch with patch#8
> - change the condition
>         #ifdef CONFIG_KEXEC_FILE
>        				kimage->file_mode ? kimage->arch.dtb_mem : 0);
>         #else
>         			0);
>         #endif

If we can avoid even this #ifdef by always having kimage->arch, I'd prefer that.
If we do that 'dtb_mem' would need some thing that indicates its for kexec_file,
as kexec has a DTB too, we just don't know where it is...


Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 03/11] arm64: kexec_file: invoke the kernel without purgatory
@ 2018-05-11 17:03         ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-11 17:03 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Akashi,

On 07/05/18 06:22, AKASHI Takahiro wrote:
> On Tue, May 01, 2018 at 06:46:06PM +0100, James Morse wrote:
>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>> diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
>>> index f76ea92dff91..f7dbba00be10 100644
>>> --- a/arch/arm64/kernel/machine_kexec.c
>>> +++ b/arch/arm64/kernel/machine_kexec.c
>>> @@ -205,10 +205,17 @@ void machine_kexec(struct kimage *kimage)

>>>  	cpu_soft_restart(kimage != kexec_crash_image,
>>> -		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
>>> +		reboot_code_buffer_phys, kimage->head, kimage->start,
>>> +#ifdef CONFIG_KEXEC_FILE
>>> +				kimage->purgatory_info.purgatory_buf ?
>>> +						0 : kimage->arch.dtb_mem);
>>> +#else
>>> +				0);
>>> +#endif


>> purgatory_buf seems to only be set in kexec_purgatory_setup_kbuf(), called from
>> kexec_load_purgatory(), which we don't use. How does this get a value?
>>
>> Would it be better to always use kimage->arch.dtb_mem, and ensure that is 0 for
>> regular kexec (as we can't know where the dtb is)? (image_arg may then be a
>> better name).
> 
> The problem is arch.dtb_mem is currently defined only if CONFIG_KEXEC_FILE.

I thought it was ARCH_HAS_KIMAGE_ARCH, which we can define all the time if
that's what we want.


> So I would like to
> - merge this patch with patch#8
> - change the condition
>         #ifdef CONFIG_KEXEC_FILE
>        				kimage->file_mode ? kimage->arch.dtb_mem : 0);
>         #else
>         			0);
>         #endif

If we can avoid even this #ifdef by always having kimage->arch, I'd prefer that.
If we do that 'dtb_mem' would need some thing that indicates its for kexec_file,
as kexec has a DTB too, we just don't know where it is...


Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 03/11] arm64: kexec_file: invoke the kernel without purgatory
@ 2018-05-11 17:03         ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-11 17:03 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

Hi Akashi,

On 07/05/18 06:22, AKASHI Takahiro wrote:
> On Tue, May 01, 2018 at 06:46:06PM +0100, James Morse wrote:
>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>> diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
>>> index f76ea92dff91..f7dbba00be10 100644
>>> --- a/arch/arm64/kernel/machine_kexec.c
>>> +++ b/arch/arm64/kernel/machine_kexec.c
>>> @@ -205,10 +205,17 @@ void machine_kexec(struct kimage *kimage)

>>>  	cpu_soft_restart(kimage != kexec_crash_image,
>>> -		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
>>> +		reboot_code_buffer_phys, kimage->head, kimage->start,
>>> +#ifdef CONFIG_KEXEC_FILE
>>> +				kimage->purgatory_info.purgatory_buf ?
>>> +						0 : kimage->arch.dtb_mem);
>>> +#else
>>> +				0);
>>> +#endif


>> purgatory_buf seems to only be set in kexec_purgatory_setup_kbuf(), called from
>> kexec_load_purgatory(), which we don't use. How does this get a value?
>>
>> Would it be better to always use kimage->arch.dtb_mem, and ensure that is 0 for
>> regular kexec (as we can't know where the dtb is)? (image_arg may then be a
>> better name).
> 
> The problem is arch.dtb_mem is currently defined only if CONFIG_KEXEC_FILE.

I thought it was ARCH_HAS_KIMAGE_ARCH, which we can define all the time if
that's what we want.


> So I would like to
> - merge this patch with patch#8
> - change the condition
>         #ifdef CONFIG_KEXEC_FILE
>        				kimage->file_mode ? kimage->arch.dtb_mem : 0);
>         #else
>         			0);
>         #endif

If we can avoid even this #ifdef by always having kimage->arch, I'd prefer that.
If we do that 'dtb_mem' would need some thing that indicates its for kexec_file,
as kexec has a DTB too, we just don't know where it is...


Thanks,

James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 06/11] arm64: kexec_file: allow for loading Image-format kernel
  2018-05-07  7:21       ` AKASHI Takahiro
  (?)
@ 2018-05-11 17:07         ` James Morse
  -1 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-11 17:07 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

Hi Akashi,

On 07/05/18 08:21, AKASHI Takahiro wrote:
> On Tue, May 01, 2018 at 06:46:11PM +0100, James Morse wrote:
>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>> This patch provides kexec_file_ops for "Image"-format kernel. In this
>>> implementation, a binary is always loaded with a fixed offset identified
>>> in text_offset field of its header.

>>> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
>>> index e4de1223715f..3cba4161818a 100644
>>> --- a/arch/arm64/include/asm/kexec.h
>>> +++ b/arch/arm64/include/asm/kexec.h
>>> @@ -102,6 +102,56 @@ struct kimage_arch {
>>>  	void *dtb_buf;
>>>  };
>>>  
>>> +/**
>>> + * struct arm64_image_header - arm64 kernel image header
>>> + *
>>> + * @pe_sig: Optional PE format 'MZ' signature
>>> + * @branch_code: Instruction to branch to stext
>>> + * @text_offset: Image load offset, little endian
>>> + * @image_size: Effective image size, little endian
>>> + * @flags:
>>> + *	Bit 0: Kernel endianness. 0=little endian, 1=big endian
>>
>> Page size? What about 'phys_base'?, (whatever that is...)
>> Probably best to refer to Documentation/arm64/booting.txt here, its the
>> authoritative source of what these fields mean.
> 
> While we don't care other bit fields for now, I will add the reference
> to the Documentation file.

Thanks, I don't want to create a second, incomplete set of documentation!


>>> +	u64 reserved[3];
>>> +	u8 magic[4];
>>> +	u32 pe_header;
>>> +};
>>
>> I'm surprised we don't have a definition for this already, I guess its always
>> done in asm. We have kernel/image.h that holds some of this stuff, if we are
>> going to validate the flags, is it worth adding the code there, (and moving it
>> to include/asm)?
> 
> A comment at the beginning of this file says,
>     #ifndef LINKER_SCRIPT
>     #error This file should only be included in vmlinux.lds.S
>     #endif
> Let me think about.

Ah, I missed that.

Having two definitions of something makes me nervous that they can become
different... looks like that header belongs to the linker, and shouldn't be used
here then.


>> I guess you skip the MZ prefix as its not present for !EFI?
> 
> CONFIG_KEXEC_IMAGE_VERIFY_SIG depends on the fact that the file
> format is PE (that is, EFI is enabled).

So if the signature checking is enabled, its already been checked.


>> Could we check branch_code is non-zero, and text-offset points within image-size?
> 
> We could do it, but I don't think this check is very useful.
> 
>>
>> We could check that this platform supports the page-size/endian config that this
>> Image was built with... We get a message from the EFI stub if the page-size
>> can't be supported, it would be nice to do the same here (as we can).
> 
> There is no restriction on page-size or endianness for kexec.

No, but it won't boot if the hardware doesn't support it. The kernel will spin
at a magic address that is, difficult, to debug without JTAG. The bug report
will be "it didn't boot".


> What will be the purpose of this check?

These values are in the header so that the bootloader can check them, then print
a meaningful error. Here, kexec_file_load() is playing the part of the bootloader.

I'm assuming kexec_file_load() can only be used to kexec linux... unlike regular
kexec. Is this where I'm going wrong?


>>> diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
>>> new file mode 100644
>>> index 000000000000..4dd524ad6611
>>> --- /dev/null
>>> +++ b/arch/arm64/kernel/kexec_image.c
>>> @@ -0,0 +1,79 @@
>>
>>> +static void *image_load(struct kimage *image,
>>> +				char *kernel, unsigned long kernel_len,
>>> +				char *initrd, unsigned long initrd_len,
>>> +				char *cmdline, unsigned long cmdline_len)
>>> +{
>>> +	struct kexec_buf kbuf;
>>> +	struct arm64_image_header *h = (struct arm64_image_header *)kernel;
>>> +	unsigned long text_offset;
>>> +	int ret;
>>> +
>>> +	/* Load the kernel */
>>> +	kbuf.image = image;
>>> +	kbuf.buf_min = 0;
>>> +	kbuf.buf_max = ULONG_MAX;
>>> +	kbuf.top_down = false;
>>> +
>>> +	kbuf.buffer = kernel;
>>> +	kbuf.bufsz = kernel_len;
>>> +	kbuf.memsz = le64_to_cpu(h->image_size);
>>> +	text_offset = le64_to_cpu(h->text_offset);
>>> +	kbuf.buf_align = SZ_2M;
>>
>>> +	/* Adjust kernel segment with TEXT_OFFSET */
>>> +	kbuf.memsz += text_offset;
>>> +
>>> +	ret = kexec_add_buffer(&kbuf);
>>> +	if (ret)
>>> +		goto out;
>>> +
>>> +	image->arch.kern_segment = image->nr_segments - 1;
>>
>> You only seem to use kern_segment here, and in load_other_segments() called
>> below. Could it not be a local variable passed in? Instead of arch-specific data
>> we keep forever?
> 
> No, kern_segment is also used in load_other_segments() in machine_kexec_file.c.
> To optimize memory hole allocation logic in locate_mem_hole_callback(),
> we need to know the exact range of kernel image (start and end).

That's the second user. My badly-made point is one calls the other, but passes
the data via some until-kexec lifetime struct. (its not important, just an
indicator this worked differently in the past and hasn't been cleaned up).
I meant something like [0].


Thanks,

James


[0] a diff is worth a thousand words:
--------------------%<--------------------
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_
kexec_file.c
index 762f9102899c..c50ce844f09e 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -325,11 +325,10 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
        return ret;
 }

-int load_other_segments(struct kimage *image,
+int load_other_segments(struct kimage *image, struct kexec_segment *kern_seg,
                        char *initrd, unsigned long initrd_len,
                        char *cmdline, unsigned long cmdline_len)
 {
-       struct kexec_segment *kern_seg;
        struct kexec_buf kbuf;
        void *hdrs_addr;
        unsigned long hdrs_sz;
@@ -368,7 +367,6 @@ int load_other_segments(struct kimage *image,
                                 image->arch.elf_load_addr, hdrs_sz, hdrs_sz);
        }

-       kern_seg = &image->segment[image->arch.kern_segment];
        kbuf.image = image;
        /* not allocate anything below the kernel */
        kbuf.buf_min = kern_seg->mem + kern_seg->memsz;
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 891f2484969d..085cb69293ca 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -173,8 +172,10 @@ static inline int arm64_header_check_pe_sig(const struct ar
m64_image_header *h)
 extern const struct kexec_file_ops kexec_image_ops;

 struct kimage;
+struct kexec_segment;

 extern int load_other_segments(struct kimage *image,
+               struct kexec_segment *kern_seg,
                char *initrd, unsigned long initrd_len,
                char *cmdline, unsigned long cmdline_len);
 #endif
diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
index 7c11beefe65f..0e032d30a79c 100644
--- a/arch/arm64/kernel/kexec_image.c
+++ b/arch/arm64/kernel/kexec_image.c
@@ -37,6 +37,7 @@ static void *image_load(struct kimage *image,
                                char *cmdline, unsigned long cmdline_len)
 {
        struct kexec_buf kbuf;
+       struct kexec_segment *kern_seg;
        struct arm64_image_header *h = (struct arm64_image_header *)kernel;
        unsigned long text_offset;
        int ret;
@@ -65,17 +66,17 @@ static void *image_load(struct kimage *image,
        if (ret)
                goto out;

-       image->arch.kern_segment = image->nr_segments - 1;
-       image->segment[image->arch.kern_segment].mem += text_offset;
-       image->segment[image->arch.kern_segment].memsz -= text_offset;
-       image->start = image->segment[image->arch.kern_segment].mem;
+       kern_seg = &image->segment[image->nr_segments - 1];
+       kern_seg->mem += text_offset;
+       kern_seg->memsz -= text_offset;
+       image->start = kern_seg->mem;

        pr_debug("Loaded kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
-                               image->segment[image->arch.kern_segment].mem,
+                               kern_seg->mem,
                                kbuf.bufsz, kbuf.memsz);

        /* Load additional data */
-       ret = load_other_segments(image, initrd, initrd_len,
+       ret = load_other_segments(image, kern_seg, initrd, initrd_len,
                                cmdline, cmdline_len);

 out:
--------------------%<--------------------

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* [PATCH v9 06/11] arm64: kexec_file: allow for loading Image-format kernel
@ 2018-05-11 17:07         ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-11 17:07 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Akashi,

On 07/05/18 08:21, AKASHI Takahiro wrote:
> On Tue, May 01, 2018 at 06:46:11PM +0100, James Morse wrote:
>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>> This patch provides kexec_file_ops for "Image"-format kernel. In this
>>> implementation, a binary is always loaded with a fixed offset identified
>>> in text_offset field of its header.

>>> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
>>> index e4de1223715f..3cba4161818a 100644
>>> --- a/arch/arm64/include/asm/kexec.h
>>> +++ b/arch/arm64/include/asm/kexec.h
>>> @@ -102,6 +102,56 @@ struct kimage_arch {
>>>  	void *dtb_buf;
>>>  };
>>>  
>>> +/**
>>> + * struct arm64_image_header - arm64 kernel image header
>>> + *
>>> + * @pe_sig: Optional PE format 'MZ' signature
>>> + * @branch_code: Instruction to branch to stext
>>> + * @text_offset: Image load offset, little endian
>>> + * @image_size: Effective image size, little endian
>>> + * @flags:
>>> + *	Bit 0: Kernel endianness. 0=little endian, 1=big endian
>>
>> Page size? What about 'phys_base'?, (whatever that is...)
>> Probably best to refer to Documentation/arm64/booting.txt here, its the
>> authoritative source of what these fields mean.
> 
> While we don't care other bit fields for now, I will add the reference
> to the Documentation file.

Thanks, I don't want to create a second, incomplete set of documentation!


>>> +	u64 reserved[3];
>>> +	u8 magic[4];
>>> +	u32 pe_header;
>>> +};
>>
>> I'm surprised we don't have a definition for this already, I guess its always
>> done in asm. We have kernel/image.h that holds some of this stuff, if we are
>> going to validate the flags, is it worth adding the code there, (and moving it
>> to include/asm)?
> 
> A comment at the beginning of this file says,
>     #ifndef LINKER_SCRIPT
>     #error This file should only be included in vmlinux.lds.S
>     #endif
> Let me think about.

Ah, I missed that.

Having two definitions of something makes me nervous that they can become
different... looks like that header belongs to the linker, and shouldn't be used
here then.


>> I guess you skip the MZ prefix as its not present for !EFI?
> 
> CONFIG_KEXEC_IMAGE_VERIFY_SIG depends on the fact that the file
> format is PE (that is, EFI is enabled).

So if the signature checking is enabled, its already been checked.


>> Could we check branch_code is non-zero, and text-offset points within image-size?
> 
> We could do it, but I don't think this check is very useful.
> 
>>
>> We could check that this platform supports the page-size/endian config that this
>> Image was built with... We get a message from the EFI stub if the page-size
>> can't be supported, it would be nice to do the same here (as we can).
> 
> There is no restriction on page-size or endianness for kexec.

No, but it won't boot if the hardware doesn't support it. The kernel will spin
at a magic address that is, difficult, to debug without JTAG. The bug report
will be "it didn't boot".


> What will be the purpose of this check?

These values are in the header so that the bootloader can check them, then print
a meaningful error. Here, kexec_file_load() is playing the part of the bootloader.

I'm assuming kexec_file_load() can only be used to kexec linux... unlike regular
kexec. Is this where I'm going wrong?


>>> diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
>>> new file mode 100644
>>> index 000000000000..4dd524ad6611
>>> --- /dev/null
>>> +++ b/arch/arm64/kernel/kexec_image.c
>>> @@ -0,0 +1,79 @@
>>
>>> +static void *image_load(struct kimage *image,
>>> +				char *kernel, unsigned long kernel_len,
>>> +				char *initrd, unsigned long initrd_len,
>>> +				char *cmdline, unsigned long cmdline_len)
>>> +{
>>> +	struct kexec_buf kbuf;
>>> +	struct arm64_image_header *h = (struct arm64_image_header *)kernel;
>>> +	unsigned long text_offset;
>>> +	int ret;
>>> +
>>> +	/* Load the kernel */
>>> +	kbuf.image = image;
>>> +	kbuf.buf_min = 0;
>>> +	kbuf.buf_max = ULONG_MAX;
>>> +	kbuf.top_down = false;
>>> +
>>> +	kbuf.buffer = kernel;
>>> +	kbuf.bufsz = kernel_len;
>>> +	kbuf.memsz = le64_to_cpu(h->image_size);
>>> +	text_offset = le64_to_cpu(h->text_offset);
>>> +	kbuf.buf_align = SZ_2M;
>>
>>> +	/* Adjust kernel segment with TEXT_OFFSET */
>>> +	kbuf.memsz += text_offset;
>>> +
>>> +	ret = kexec_add_buffer(&kbuf);
>>> +	if (ret)
>>> +		goto out;
>>> +
>>> +	image->arch.kern_segment = image->nr_segments - 1;
>>
>> You only seem to use kern_segment here, and in load_other_segments() called
>> below. Could it not be a local variable passed in? Instead of arch-specific data
>> we keep forever?
> 
> No, kern_segment is also used in load_other_segments() in machine_kexec_file.c.
> To optimize memory hole allocation logic in locate_mem_hole_callback(),
> we need to know the exact range of kernel image (start and end).

That's the second user. My badly-made point is one calls the other, but passes
the data via some until-kexec lifetime struct. (its not important, just an
indicator this worked differently in the past and hasn't been cleaned up).
I meant something like [0].


Thanks,

James


[0] a diff is worth a thousand words:
--------------------%<--------------------
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_
kexec_file.c
index 762f9102899c..c50ce844f09e 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -325,11 +325,10 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
        return ret;
 }

-int load_other_segments(struct kimage *image,
+int load_other_segments(struct kimage *image, struct kexec_segment *kern_seg,
                        char *initrd, unsigned long initrd_len,
                        char *cmdline, unsigned long cmdline_len)
 {
-       struct kexec_segment *kern_seg;
        struct kexec_buf kbuf;
        void *hdrs_addr;
        unsigned long hdrs_sz;
@@ -368,7 +367,6 @@ int load_other_segments(struct kimage *image,
                                 image->arch.elf_load_addr, hdrs_sz, hdrs_sz);
        }

-       kern_seg = &image->segment[image->arch.kern_segment];
        kbuf.image = image;
        /* not allocate anything below the kernel */
        kbuf.buf_min = kern_seg->mem + kern_seg->memsz;
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 891f2484969d..085cb69293ca 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -173,8 +172,10 @@ static inline int arm64_header_check_pe_sig(const struct ar
m64_image_header *h)
 extern const struct kexec_file_ops kexec_image_ops;

 struct kimage;
+struct kexec_segment;

 extern int load_other_segments(struct kimage *image,
+               struct kexec_segment *kern_seg,
                char *initrd, unsigned long initrd_len,
                char *cmdline, unsigned long cmdline_len);
 #endif
diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
index 7c11beefe65f..0e032d30a79c 100644
--- a/arch/arm64/kernel/kexec_image.c
+++ b/arch/arm64/kernel/kexec_image.c
@@ -37,6 +37,7 @@ static void *image_load(struct kimage *image,
                                char *cmdline, unsigned long cmdline_len)
 {
        struct kexec_buf kbuf;
+       struct kexec_segment *kern_seg;
        struct arm64_image_header *h = (struct arm64_image_header *)kernel;
        unsigned long text_offset;
        int ret;
@@ -65,17 +66,17 @@ static void *image_load(struct kimage *image,
        if (ret)
                goto out;

-       image->arch.kern_segment = image->nr_segments - 1;
-       image->segment[image->arch.kern_segment].mem += text_offset;
-       image->segment[image->arch.kern_segment].memsz -= text_offset;
-       image->start = image->segment[image->arch.kern_segment].mem;
+       kern_seg = &image->segment[image->nr_segments - 1];
+       kern_seg->mem += text_offset;
+       kern_seg->memsz -= text_offset;
+       image->start = kern_seg->mem;

        pr_debug("Loaded kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
-                               image->segment[image->arch.kern_segment].mem,
+                               kern_seg->mem,
                                kbuf.bufsz, kbuf.memsz);

        /* Load additional data */
-       ret = load_other_segments(image, initrd, initrd_len,
+       ret = load_other_segments(image, kern_seg, initrd, initrd_len,
                                cmdline, cmdline_len);

 out:
--------------------%<--------------------

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 06/11] arm64: kexec_file: allow for loading Image-format kernel
@ 2018-05-11 17:07         ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-11 17:07 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

Hi Akashi,

On 07/05/18 08:21, AKASHI Takahiro wrote:
> On Tue, May 01, 2018 at 06:46:11PM +0100, James Morse wrote:
>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>> This patch provides kexec_file_ops for "Image"-format kernel. In this
>>> implementation, a binary is always loaded with a fixed offset identified
>>> in text_offset field of its header.

>>> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
>>> index e4de1223715f..3cba4161818a 100644
>>> --- a/arch/arm64/include/asm/kexec.h
>>> +++ b/arch/arm64/include/asm/kexec.h
>>> @@ -102,6 +102,56 @@ struct kimage_arch {
>>>  	void *dtb_buf;
>>>  };
>>>  
>>> +/**
>>> + * struct arm64_image_header - arm64 kernel image header
>>> + *
>>> + * @pe_sig: Optional PE format 'MZ' signature
>>> + * @branch_code: Instruction to branch to stext
>>> + * @text_offset: Image load offset, little endian
>>> + * @image_size: Effective image size, little endian
>>> + * @flags:
>>> + *	Bit 0: Kernel endianness. 0=little endian, 1=big endian
>>
>> Page size? What about 'phys_base'?, (whatever that is...)
>> Probably best to refer to Documentation/arm64/booting.txt here, its the
>> authoritative source of what these fields mean.
> 
> While we don't care other bit fields for now, I will add the reference
> to the Documentation file.

Thanks, I don't want to create a second, incomplete set of documentation!


>>> +	u64 reserved[3];
>>> +	u8 magic[4];
>>> +	u32 pe_header;
>>> +};
>>
>> I'm surprised we don't have a definition for this already, I guess its always
>> done in asm. We have kernel/image.h that holds some of this stuff, if we are
>> going to validate the flags, is it worth adding the code there, (and moving it
>> to include/asm)?
> 
> A comment at the beginning of this file says,
>     #ifndef LINKER_SCRIPT
>     #error This file should only be included in vmlinux.lds.S
>     #endif
> Let me think about.

Ah, I missed that.

Having two definitions of something makes me nervous that they can become
different... looks like that header belongs to the linker, and shouldn't be used
here then.


>> I guess you skip the MZ prefix as its not present for !EFI?
> 
> CONFIG_KEXEC_IMAGE_VERIFY_SIG depends on the fact that the file
> format is PE (that is, EFI is enabled).

So if the signature checking is enabled, its already been checked.


>> Could we check branch_code is non-zero, and text-offset points within image-size?
> 
> We could do it, but I don't think this check is very useful.
> 
>>
>> We could check that this platform supports the page-size/endian config that this
>> Image was built with... We get a message from the EFI stub if the page-size
>> can't be supported, it would be nice to do the same here (as we can).
> 
> There is no restriction on page-size or endianness for kexec.

No, but it won't boot if the hardware doesn't support it. The kernel will spin
at a magic address that is, difficult, to debug without JTAG. The bug report
will be "it didn't boot".


> What will be the purpose of this check?

These values are in the header so that the bootloader can check them, then print
a meaningful error. Here, kexec_file_load() is playing the part of the bootloader.

I'm assuming kexec_file_load() can only be used to kexec linux... unlike regular
kexec. Is this where I'm going wrong?


>>> diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
>>> new file mode 100644
>>> index 000000000000..4dd524ad6611
>>> --- /dev/null
>>> +++ b/arch/arm64/kernel/kexec_image.c
>>> @@ -0,0 +1,79 @@
>>
>>> +static void *image_load(struct kimage *image,
>>> +				char *kernel, unsigned long kernel_len,
>>> +				char *initrd, unsigned long initrd_len,
>>> +				char *cmdline, unsigned long cmdline_len)
>>> +{
>>> +	struct kexec_buf kbuf;
>>> +	struct arm64_image_header *h = (struct arm64_image_header *)kernel;
>>> +	unsigned long text_offset;
>>> +	int ret;
>>> +
>>> +	/* Load the kernel */
>>> +	kbuf.image = image;
>>> +	kbuf.buf_min = 0;
>>> +	kbuf.buf_max = ULONG_MAX;
>>> +	kbuf.top_down = false;
>>> +
>>> +	kbuf.buffer = kernel;
>>> +	kbuf.bufsz = kernel_len;
>>> +	kbuf.memsz = le64_to_cpu(h->image_size);
>>> +	text_offset = le64_to_cpu(h->text_offset);
>>> +	kbuf.buf_align = SZ_2M;
>>
>>> +	/* Adjust kernel segment with TEXT_OFFSET */
>>> +	kbuf.memsz += text_offset;
>>> +
>>> +	ret = kexec_add_buffer(&kbuf);
>>> +	if (ret)
>>> +		goto out;
>>> +
>>> +	image->arch.kern_segment = image->nr_segments - 1;
>>
>> You only seem to use kern_segment here, and in load_other_segments() called
>> below. Could it not be a local variable passed in? Instead of arch-specific data
>> we keep forever?
> 
> No, kern_segment is also used in load_other_segments() in machine_kexec_file.c.
> To optimize memory hole allocation logic in locate_mem_hole_callback(),
> we need to know the exact range of kernel image (start and end).

That's the second user. My badly-made point is one calls the other, but passes
the data via some until-kexec lifetime struct. (its not important, just an
indicator this worked differently in the past and hasn't been cleaned up).
I meant something like [0].


Thanks,

James


[0] a diff is worth a thousand words:
--------------------%<--------------------
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_
kexec_file.c
index 762f9102899c..c50ce844f09e 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -325,11 +325,10 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
        return ret;
 }

-int load_other_segments(struct kimage *image,
+int load_other_segments(struct kimage *image, struct kexec_segment *kern_seg,
                        char *initrd, unsigned long initrd_len,
                        char *cmdline, unsigned long cmdline_len)
 {
-       struct kexec_segment *kern_seg;
        struct kexec_buf kbuf;
        void *hdrs_addr;
        unsigned long hdrs_sz;
@@ -368,7 +367,6 @@ int load_other_segments(struct kimage *image,
                                 image->arch.elf_load_addr, hdrs_sz, hdrs_sz);
        }

-       kern_seg = &image->segment[image->arch.kern_segment];
        kbuf.image = image;
        /* not allocate anything below the kernel */
        kbuf.buf_min = kern_seg->mem + kern_seg->memsz;
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 891f2484969d..085cb69293ca 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -173,8 +172,10 @@ static inline int arm64_header_check_pe_sig(const struct ar
m64_image_header *h)
 extern const struct kexec_file_ops kexec_image_ops;

 struct kimage;
+struct kexec_segment;

 extern int load_other_segments(struct kimage *image,
+               struct kexec_segment *kern_seg,
                char *initrd, unsigned long initrd_len,
                char *cmdline, unsigned long cmdline_len);
 #endif
diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
index 7c11beefe65f..0e032d30a79c 100644
--- a/arch/arm64/kernel/kexec_image.c
+++ b/arch/arm64/kernel/kexec_image.c
@@ -37,6 +37,7 @@ static void *image_load(struct kimage *image,
                                char *cmdline, unsigned long cmdline_len)
 {
        struct kexec_buf kbuf;
+       struct kexec_segment *kern_seg;
        struct arm64_image_header *h = (struct arm64_image_header *)kernel;
        unsigned long text_offset;
        int ret;
@@ -65,17 +66,17 @@ static void *image_load(struct kimage *image,
        if (ret)
                goto out;

-       image->arch.kern_segment = image->nr_segments - 1;
-       image->segment[image->arch.kern_segment].mem += text_offset;
-       image->segment[image->arch.kern_segment].memsz -= text_offset;
-       image->start = image->segment[image->arch.kern_segment].mem;
+       kern_seg = &image->segment[image->nr_segments - 1];
+       kern_seg->mem += text_offset;
+       kern_seg->memsz -= text_offset;
+       image->start = kern_seg->mem;

        pr_debug("Loaded kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
-                               image->segment[image->arch.kern_segment].mem,
+                               kern_seg->mem,
                                kbuf.bufsz, kbuf.memsz);

        /* Load additional data */
-       ret = load_other_segments(image, initrd, initrd_len,
+       ret = load_other_segments(image, kern_seg, initrd, initrd_len,
                                cmdline, cmdline_len);

 out:
--------------------%<--------------------

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
  2018-05-07  5:59       ` AKASHI Takahiro
  (?)
@ 2018-05-15  4:35         ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-15  4:35 UTC (permalink / raw)
  To: James Morse, catalin.marinas, will.deacon, dhowells, vgoyal,
	herbert, davem, dyoung, bhe, arnd, ard.biesheuvel, bhsharma,
	kexec, linux-arm-kernel, linux-kernel

James,

On Mon, May 07, 2018 at 02:59:07PM +0900, AKASHI Takahiro wrote:
> James,
> 
> On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
> > Hi Akashi,
> > 
> > On 25/04/18 07:26, AKASHI Takahiro wrote:
> > > We need to prevent firmware-reserved memory regions, particularly EFI
> > > memory map as well as ACPI tables, from being corrupted by loading
> > > kernel/initrd (or other kexec buffers). We also want to support memory
> > > allocation in top-down manner in addition to default bottom-up.
> > > So let's have arm64 specific arch_kexec_walk_mem() which will search
> > > for available memory ranges in usable memblock list,
> > > i.e. !NOMAP & !reserved, 
> > 
> > > instead of system resource tree.
> > 
> > Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
> > be safe in the EFI-memory-map/ACPI-tables case?
> > 
> > It would be good to avoid having two ways of doing this, and I would like to
> > avoid having extra arch code...
> 
> I know what you mean.
> /proc/iomem or system resource is, in my opinion, not the best place to
> describe memory usage of kernel but rather to describe *physical* hardware
> layout. As we are still discussing about "reserved" memory, I don't want
> to depend on it.
> Along with memblock list, we will have more accurate control over memory
> usage.

If you don't have further objection, I will take memblock approach
(with factoring out powerpc's arch_kexec_walk_mem()).

Thanks,
-Takahiro AKASHI


> > 
> > > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > > new file mode 100644
> > > index 000000000000..f9ebf54ca247
> > > --- /dev/null
> > > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > > @@ -0,0 +1,57 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * kexec_file for arm64
> > > + *
> > > + * Copyright (C) 2018 Linaro Limited
> > > + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > > + *
> > 
> > > + * Most code is derived from arm64 port of kexec-tools
> > 
> > How does kexec-tools walk memblock?
> 
> Will remove this comment from this patch.
> Obviously, this comment is for the rest of the code which will be
> added to succeeding patches (patch #5 and #7).
> 
> 
> > 
> > > + */
> > > +
> > > +#define pr_fmt(fmt) "kexec_file: " fmt
> > > +
> > > +#include <linux/ioport.h>
> > > +#include <linux/kernel.h>
> > > +#include <linux/kexec.h>
> > > +#include <linux/memblock.h>
> > > +
> > > +int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> > > +				int (*func)(struct resource *, void *))
> > > +{
> > > +	phys_addr_t start, end;
> > > +	struct resource res;
> > > +	u64 i;
> > > +	int ret = 0;
> > > +
> > > +	if (kbuf->image->type == KEXEC_TYPE_CRASH)
> > > +		return func(&crashk_res, kbuf);
> > > +
> > > +	if (kbuf->top_down)
> > > +		for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,
> > > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > > +				&start, &end, NULL) {
> > 
> > for_each_free_mem_range_reverse() is a more readable version of this helper.
> 
> OK. I used to use my own limited list of reserved memory instead of
> memblock.reserved here to exclude verbose ranges.
> 
> 
> > > +			if (!memblock_is_map_memory(start))
> > > +				continue;
> > 
> > Passing MEMBLOCK_NONE means this walk will never find MEMBLOCK_NOMAP memory.
> 
> Sure, I confirmed it.
> 
> > 
> > > +			res.start = start;
> > > +			res.end = end;
> > > +			ret = func(&res, kbuf);
> > > +			if (ret)
> > > +				break;
> > > +		}
> > > +	else
> > > +		for_each_mem_range(i, &memblock.memory, &memblock.reserved,
> > > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > > +				&start, &end, NULL) {
> > 
> > for_each_free_mem_range()?
> 
> OK.
> 
> > > +			if (!memblock_is_map_memory(start))
> > > +				continue;
> > > +
> > > +			res.start = start;
> > > +			res.end = end;
> > > +			ret = func(&res, kbuf);
> > > +			if (ret)
> > > +				break;
> > > +		}
> > > +
> > > +	return ret;
> > > +}
> > > 
> > 
> > With these changes, what we have is almost:
> > arch/powerpc/kernel/machine_kexec_file_64.c::arch_kexec_walk_mem() !
> > (the difference being powerpc doesn't yet support crash-kernels here)
> > 
> > If the argument is walking memblock gives a better answer than the stringy
> > walk_system_ram_res() thing, is there any mileage in moving this code into
> > kexec_file.c, and using it if !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)?
> > 
> > This would save arm64/powerpc having near-identical implementations.
> > 32bit arm keeps memblock if it has kexec, so it may be useful there too if
> > kexec_file_load() support is added.
> 
> Thanks. I've forgot ppc.
> 
> -Takahiro AKASHI
> 
> 
> > 
> > Thanks,
> > 
> > James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
@ 2018-05-15  4:35         ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-15  4:35 UTC (permalink / raw)
  To: linux-arm-kernel

James,

On Mon, May 07, 2018 at 02:59:07PM +0900, AKASHI Takahiro wrote:
> James,
> 
> On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
> > Hi Akashi,
> > 
> > On 25/04/18 07:26, AKASHI Takahiro wrote:
> > > We need to prevent firmware-reserved memory regions, particularly EFI
> > > memory map as well as ACPI tables, from being corrupted by loading
> > > kernel/initrd (or other kexec buffers). We also want to support memory
> > > allocation in top-down manner in addition to default bottom-up.
> > > So let's have arm64 specific arch_kexec_walk_mem() which will search
> > > for available memory ranges in usable memblock list,
> > > i.e. !NOMAP & !reserved, 
> > 
> > > instead of system resource tree.
> > 
> > Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
> > be safe in the EFI-memory-map/ACPI-tables case?
> > 
> > It would be good to avoid having two ways of doing this, and I would like to
> > avoid having extra arch code...
> 
> I know what you mean.
> /proc/iomem or system resource is, in my opinion, not the best place to
> describe memory usage of kernel but rather to describe *physical* hardware
> layout. As we are still discussing about "reserved" memory, I don't want
> to depend on it.
> Along with memblock list, we will have more accurate control over memory
> usage.

If you don't have further objection, I will take memblock approach
(with factoring out powerpc's arch_kexec_walk_mem()).

Thanks,
-Takahiro AKASHI


> > 
> > > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > > new file mode 100644
> > > index 000000000000..f9ebf54ca247
> > > --- /dev/null
> > > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > > @@ -0,0 +1,57 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * kexec_file for arm64
> > > + *
> > > + * Copyright (C) 2018 Linaro Limited
> > > + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > > + *
> > 
> > > + * Most code is derived from arm64 port of kexec-tools
> > 
> > How does kexec-tools walk memblock?
> 
> Will remove this comment from this patch.
> Obviously, this comment is for the rest of the code which will be
> added to succeeding patches (patch #5 and #7).
> 
> 
> > 
> > > + */
> > > +
> > > +#define pr_fmt(fmt) "kexec_file: " fmt
> > > +
> > > +#include <linux/ioport.h>
> > > +#include <linux/kernel.h>
> > > +#include <linux/kexec.h>
> > > +#include <linux/memblock.h>
> > > +
> > > +int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> > > +				int (*func)(struct resource *, void *))
> > > +{
> > > +	phys_addr_t start, end;
> > > +	struct resource res;
> > > +	u64 i;
> > > +	int ret = 0;
> > > +
> > > +	if (kbuf->image->type == KEXEC_TYPE_CRASH)
> > > +		return func(&crashk_res, kbuf);
> > > +
> > > +	if (kbuf->top_down)
> > > +		for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,
> > > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > > +				&start, &end, NULL) {
> > 
> > for_each_free_mem_range_reverse() is a more readable version of this helper.
> 
> OK. I used to use my own limited list of reserved memory instead of
> memblock.reserved here to exclude verbose ranges.
> 
> 
> > > +			if (!memblock_is_map_memory(start))
> > > +				continue;
> > 
> > Passing MEMBLOCK_NONE means this walk will never find MEMBLOCK_NOMAP memory.
> 
> Sure, I confirmed it.
> 
> > 
> > > +			res.start = start;
> > > +			res.end = end;
> > > +			ret = func(&res, kbuf);
> > > +			if (ret)
> > > +				break;
> > > +		}
> > > +	else
> > > +		for_each_mem_range(i, &memblock.memory, &memblock.reserved,
> > > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > > +				&start, &end, NULL) {
> > 
> > for_each_free_mem_range()?
> 
> OK.
> 
> > > +			if (!memblock_is_map_memory(start))
> > > +				continue;
> > > +
> > > +			res.start = start;
> > > +			res.end = end;
> > > +			ret = func(&res, kbuf);
> > > +			if (ret)
> > > +				break;
> > > +		}
> > > +
> > > +	return ret;
> > > +}
> > > 
> > 
> > With these changes, what we have is almost:
> > arch/powerpc/kernel/machine_kexec_file_64.c::arch_kexec_walk_mem() !
> > (the difference being powerpc doesn't yet support crash-kernels here)
> > 
> > If the argument is walking memblock gives a better answer than the stringy
> > walk_system_ram_res() thing, is there any mileage in moving this code into
> > kexec_file.c, and using it if !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)?
> > 
> > This would save arm64/powerpc having near-identical implementations.
> > 32bit arm keeps memblock if it has kexec, so it may be useful there too if
> > kexec_file_load() support is added.
> 
> Thanks. I've forgot ppc.
> 
> -Takahiro AKASHI
> 
> 
> > 
> > Thanks,
> > 
> > James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
@ 2018-05-15  4:35         ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-15  4:35 UTC (permalink / raw)
  To: James Morse, catalin.marinas, will.deacon, dhowells, vgoyal,
	herbert, davem, dyoung, bhe, arnd, ard.biesheuvel, bhsharma,
	kexec, linux-arm-kernel, linux-kernel

James,

On Mon, May 07, 2018 at 02:59:07PM +0900, AKASHI Takahiro wrote:
> James,
> 
> On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
> > Hi Akashi,
> > 
> > On 25/04/18 07:26, AKASHI Takahiro wrote:
> > > We need to prevent firmware-reserved memory regions, particularly EFI
> > > memory map as well as ACPI tables, from being corrupted by loading
> > > kernel/initrd (or other kexec buffers). We also want to support memory
> > > allocation in top-down manner in addition to default bottom-up.
> > > So let's have arm64 specific arch_kexec_walk_mem() which will search
> > > for available memory ranges in usable memblock list,
> > > i.e. !NOMAP & !reserved, 
> > 
> > > instead of system resource tree.
> > 
> > Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
> > be safe in the EFI-memory-map/ACPI-tables case?
> > 
> > It would be good to avoid having two ways of doing this, and I would like to
> > avoid having extra arch code...
> 
> I know what you mean.
> /proc/iomem or system resource is, in my opinion, not the best place to
> describe memory usage of kernel but rather to describe *physical* hardware
> layout. As we are still discussing about "reserved" memory, I don't want
> to depend on it.
> Along with memblock list, we will have more accurate control over memory
> usage.

If you don't have further objection, I will take memblock approach
(with factoring out powerpc's arch_kexec_walk_mem()).

Thanks,
-Takahiro AKASHI


> > 
> > > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > > new file mode 100644
> > > index 000000000000..f9ebf54ca247
> > > --- /dev/null
> > > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > > @@ -0,0 +1,57 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * kexec_file for arm64
> > > + *
> > > + * Copyright (C) 2018 Linaro Limited
> > > + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > > + *
> > 
> > > + * Most code is derived from arm64 port of kexec-tools
> > 
> > How does kexec-tools walk memblock?
> 
> Will remove this comment from this patch.
> Obviously, this comment is for the rest of the code which will be
> added to succeeding patches (patch #5 and #7).
> 
> 
> > 
> > > + */
> > > +
> > > +#define pr_fmt(fmt) "kexec_file: " fmt
> > > +
> > > +#include <linux/ioport.h>
> > > +#include <linux/kernel.h>
> > > +#include <linux/kexec.h>
> > > +#include <linux/memblock.h>
> > > +
> > > +int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> > > +				int (*func)(struct resource *, void *))
> > > +{
> > > +	phys_addr_t start, end;
> > > +	struct resource res;
> > > +	u64 i;
> > > +	int ret = 0;
> > > +
> > > +	if (kbuf->image->type == KEXEC_TYPE_CRASH)
> > > +		return func(&crashk_res, kbuf);
> > > +
> > > +	if (kbuf->top_down)
> > > +		for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,
> > > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > > +				&start, &end, NULL) {
> > 
> > for_each_free_mem_range_reverse() is a more readable version of this helper.
> 
> OK. I used to use my own limited list of reserved memory instead of
> memblock.reserved here to exclude verbose ranges.
> 
> 
> > > +			if (!memblock_is_map_memory(start))
> > > +				continue;
> > 
> > Passing MEMBLOCK_NONE means this walk will never find MEMBLOCK_NOMAP memory.
> 
> Sure, I confirmed it.
> 
> > 
> > > +			res.start = start;
> > > +			res.end = end;
> > > +			ret = func(&res, kbuf);
> > > +			if (ret)
> > > +				break;
> > > +		}
> > > +	else
> > > +		for_each_mem_range(i, &memblock.memory, &memblock.reserved,
> > > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > > +				&start, &end, NULL) {
> > 
> > for_each_free_mem_range()?
> 
> OK.
> 
> > > +			if (!memblock_is_map_memory(start))
> > > +				continue;
> > > +
> > > +			res.start = start;
> > > +			res.end = end;
> > > +			ret = func(&res, kbuf);
> > > +			if (ret)
> > > +				break;
> > > +		}
> > > +
> > > +	return ret;
> > > +}
> > > 
> > 
> > With these changes, what we have is almost:
> > arch/powerpc/kernel/machine_kexec_file_64.c::arch_kexec_walk_mem() !
> > (the difference being powerpc doesn't yet support crash-kernels here)
> > 
> > If the argument is walking memblock gives a better answer than the stringy
> > walk_system_ram_res() thing, is there any mileage in moving this code into
> > kexec_file.c, and using it if !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)?
> > 
> > This would save arm64/powerpc having near-identical implementations.
> > 32bit arm keeps memblock if it has kexec, so it may be useful there too if
> > kexec_file_load() support is added.
> 
> Thanks. I've forgot ppc.
> 
> -Takahiro AKASHI
> 
> 
> > 
> > Thanks,
> > 
> > James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 03/11] arm64: kexec_file: invoke the kernel without purgatory
  2018-05-11 17:03         ` James Morse
  (?)
@ 2018-05-15  4:45           ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-15  4:45 UTC (permalink / raw)
  To: James Morse
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

James,

On Fri, May 11, 2018 at 06:03:49PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 07/05/18 06:22, AKASHI Takahiro wrote:
> > On Tue, May 01, 2018 at 06:46:06PM +0100, James Morse wrote:
> >> On 25/04/18 07:26, AKASHI Takahiro wrote:
> >>> diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
> >>> index f76ea92dff91..f7dbba00be10 100644
> >>> --- a/arch/arm64/kernel/machine_kexec.c
> >>> +++ b/arch/arm64/kernel/machine_kexec.c
> >>> @@ -205,10 +205,17 @@ void machine_kexec(struct kimage *kimage)
> 
> >>>  	cpu_soft_restart(kimage != kexec_crash_image,
> >>> -		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
> >>> +		reboot_code_buffer_phys, kimage->head, kimage->start,
> >>> +#ifdef CONFIG_KEXEC_FILE
> >>> +				kimage->purgatory_info.purgatory_buf ?
> >>> +						0 : kimage->arch.dtb_mem);
> >>> +#else
> >>> +				0);
> >>> +#endif
> 
> 
> >> purgatory_buf seems to only be set in kexec_purgatory_setup_kbuf(), called from
> >> kexec_load_purgatory(), which we don't use. How does this get a value?
> >>
> >> Would it be better to always use kimage->arch.dtb_mem, and ensure that is 0 for
> >> regular kexec (as we can't know where the dtb is)? (image_arg may then be a
> >> better name).
> > 
> > The problem is arch.dtb_mem is currently defined only if CONFIG_KEXEC_FILE.
> 
> I thought it was ARCH_HAS_KIMAGE_ARCH, which we can define all the time if
> that's what we want.
> 
> 
> > So I would like to
> > - merge this patch with patch#8
> > - change the condition
> >         #ifdef CONFIG_KEXEC_FILE
> >        				kimage->file_mode ? kimage->arch.dtb_mem : 0);
> >         #else
> >         			0);
> >         #endif
> 
> If we can avoid even this #ifdef by always having kimage->arch, I'd prefer that.
> If we do that 'dtb_mem' would need some thing that indicates its for kexec_file,
> as kexec has a DTB too, we just don't know where it is...

OK, but I want to have a minimum of kexec.arch always exist.
How about this?

| #define ARCH_HAS_KIMAGE_ARCH
|
| struct kimage_arch {
| 	phys_addr_t dtb_mem;
| #ifdef CONFIG_KEXEC_FILE
| 	void *dtb_buf;
| 	/* Core ELF header buffer */
| 	void *elf_headers;
| 	unsigned long elf_headers_sz;
| 	unsigned long elf_load_addr;
| #endif

| void machine_kexec(struct kimage *kimage)
| {
| 	...
| 	cpu_soft_restart(kimage != kexec_crash_image,
| 		reboot_code_buffer_phys, kimage->head, kimage->start,
| 						kimage->arch.dtb_mem);

Thanks
-Takahiro AKASHI

> 
> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 03/11] arm64: kexec_file: invoke the kernel without purgatory
@ 2018-05-15  4:45           ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-15  4:45 UTC (permalink / raw)
  To: linux-arm-kernel

James,

On Fri, May 11, 2018 at 06:03:49PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 07/05/18 06:22, AKASHI Takahiro wrote:
> > On Tue, May 01, 2018 at 06:46:06PM +0100, James Morse wrote:
> >> On 25/04/18 07:26, AKASHI Takahiro wrote:
> >>> diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
> >>> index f76ea92dff91..f7dbba00be10 100644
> >>> --- a/arch/arm64/kernel/machine_kexec.c
> >>> +++ b/arch/arm64/kernel/machine_kexec.c
> >>> @@ -205,10 +205,17 @@ void machine_kexec(struct kimage *kimage)
> 
> >>>  	cpu_soft_restart(kimage != kexec_crash_image,
> >>> -		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
> >>> +		reboot_code_buffer_phys, kimage->head, kimage->start,
> >>> +#ifdef CONFIG_KEXEC_FILE
> >>> +				kimage->purgatory_info.purgatory_buf ?
> >>> +						0 : kimage->arch.dtb_mem);
> >>> +#else
> >>> +				0);
> >>> +#endif
> 
> 
> >> purgatory_buf seems to only be set in kexec_purgatory_setup_kbuf(), called from
> >> kexec_load_purgatory(), which we don't use. How does this get a value?
> >>
> >> Would it be better to always use kimage->arch.dtb_mem, and ensure that is 0 for
> >> regular kexec (as we can't know where the dtb is)? (image_arg may then be a
> >> better name).
> > 
> > The problem is arch.dtb_mem is currently defined only if CONFIG_KEXEC_FILE.
> 
> I thought it was ARCH_HAS_KIMAGE_ARCH, which we can define all the time if
> that's what we want.
> 
> 
> > So I would like to
> > - merge this patch with patch#8
> > - change the condition
> >         #ifdef CONFIG_KEXEC_FILE
> >        				kimage->file_mode ? kimage->arch.dtb_mem : 0);
> >         #else
> >         			0);
> >         #endif
> 
> If we can avoid even this #ifdef by always having kimage->arch, I'd prefer that.
> If we do that 'dtb_mem' would need some thing that indicates its for kexec_file,
> as kexec has a DTB too, we just don't know where it is...

OK, but I want to have a minimum of kexec.arch always exist.
How about this?

| #define ARCH_HAS_KIMAGE_ARCH
|
| struct kimage_arch {
| 	phys_addr_t dtb_mem;
| #ifdef CONFIG_KEXEC_FILE
| 	void *dtb_buf;
| 	/* Core ELF header buffer */
| 	void *elf_headers;
| 	unsigned long elf_headers_sz;
| 	unsigned long elf_load_addr;
| #endif

| void machine_kexec(struct kimage *kimage)
| {
| 	...
| 	cpu_soft_restart(kimage != kexec_crash_image,
| 		reboot_code_buffer_phys, kimage->head, kimage->start,
| 						kimage->arch.dtb_mem);

Thanks
-Takahiro AKASHI

> 
> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 03/11] arm64: kexec_file: invoke the kernel without purgatory
@ 2018-05-15  4:45           ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-15  4:45 UTC (permalink / raw)
  To: James Morse
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

James,

On Fri, May 11, 2018 at 06:03:49PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 07/05/18 06:22, AKASHI Takahiro wrote:
> > On Tue, May 01, 2018 at 06:46:06PM +0100, James Morse wrote:
> >> On 25/04/18 07:26, AKASHI Takahiro wrote:
> >>> diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
> >>> index f76ea92dff91..f7dbba00be10 100644
> >>> --- a/arch/arm64/kernel/machine_kexec.c
> >>> +++ b/arch/arm64/kernel/machine_kexec.c
> >>> @@ -205,10 +205,17 @@ void machine_kexec(struct kimage *kimage)
> 
> >>>  	cpu_soft_restart(kimage != kexec_crash_image,
> >>> -		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
> >>> +		reboot_code_buffer_phys, kimage->head, kimage->start,
> >>> +#ifdef CONFIG_KEXEC_FILE
> >>> +				kimage->purgatory_info.purgatory_buf ?
> >>> +						0 : kimage->arch.dtb_mem);
> >>> +#else
> >>> +				0);
> >>> +#endif
> 
> 
> >> purgatory_buf seems to only be set in kexec_purgatory_setup_kbuf(), called from
> >> kexec_load_purgatory(), which we don't use. How does this get a value?
> >>
> >> Would it be better to always use kimage->arch.dtb_mem, and ensure that is 0 for
> >> regular kexec (as we can't know where the dtb is)? (image_arg may then be a
> >> better name).
> > 
> > The problem is arch.dtb_mem is currently defined only if CONFIG_KEXEC_FILE.
> 
> I thought it was ARCH_HAS_KIMAGE_ARCH, which we can define all the time if
> that's what we want.
> 
> 
> > So I would like to
> > - merge this patch with patch#8
> > - change the condition
> >         #ifdef CONFIG_KEXEC_FILE
> >        				kimage->file_mode ? kimage->arch.dtb_mem : 0);
> >         #else
> >         			0);
> >         #endif
> 
> If we can avoid even this #ifdef by always having kimage->arch, I'd prefer that.
> If we do that 'dtb_mem' would need some thing that indicates its for kexec_file,
> as kexec has a DTB too, we just don't know where it is...

OK, but I want to have a minimum of kexec.arch always exist.
How about this?

| #define ARCH_HAS_KIMAGE_ARCH
|
| struct kimage_arch {
| 	phys_addr_t dtb_mem;
| #ifdef CONFIG_KEXEC_FILE
| 	void *dtb_buf;
| 	/* Core ELF header buffer */
| 	void *elf_headers;
| 	unsigned long elf_headers_sz;
| 	unsigned long elf_load_addr;
| #endif

| void machine_kexec(struct kimage *kimage)
| {
| 	...
| 	cpu_soft_restart(kimage != kexec_crash_image,
| 		reboot_code_buffer_phys, kimage->head, kimage->start,
| 						kimage->arch.dtb_mem);

Thanks
-Takahiro AKASHI

> 
> 
> Thanks,
> 
> James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 06/11] arm64: kexec_file: allow for loading Image-format kernel
  2018-05-11 17:07         ` James Morse
  (?)
@ 2018-05-15  5:13           ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-15  5:13 UTC (permalink / raw)
  To: James Morse
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

James,

On Fri, May 11, 2018 at 06:07:06PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 07/05/18 08:21, AKASHI Takahiro wrote:
> > On Tue, May 01, 2018 at 06:46:11PM +0100, James Morse wrote:
> >> On 25/04/18 07:26, AKASHI Takahiro wrote:
> >>> This patch provides kexec_file_ops for "Image"-format kernel. In this
> >>> implementation, a binary is always loaded with a fixed offset identified
> >>> in text_offset field of its header.
> 
> >>> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
> >>> index e4de1223715f..3cba4161818a 100644
> >>> --- a/arch/arm64/include/asm/kexec.h
> >>> +++ b/arch/arm64/include/asm/kexec.h
> >>> @@ -102,6 +102,56 @@ struct kimage_arch {
> >>>  	void *dtb_buf;
> >>>  };
> >>>  
> >>> +/**
> >>> + * struct arm64_image_header - arm64 kernel image header
> >>> + *
> >>> + * @pe_sig: Optional PE format 'MZ' signature

To be precise, this is NOT a PE signature but MS-DOS header's magic.
(There is another "PE" signature in PE COFF file header pointed to by
'pe_header'.)
I will correct its name.

> >>> + * @branch_code: Instruction to branch to stext
> >>> + * @text_offset: Image load offset, little endian
> >>> + * @image_size: Effective image size, little endian
> >>> + * @flags:
> >>> + *	Bit 0: Kernel endianness. 0=little endian, 1=big endian
> >>
> >> Page size? What about 'phys_base'?, (whatever that is...)
> >> Probably best to refer to Documentation/arm64/booting.txt here, its the
> >> authoritative source of what these fields mean.
> > 
> > While we don't care other bit fields for now, I will add the reference
> > to the Documentation file.
> 
> Thanks, I don't want to create a second, incomplete set of documentation!

I will leave a minimum of description of parameters here.

> 
> 
> >>> +	u64 reserved[3];
> >>> +	u8 magic[4];
> >>> +	u32 pe_header;
> >>> +};
> >>
> >> I'm surprised we don't have a definition for this already, I guess its always
> >> done in asm. We have kernel/image.h that holds some of this stuff, if we are
> >> going to validate the flags, is it worth adding the code there, (and moving it
> >> to include/asm)?
> > 
> > A comment at the beginning of this file says,
> >     #ifndef LINKER_SCRIPT
> >     #error This file should only be included in vmlinux.lds.S
> >     #endif
> > Let me think about.
> 
> Ah, I missed that.
> 
> Having two definitions of something makes me nervous that they can become
> different... looks like that header belongs to the linker, and shouldn't be used
> here then.

OK.

> 
> >> I guess you skip the MZ prefix as its not present for !EFI?

Correct, but MZ checking in probe function is just an informative message.

> > 
> > CONFIG_KEXEC_IMAGE_VERIFY_SIG depends on the fact that the file
> > format is PE (that is, EFI is enabled).
> 
> So if the signature checking is enabled, its already been checked.

The signature, either MZ or PE, in a file will be actually checked
in verify_pefile_signature().

> 
> >> Could we check branch_code is non-zero, and text-offset points within image-size?
> > 
> > We could do it, but I don't think this check is very useful.
> > 
> >>
> >> We could check that this platform supports the page-size/endian config that this
> >> Image was built with... We get a message from the EFI stub if the page-size
> >> can't be supported, it would be nice to do the same here (as we can).
> > 
> > There is no restriction on page-size or endianness for kexec.
> 
> No, but it won't boot if the hardware doesn't support it. The kernel will spin
> at a magic address that is, difficult, to debug without JTAG. The bug report
> will be "it didn't boot".

OK.
Added sanity checks for cpu features, endianness as well as page size.

> 
> > What will be the purpose of this check?
> 
> These values are in the header so that the bootloader can check them, then print
> a meaningful error. Here, kexec_file_load() is playing the part of the bootloader.
> 
> I'm assuming kexec_file_load() can only be used to kexec linux... unlike regular
> kexec. Is this where I'm going wrong?
> 
> 
> >>> diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
> >>> new file mode 100644
> >>> index 000000000000..4dd524ad6611
> >>> --- /dev/null
> >>> +++ b/arch/arm64/kernel/kexec_image.c
> >>> @@ -0,0 +1,79 @@
> >>
> >>> +static void *image_load(struct kimage *image,
> >>> +				char *kernel, unsigned long kernel_len,
> >>> +				char *initrd, unsigned long initrd_len,
> >>> +				char *cmdline, unsigned long cmdline_len)
> >>> +{
> >>> +	struct kexec_buf kbuf;
> >>> +	struct arm64_image_header *h = (struct arm64_image_header *)kernel;
> >>> +	unsigned long text_offset;
> >>> +	int ret;
> >>> +
> >>> +	/* Load the kernel */
> >>> +	kbuf.image = image;
> >>> +	kbuf.buf_min = 0;
> >>> +	kbuf.buf_max = ULONG_MAX;
> >>> +	kbuf.top_down = false;
> >>> +
> >>> +	kbuf.buffer = kernel;
> >>> +	kbuf.bufsz = kernel_len;
> >>> +	kbuf.memsz = le64_to_cpu(h->image_size);
> >>> +	text_offset = le64_to_cpu(h->text_offset);
> >>> +	kbuf.buf_align = SZ_2M;
> >>
> >>> +	/* Adjust kernel segment with TEXT_OFFSET */
> >>> +	kbuf.memsz += text_offset;
> >>> +
> >>> +	ret = kexec_add_buffer(&kbuf);
> >>> +	if (ret)
> >>> +		goto out;
> >>> +
> >>> +	image->arch.kern_segment = image->nr_segments - 1;
> >>
> >> You only seem to use kern_segment here, and in load_other_segments() called
> >> below. Could it not be a local variable passed in? Instead of arch-specific data
> >> we keep forever?
> > 
> > No, kern_segment is also used in load_other_segments() in machine_kexec_file.c.
> > To optimize memory hole allocation logic in locate_mem_hole_callback(),
> > we need to know the exact range of kernel image (start and end).
> 
> That's the second user. My badly-made point is one calls the other, but passes
> the data via some until-kexec lifetime struct. (its not important, just an
> indicator this worked differently in the past and hasn't been cleaned up).
> I meant something like [0].

OK, but instead of adding kern_seg, I want to change the interface to:

| extern int load_other_segments(struct kimage *image,
|		unsigned long kernel_load_addr, unsigned long kernel_size,
|		char *initrd, unsigned long initrd_len,
|		char *cmdline, unsigned long cmdline_len);

This way, we will in future be able to address an issue I mentioned in
my previous e-mail. (If we support vmlinux, the kernel occupies two segments
for text and data, respectively.)

Thanks,
-Takahiro AKASHI


> 
> Thanks,
> 
> James
> 
> 
> [0] a diff is worth a thousand words:
> --------------------%<--------------------
> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_
> kexec_file.c
> index 762f9102899c..c50ce844f09e 100644
> --- a/arch/arm64/kernel/machine_kexec_file.c
> +++ b/arch/arm64/kernel/machine_kexec_file.c
> @@ -325,11 +325,10 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
>         return ret;
>  }
> 
> -int load_other_segments(struct kimage *image,
> +int load_other_segments(struct kimage *image, struct kexec_segment *kern_seg,
>                         char *initrd, unsigned long initrd_len,
>                         char *cmdline, unsigned long cmdline_len)
>  {
> -       struct kexec_segment *kern_seg;
>         struct kexec_buf kbuf;
>         void *hdrs_addr;
>         unsigned long hdrs_sz;
> @@ -368,7 +367,6 @@ int load_other_segments(struct kimage *image,
>                                  image->arch.elf_load_addr, hdrs_sz, hdrs_sz);
>         }
> 
> -       kern_seg = &image->segment[image->arch.kern_segment];
>         kbuf.image = image;
>         /* not allocate anything below the kernel */
>         kbuf.buf_min = kern_seg->mem + kern_seg->memsz;
> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
> index 891f2484969d..085cb69293ca 100644
> --- a/arch/arm64/include/asm/kexec.h
> +++ b/arch/arm64/include/asm/kexec.h
> @@ -173,8 +172,10 @@ static inline int arm64_header_check_pe_sig(const struct ar
> m64_image_header *h)
>  extern const struct kexec_file_ops kexec_image_ops;
> 
>  struct kimage;
> +struct kexec_segment;
> 
>  extern int load_other_segments(struct kimage *image,
> +               struct kexec_segment *kern_seg,
>                 char *initrd, unsigned long initrd_len,
>                 char *cmdline, unsigned long cmdline_len);
>  #endif
> diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
> index 7c11beefe65f..0e032d30a79c 100644
> --- a/arch/arm64/kernel/kexec_image.c
> +++ b/arch/arm64/kernel/kexec_image.c
> @@ -37,6 +37,7 @@ static void *image_load(struct kimage *image,
>                                 char *cmdline, unsigned long cmdline_len)
>  {
>         struct kexec_buf kbuf;
> +       struct kexec_segment *kern_seg;
>         struct arm64_image_header *h = (struct arm64_image_header *)kernel;
>         unsigned long text_offset;
>         int ret;
> @@ -65,17 +66,17 @@ static void *image_load(struct kimage *image,
>         if (ret)
>                 goto out;
> 
> -       image->arch.kern_segment = image->nr_segments - 1;
> -       image->segment[image->arch.kern_segment].mem += text_offset;
> -       image->segment[image->arch.kern_segment].memsz -= text_offset;
> -       image->start = image->segment[image->arch.kern_segment].mem;
> +       kern_seg = &image->segment[image->nr_segments - 1];
> +       kern_seg->mem += text_offset;
> +       kern_seg->memsz -= text_offset;
> +       image->start = kern_seg->mem;
> 
>         pr_debug("Loaded kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> -                               image->segment[image->arch.kern_segment].mem,
> +                               kern_seg->mem,
>                                 kbuf.bufsz, kbuf.memsz);
> 
>         /* Load additional data */
> -       ret = load_other_segments(image, initrd, initrd_len,
> +       ret = load_other_segments(image, kern_seg, initrd, initrd_len,
>                                 cmdline, cmdline_len);
> 
>  out:
> --------------------%<--------------------

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 06/11] arm64: kexec_file: allow for loading Image-format kernel
@ 2018-05-15  5:13           ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-15  5:13 UTC (permalink / raw)
  To: linux-arm-kernel

James,

On Fri, May 11, 2018 at 06:07:06PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 07/05/18 08:21, AKASHI Takahiro wrote:
> > On Tue, May 01, 2018 at 06:46:11PM +0100, James Morse wrote:
> >> On 25/04/18 07:26, AKASHI Takahiro wrote:
> >>> This patch provides kexec_file_ops for "Image"-format kernel. In this
> >>> implementation, a binary is always loaded with a fixed offset identified
> >>> in text_offset field of its header.
> 
> >>> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
> >>> index e4de1223715f..3cba4161818a 100644
> >>> --- a/arch/arm64/include/asm/kexec.h
> >>> +++ b/arch/arm64/include/asm/kexec.h
> >>> @@ -102,6 +102,56 @@ struct kimage_arch {
> >>>  	void *dtb_buf;
> >>>  };
> >>>  
> >>> +/**
> >>> + * struct arm64_image_header - arm64 kernel image header
> >>> + *
> >>> + * @pe_sig: Optional PE format 'MZ' signature

To be precise, this is NOT a PE signature but MS-DOS header's magic.
(There is another "PE" signature in PE COFF file header pointed to by
'pe_header'.)
I will correct its name.

> >>> + * @branch_code: Instruction to branch to stext
> >>> + * @text_offset: Image load offset, little endian
> >>> + * @image_size: Effective image size, little endian
> >>> + * @flags:
> >>> + *	Bit 0: Kernel endianness. 0=little endian, 1=big endian
> >>
> >> Page size? What about 'phys_base'?, (whatever that is...)
> >> Probably best to refer to Documentation/arm64/booting.txt here, its the
> >> authoritative source of what these fields mean.
> > 
> > While we don't care other bit fields for now, I will add the reference
> > to the Documentation file.
> 
> Thanks, I don't want to create a second, incomplete set of documentation!

I will leave a minimum of description of parameters here.

> 
> 
> >>> +	u64 reserved[3];
> >>> +	u8 magic[4];
> >>> +	u32 pe_header;
> >>> +};
> >>
> >> I'm surprised we don't have a definition for this already, I guess its always
> >> done in asm. We have kernel/image.h that holds some of this stuff, if we are
> >> going to validate the flags, is it worth adding the code there, (and moving it
> >> to include/asm)?
> > 
> > A comment at the beginning of this file says,
> >     #ifndef LINKER_SCRIPT
> >     #error This file should only be included in vmlinux.lds.S
> >     #endif
> > Let me think about.
> 
> Ah, I missed that.
> 
> Having two definitions of something makes me nervous that they can become
> different... looks like that header belongs to the linker, and shouldn't be used
> here then.

OK.

> 
> >> I guess you skip the MZ prefix as its not present for !EFI?

Correct, but MZ checking in probe function is just an informative message.

> > 
> > CONFIG_KEXEC_IMAGE_VERIFY_SIG depends on the fact that the file
> > format is PE (that is, EFI is enabled).
> 
> So if the signature checking is enabled, its already been checked.

The signature, either MZ or PE, in a file will be actually checked
in verify_pefile_signature().

> 
> >> Could we check branch_code is non-zero, and text-offset points within image-size?
> > 
> > We could do it, but I don't think this check is very useful.
> > 
> >>
> >> We could check that this platform supports the page-size/endian config that this
> >> Image was built with... We get a message from the EFI stub if the page-size
> >> can't be supported, it would be nice to do the same here (as we can).
> > 
> > There is no restriction on page-size or endianness for kexec.
> 
> No, but it won't boot if the hardware doesn't support it. The kernel will spin
> at a magic address that is, difficult, to debug without JTAG. The bug report
> will be "it didn't boot".

OK.
Added sanity checks for cpu features, endianness as well as page size.

> 
> > What will be the purpose of this check?
> 
> These values are in the header so that the bootloader can check them, then print
> a meaningful error. Here, kexec_file_load() is playing the part of the bootloader.
> 
> I'm assuming kexec_file_load() can only be used to kexec linux... unlike regular
> kexec. Is this where I'm going wrong?
> 
> 
> >>> diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
> >>> new file mode 100644
> >>> index 000000000000..4dd524ad6611
> >>> --- /dev/null
> >>> +++ b/arch/arm64/kernel/kexec_image.c
> >>> @@ -0,0 +1,79 @@
> >>
> >>> +static void *image_load(struct kimage *image,
> >>> +				char *kernel, unsigned long kernel_len,
> >>> +				char *initrd, unsigned long initrd_len,
> >>> +				char *cmdline, unsigned long cmdline_len)
> >>> +{
> >>> +	struct kexec_buf kbuf;
> >>> +	struct arm64_image_header *h = (struct arm64_image_header *)kernel;
> >>> +	unsigned long text_offset;
> >>> +	int ret;
> >>> +
> >>> +	/* Load the kernel */
> >>> +	kbuf.image = image;
> >>> +	kbuf.buf_min = 0;
> >>> +	kbuf.buf_max = ULONG_MAX;
> >>> +	kbuf.top_down = false;
> >>> +
> >>> +	kbuf.buffer = kernel;
> >>> +	kbuf.bufsz = kernel_len;
> >>> +	kbuf.memsz = le64_to_cpu(h->image_size);
> >>> +	text_offset = le64_to_cpu(h->text_offset);
> >>> +	kbuf.buf_align = SZ_2M;
> >>
> >>> +	/* Adjust kernel segment with TEXT_OFFSET */
> >>> +	kbuf.memsz += text_offset;
> >>> +
> >>> +	ret = kexec_add_buffer(&kbuf);
> >>> +	if (ret)
> >>> +		goto out;
> >>> +
> >>> +	image->arch.kern_segment = image->nr_segments - 1;
> >>
> >> You only seem to use kern_segment here, and in load_other_segments() called
> >> below. Could it not be a local variable passed in? Instead of arch-specific data
> >> we keep forever?
> > 
> > No, kern_segment is also used in load_other_segments() in machine_kexec_file.c.
> > To optimize memory hole allocation logic in locate_mem_hole_callback(),
> > we need to know the exact range of kernel image (start and end).
> 
> That's the second user. My badly-made point is one calls the other, but passes
> the data via some until-kexec lifetime struct. (its not important, just an
> indicator this worked differently in the past and hasn't been cleaned up).
> I meant something like [0].

OK, but instead of adding kern_seg, I want to change the interface to:

| extern int load_other_segments(struct kimage *image,
|		unsigned long kernel_load_addr, unsigned long kernel_size,
|		char *initrd, unsigned long initrd_len,
|		char *cmdline, unsigned long cmdline_len);

This way, we will in future be able to address an issue I mentioned in
my previous e-mail. (If we support vmlinux, the kernel occupies two segments
for text and data, respectively.)

Thanks,
-Takahiro AKASHI


> 
> Thanks,
> 
> James
> 
> 
> [0] a diff is worth a thousand words:
> --------------------%<--------------------
> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_
> kexec_file.c
> index 762f9102899c..c50ce844f09e 100644
> --- a/arch/arm64/kernel/machine_kexec_file.c
> +++ b/arch/arm64/kernel/machine_kexec_file.c
> @@ -325,11 +325,10 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
>         return ret;
>  }
> 
> -int load_other_segments(struct kimage *image,
> +int load_other_segments(struct kimage *image, struct kexec_segment *kern_seg,
>                         char *initrd, unsigned long initrd_len,
>                         char *cmdline, unsigned long cmdline_len)
>  {
> -       struct kexec_segment *kern_seg;
>         struct kexec_buf kbuf;
>         void *hdrs_addr;
>         unsigned long hdrs_sz;
> @@ -368,7 +367,6 @@ int load_other_segments(struct kimage *image,
>                                  image->arch.elf_load_addr, hdrs_sz, hdrs_sz);
>         }
> 
> -       kern_seg = &image->segment[image->arch.kern_segment];
>         kbuf.image = image;
>         /* not allocate anything below the kernel */
>         kbuf.buf_min = kern_seg->mem + kern_seg->memsz;
> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
> index 891f2484969d..085cb69293ca 100644
> --- a/arch/arm64/include/asm/kexec.h
> +++ b/arch/arm64/include/asm/kexec.h
> @@ -173,8 +172,10 @@ static inline int arm64_header_check_pe_sig(const struct ar
> m64_image_header *h)
>  extern const struct kexec_file_ops kexec_image_ops;
> 
>  struct kimage;
> +struct kexec_segment;
> 
>  extern int load_other_segments(struct kimage *image,
> +               struct kexec_segment *kern_seg,
>                 char *initrd, unsigned long initrd_len,
>                 char *cmdline, unsigned long cmdline_len);
>  #endif
> diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
> index 7c11beefe65f..0e032d30a79c 100644
> --- a/arch/arm64/kernel/kexec_image.c
> +++ b/arch/arm64/kernel/kexec_image.c
> @@ -37,6 +37,7 @@ static void *image_load(struct kimage *image,
>                                 char *cmdline, unsigned long cmdline_len)
>  {
>         struct kexec_buf kbuf;
> +       struct kexec_segment *kern_seg;
>         struct arm64_image_header *h = (struct arm64_image_header *)kernel;
>         unsigned long text_offset;
>         int ret;
> @@ -65,17 +66,17 @@ static void *image_load(struct kimage *image,
>         if (ret)
>                 goto out;
> 
> -       image->arch.kern_segment = image->nr_segments - 1;
> -       image->segment[image->arch.kern_segment].mem += text_offset;
> -       image->segment[image->arch.kern_segment].memsz -= text_offset;
> -       image->start = image->segment[image->arch.kern_segment].mem;
> +       kern_seg = &image->segment[image->nr_segments - 1];
> +       kern_seg->mem += text_offset;
> +       kern_seg->memsz -= text_offset;
> +       image->start = kern_seg->mem;
> 
>         pr_debug("Loaded kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> -                               image->segment[image->arch.kern_segment].mem,
> +                               kern_seg->mem,
>                                 kbuf.bufsz, kbuf.memsz);
> 
>         /* Load additional data */
> -       ret = load_other_segments(image, initrd, initrd_len,
> +       ret = load_other_segments(image, kern_seg, initrd, initrd_len,
>                                 cmdline, cmdline_len);
> 
>  out:
> --------------------%<--------------------

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 06/11] arm64: kexec_file: allow for loading Image-format kernel
@ 2018-05-15  5:13           ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-15  5:13 UTC (permalink / raw)
  To: James Morse
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

James,

On Fri, May 11, 2018 at 06:07:06PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 07/05/18 08:21, AKASHI Takahiro wrote:
> > On Tue, May 01, 2018 at 06:46:11PM +0100, James Morse wrote:
> >> On 25/04/18 07:26, AKASHI Takahiro wrote:
> >>> This patch provides kexec_file_ops for "Image"-format kernel. In this
> >>> implementation, a binary is always loaded with a fixed offset identified
> >>> in text_offset field of its header.
> 
> >>> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
> >>> index e4de1223715f..3cba4161818a 100644
> >>> --- a/arch/arm64/include/asm/kexec.h
> >>> +++ b/arch/arm64/include/asm/kexec.h
> >>> @@ -102,6 +102,56 @@ struct kimage_arch {
> >>>  	void *dtb_buf;
> >>>  };
> >>>  
> >>> +/**
> >>> + * struct arm64_image_header - arm64 kernel image header
> >>> + *
> >>> + * @pe_sig: Optional PE format 'MZ' signature

To be precise, this is NOT a PE signature but MS-DOS header's magic.
(There is another "PE" signature in PE COFF file header pointed to by
'pe_header'.)
I will correct its name.

> >>> + * @branch_code: Instruction to branch to stext
> >>> + * @text_offset: Image load offset, little endian
> >>> + * @image_size: Effective image size, little endian
> >>> + * @flags:
> >>> + *	Bit 0: Kernel endianness. 0=little endian, 1=big endian
> >>
> >> Page size? What about 'phys_base'?, (whatever that is...)
> >> Probably best to refer to Documentation/arm64/booting.txt here, its the
> >> authoritative source of what these fields mean.
> > 
> > While we don't care other bit fields for now, I will add the reference
> > to the Documentation file.
> 
> Thanks, I don't want to create a second, incomplete set of documentation!

I will leave a minimum of description of parameters here.

> 
> 
> >>> +	u64 reserved[3];
> >>> +	u8 magic[4];
> >>> +	u32 pe_header;
> >>> +};
> >>
> >> I'm surprised we don't have a definition for this already, I guess its always
> >> done in asm. We have kernel/image.h that holds some of this stuff, if we are
> >> going to validate the flags, is it worth adding the code there, (and moving it
> >> to include/asm)?
> > 
> > A comment at the beginning of this file says,
> >     #ifndef LINKER_SCRIPT
> >     #error This file should only be included in vmlinux.lds.S
> >     #endif
> > Let me think about.
> 
> Ah, I missed that.
> 
> Having two definitions of something makes me nervous that they can become
> different... looks like that header belongs to the linker, and shouldn't be used
> here then.

OK.

> 
> >> I guess you skip the MZ prefix as its not present for !EFI?

Correct, but MZ checking in probe function is just an informative message.

> > 
> > CONFIG_KEXEC_IMAGE_VERIFY_SIG depends on the fact that the file
> > format is PE (that is, EFI is enabled).
> 
> So if the signature checking is enabled, its already been checked.

The signature, either MZ or PE, in a file will be actually checked
in verify_pefile_signature().

> 
> >> Could we check branch_code is non-zero, and text-offset points within image-size?
> > 
> > We could do it, but I don't think this check is very useful.
> > 
> >>
> >> We could check that this platform supports the page-size/endian config that this
> >> Image was built with... We get a message from the EFI stub if the page-size
> >> can't be supported, it would be nice to do the same here (as we can).
> > 
> > There is no restriction on page-size or endianness for kexec.
> 
> No, but it won't boot if the hardware doesn't support it. The kernel will spin
> at a magic address that is, difficult, to debug without JTAG. The bug report
> will be "it didn't boot".

OK.
Added sanity checks for cpu features, endianness as well as page size.

> 
> > What will be the purpose of this check?
> 
> These values are in the header so that the bootloader can check them, then print
> a meaningful error. Here, kexec_file_load() is playing the part of the bootloader.
> 
> I'm assuming kexec_file_load() can only be used to kexec linux... unlike regular
> kexec. Is this where I'm going wrong?
> 
> 
> >>> diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
> >>> new file mode 100644
> >>> index 000000000000..4dd524ad6611
> >>> --- /dev/null
> >>> +++ b/arch/arm64/kernel/kexec_image.c
> >>> @@ -0,0 +1,79 @@
> >>
> >>> +static void *image_load(struct kimage *image,
> >>> +				char *kernel, unsigned long kernel_len,
> >>> +				char *initrd, unsigned long initrd_len,
> >>> +				char *cmdline, unsigned long cmdline_len)
> >>> +{
> >>> +	struct kexec_buf kbuf;
> >>> +	struct arm64_image_header *h = (struct arm64_image_header *)kernel;
> >>> +	unsigned long text_offset;
> >>> +	int ret;
> >>> +
> >>> +	/* Load the kernel */
> >>> +	kbuf.image = image;
> >>> +	kbuf.buf_min = 0;
> >>> +	kbuf.buf_max = ULONG_MAX;
> >>> +	kbuf.top_down = false;
> >>> +
> >>> +	kbuf.buffer = kernel;
> >>> +	kbuf.bufsz = kernel_len;
> >>> +	kbuf.memsz = le64_to_cpu(h->image_size);
> >>> +	text_offset = le64_to_cpu(h->text_offset);
> >>> +	kbuf.buf_align = SZ_2M;
> >>
> >>> +	/* Adjust kernel segment with TEXT_OFFSET */
> >>> +	kbuf.memsz += text_offset;
> >>> +
> >>> +	ret = kexec_add_buffer(&kbuf);
> >>> +	if (ret)
> >>> +		goto out;
> >>> +
> >>> +	image->arch.kern_segment = image->nr_segments - 1;
> >>
> >> You only seem to use kern_segment here, and in load_other_segments() called
> >> below. Could it not be a local variable passed in? Instead of arch-specific data
> >> we keep forever?
> > 
> > No, kern_segment is also used in load_other_segments() in machine_kexec_file.c.
> > To optimize memory hole allocation logic in locate_mem_hole_callback(),
> > we need to know the exact range of kernel image (start and end).
> 
> That's the second user. My badly-made point is one calls the other, but passes
> the data via some until-kexec lifetime struct. (its not important, just an
> indicator this worked differently in the past and hasn't been cleaned up).
> I meant something like [0].

OK, but instead of adding kern_seg, I want to change the interface to:

| extern int load_other_segments(struct kimage *image,
|		unsigned long kernel_load_addr, unsigned long kernel_size,
|		char *initrd, unsigned long initrd_len,
|		char *cmdline, unsigned long cmdline_len);

This way, we will in future be able to address an issue I mentioned in
my previous e-mail. (If we support vmlinux, the kernel occupies two segments
for text and data, respectively.)

Thanks,
-Takahiro AKASHI


> 
> Thanks,
> 
> James
> 
> 
> [0] a diff is worth a thousand words:
> --------------------%<--------------------
> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_
> kexec_file.c
> index 762f9102899c..c50ce844f09e 100644
> --- a/arch/arm64/kernel/machine_kexec_file.c
> +++ b/arch/arm64/kernel/machine_kexec_file.c
> @@ -325,11 +325,10 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
>         return ret;
>  }
> 
> -int load_other_segments(struct kimage *image,
> +int load_other_segments(struct kimage *image, struct kexec_segment *kern_seg,
>                         char *initrd, unsigned long initrd_len,
>                         char *cmdline, unsigned long cmdline_len)
>  {
> -       struct kexec_segment *kern_seg;
>         struct kexec_buf kbuf;
>         void *hdrs_addr;
>         unsigned long hdrs_sz;
> @@ -368,7 +367,6 @@ int load_other_segments(struct kimage *image,
>                                  image->arch.elf_load_addr, hdrs_sz, hdrs_sz);
>         }
> 
> -       kern_seg = &image->segment[image->arch.kern_segment];
>         kbuf.image = image;
>         /* not allocate anything below the kernel */
>         kbuf.buf_min = kern_seg->mem + kern_seg->memsz;
> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
> index 891f2484969d..085cb69293ca 100644
> --- a/arch/arm64/include/asm/kexec.h
> +++ b/arch/arm64/include/asm/kexec.h
> @@ -173,8 +172,10 @@ static inline int arm64_header_check_pe_sig(const struct ar
> m64_image_header *h)
>  extern const struct kexec_file_ops kexec_image_ops;
> 
>  struct kimage;
> +struct kexec_segment;
> 
>  extern int load_other_segments(struct kimage *image,
> +               struct kexec_segment *kern_seg,
>                 char *initrd, unsigned long initrd_len,
>                 char *cmdline, unsigned long cmdline_len);
>  #endif
> diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
> index 7c11beefe65f..0e032d30a79c 100644
> --- a/arch/arm64/kernel/kexec_image.c
> +++ b/arch/arm64/kernel/kexec_image.c
> @@ -37,6 +37,7 @@ static void *image_load(struct kimage *image,
>                                 char *cmdline, unsigned long cmdline_len)
>  {
>         struct kexec_buf kbuf;
> +       struct kexec_segment *kern_seg;
>         struct arm64_image_header *h = (struct arm64_image_header *)kernel;
>         unsigned long text_offset;
>         int ret;
> @@ -65,17 +66,17 @@ static void *image_load(struct kimage *image,
>         if (ret)
>                 goto out;
> 
> -       image->arch.kern_segment = image->nr_segments - 1;
> -       image->segment[image->arch.kern_segment].mem += text_offset;
> -       image->segment[image->arch.kern_segment].memsz -= text_offset;
> -       image->start = image->segment[image->arch.kern_segment].mem;
> +       kern_seg = &image->segment[image->nr_segments - 1];
> +       kern_seg->mem += text_offset;
> +       kern_seg->memsz -= text_offset;
> +       image->start = kern_seg->mem;
> 
>         pr_debug("Loaded kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> -                               image->segment[image->arch.kern_segment].mem,
> +                               kern_seg->mem,
>                                 kbuf.bufsz, kbuf.memsz);
> 
>         /* Load additional data */
> -       ret = load_other_segments(image, initrd, initrd_len,
> +       ret = load_other_segments(image, kern_seg, initrd, initrd_len,
>                                 cmdline, cmdline_len);
> 
>  out:
> --------------------%<--------------------

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 03/11] arm64: kexec_file: invoke the kernel without purgatory
  2018-05-15  4:45           ` AKASHI Takahiro
  (?)
@ 2018-05-15 16:15             ` James Morse
  -1 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-15 16:15 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

Hi Akashi,

On 15/05/18 05:45, AKASHI Takahiro wrote:
> On Fri, May 11, 2018 at 06:03:49PM +0100, James Morse wrote:
>> On 07/05/18 06:22, AKASHI Takahiro wrote:
>>> On Tue, May 01, 2018 at 06:46:06PM +0100, James Morse wrote:
>>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>>>> diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
>>>>> index f76ea92dff91..f7dbba00be10 100644
>>>>> --- a/arch/arm64/kernel/machine_kexec.c
>>>>> +++ b/arch/arm64/kernel/machine_kexec.c
>>>>> @@ -205,10 +205,17 @@ void machine_kexec(struct kimage *kimage)
>>
>>>>>  	cpu_soft_restart(kimage != kexec_crash_image,
>>>>> -		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
>>>>> +		reboot_code_buffer_phys, kimage->head, kimage->start,
>>>>> +#ifdef CONFIG_KEXEC_FILE
>>>>> +				kimage->purgatory_info.purgatory_buf ?
>>>>> +						0 : kimage->arch.dtb_mem);
>>>>> +#else
>>>>> +				0);
>>>>> +#endif
>>
>>
>>>> purgatory_buf seems to only be set in kexec_purgatory_setup_kbuf(), called from
>>>> kexec_load_purgatory(), which we don't use. How does this get a value?
>>>>
>>>> Would it be better to always use kimage->arch.dtb_mem, and ensure that is 0 for
>>>> regular kexec (as we can't know where the dtb is)? (image_arg may then be a
>>>> better name).
>>>
>>> The problem is arch.dtb_mem is currently defined only if CONFIG_KEXEC_FILE.
>>
>> I thought it was ARCH_HAS_KIMAGE_ARCH, which we can define all the time if
>> that's what we want.
>>
>>
>>> So I would like to
>>> - merge this patch with patch#8
>>> - change the condition
>>>         #ifdef CONFIG_KEXEC_FILE
>>>        				kimage->file_mode ? kimage->arch.dtb_mem : 0);
>>>         #else
>>>         			0);
>>>         #endif
>>
>> If we can avoid even this #ifdef by always having kimage->arch, I'd prefer that.
>> If we do that 'dtb_mem' would need some thing that indicates its for kexec_file,
>> as kexec has a DTB too, we just don't know where it is...
> 
> OK, but I want to have a minimum of kexec.arch always exist.

I'm curious, why? Its 32bytes that is allocated a maximum of twice.

(my questions on what needs to go in there were because it looked like a third
user was missing...)


> How about this?
>
> | struct kimage_arch {
> | 	phys_addr_t dtb_mem;
> | #ifdef CONFIG_KEXEC_FILE

#ifdef in structs just breeds more #ifdefs, as the code that accesses those
members has to be behind the same set of conditions.

Given this, I prefer the #ifdefs around cpu_soft_restart() as it doesn't force
us to add more #ifdefs later.

For either option without purgatory_info:
Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 03/11] arm64: kexec_file: invoke the kernel without purgatory
@ 2018-05-15 16:15             ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-15 16:15 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Akashi,

On 15/05/18 05:45, AKASHI Takahiro wrote:
> On Fri, May 11, 2018 at 06:03:49PM +0100, James Morse wrote:
>> On 07/05/18 06:22, AKASHI Takahiro wrote:
>>> On Tue, May 01, 2018 at 06:46:06PM +0100, James Morse wrote:
>>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>>>> diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
>>>>> index f76ea92dff91..f7dbba00be10 100644
>>>>> --- a/arch/arm64/kernel/machine_kexec.c
>>>>> +++ b/arch/arm64/kernel/machine_kexec.c
>>>>> @@ -205,10 +205,17 @@ void machine_kexec(struct kimage *kimage)
>>
>>>>>  	cpu_soft_restart(kimage != kexec_crash_image,
>>>>> -		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
>>>>> +		reboot_code_buffer_phys, kimage->head, kimage->start,
>>>>> +#ifdef CONFIG_KEXEC_FILE
>>>>> +				kimage->purgatory_info.purgatory_buf ?
>>>>> +						0 : kimage->arch.dtb_mem);
>>>>> +#else
>>>>> +				0);
>>>>> +#endif
>>
>>
>>>> purgatory_buf seems to only be set in kexec_purgatory_setup_kbuf(), called from
>>>> kexec_load_purgatory(), which we don't use. How does this get a value?
>>>>
>>>> Would it be better to always use kimage->arch.dtb_mem, and ensure that is 0 for
>>>> regular kexec (as we can't know where the dtb is)? (image_arg may then be a
>>>> better name).
>>>
>>> The problem is arch.dtb_mem is currently defined only if CONFIG_KEXEC_FILE.
>>
>> I thought it was ARCH_HAS_KIMAGE_ARCH, which we can define all the time if
>> that's what we want.
>>
>>
>>> So I would like to
>>> - merge this patch with patch#8
>>> - change the condition
>>>         #ifdef CONFIG_KEXEC_FILE
>>>        				kimage->file_mode ? kimage->arch.dtb_mem : 0);
>>>         #else
>>>         			0);
>>>         #endif
>>
>> If we can avoid even this #ifdef by always having kimage->arch, I'd prefer that.
>> If we do that 'dtb_mem' would need some thing that indicates its for kexec_file,
>> as kexec has a DTB too, we just don't know where it is...
> 
> OK, but I want to have a minimum of kexec.arch always exist.

I'm curious, why? Its 32bytes that is allocated a maximum of twice.

(my questions on what needs to go in there were because it looked like a third
user was missing...)


> How about this?
>
> | struct kimage_arch {
> | 	phys_addr_t dtb_mem;
> | #ifdef CONFIG_KEXEC_FILE

#ifdef in structs just breeds more #ifdefs, as the code that accesses those
members has to be behind the same set of conditions.

Given this, I prefer the #ifdefs around cpu_soft_restart() as it doesn't force
us to add more #ifdefs later.

For either option without purgatory_info:
Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 03/11] arm64: kexec_file: invoke the kernel without purgatory
@ 2018-05-15 16:15             ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-15 16:15 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

Hi Akashi,

On 15/05/18 05:45, AKASHI Takahiro wrote:
> On Fri, May 11, 2018 at 06:03:49PM +0100, James Morse wrote:
>> On 07/05/18 06:22, AKASHI Takahiro wrote:
>>> On Tue, May 01, 2018 at 06:46:06PM +0100, James Morse wrote:
>>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>>>> diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
>>>>> index f76ea92dff91..f7dbba00be10 100644
>>>>> --- a/arch/arm64/kernel/machine_kexec.c
>>>>> +++ b/arch/arm64/kernel/machine_kexec.c
>>>>> @@ -205,10 +205,17 @@ void machine_kexec(struct kimage *kimage)
>>
>>>>>  	cpu_soft_restart(kimage != kexec_crash_image,
>>>>> -		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
>>>>> +		reboot_code_buffer_phys, kimage->head, kimage->start,
>>>>> +#ifdef CONFIG_KEXEC_FILE
>>>>> +				kimage->purgatory_info.purgatory_buf ?
>>>>> +						0 : kimage->arch.dtb_mem);
>>>>> +#else
>>>>> +				0);
>>>>> +#endif
>>
>>
>>>> purgatory_buf seems to only be set in kexec_purgatory_setup_kbuf(), called from
>>>> kexec_load_purgatory(), which we don't use. How does this get a value?
>>>>
>>>> Would it be better to always use kimage->arch.dtb_mem, and ensure that is 0 for
>>>> regular kexec (as we can't know where the dtb is)? (image_arg may then be a
>>>> better name).
>>>
>>> The problem is arch.dtb_mem is currently defined only if CONFIG_KEXEC_FILE.
>>
>> I thought it was ARCH_HAS_KIMAGE_ARCH, which we can define all the time if
>> that's what we want.
>>
>>
>>> So I would like to
>>> - merge this patch with patch#8
>>> - change the condition
>>>         #ifdef CONFIG_KEXEC_FILE
>>>        				kimage->file_mode ? kimage->arch.dtb_mem : 0);
>>>         #else
>>>         			0);
>>>         #endif
>>
>> If we can avoid even this #ifdef by always having kimage->arch, I'd prefer that.
>> If we do that 'dtb_mem' would need some thing that indicates its for kexec_file,
>> as kexec has a DTB too, we just don't know where it is...
> 
> OK, but I want to have a minimum of kexec.arch always exist.

I'm curious, why? Its 32bytes that is allocated a maximum of twice.

(my questions on what needs to go in there were because it looked like a third
user was missing...)


> How about this?
>
> | struct kimage_arch {
> | 	phys_addr_t dtb_mem;
> | #ifdef CONFIG_KEXEC_FILE

#ifdef in structs just breeds more #ifdefs, as the code that accesses those
members has to be behind the same set of conditions.

Given this, I prefer the #ifdefs around cpu_soft_restart() as it doesn't force
us to add more #ifdefs later.

For either option without purgatory_info:
Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
  2018-05-15  4:35         ` AKASHI Takahiro
  (?)
@ 2018-05-15 16:17           ` James Morse
  -1 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-15 16:17 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

Hi Akashi,

On 15/05/18 05:35, AKASHI Takahiro wrote:
> On Mon, May 07, 2018 at 02:59:07PM +0900, AKASHI Takahiro wrote:
>> On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>>> We need to prevent firmware-reserved memory regions, particularly EFI
>>>> memory map as well as ACPI tables, from being corrupted by loading
>>>> kernel/initrd (or other kexec buffers). We also want to support memory
>>>> allocation in top-down manner in addition to default bottom-up.
>>>> So let's have arm64 specific arch_kexec_walk_mem() which will search
>>>> for available memory ranges in usable memblock list,
>>>> i.e. !NOMAP & !reserved, 
>>>
>>>> instead of system resource tree.
>>>
>>> Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
>>> be safe in the EFI-memory-map/ACPI-tables case?
>>>
>>> It would be good to avoid having two ways of doing this, and I would like to
>>> avoid having extra arch code...
>>
>> I know what you mean.
>> /proc/iomem or system resource is, in my opinion, not the best place to
>> describe memory usage of kernel but rather to describe *physical* hardware
>> layout. As we are still discussing about "reserved" memory, I don't want
>> to depend on it.

I agree. We have funny stuff that isn't hardware-layout, but is important for
the next boot. The kernel doesn't have an ABI to support when it queries the
list itself.


>> Along with memblock list, we will have more accurate control over memory
>> usage.

>>> If the argument is walking memblock gives a better answer than the stringy
>>> walk_system_ram_res() thing, is there any mileage in moving this code into
>>> kexec_file.c, and using it if !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)?
>>>
>>> This would save arm64/powerpc having near-identical implementations.
>>> 32bit arm keeps memblock if it has kexec, so it may be useful there too if
>>> kexec_file_load() support is added.

> If you don't have further objection, I will take memblock approach
> (with factoring out powerpc's arch_kexec_walk_mem()).

If we're agreed that the memblock walking is generic, then it would be quicker
to make the arm64 version as close as possible and merge them as a later series.
(saves a cross arch dependency)

With that,
Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
@ 2018-05-15 16:17           ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-15 16:17 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Akashi,

On 15/05/18 05:35, AKASHI Takahiro wrote:
> On Mon, May 07, 2018 at 02:59:07PM +0900, AKASHI Takahiro wrote:
>> On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>>> We need to prevent firmware-reserved memory regions, particularly EFI
>>>> memory map as well as ACPI tables, from being corrupted by loading
>>>> kernel/initrd (or other kexec buffers). We also want to support memory
>>>> allocation in top-down manner in addition to default bottom-up.
>>>> So let's have arm64 specific arch_kexec_walk_mem() which will search
>>>> for available memory ranges in usable memblock list,
>>>> i.e. !NOMAP & !reserved, 
>>>
>>>> instead of system resource tree.
>>>
>>> Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
>>> be safe in the EFI-memory-map/ACPI-tables case?
>>>
>>> It would be good to avoid having two ways of doing this, and I would like to
>>> avoid having extra arch code...
>>
>> I know what you mean.
>> /proc/iomem or system resource is, in my opinion, not the best place to
>> describe memory usage of kernel but rather to describe *physical* hardware
>> layout. As we are still discussing about "reserved" memory, I don't want
>> to depend on it.

I agree. We have funny stuff that isn't hardware-layout, but is important for
the next boot. The kernel doesn't have an ABI to support when it queries the
list itself.


>> Along with memblock list, we will have more accurate control over memory
>> usage.

>>> If the argument is walking memblock gives a better answer than the stringy
>>> walk_system_ram_res() thing, is there any mileage in moving this code into
>>> kexec_file.c, and using it if !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)?
>>>
>>> This would save arm64/powerpc having near-identical implementations.
>>> 32bit arm keeps memblock if it has kexec, so it may be useful there too if
>>> kexec_file_load() support is added.

> If you don't have further objection, I will take memblock approach
> (with factoring out powerpc's arch_kexec_walk_mem()).

If we're agreed that the memblock walking is generic, then it would be quicker
to make the arm64 version as close as possible and merge them as a later series.
(saves a cross arch dependency)

With that,
Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
@ 2018-05-15 16:17           ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-15 16:17 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

Hi Akashi,

On 15/05/18 05:35, AKASHI Takahiro wrote:
> On Mon, May 07, 2018 at 02:59:07PM +0900, AKASHI Takahiro wrote:
>> On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>>> We need to prevent firmware-reserved memory regions, particularly EFI
>>>> memory map as well as ACPI tables, from being corrupted by loading
>>>> kernel/initrd (or other kexec buffers). We also want to support memory
>>>> allocation in top-down manner in addition to default bottom-up.
>>>> So let's have arm64 specific arch_kexec_walk_mem() which will search
>>>> for available memory ranges in usable memblock list,
>>>> i.e. !NOMAP & !reserved, 
>>>
>>>> instead of system resource tree.
>>>
>>> Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
>>> be safe in the EFI-memory-map/ACPI-tables case?
>>>
>>> It would be good to avoid having two ways of doing this, and I would like to
>>> avoid having extra arch code...
>>
>> I know what you mean.
>> /proc/iomem or system resource is, in my opinion, not the best place to
>> describe memory usage of kernel but rather to describe *physical* hardware
>> layout. As we are still discussing about "reserved" memory, I don't want
>> to depend on it.

I agree. We have funny stuff that isn't hardware-layout, but is important for
the next boot. The kernel doesn't have an ABI to support when it queries the
list itself.


>> Along with memblock list, we will have more accurate control over memory
>> usage.

>>> If the argument is walking memblock gives a better answer than the stringy
>>> walk_system_ram_res() thing, is there any mileage in moving this code into
>>> kexec_file.c, and using it if !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)?
>>>
>>> This would save arm64/powerpc having near-identical implementations.
>>> 32bit arm keeps memblock if it has kexec, so it may be useful there too if
>>> kexec_file_load() support is added.

> If you don't have further objection, I will take memblock approach
> (with factoring out powerpc's arch_kexec_walk_mem()).

If we're agreed that the memblock walking is generic, then it would be quicker
to make the arm64 version as close as possible and merge them as a later series.
(saves a cross arch dependency)

With that,
Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 05/11] arm64: kexec_file: load initrd and device-tree
  2018-04-25  6:26   ` AKASHI Takahiro
  (?)
@ 2018-05-15 16:20     ` James Morse
  -1 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-15 16:20 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

Hi Akashi,

On 25/04/18 07:26, AKASHI Takahiro wrote:
> load_other_segments() is expected to allocate and place all the necessary
> memory segments other than kernel, including initrd and device-tree
> blob (and elf core header for crash).
> While most of the code was borrowed from kexec-tools' counterpart,
> users may not be allowed to specify dtb explicitly, instead, the dtb
> presented by a boot loader is reused.

(Nit: "a boot loader" -> "the original boot loader")

> arch_kimage_kernel_post_load_cleanup() is responsible for freeing arm64-
> specific data allocated in load_other_segments().


> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> index f9ebf54ca247..b3b9b1725d8a 100644
> --- a/arch/arm64/kernel/machine_kexec_file.c
> +++ b/arch/arm64/kernel/machine_kexec_file.c
> @@ -13,7 +13,26 @@
>  #include <linux/ioport.h>
>  #include <linux/kernel.h>
>  #include <linux/kexec.h>
> +#include <linux/libfdt.h>
>  #include <linux/memblock.h>
> +#include <linux/of_fdt.h>
> +#include <linux/types.h>
> +#include <asm/byteorder.h>
> +
> +static int __dt_root_addr_cells;
> +static int __dt_root_size_cells;

> @@ -55,3 +74,144 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
>  
>  	return ret;
>  }
> +
> +static int setup_dtb(struct kimage *image,
> +		unsigned long initrd_load_addr, unsigned long initrd_len,
> +		char *cmdline, unsigned long cmdline_len,
> +		char **dtb_buf, size_t *dtb_buf_len)
> +{
> +	char *buf = NULL;
> +	size_t buf_size;
> +	int nodeoffset;
> +	u64 value;
> +	int range_len;
> +	int ret;
> +
> +	/* duplicate dt blob */
> +	buf_size = fdt_totalsize(initial_boot_params);
> +	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);

These two cells values are 0 here. Did you want
arch_kexec_file_init() in patch 7 in this patch?

Ah, range_len isn't used, so, did you want the cells values and this range_len
thing in in patch 7!?


> +
> +	if (initrd_load_addr)
> +		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
> +				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
> +
> +	if (cmdline)
> +		buf_size += fdt_prop_len("bootargs", cmdline_len + 1);

I can't find where fdt_prop_len() .... oh, patch 7. fdt_prop_len() doesn't look
like the sort of thing that should be created here, but I agree there isn't an
existing API to do this.

(This must be why powerpc guesses that the fdt won't be more than double in size).


> +	buf = vmalloc(buf_size);
> +	if (!buf) {
> +		ret = -ENOMEM;
> +		goto out_err;
> +	}
> +
> +	ret = fdt_open_into(initial_boot_params, buf, buf_size);
> +	if (ret)
> +		goto out_err;
> +
> +	nodeoffset = fdt_path_offset(buf, "/chosen");
> +	if (nodeoffset < 0)
> +		goto out_err;
> +
> +	/* add bootargs */
> +	if (cmdline) {
> +		ret = fdt_setprop(buf, nodeoffset, "bootargs",
> +						cmdline, cmdline_len + 1);

fdt_setprop_string()?


> +		if (ret)
> +			goto out_err;
> +	}
> +
> +	/* add initrd-* */
> +	if (initrd_load_addr) {
> +		value = cpu_to_fdt64(initrd_load_addr);
> +		ret = fdt_setprop(buf, nodeoffset, "linux,initrd-start",
> +				&value, sizeof(value));

sizeof(value) was assumed to be the same as sizeof(u64) earlier.
fdt_setprop_u64()?


> +		if (ret)
> +			goto out_err;
> +
> +		value = cpu_to_fdt64(initrd_load_addr + initrd_len);
> +		ret = fdt_setprop(buf, nodeoffset, "linux,initrd-end",
> +				&value, sizeof(value));
> +		if (ret)
> +			goto out_err;
> +	}
> +
> +	/* trim a buffer */
> +	fdt_pack(buf);
> +	*dtb_buf = buf;
> +	*dtb_buf_len = fdt_totalsize(buf);
> +
> +	return 0;
> +
> +out_err:
> +	vfree(buf);
> +	return ret;
> +}

While powerpc has some similar code for updating the initrd and cmdline, it
makes different assumptions about the size of the dt, and has different behavior
for memreserve. (looks like we don't expect the initramfs to be memreserved).
Lets leave unifying that stuff where possible for the future.


> +int load_other_segments(struct kimage *image,
> +			char *initrd, unsigned long initrd_len,
> +			char *cmdline, unsigned long cmdline_len)
> +{
> +	struct kexec_segment *kern_seg;
> +	struct kexec_buf kbuf;
> +	unsigned long initrd_load_addr = 0;
> +	char *dtb = NULL;
> +	unsigned long dtb_len = 0;
> +	int ret = 0;
> +
> +	kern_seg = &image->segment[image->arch.kern_segment];
> +	kbuf.image = image;
> +	/* not allocate anything below the kernel */
> +	kbuf.buf_min = kern_seg->mem + kern_seg->memsz;

> +	/* load initrd */
> +	if (initrd) {
> +		kbuf.buffer = initrd;
> +		kbuf.bufsz = initrd_len;
> +		kbuf.memsz = initrd_len;

> +		kbuf.buf_align = 0;

I'm surprised there initrd has no alignment requirement, but kexec_add_buffer()
rounds this up to PAGE_SIZE.


> +		/* within 1GB-aligned window of up to 32GB in size */
> +		kbuf.buf_max = round_down(kern_seg->mem, SZ_1G)
> +						+ (unsigned long)SZ_1G * 32;
> +		kbuf.top_down = false;
> +
> +		ret = kexec_add_buffer(&kbuf);
> +		if (ret)
> +			goto out_err;
> +		initrd_load_addr = kbuf.mem;
> +
> +		pr_debug("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> +				initrd_load_addr, initrd_len, initrd_len);
> +	}
> +
> +	/* load dtb blob */
> +	ret = setup_dtb(image, initrd_load_addr, initrd_len,
> +				cmdline, cmdline_len, &dtb, &dtb_len);
> +	if (ret) {
> +		pr_err("Preparing for new dtb failed\n");
> +		goto out_err;
> +	}
> +
> +	kbuf.buffer = dtb;
> +	kbuf.bufsz = dtb_len;
> +	kbuf.memsz = dtb_len;
> +	/* not across 2MB boundary */
> +	kbuf.buf_align = SZ_2M;
> +	kbuf.buf_max = ULONG_MAX;
> +	kbuf.top_down = true;
> +
> +	ret = kexec_add_buffer(&kbuf);
> +	if (ret)
> +		goto out_err;
> +	image->arch.dtb_mem = kbuf.mem;
> +	image->arch.dtb_buf = dtb;
> +
> +	pr_debug("Loaded dtb at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> +			kbuf.mem, dtb_len, dtb_len);
> +
> +	return 0;
> +
> +out_err:
> +	vfree(dtb);
> +	image->arch.dtb_buf = NULL;

Won't kimage_file_post_load_cleanup() always be called if we return an error
here? Why not leave the free()ing until then?


> +	return ret;
> +}



Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 05/11] arm64: kexec_file: load initrd and device-tree
@ 2018-05-15 16:20     ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-15 16:20 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Akashi,

On 25/04/18 07:26, AKASHI Takahiro wrote:
> load_other_segments() is expected to allocate and place all the necessary
> memory segments other than kernel, including initrd and device-tree
> blob (and elf core header for crash).
> While most of the code was borrowed from kexec-tools' counterpart,
> users may not be allowed to specify dtb explicitly, instead, the dtb
> presented by a boot loader is reused.

(Nit: "a boot loader" -> "the original boot loader")

> arch_kimage_kernel_post_load_cleanup() is responsible for freeing arm64-
> specific data allocated in load_other_segments().


> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> index f9ebf54ca247..b3b9b1725d8a 100644
> --- a/arch/arm64/kernel/machine_kexec_file.c
> +++ b/arch/arm64/kernel/machine_kexec_file.c
> @@ -13,7 +13,26 @@
>  #include <linux/ioport.h>
>  #include <linux/kernel.h>
>  #include <linux/kexec.h>
> +#include <linux/libfdt.h>
>  #include <linux/memblock.h>
> +#include <linux/of_fdt.h>
> +#include <linux/types.h>
> +#include <asm/byteorder.h>
> +
> +static int __dt_root_addr_cells;
> +static int __dt_root_size_cells;

> @@ -55,3 +74,144 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
>  
>  	return ret;
>  }
> +
> +static int setup_dtb(struct kimage *image,
> +		unsigned long initrd_load_addr, unsigned long initrd_len,
> +		char *cmdline, unsigned long cmdline_len,
> +		char **dtb_buf, size_t *dtb_buf_len)
> +{
> +	char *buf = NULL;
> +	size_t buf_size;
> +	int nodeoffset;
> +	u64 value;
> +	int range_len;
> +	int ret;
> +
> +	/* duplicate dt blob */
> +	buf_size = fdt_totalsize(initial_boot_params);
> +	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);

These two cells values are 0 here. Did you want
arch_kexec_file_init() in patch 7 in this patch?

Ah, range_len isn't used, so, did you want the cells values and this range_len
thing in in patch 7!?


> +
> +	if (initrd_load_addr)
> +		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
> +				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
> +
> +	if (cmdline)
> +		buf_size += fdt_prop_len("bootargs", cmdline_len + 1);

I can't find where fdt_prop_len() .... oh, patch 7. fdt_prop_len() doesn't look
like the sort of thing that should be created here, but I agree there isn't an
existing API to do this.

(This must be why powerpc guesses that the fdt won't be more than double in size).


> +	buf = vmalloc(buf_size);
> +	if (!buf) {
> +		ret = -ENOMEM;
> +		goto out_err;
> +	}
> +
> +	ret = fdt_open_into(initial_boot_params, buf, buf_size);
> +	if (ret)
> +		goto out_err;
> +
> +	nodeoffset = fdt_path_offset(buf, "/chosen");
> +	if (nodeoffset < 0)
> +		goto out_err;
> +
> +	/* add bootargs */
> +	if (cmdline) {
> +		ret = fdt_setprop(buf, nodeoffset, "bootargs",
> +						cmdline, cmdline_len + 1);

fdt_setprop_string()?


> +		if (ret)
> +			goto out_err;
> +	}
> +
> +	/* add initrd-* */
> +	if (initrd_load_addr) {
> +		value = cpu_to_fdt64(initrd_load_addr);
> +		ret = fdt_setprop(buf, nodeoffset, "linux,initrd-start",
> +				&value, sizeof(value));

sizeof(value) was assumed to be the same as sizeof(u64) earlier.
fdt_setprop_u64()?


> +		if (ret)
> +			goto out_err;
> +
> +		value = cpu_to_fdt64(initrd_load_addr + initrd_len);
> +		ret = fdt_setprop(buf, nodeoffset, "linux,initrd-end",
> +				&value, sizeof(value));
> +		if (ret)
> +			goto out_err;
> +	}
> +
> +	/* trim a buffer */
> +	fdt_pack(buf);
> +	*dtb_buf = buf;
> +	*dtb_buf_len = fdt_totalsize(buf);
> +
> +	return 0;
> +
> +out_err:
> +	vfree(buf);
> +	return ret;
> +}

While powerpc has some similar code for updating the initrd and cmdline, it
makes different assumptions about the size of the dt, and has different behavior
for memreserve. (looks like we don't expect the initramfs to be memreserved).
Lets leave unifying that stuff where possible for the future.


> +int load_other_segments(struct kimage *image,
> +			char *initrd, unsigned long initrd_len,
> +			char *cmdline, unsigned long cmdline_len)
> +{
> +	struct kexec_segment *kern_seg;
> +	struct kexec_buf kbuf;
> +	unsigned long initrd_load_addr = 0;
> +	char *dtb = NULL;
> +	unsigned long dtb_len = 0;
> +	int ret = 0;
> +
> +	kern_seg = &image->segment[image->arch.kern_segment];
> +	kbuf.image = image;
> +	/* not allocate anything below the kernel */
> +	kbuf.buf_min = kern_seg->mem + kern_seg->memsz;

> +	/* load initrd */
> +	if (initrd) {
> +		kbuf.buffer = initrd;
> +		kbuf.bufsz = initrd_len;
> +		kbuf.memsz = initrd_len;

> +		kbuf.buf_align = 0;

I'm surprised there initrd has no alignment requirement, but kexec_add_buffer()
rounds this up to PAGE_SIZE.


> +		/* within 1GB-aligned window of up to 32GB in size */
> +		kbuf.buf_max = round_down(kern_seg->mem, SZ_1G)
> +						+ (unsigned long)SZ_1G * 32;
> +		kbuf.top_down = false;
> +
> +		ret = kexec_add_buffer(&kbuf);
> +		if (ret)
> +			goto out_err;
> +		initrd_load_addr = kbuf.mem;
> +
> +		pr_debug("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> +				initrd_load_addr, initrd_len, initrd_len);
> +	}
> +
> +	/* load dtb blob */
> +	ret = setup_dtb(image, initrd_load_addr, initrd_len,
> +				cmdline, cmdline_len, &dtb, &dtb_len);
> +	if (ret) {
> +		pr_err("Preparing for new dtb failed\n");
> +		goto out_err;
> +	}
> +
> +	kbuf.buffer = dtb;
> +	kbuf.bufsz = dtb_len;
> +	kbuf.memsz = dtb_len;
> +	/* not across 2MB boundary */
> +	kbuf.buf_align = SZ_2M;
> +	kbuf.buf_max = ULONG_MAX;
> +	kbuf.top_down = true;
> +
> +	ret = kexec_add_buffer(&kbuf);
> +	if (ret)
> +		goto out_err;
> +	image->arch.dtb_mem = kbuf.mem;
> +	image->arch.dtb_buf = dtb;
> +
> +	pr_debug("Loaded dtb at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> +			kbuf.mem, dtb_len, dtb_len);
> +
> +	return 0;
> +
> +out_err:
> +	vfree(dtb);
> +	image->arch.dtb_buf = NULL;

Won't kimage_file_post_load_cleanup() always be called if we return an error
here? Why not leave the free()ing until then?


> +	return ret;
> +}



Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 05/11] arm64: kexec_file: load initrd and device-tree
@ 2018-05-15 16:20     ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-15 16:20 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

Hi Akashi,

On 25/04/18 07:26, AKASHI Takahiro wrote:
> load_other_segments() is expected to allocate and place all the necessary
> memory segments other than kernel, including initrd and device-tree
> blob (and elf core header for crash).
> While most of the code was borrowed from kexec-tools' counterpart,
> users may not be allowed to specify dtb explicitly, instead, the dtb
> presented by a boot loader is reused.

(Nit: "a boot loader" -> "the original boot loader")

> arch_kimage_kernel_post_load_cleanup() is responsible for freeing arm64-
> specific data allocated in load_other_segments().


> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> index f9ebf54ca247..b3b9b1725d8a 100644
> --- a/arch/arm64/kernel/machine_kexec_file.c
> +++ b/arch/arm64/kernel/machine_kexec_file.c
> @@ -13,7 +13,26 @@
>  #include <linux/ioport.h>
>  #include <linux/kernel.h>
>  #include <linux/kexec.h>
> +#include <linux/libfdt.h>
>  #include <linux/memblock.h>
> +#include <linux/of_fdt.h>
> +#include <linux/types.h>
> +#include <asm/byteorder.h>
> +
> +static int __dt_root_addr_cells;
> +static int __dt_root_size_cells;

> @@ -55,3 +74,144 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
>  
>  	return ret;
>  }
> +
> +static int setup_dtb(struct kimage *image,
> +		unsigned long initrd_load_addr, unsigned long initrd_len,
> +		char *cmdline, unsigned long cmdline_len,
> +		char **dtb_buf, size_t *dtb_buf_len)
> +{
> +	char *buf = NULL;
> +	size_t buf_size;
> +	int nodeoffset;
> +	u64 value;
> +	int range_len;
> +	int ret;
> +
> +	/* duplicate dt blob */
> +	buf_size = fdt_totalsize(initial_boot_params);
> +	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);

These two cells values are 0 here. Did you want
arch_kexec_file_init() in patch 7 in this patch?

Ah, range_len isn't used, so, did you want the cells values and this range_len
thing in in patch 7!?


> +
> +	if (initrd_load_addr)
> +		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
> +				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
> +
> +	if (cmdline)
> +		buf_size += fdt_prop_len("bootargs", cmdline_len + 1);

I can't find where fdt_prop_len() .... oh, patch 7. fdt_prop_len() doesn't look
like the sort of thing that should be created here, but I agree there isn't an
existing API to do this.

(This must be why powerpc guesses that the fdt won't be more than double in size).


> +	buf = vmalloc(buf_size);
> +	if (!buf) {
> +		ret = -ENOMEM;
> +		goto out_err;
> +	}
> +
> +	ret = fdt_open_into(initial_boot_params, buf, buf_size);
> +	if (ret)
> +		goto out_err;
> +
> +	nodeoffset = fdt_path_offset(buf, "/chosen");
> +	if (nodeoffset < 0)
> +		goto out_err;
> +
> +	/* add bootargs */
> +	if (cmdline) {
> +		ret = fdt_setprop(buf, nodeoffset, "bootargs",
> +						cmdline, cmdline_len + 1);

fdt_setprop_string()?


> +		if (ret)
> +			goto out_err;
> +	}
> +
> +	/* add initrd-* */
> +	if (initrd_load_addr) {
> +		value = cpu_to_fdt64(initrd_load_addr);
> +		ret = fdt_setprop(buf, nodeoffset, "linux,initrd-start",
> +				&value, sizeof(value));

sizeof(value) was assumed to be the same as sizeof(u64) earlier.
fdt_setprop_u64()?


> +		if (ret)
> +			goto out_err;
> +
> +		value = cpu_to_fdt64(initrd_load_addr + initrd_len);
> +		ret = fdt_setprop(buf, nodeoffset, "linux,initrd-end",
> +				&value, sizeof(value));
> +		if (ret)
> +			goto out_err;
> +	}
> +
> +	/* trim a buffer */
> +	fdt_pack(buf);
> +	*dtb_buf = buf;
> +	*dtb_buf_len = fdt_totalsize(buf);
> +
> +	return 0;
> +
> +out_err:
> +	vfree(buf);
> +	return ret;
> +}

While powerpc has some similar code for updating the initrd and cmdline, it
makes different assumptions about the size of the dt, and has different behavior
for memreserve. (looks like we don't expect the initramfs to be memreserved).
Lets leave unifying that stuff where possible for the future.


> +int load_other_segments(struct kimage *image,
> +			char *initrd, unsigned long initrd_len,
> +			char *cmdline, unsigned long cmdline_len)
> +{
> +	struct kexec_segment *kern_seg;
> +	struct kexec_buf kbuf;
> +	unsigned long initrd_load_addr = 0;
> +	char *dtb = NULL;
> +	unsigned long dtb_len = 0;
> +	int ret = 0;
> +
> +	kern_seg = &image->segment[image->arch.kern_segment];
> +	kbuf.image = image;
> +	/* not allocate anything below the kernel */
> +	kbuf.buf_min = kern_seg->mem + kern_seg->memsz;

> +	/* load initrd */
> +	if (initrd) {
> +		kbuf.buffer = initrd;
> +		kbuf.bufsz = initrd_len;
> +		kbuf.memsz = initrd_len;

> +		kbuf.buf_align = 0;

I'm surprised there initrd has no alignment requirement, but kexec_add_buffer()
rounds this up to PAGE_SIZE.


> +		/* within 1GB-aligned window of up to 32GB in size */
> +		kbuf.buf_max = round_down(kern_seg->mem, SZ_1G)
> +						+ (unsigned long)SZ_1G * 32;
> +		kbuf.top_down = false;
> +
> +		ret = kexec_add_buffer(&kbuf);
> +		if (ret)
> +			goto out_err;
> +		initrd_load_addr = kbuf.mem;
> +
> +		pr_debug("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> +				initrd_load_addr, initrd_len, initrd_len);
> +	}
> +
> +	/* load dtb blob */
> +	ret = setup_dtb(image, initrd_load_addr, initrd_len,
> +				cmdline, cmdline_len, &dtb, &dtb_len);
> +	if (ret) {
> +		pr_err("Preparing for new dtb failed\n");
> +		goto out_err;
> +	}
> +
> +	kbuf.buffer = dtb;
> +	kbuf.bufsz = dtb_len;
> +	kbuf.memsz = dtb_len;
> +	/* not across 2MB boundary */
> +	kbuf.buf_align = SZ_2M;
> +	kbuf.buf_max = ULONG_MAX;
> +	kbuf.top_down = true;
> +
> +	ret = kexec_add_buffer(&kbuf);
> +	if (ret)
> +		goto out_err;
> +	image->arch.dtb_mem = kbuf.mem;
> +	image->arch.dtb_buf = dtb;
> +
> +	pr_debug("Loaded dtb at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> +			kbuf.mem, dtb_len, dtb_len);
> +
> +	return 0;
> +
> +out_err:
> +	vfree(dtb);
> +	image->arch.dtb_buf = NULL;

Won't kimage_file_post_load_cleanup() always be called if we return an error
here? Why not leave the free()ing until then?


> +	return ret;
> +}



Thanks,

James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support
  2018-04-25  6:26   ` AKASHI Takahiro
  (?)
@ 2018-05-15 17:11     ` James Morse
  -1 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-15 17:11 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

Hi Akashi,

On 25/04/18 07:26, AKASHI Takahiro wrote:
> Enabling crash dump (kdump) includes
> * prepare contents of ELF header of a core dump file, /proc/vmcore,
>   using crash_prepare_elf64_headers(), and
> * add two device tree properties, "linux,usable-memory-range" and
>   "linux,elfcorehdr", which represent repsectively a memory range

(Nit: respectively)


>   to be used by crash dump kernel and the header's location

>  arch/arm64/include/asm/kexec.h         |   4 +
>  arch/arm64/kernel/kexec_image.c        |   9 +-
>  arch/arm64/kernel/machine_kexec_file.c | 202 +++++++++++++++++++++++++

In this patch, machine_kexec_file.c gains its own private fdt array encoder.


> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> index 37c0a9dc2e47..ec674f4d267c 100644
> --- a/arch/arm64/kernel/machine_kexec_file.c
> +++ b/arch/arm64/kernel/machine_kexec_file.c
> @@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
>  	return ret;
>  }
>  
> +static int __init arch_kexec_file_init(void)
> +{
> +	/* Those values are used later on loading the kernel */
> +	__dt_root_addr_cells = dt_root_addr_cells;
> +	__dt_root_size_cells = dt_root_size_cells;
> +
> +	return 0;
> +}
> +late_initcall(arch_kexec_file_init);

If we need these is it worth taking them out of __initdata? I note they've been
'temporary' for quite a long time.


> +
> +#define FDT_ALIGN(x, a)	(((x) + (a) - 1) & ~((a) - 1))
> +#define FDT_TAGALIGN(x)	(FDT_ALIGN((x), FDT_TAGSIZE))
> +
> +static int fdt_prop_len(const char *prop_name, int len)
> +{
> +	return (strlen(prop_name) + 1) +
> +		sizeof(struct fdt_property) +
> +		FDT_TAGALIGN(len);
> +}

This stuff should really be in libfdt.h  Those macros come from
libfdt_internal.h, so we're probably doing something wrong here.


> +static bool cells_size_fitted(unsigned long base, unsigned long size)
> +{
> +	/* if *_cells >= 2, cells can hold 64-bit values anyway */
> +	if ((__dt_root_addr_cells == 1) && (base >= (1ULL << 32)))
> +		return false;
> +
> +	if ((__dt_root_size_cells == 1) && (size >= (1ULL << 32)))
> +		return false;

Using '> U32_MAX' here may be more readable.


> +	return true;
> +}
> +
> +static void fill_property(void *buf, u64 val64, int cells)
> +{
> +	u32 val32;
> +
> +	if (cells == 1) {
> +		val32 = cpu_to_fdt32((u32)val64);
> +		memcpy(buf, &val32, sizeof(val32));
> +	} else {

> +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
> +		buf += cells * sizeof(u32) - sizeof(u64);

Is this trying to clear the 'top' cells and shuffle the pointer to point at the
'bottom' 2? I'm pretty sure this isn't endian safe.

Do we really expect a system to have #address-cells > 2?


> +		val64 = cpu_to_fdt64(val64);
> +		memcpy(buf, &val64, sizeof(val64));
> +	}
> +}
> +
> +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
> +				unsigned long addr, unsigned long size)

(the device-tree spec describes a 'ranges' property, which had me confused. This
is encoding a prop-encoded-array)

> +{
> +	void *buf, *prop;
> +	size_t buf_size;
> +	int result;
> +
> +	buf_size = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> +	prop = buf = vmalloc(buf_size);

virtual memory allocation for something less than PAGE_SIZE?


> +	if (!buf)
> +		return -ENOMEM;
> +
> +	fill_property(prop, addr, __dt_root_addr_cells);
> +	prop += __dt_root_addr_cells * sizeof(u32);
> +
> +	fill_property(prop, size, __dt_root_size_cells);
> +
> +	result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
> +
> +	vfree(buf);
> +
> +	return result;
> +}

Doesn't this stuff belong in libfdt? I guess there is no 'add array element' api
because this the first time we've wanted to create a node with more than
key=fixed-size-value.

I don't think this belongs in arch C code. Do we have a plan for getting libfdt
to support encoding prop-arrays? Can we put it somewhere anyone else duplicating
this will find it, until we can (re)move it?

I have no idea how that happens... it looks like the devicetree list is the
place to ask.


>  static int setup_dtb(struct kimage *image,
>  		unsigned long initrd_load_addr, unsigned long initrd_len,
>  		char *cmdline, unsigned long cmdline_len,
> @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
>  	int range_len;
>  	int ret;
>  
> +	/* check ranges against root's #address-cells and #size-cells */
> +	if (image->type == KEXEC_TYPE_CRASH &&
> +		(!cells_size_fitted(image->arch.elf_load_addr,
> +				image->arch.elf_headers_sz) ||
> +		 !cells_size_fitted(crashk_res.start,
> +				crashk_res.end - crashk_res.start + 1))) {
> +		pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
> +		ret = -EINVAL;
> +		goto out_err;
> +	}

To check I've understood this properly: This can happen if the firmware provided
a DTB with 32bit address/size cells, but at least some of the memory requires 64
bit address/size cells. This could only happen on a UEFI system where the
firmware-DTB doesn't describe memory. ACPI-only systems would have the EFIstub DT.


>  	/* duplicate dt blob */
>  	buf_size = fdt_totalsize(initial_boot_params);
>  	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
>  
> +	if (image->type == KEXEC_TYPE_CRASH)
> +		buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
> +				+ fdt_prop_len("linux,usable-memory-range",
> +								range_len);
> +
>  	if (initrd_load_addr)
>  		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
>  				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
> @@ -113,6 +206,23 @@ static int setup_dtb(struct kimage *image,
>  	if (nodeoffset < 0)
>  		goto out_err;
>  
> +	if (image->type == KEXEC_TYPE_CRASH) {
> +		/* add linux,elfcorehdr */
> +		ret = fdt_setprop_range(buf, nodeoffset, "linux,elfcorehdr",
> +				image->arch.elf_load_addr,
> +				image->arch.elf_headers_sz);
> +		if (ret)
> +			goto out_err;
> +
> +		/* add linux,usable-memory-range */
> +		ret = fdt_setprop_range(buf, nodeoffset,
> +				"linux,usable-memory-range",
> +				crashk_res.start,
> +				crashk_res.end - crashk_res.start + 1);

Don't you need to add "linux,usable-memory-range" to the buf_size estimate?


> +		if (ret)
> +			goto out_err;
> +	}

> @@ -148,17 +258,109 @@ static int setup_dtb(struct kimage *image,

> +static struct crash_mem *get_crash_memory_ranges(void)
> +{
> +	unsigned int nr_ranges;
> +	struct crash_mem *cmem;
> +
> +	nr_ranges = 1; /* for exclusion of crashkernel region */
> +	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ranges_callback);
> +
> +	cmem = vmalloc(sizeof(struct crash_mem) +
> +			sizeof(struct crash_mem_range) * nr_ranges);
> +	if (!cmem)
> +		return NULL;
> +
> +	cmem->max_nr_ranges = nr_ranges;
> +	cmem->nr_ranges = 0;
> +	walk_system_ram_res(0, -1, cmem, add_mem_range_callback);
> +
> +	/* Exclude crashkernel region */
> +	if (crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end)) {
> +		vfree(cmem);
> +		return NULL;
> +	}
> +
> +	return cmem;
> +}

Could this function be included in prepare_elf_headers() so that the alloc() and
free() occur together.


> +static int prepare_elf_headers(void **addr, unsigned long *sz)
> +{
> +	struct crash_mem *cmem;
> +	int ret = 0;
> +
> +	cmem = get_crash_memory_ranges();
> +	if (!cmem)
> +		return -ENOMEM;
> +
> +	ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
> +
> +	vfree(cmem);

> +	return ret;
> +}

All this is moving memory-range information from core-code's
walk_system_ram_res() into core-code's struct crash_mem, and excluding
crashk_res, which again is accessible to the core code.

It looks like this is duplicated in arch/x86 and arch/arm64 because arm64
doesn't have a second 'crashk_low_res' region, and always wants elf64, instead
of when IS_ENABLED(CONFIG_X86_64).
If we can abstract just those two, more of this could be moved to core code
where powerpc can make use of it if they want to support kdump with
kexec_file_load().

But, its getting late for cross-architecture dependencies, lets put that on the
for-later list. (assuming there isn't a powerpc-kdump series out there adding a
third copy of this)


Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-05-15 17:11     ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-15 17:11 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Akashi,

On 25/04/18 07:26, AKASHI Takahiro wrote:
> Enabling crash dump (kdump) includes
> * prepare contents of ELF header of a core dump file, /proc/vmcore,
>   using crash_prepare_elf64_headers(), and
> * add two device tree properties, "linux,usable-memory-range" and
>   "linux,elfcorehdr", which represent repsectively a memory range

(Nit: respectively)


>   to be used by crash dump kernel and the header's location

>  arch/arm64/include/asm/kexec.h         |   4 +
>  arch/arm64/kernel/kexec_image.c        |   9 +-
>  arch/arm64/kernel/machine_kexec_file.c | 202 +++++++++++++++++++++++++

In this patch, machine_kexec_file.c gains its own private fdt array encoder.


> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> index 37c0a9dc2e47..ec674f4d267c 100644
> --- a/arch/arm64/kernel/machine_kexec_file.c
> +++ b/arch/arm64/kernel/machine_kexec_file.c
> @@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
>  	return ret;
>  }
>  
> +static int __init arch_kexec_file_init(void)
> +{
> +	/* Those values are used later on loading the kernel */
> +	__dt_root_addr_cells = dt_root_addr_cells;
> +	__dt_root_size_cells = dt_root_size_cells;
> +
> +	return 0;
> +}
> +late_initcall(arch_kexec_file_init);

If we need these is it worth taking them out of __initdata? I note they've been
'temporary' for quite a long time.


> +
> +#define FDT_ALIGN(x, a)	(((x) + (a) - 1) & ~((a) - 1))
> +#define FDT_TAGALIGN(x)	(FDT_ALIGN((x), FDT_TAGSIZE))
> +
> +static int fdt_prop_len(const char *prop_name, int len)
> +{
> +	return (strlen(prop_name) + 1) +
> +		sizeof(struct fdt_property) +
> +		FDT_TAGALIGN(len);
> +}

This stuff should really be in libfdt.h  Those macros come from
libfdt_internal.h, so we're probably doing something wrong here.


> +static bool cells_size_fitted(unsigned long base, unsigned long size)
> +{
> +	/* if *_cells >= 2, cells can hold 64-bit values anyway */
> +	if ((__dt_root_addr_cells == 1) && (base >= (1ULL << 32)))
> +		return false;
> +
> +	if ((__dt_root_size_cells == 1) && (size >= (1ULL << 32)))
> +		return false;

Using '> U32_MAX' here may be more readable.


> +	return true;
> +}
> +
> +static void fill_property(void *buf, u64 val64, int cells)
> +{
> +	u32 val32;
> +
> +	if (cells == 1) {
> +		val32 = cpu_to_fdt32((u32)val64);
> +		memcpy(buf, &val32, sizeof(val32));
> +	} else {

> +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
> +		buf += cells * sizeof(u32) - sizeof(u64);

Is this trying to clear the 'top' cells and shuffle the pointer to point at the
'bottom' 2? I'm pretty sure this isn't endian safe.

Do we really expect a system to have #address-cells > 2?


> +		val64 = cpu_to_fdt64(val64);
> +		memcpy(buf, &val64, sizeof(val64));
> +	}
> +}
> +
> +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
> +				unsigned long addr, unsigned long size)

(the device-tree spec describes a 'ranges' property, which had me confused. This
is encoding a prop-encoded-array)

> +{
> +	void *buf, *prop;
> +	size_t buf_size;
> +	int result;
> +
> +	buf_size = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> +	prop = buf = vmalloc(buf_size);

virtual memory allocation for something less than PAGE_SIZE?


> +	if (!buf)
> +		return -ENOMEM;
> +
> +	fill_property(prop, addr, __dt_root_addr_cells);
> +	prop += __dt_root_addr_cells * sizeof(u32);
> +
> +	fill_property(prop, size, __dt_root_size_cells);
> +
> +	result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
> +
> +	vfree(buf);
> +
> +	return result;
> +}

Doesn't this stuff belong in libfdt? I guess there is no 'add array element' api
because this the first time we've wanted to create a node with more than
key=fixed-size-value.

I don't think this belongs in arch C code. Do we have a plan for getting libfdt
to support encoding prop-arrays? Can we put it somewhere anyone else duplicating
this will find it, until we can (re)move it?

I have no idea how that happens... it looks like the devicetree list is the
place to ask.


>  static int setup_dtb(struct kimage *image,
>  		unsigned long initrd_load_addr, unsigned long initrd_len,
>  		char *cmdline, unsigned long cmdline_len,
> @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
>  	int range_len;
>  	int ret;
>  
> +	/* check ranges against root's #address-cells and #size-cells */
> +	if (image->type == KEXEC_TYPE_CRASH &&
> +		(!cells_size_fitted(image->arch.elf_load_addr,
> +				image->arch.elf_headers_sz) ||
> +		 !cells_size_fitted(crashk_res.start,
> +				crashk_res.end - crashk_res.start + 1))) {
> +		pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
> +		ret = -EINVAL;
> +		goto out_err;
> +	}

To check I've understood this properly: This can happen if the firmware provided
a DTB with 32bit address/size cells, but at least some of the memory requires 64
bit address/size cells. This could only happen on a UEFI system where the
firmware-DTB doesn't describe memory. ACPI-only systems would have the EFIstub DT.


>  	/* duplicate dt blob */
>  	buf_size = fdt_totalsize(initial_boot_params);
>  	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
>  
> +	if (image->type == KEXEC_TYPE_CRASH)
> +		buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
> +				+ fdt_prop_len("linux,usable-memory-range",
> +								range_len);
> +
>  	if (initrd_load_addr)
>  		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
>  				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
> @@ -113,6 +206,23 @@ static int setup_dtb(struct kimage *image,
>  	if (nodeoffset < 0)
>  		goto out_err;
>  
> +	if (image->type == KEXEC_TYPE_CRASH) {
> +		/* add linux,elfcorehdr */
> +		ret = fdt_setprop_range(buf, nodeoffset, "linux,elfcorehdr",
> +				image->arch.elf_load_addr,
> +				image->arch.elf_headers_sz);
> +		if (ret)
> +			goto out_err;
> +
> +		/* add linux,usable-memory-range */
> +		ret = fdt_setprop_range(buf, nodeoffset,
> +				"linux,usable-memory-range",
> +				crashk_res.start,
> +				crashk_res.end - crashk_res.start + 1);

Don't you need to add "linux,usable-memory-range" to the buf_size estimate?


> +		if (ret)
> +			goto out_err;
> +	}

> @@ -148,17 +258,109 @@ static int setup_dtb(struct kimage *image,

> +static struct crash_mem *get_crash_memory_ranges(void)
> +{
> +	unsigned int nr_ranges;
> +	struct crash_mem *cmem;
> +
> +	nr_ranges = 1; /* for exclusion of crashkernel region */
> +	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ranges_callback);
> +
> +	cmem = vmalloc(sizeof(struct crash_mem) +
> +			sizeof(struct crash_mem_range) * nr_ranges);
> +	if (!cmem)
> +		return NULL;
> +
> +	cmem->max_nr_ranges = nr_ranges;
> +	cmem->nr_ranges = 0;
> +	walk_system_ram_res(0, -1, cmem, add_mem_range_callback);
> +
> +	/* Exclude crashkernel region */
> +	if (crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end)) {
> +		vfree(cmem);
> +		return NULL;
> +	}
> +
> +	return cmem;
> +}

Could this function be included in prepare_elf_headers() so that the alloc() and
free() occur together.


> +static int prepare_elf_headers(void **addr, unsigned long *sz)
> +{
> +	struct crash_mem *cmem;
> +	int ret = 0;
> +
> +	cmem = get_crash_memory_ranges();
> +	if (!cmem)
> +		return -ENOMEM;
> +
> +	ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
> +
> +	vfree(cmem);

> +	return ret;
> +}

All this is moving memory-range information from core-code's
walk_system_ram_res() into core-code's struct crash_mem, and excluding
crashk_res, which again is accessible to the core code.

It looks like this is duplicated in arch/x86 and arch/arm64 because arm64
doesn't have a second 'crashk_low_res' region, and always wants elf64, instead
of when IS_ENABLED(CONFIG_X86_64).
If we can abstract just those two, more of this could be moved to core code
where powerpc can make use of it if they want to support kdump with
kexec_file_load().

But, its getting late for cross-architecture dependencies, lets put that on the
for-later list. (assuming there isn't a powerpc-kdump series out there adding a
third copy of this)


Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-05-15 17:11     ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-15 17:11 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

Hi Akashi,

On 25/04/18 07:26, AKASHI Takahiro wrote:
> Enabling crash dump (kdump) includes
> * prepare contents of ELF header of a core dump file, /proc/vmcore,
>   using crash_prepare_elf64_headers(), and
> * add two device tree properties, "linux,usable-memory-range" and
>   "linux,elfcorehdr", which represent repsectively a memory range

(Nit: respectively)


>   to be used by crash dump kernel and the header's location

>  arch/arm64/include/asm/kexec.h         |   4 +
>  arch/arm64/kernel/kexec_image.c        |   9 +-
>  arch/arm64/kernel/machine_kexec_file.c | 202 +++++++++++++++++++++++++

In this patch, machine_kexec_file.c gains its own private fdt array encoder.


> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> index 37c0a9dc2e47..ec674f4d267c 100644
> --- a/arch/arm64/kernel/machine_kexec_file.c
> +++ b/arch/arm64/kernel/machine_kexec_file.c
> @@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
>  	return ret;
>  }
>  
> +static int __init arch_kexec_file_init(void)
> +{
> +	/* Those values are used later on loading the kernel */
> +	__dt_root_addr_cells = dt_root_addr_cells;
> +	__dt_root_size_cells = dt_root_size_cells;
> +
> +	return 0;
> +}
> +late_initcall(arch_kexec_file_init);

If we need these is it worth taking them out of __initdata? I note they've been
'temporary' for quite a long time.


> +
> +#define FDT_ALIGN(x, a)	(((x) + (a) - 1) & ~((a) - 1))
> +#define FDT_TAGALIGN(x)	(FDT_ALIGN((x), FDT_TAGSIZE))
> +
> +static int fdt_prop_len(const char *prop_name, int len)
> +{
> +	return (strlen(prop_name) + 1) +
> +		sizeof(struct fdt_property) +
> +		FDT_TAGALIGN(len);
> +}

This stuff should really be in libfdt.h  Those macros come from
libfdt_internal.h, so we're probably doing something wrong here.


> +static bool cells_size_fitted(unsigned long base, unsigned long size)
> +{
> +	/* if *_cells >= 2, cells can hold 64-bit values anyway */
> +	if ((__dt_root_addr_cells == 1) && (base >= (1ULL << 32)))
> +		return false;
> +
> +	if ((__dt_root_size_cells == 1) && (size >= (1ULL << 32)))
> +		return false;

Using '> U32_MAX' here may be more readable.


> +	return true;
> +}
> +
> +static void fill_property(void *buf, u64 val64, int cells)
> +{
> +	u32 val32;
> +
> +	if (cells == 1) {
> +		val32 = cpu_to_fdt32((u32)val64);
> +		memcpy(buf, &val32, sizeof(val32));
> +	} else {

> +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
> +		buf += cells * sizeof(u32) - sizeof(u64);

Is this trying to clear the 'top' cells and shuffle the pointer to point at the
'bottom' 2? I'm pretty sure this isn't endian safe.

Do we really expect a system to have #address-cells > 2?


> +		val64 = cpu_to_fdt64(val64);
> +		memcpy(buf, &val64, sizeof(val64));
> +	}
> +}
> +
> +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
> +				unsigned long addr, unsigned long size)

(the device-tree spec describes a 'ranges' property, which had me confused. This
is encoding a prop-encoded-array)

> +{
> +	void *buf, *prop;
> +	size_t buf_size;
> +	int result;
> +
> +	buf_size = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> +	prop = buf = vmalloc(buf_size);

virtual memory allocation for something less than PAGE_SIZE?


> +	if (!buf)
> +		return -ENOMEM;
> +
> +	fill_property(prop, addr, __dt_root_addr_cells);
> +	prop += __dt_root_addr_cells * sizeof(u32);
> +
> +	fill_property(prop, size, __dt_root_size_cells);
> +
> +	result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
> +
> +	vfree(buf);
> +
> +	return result;
> +}

Doesn't this stuff belong in libfdt? I guess there is no 'add array element' api
because this the first time we've wanted to create a node with more than
key=fixed-size-value.

I don't think this belongs in arch C code. Do we have a plan for getting libfdt
to support encoding prop-arrays? Can we put it somewhere anyone else duplicating
this will find it, until we can (re)move it?

I have no idea how that happens... it looks like the devicetree list is the
place to ask.


>  static int setup_dtb(struct kimage *image,
>  		unsigned long initrd_load_addr, unsigned long initrd_len,
>  		char *cmdline, unsigned long cmdline_len,
> @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
>  	int range_len;
>  	int ret;
>  
> +	/* check ranges against root's #address-cells and #size-cells */
> +	if (image->type == KEXEC_TYPE_CRASH &&
> +		(!cells_size_fitted(image->arch.elf_load_addr,
> +				image->arch.elf_headers_sz) ||
> +		 !cells_size_fitted(crashk_res.start,
> +				crashk_res.end - crashk_res.start + 1))) {
> +		pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
> +		ret = -EINVAL;
> +		goto out_err;
> +	}

To check I've understood this properly: This can happen if the firmware provided
a DTB with 32bit address/size cells, but at least some of the memory requires 64
bit address/size cells. This could only happen on a UEFI system where the
firmware-DTB doesn't describe memory. ACPI-only systems would have the EFIstub DT.


>  	/* duplicate dt blob */
>  	buf_size = fdt_totalsize(initial_boot_params);
>  	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
>  
> +	if (image->type == KEXEC_TYPE_CRASH)
> +		buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
> +				+ fdt_prop_len("linux,usable-memory-range",
> +								range_len);
> +
>  	if (initrd_load_addr)
>  		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
>  				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
> @@ -113,6 +206,23 @@ static int setup_dtb(struct kimage *image,
>  	if (nodeoffset < 0)
>  		goto out_err;
>  
> +	if (image->type == KEXEC_TYPE_CRASH) {
> +		/* add linux,elfcorehdr */
> +		ret = fdt_setprop_range(buf, nodeoffset, "linux,elfcorehdr",
> +				image->arch.elf_load_addr,
> +				image->arch.elf_headers_sz);
> +		if (ret)
> +			goto out_err;
> +
> +		/* add linux,usable-memory-range */
> +		ret = fdt_setprop_range(buf, nodeoffset,
> +				"linux,usable-memory-range",
> +				crashk_res.start,
> +				crashk_res.end - crashk_res.start + 1);

Don't you need to add "linux,usable-memory-range" to the buf_size estimate?


> +		if (ret)
> +			goto out_err;
> +	}

> @@ -148,17 +258,109 @@ static int setup_dtb(struct kimage *image,

> +static struct crash_mem *get_crash_memory_ranges(void)
> +{
> +	unsigned int nr_ranges;
> +	struct crash_mem *cmem;
> +
> +	nr_ranges = 1; /* for exclusion of crashkernel region */
> +	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ranges_callback);
> +
> +	cmem = vmalloc(sizeof(struct crash_mem) +
> +			sizeof(struct crash_mem_range) * nr_ranges);
> +	if (!cmem)
> +		return NULL;
> +
> +	cmem->max_nr_ranges = nr_ranges;
> +	cmem->nr_ranges = 0;
> +	walk_system_ram_res(0, -1, cmem, add_mem_range_callback);
> +
> +	/* Exclude crashkernel region */
> +	if (crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end)) {
> +		vfree(cmem);
> +		return NULL;
> +	}
> +
> +	return cmem;
> +}

Could this function be included in prepare_elf_headers() so that the alloc() and
free() occur together.


> +static int prepare_elf_headers(void **addr, unsigned long *sz)
> +{
> +	struct crash_mem *cmem;
> +	int ret = 0;
> +
> +	cmem = get_crash_memory_ranges();
> +	if (!cmem)
> +		return -ENOMEM;
> +
> +	ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
> +
> +	vfree(cmem);

> +	return ret;
> +}

All this is moving memory-range information from core-code's
walk_system_ram_res() into core-code's struct crash_mem, and excluding
crashk_res, which again is accessible to the core code.

It looks like this is duplicated in arch/x86 and arch/arm64 because arm64
doesn't have a second 'crashk_low_res' region, and always wants elf64, instead
of when IS_ENABLED(CONFIG_X86_64).
If we can abstract just those two, more of this could be moved to core code
where powerpc can make use of it if they want to support kdump with
kexec_file_load().

But, its getting late for cross-architecture dependencies, lets put that on the
for-later list. (assuming there isn't a powerpc-kdump series out there adding a
third copy of this)


Thanks,

James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support
  2018-04-25  6:26   ` AKASHI Takahiro
  (?)
@ 2018-05-15 17:12     ` James Morse
  -1 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-15 17:12 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel, devicetree, Rob Herring

Hi guys,

(CC: +RobH, devicetree list)

On 25/04/18 07:26, AKASHI Takahiro wrote:
> Enabling crash dump (kdump) includes
> * prepare contents of ELF header of a core dump file, /proc/vmcore,
>   using crash_prepare_elf64_headers(), and
> * add two device tree properties, "linux,usable-memory-range" and
>   "linux,elfcorehdr", which represent repsectively a memory range
>   to be used by crash dump kernel and the header's location

kexec_file_load() on arm64 needs to be able to create a prop encoded array to
the FDT, but there doesn't appear to be a libfdt helper to do this.

Akashi's code below adds fdt_setprop_range() to the arch code, and duplicates
bits of libfdt_internal.h to do the work.

How should this be done? I'm assuming this is something we need a new API in
libfdt.h for. How do these come about, and is there an interim step we can use
until then?

Thanks!

James

> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> index 37c0a9dc2e47..ec674f4d267c 100644
> --- a/arch/arm64/kernel/machine_kexec_file.c
> +++ b/arch/arm64/kernel/machine_kexec_file.c
> @@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
>  	return ret;
>  }
>  
> +static int __init arch_kexec_file_init(void)
> +{
> +	/* Those values are used later on loading the kernel */
> +	__dt_root_addr_cells = dt_root_addr_cells;
> +	__dt_root_size_cells = dt_root_size_cells;
> +
> +	return 0;
> +}
> +late_initcall(arch_kexec_file_init);
> +
> +#define FDT_ALIGN(x, a)	(((x) + (a) - 1) & ~((a) - 1))
> +#define FDT_TAGALIGN(x)	(FDT_ALIGN((x), FDT_TAGSIZE))
> +
> +static int fdt_prop_len(const char *prop_name, int len)
> +{
> +	return (strlen(prop_name) + 1) +
> +		sizeof(struct fdt_property) +
> +		FDT_TAGALIGN(len);
> +}
> +
> +static bool cells_size_fitted(unsigned long base, unsigned long size)
> +{
> +	/* if *_cells >= 2, cells can hold 64-bit values anyway */
> +	if ((__dt_root_addr_cells == 1) && (base >= (1ULL << 32)))
> +		return false;
> +
> +	if ((__dt_root_size_cells == 1) && (size >= (1ULL << 32)))
> +		return false;
> +
> +	return true;
> +}
> +
> +static void fill_property(void *buf, u64 val64, int cells)
> +{
> +	u32 val32;
> +
> +	if (cells == 1) {
> +		val32 = cpu_to_fdt32((u32)val64);
> +		memcpy(buf, &val32, sizeof(val32));
> +	} else {
> +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
> +		buf += cells * sizeof(u32) - sizeof(u64);
> +
> +		val64 = cpu_to_fdt64(val64);
> +		memcpy(buf, &val64, sizeof(val64));
> +	}
> +}
> +
> +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
> +				unsigned long addr, unsigned long size)
> +{
> +	void *buf, *prop;
> +	size_t buf_size;
> +	int result;
> +
> +	buf_size = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> +	prop = buf = vmalloc(buf_size);
> +	if (!buf)
> +		return -ENOMEM;
> +
> +	fill_property(prop, addr, __dt_root_addr_cells);
> +	prop += __dt_root_addr_cells * sizeof(u32);
> +
> +	fill_property(prop, size, __dt_root_size_cells);
> +
> +	result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
> +
> +	vfree(buf);
> +
> +	return result;
> +}
> +
>  static int setup_dtb(struct kimage *image,
>  		unsigned long initrd_load_addr, unsigned long initrd_len,
>  		char *cmdline, unsigned long cmdline_len,
> @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
>  	int range_len;
>  	int ret;
>  
> +	/* check ranges against root's #address-cells and #size-cells */
> +	if (image->type == KEXEC_TYPE_CRASH &&
> +		(!cells_size_fitted(image->arch.elf_load_addr,
> +				image->arch.elf_headers_sz) ||
> +		 !cells_size_fitted(crashk_res.start,
> +				crashk_res.end - crashk_res.start + 1))) {
> +		pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
> +		ret = -EINVAL;
> +		goto out_err;
> +	}
> +
>  	/* duplicate dt blob */
>  	buf_size = fdt_totalsize(initial_boot_params);
>  	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
>  
> +	if (image->type == KEXEC_TYPE_CRASH)
> +		buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
> +				+ fdt_prop_len("linux,usable-memory-range",
> +								range_len);
> +
>  	if (initrd_load_addr)
>  		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
>  				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
> @@ -113,6 +206,23 @@ static int setup_dtb(struct kimage *image,
>  	if (nodeoffset < 0)
>  		goto out_err;
>  
> +	if (image->type == KEXEC_TYPE_CRASH) {
> +		/* add linux,elfcorehdr */
> +		ret = fdt_setprop_range(buf, nodeoffset, "linux,elfcorehdr",
> +				image->arch.elf_load_addr,
> +				image->arch.elf_headers_sz);
> +		if (ret)
> +			goto out_err;
> +
> +		/* add linux,usable-memory-range */
> +		ret = fdt_setprop_range(buf, nodeoffset,
> +				"linux,usable-memory-range",
> +				crashk_res.start,
> +				crashk_res.end - crashk_res.start + 1);
> +		if (ret)
> +			goto out_err;
> +	}
> +
>  	/* add bootargs */
>  	if (cmdline) {
>  		ret = fdt_setprop(buf, nodeoffset, "bootargs",

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-05-15 17:12     ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-15 17:12 UTC (permalink / raw)
  To: linux-arm-kernel

Hi guys,

(CC: +RobH, devicetree list)

On 25/04/18 07:26, AKASHI Takahiro wrote:
> Enabling crash dump (kdump) includes
> * prepare contents of ELF header of a core dump file, /proc/vmcore,
>   using crash_prepare_elf64_headers(), and
> * add two device tree properties, "linux,usable-memory-range" and
>   "linux,elfcorehdr", which represent repsectively a memory range
>   to be used by crash dump kernel and the header's location

kexec_file_load() on arm64 needs to be able to create a prop encoded array to
the FDT, but there doesn't appear to be a libfdt helper to do this.

Akashi's code below adds fdt_setprop_range() to the arch code, and duplicates
bits of libfdt_internal.h to do the work.

How should this be done? I'm assuming this is something we need a new API in
libfdt.h for. How do these come about, and is there an interim step we can use
until then?

Thanks!

James

> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> index 37c0a9dc2e47..ec674f4d267c 100644
> --- a/arch/arm64/kernel/machine_kexec_file.c
> +++ b/arch/arm64/kernel/machine_kexec_file.c
> @@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
>  	return ret;
>  }
>  
> +static int __init arch_kexec_file_init(void)
> +{
> +	/* Those values are used later on loading the kernel */
> +	__dt_root_addr_cells = dt_root_addr_cells;
> +	__dt_root_size_cells = dt_root_size_cells;
> +
> +	return 0;
> +}
> +late_initcall(arch_kexec_file_init);
> +
> +#define FDT_ALIGN(x, a)	(((x) + (a) - 1) & ~((a) - 1))
> +#define FDT_TAGALIGN(x)	(FDT_ALIGN((x), FDT_TAGSIZE))
> +
> +static int fdt_prop_len(const char *prop_name, int len)
> +{
> +	return (strlen(prop_name) + 1) +
> +		sizeof(struct fdt_property) +
> +		FDT_TAGALIGN(len);
> +}
> +
> +static bool cells_size_fitted(unsigned long base, unsigned long size)
> +{
> +	/* if *_cells >= 2, cells can hold 64-bit values anyway */
> +	if ((__dt_root_addr_cells == 1) && (base >= (1ULL << 32)))
> +		return false;
> +
> +	if ((__dt_root_size_cells == 1) && (size >= (1ULL << 32)))
> +		return false;
> +
> +	return true;
> +}
> +
> +static void fill_property(void *buf, u64 val64, int cells)
> +{
> +	u32 val32;
> +
> +	if (cells == 1) {
> +		val32 = cpu_to_fdt32((u32)val64);
> +		memcpy(buf, &val32, sizeof(val32));
> +	} else {
> +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
> +		buf += cells * sizeof(u32) - sizeof(u64);
> +
> +		val64 = cpu_to_fdt64(val64);
> +		memcpy(buf, &val64, sizeof(val64));
> +	}
> +}
> +
> +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
> +				unsigned long addr, unsigned long size)
> +{
> +	void *buf, *prop;
> +	size_t buf_size;
> +	int result;
> +
> +	buf_size = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> +	prop = buf = vmalloc(buf_size);
> +	if (!buf)
> +		return -ENOMEM;
> +
> +	fill_property(prop, addr, __dt_root_addr_cells);
> +	prop += __dt_root_addr_cells * sizeof(u32);
> +
> +	fill_property(prop, size, __dt_root_size_cells);
> +
> +	result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
> +
> +	vfree(buf);
> +
> +	return result;
> +}
> +
>  static int setup_dtb(struct kimage *image,
>  		unsigned long initrd_load_addr, unsigned long initrd_len,
>  		char *cmdline, unsigned long cmdline_len,
> @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
>  	int range_len;
>  	int ret;
>  
> +	/* check ranges against root's #address-cells and #size-cells */
> +	if (image->type == KEXEC_TYPE_CRASH &&
> +		(!cells_size_fitted(image->arch.elf_load_addr,
> +				image->arch.elf_headers_sz) ||
> +		 !cells_size_fitted(crashk_res.start,
> +				crashk_res.end - crashk_res.start + 1))) {
> +		pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
> +		ret = -EINVAL;
> +		goto out_err;
> +	}
> +
>  	/* duplicate dt blob */
>  	buf_size = fdt_totalsize(initial_boot_params);
>  	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
>  
> +	if (image->type == KEXEC_TYPE_CRASH)
> +		buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
> +				+ fdt_prop_len("linux,usable-memory-range",
> +								range_len);
> +
>  	if (initrd_load_addr)
>  		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
>  				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
> @@ -113,6 +206,23 @@ static int setup_dtb(struct kimage *image,
>  	if (nodeoffset < 0)
>  		goto out_err;
>  
> +	if (image->type == KEXEC_TYPE_CRASH) {
> +		/* add linux,elfcorehdr */
> +		ret = fdt_setprop_range(buf, nodeoffset, "linux,elfcorehdr",
> +				image->arch.elf_load_addr,
> +				image->arch.elf_headers_sz);
> +		if (ret)
> +			goto out_err;
> +
> +		/* add linux,usable-memory-range */
> +		ret = fdt_setprop_range(buf, nodeoffset,
> +				"linux,usable-memory-range",
> +				crashk_res.start,
> +				crashk_res.end - crashk_res.start + 1);
> +		if (ret)
> +			goto out_err;
> +	}
> +
>  	/* add bootargs */
>  	if (cmdline) {
>  		ret = fdt_setprop(buf, nodeoffset, "bootargs",

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-05-15 17:12     ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-15 17:12 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: devicetree, herbert, bhe, ard.biesheuvel, catalin.marinas,
	bhsharma, will.deacon, linux-kernel, dhowells, Rob Herring, arnd,
	linux-arm-kernel, kexec, dyoung, davem, vgoyal

Hi guys,

(CC: +RobH, devicetree list)

On 25/04/18 07:26, AKASHI Takahiro wrote:
> Enabling crash dump (kdump) includes
> * prepare contents of ELF header of a core dump file, /proc/vmcore,
>   using crash_prepare_elf64_headers(), and
> * add two device tree properties, "linux,usable-memory-range" and
>   "linux,elfcorehdr", which represent repsectively a memory range
>   to be used by crash dump kernel and the header's location

kexec_file_load() on arm64 needs to be able to create a prop encoded array to
the FDT, but there doesn't appear to be a libfdt helper to do this.

Akashi's code below adds fdt_setprop_range() to the arch code, and duplicates
bits of libfdt_internal.h to do the work.

How should this be done? I'm assuming this is something we need a new API in
libfdt.h for. How do these come about, and is there an interim step we can use
until then?

Thanks!

James

> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> index 37c0a9dc2e47..ec674f4d267c 100644
> --- a/arch/arm64/kernel/machine_kexec_file.c
> +++ b/arch/arm64/kernel/machine_kexec_file.c
> @@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
>  	return ret;
>  }
>  
> +static int __init arch_kexec_file_init(void)
> +{
> +	/* Those values are used later on loading the kernel */
> +	__dt_root_addr_cells = dt_root_addr_cells;
> +	__dt_root_size_cells = dt_root_size_cells;
> +
> +	return 0;
> +}
> +late_initcall(arch_kexec_file_init);
> +
> +#define FDT_ALIGN(x, a)	(((x) + (a) - 1) & ~((a) - 1))
> +#define FDT_TAGALIGN(x)	(FDT_ALIGN((x), FDT_TAGSIZE))
> +
> +static int fdt_prop_len(const char *prop_name, int len)
> +{
> +	return (strlen(prop_name) + 1) +
> +		sizeof(struct fdt_property) +
> +		FDT_TAGALIGN(len);
> +}
> +
> +static bool cells_size_fitted(unsigned long base, unsigned long size)
> +{
> +	/* if *_cells >= 2, cells can hold 64-bit values anyway */
> +	if ((__dt_root_addr_cells == 1) && (base >= (1ULL << 32)))
> +		return false;
> +
> +	if ((__dt_root_size_cells == 1) && (size >= (1ULL << 32)))
> +		return false;
> +
> +	return true;
> +}
> +
> +static void fill_property(void *buf, u64 val64, int cells)
> +{
> +	u32 val32;
> +
> +	if (cells == 1) {
> +		val32 = cpu_to_fdt32((u32)val64);
> +		memcpy(buf, &val32, sizeof(val32));
> +	} else {
> +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
> +		buf += cells * sizeof(u32) - sizeof(u64);
> +
> +		val64 = cpu_to_fdt64(val64);
> +		memcpy(buf, &val64, sizeof(val64));
> +	}
> +}
> +
> +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
> +				unsigned long addr, unsigned long size)
> +{
> +	void *buf, *prop;
> +	size_t buf_size;
> +	int result;
> +
> +	buf_size = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> +	prop = buf = vmalloc(buf_size);
> +	if (!buf)
> +		return -ENOMEM;
> +
> +	fill_property(prop, addr, __dt_root_addr_cells);
> +	prop += __dt_root_addr_cells * sizeof(u32);
> +
> +	fill_property(prop, size, __dt_root_size_cells);
> +
> +	result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
> +
> +	vfree(buf);
> +
> +	return result;
> +}
> +
>  static int setup_dtb(struct kimage *image,
>  		unsigned long initrd_load_addr, unsigned long initrd_len,
>  		char *cmdline, unsigned long cmdline_len,
> @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
>  	int range_len;
>  	int ret;
>  
> +	/* check ranges against root's #address-cells and #size-cells */
> +	if (image->type == KEXEC_TYPE_CRASH &&
> +		(!cells_size_fitted(image->arch.elf_load_addr,
> +				image->arch.elf_headers_sz) ||
> +		 !cells_size_fitted(crashk_res.start,
> +				crashk_res.end - crashk_res.start + 1))) {
> +		pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
> +		ret = -EINVAL;
> +		goto out_err;
> +	}
> +
>  	/* duplicate dt blob */
>  	buf_size = fdt_totalsize(initial_boot_params);
>  	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
>  
> +	if (image->type == KEXEC_TYPE_CRASH)
> +		buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
> +				+ fdt_prop_len("linux,usable-memory-range",
> +								range_len);
> +
>  	if (initrd_load_addr)
>  		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
>  				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
> @@ -113,6 +206,23 @@ static int setup_dtb(struct kimage *image,
>  	if (nodeoffset < 0)
>  		goto out_err;
>  
> +	if (image->type == KEXEC_TYPE_CRASH) {
> +		/* add linux,elfcorehdr */
> +		ret = fdt_setprop_range(buf, nodeoffset, "linux,elfcorehdr",
> +				image->arch.elf_load_addr,
> +				image->arch.elf_headers_sz);
> +		if (ret)
> +			goto out_err;
> +
> +		/* add linux,usable-memory-range */
> +		ret = fdt_setprop_range(buf, nodeoffset,
> +				"linux,usable-memory-range",
> +				crashk_res.start,
> +				crashk_res.end - crashk_res.start + 1);
> +		if (ret)
> +			goto out_err;
> +	}
> +
>  	/* add bootargs */
>  	if (cmdline) {
>  		ret = fdt_setprop(buf, nodeoffset, "bootargs",


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 06/11] arm64: kexec_file: allow for loading Image-format kernel
  2018-05-15  5:13           ` AKASHI Takahiro
  (?)
@ 2018-05-15 17:14             ` James Morse
  -1 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-15 17:14 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

Hi Akashi,

On 15/05/18 06:13, AKASHI Takahiro wrote:
> On Fri, May 11, 2018 at 06:07:06PM +0100, James Morse wrote:
>> On 07/05/18 08:21, AKASHI Takahiro wrote:
>>> On Tue, May 01, 2018 at 06:46:11PM +0100, James Morse wrote:
>>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>>>> This patch provides kexec_file_ops for "Image"-format kernel. In this
>>>>> implementation, a binary is always loaded with a fixed offset identified
>>>>> in text_offset field of its header.
>>
>>>>> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
>>>>> index e4de1223715f..3cba4161818a 100644
>>>>> --- a/arch/arm64/include/asm/kexec.h
>>>>> +++ b/arch/arm64/include/asm/kexec.h

>>>> Could we check branch_code is non-zero, and text-offset points within image-size?
>>>
>>> We could do it, but I don't think this check is very useful.
>>>
>>>>
>>>> We could check that this platform supports the page-size/endian config that this
>>>> Image was built with... We get a message from the EFI stub if the page-size
>>>> can't be supported, it would be nice to do the same here (as we can).
>>>
>>> There is no restriction on page-size or endianness for kexec.
>>
>> No, but it won't boot if the hardware doesn't support it. The kernel will spin
>> at a magic address that is, difficult, to debug without JTAG. The bug report
>> will be "it didn't boot".
> 
> OK.
> Added sanity checks for cpu features, endianness as well as page size.
> 
>>
>>> What will be the purpose of this check?
>>
>> These values are in the header so that the bootloader can check them, then print
>> a meaningful error. Here, kexec_file_load() is playing the part of the bootloader.

>> I'm assuming kexec_file_load() can only be used to kexec linux... unlike regular
>> kexec. Is this where I'm going wrong?

Trying to work this out for myself: we can't support any UEFI application as we
can't give it the boot-services environment, so I'm pretty sure
kexec_file_load() must be linux-specific.

Can we state somewhere that we only expect arm64 linux to be booted with
kexec_file_load()? Its not clear from the kconfig text, which refers to kexec,
which explicitly states it can boot other OS. But for kexec_file_load() we're
following the kernel's booting.txt.


>>>>> diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
>>>>> new file mode 100644
>>>>> index 000000000000..4dd524ad6611
>>>>> --- /dev/null
>>>>> +++ b/arch/arm64/kernel/kexec_image.c
>>>>> @@ -0,0 +1,79 @@
>>>>
>>>>> +static void *image_load(struct kimage *image,
>>>>> +				char *kernel, unsigned long kernel_len,
>>>>> +				char *initrd, unsigned long initrd_len,
>>>>> +				char *cmdline, unsigned long cmdline_len)
>>>>> +{
>>>>> +	struct kexec_buf kbuf;
>>>>> +	struct arm64_image_header *h = (struct arm64_image_header *)kernel;
>>>>> +	unsigned long text_offset;
>>>>> +	int ret;
>>>>> +
>>>>> +	/* Load the kernel */
>>>>> +	kbuf.image = image;
>>>>> +	kbuf.buf_min = 0;
>>>>> +	kbuf.buf_max = ULONG_MAX;
>>>>> +	kbuf.top_down = false;
>>>>> +
>>>>> +	kbuf.buffer = kernel;
>>>>> +	kbuf.bufsz = kernel_len;
>>>>> +	kbuf.memsz = le64_to_cpu(h->image_size);
>>>>> +	text_offset = le64_to_cpu(h->text_offset);
>>>>> +	kbuf.buf_align = SZ_2M;
>>>>
>>>>> +	/* Adjust kernel segment with TEXT_OFFSET */
>>>>> +	kbuf.memsz += text_offset;
>>>>> +
>>>>> +	ret = kexec_add_buffer(&kbuf);
>>>>> +	if (ret)
>>>>> +		goto out;
>>>>> +
>>>>> +	image->arch.kern_segment = image->nr_segments - 1;
>>>>
>>>> You only seem to use kern_segment here, and in load_other_segments() called
>>>> below. Could it not be a local variable passed in? Instead of arch-specific data
>>>> we keep forever?
>>>
>>> No, kern_segment is also used in load_other_segments() in machine_kexec_file.c.
>>> To optimize memory hole allocation logic in locate_mem_hole_callback(),
>>> we need to know the exact range of kernel image (start and end).
>>
>> That's the second user. My badly-made point is one calls the other, but passes
>> the data via some until-kexec lifetime struct. (its not important, just an
>> indicator this worked differently in the past and hasn't been cleaned up).
>> I meant something like [0].
> 
> OK, but instead of adding kern_seg, I want to change the interface to:
> 
> | extern int load_other_segments(struct kimage *image,
> |		unsigned long kernel_load_addr, unsigned long kernel_size,
> |		char *initrd, unsigned long initrd_len,
> |		char *cmdline, unsigned long cmdline_len);
> 
> This way, we will in future be able to address an issue I mentioned in
> my previous e-mail. (If we support vmlinux, the kernel occupies two segments
> for text and data, respectively.)

Aha, its not from old-stuff, its for future-stuff!


James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 06/11] arm64: kexec_file: allow for loading Image-format kernel
@ 2018-05-15 17:14             ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-15 17:14 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Akashi,

On 15/05/18 06:13, AKASHI Takahiro wrote:
> On Fri, May 11, 2018 at 06:07:06PM +0100, James Morse wrote:
>> On 07/05/18 08:21, AKASHI Takahiro wrote:
>>> On Tue, May 01, 2018 at 06:46:11PM +0100, James Morse wrote:
>>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>>>> This patch provides kexec_file_ops for "Image"-format kernel. In this
>>>>> implementation, a binary is always loaded with a fixed offset identified
>>>>> in text_offset field of its header.
>>
>>>>> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
>>>>> index e4de1223715f..3cba4161818a 100644
>>>>> --- a/arch/arm64/include/asm/kexec.h
>>>>> +++ b/arch/arm64/include/asm/kexec.h

>>>> Could we check branch_code is non-zero, and text-offset points within image-size?
>>>
>>> We could do it, but I don't think this check is very useful.
>>>
>>>>
>>>> We could check that this platform supports the page-size/endian config that this
>>>> Image was built with... We get a message from the EFI stub if the page-size
>>>> can't be supported, it would be nice to do the same here (as we can).
>>>
>>> There is no restriction on page-size or endianness for kexec.
>>
>> No, but it won't boot if the hardware doesn't support it. The kernel will spin
>> at a magic address that is, difficult, to debug without JTAG. The bug report
>> will be "it didn't boot".
> 
> OK.
> Added sanity checks for cpu features, endianness as well as page size.
> 
>>
>>> What will be the purpose of this check?
>>
>> These values are in the header so that the bootloader can check them, then print
>> a meaningful error. Here, kexec_file_load() is playing the part of the bootloader.

>> I'm assuming kexec_file_load() can only be used to kexec linux... unlike regular
>> kexec. Is this where I'm going wrong?

Trying to work this out for myself: we can't support any UEFI application as we
can't give it the boot-services environment, so I'm pretty sure
kexec_file_load() must be linux-specific.

Can we state somewhere that we only expect arm64 linux to be booted with
kexec_file_load()? Its not clear from the kconfig text, which refers to kexec,
which explicitly states it can boot other OS. But for kexec_file_load() we're
following the kernel's booting.txt.


>>>>> diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
>>>>> new file mode 100644
>>>>> index 000000000000..4dd524ad6611
>>>>> --- /dev/null
>>>>> +++ b/arch/arm64/kernel/kexec_image.c
>>>>> @@ -0,0 +1,79 @@
>>>>
>>>>> +static void *image_load(struct kimage *image,
>>>>> +				char *kernel, unsigned long kernel_len,
>>>>> +				char *initrd, unsigned long initrd_len,
>>>>> +				char *cmdline, unsigned long cmdline_len)
>>>>> +{
>>>>> +	struct kexec_buf kbuf;
>>>>> +	struct arm64_image_header *h = (struct arm64_image_header *)kernel;
>>>>> +	unsigned long text_offset;
>>>>> +	int ret;
>>>>> +
>>>>> +	/* Load the kernel */
>>>>> +	kbuf.image = image;
>>>>> +	kbuf.buf_min = 0;
>>>>> +	kbuf.buf_max = ULONG_MAX;
>>>>> +	kbuf.top_down = false;
>>>>> +
>>>>> +	kbuf.buffer = kernel;
>>>>> +	kbuf.bufsz = kernel_len;
>>>>> +	kbuf.memsz = le64_to_cpu(h->image_size);
>>>>> +	text_offset = le64_to_cpu(h->text_offset);
>>>>> +	kbuf.buf_align = SZ_2M;
>>>>
>>>>> +	/* Adjust kernel segment with TEXT_OFFSET */
>>>>> +	kbuf.memsz += text_offset;
>>>>> +
>>>>> +	ret = kexec_add_buffer(&kbuf);
>>>>> +	if (ret)
>>>>> +		goto out;
>>>>> +
>>>>> +	image->arch.kern_segment = image->nr_segments - 1;
>>>>
>>>> You only seem to use kern_segment here, and in load_other_segments() called
>>>> below. Could it not be a local variable passed in? Instead of arch-specific data
>>>> we keep forever?
>>>
>>> No, kern_segment is also used in load_other_segments() in machine_kexec_file.c.
>>> To optimize memory hole allocation logic in locate_mem_hole_callback(),
>>> we need to know the exact range of kernel image (start and end).
>>
>> That's the second user. My badly-made point is one calls the other, but passes
>> the data via some until-kexec lifetime struct. (its not important, just an
>> indicator this worked differently in the past and hasn't been cleaned up).
>> I meant something like [0].
> 
> OK, but instead of adding kern_seg, I want to change the interface to:
> 
> | extern int load_other_segments(struct kimage *image,
> |		unsigned long kernel_load_addr, unsigned long kernel_size,
> |		char *initrd, unsigned long initrd_len,
> |		char *cmdline, unsigned long cmdline_len);
> 
> This way, we will in future be able to address an issue I mentioned in
> my previous e-mail. (If we support vmlinux, the kernel occupies two segments
> for text and data, respectively.)

Aha, its not from old-stuff, its for future-stuff!


James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 06/11] arm64: kexec_file: allow for loading Image-format kernel
@ 2018-05-15 17:14             ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-15 17:14 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

Hi Akashi,

On 15/05/18 06:13, AKASHI Takahiro wrote:
> On Fri, May 11, 2018 at 06:07:06PM +0100, James Morse wrote:
>> On 07/05/18 08:21, AKASHI Takahiro wrote:
>>> On Tue, May 01, 2018 at 06:46:11PM +0100, James Morse wrote:
>>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>>>> This patch provides kexec_file_ops for "Image"-format kernel. In this
>>>>> implementation, a binary is always loaded with a fixed offset identified
>>>>> in text_offset field of its header.
>>
>>>>> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
>>>>> index e4de1223715f..3cba4161818a 100644
>>>>> --- a/arch/arm64/include/asm/kexec.h
>>>>> +++ b/arch/arm64/include/asm/kexec.h

>>>> Could we check branch_code is non-zero, and text-offset points within image-size?
>>>
>>> We could do it, but I don't think this check is very useful.
>>>
>>>>
>>>> We could check that this platform supports the page-size/endian config that this
>>>> Image was built with... We get a message from the EFI stub if the page-size
>>>> can't be supported, it would be nice to do the same here (as we can).
>>>
>>> There is no restriction on page-size or endianness for kexec.
>>
>> No, but it won't boot if the hardware doesn't support it. The kernel will spin
>> at a magic address that is, difficult, to debug without JTAG. The bug report
>> will be "it didn't boot".
> 
> OK.
> Added sanity checks for cpu features, endianness as well as page size.
> 
>>
>>> What will be the purpose of this check?
>>
>> These values are in the header so that the bootloader can check them, then print
>> a meaningful error. Here, kexec_file_load() is playing the part of the bootloader.

>> I'm assuming kexec_file_load() can only be used to kexec linux... unlike regular
>> kexec. Is this where I'm going wrong?

Trying to work this out for myself: we can't support any UEFI application as we
can't give it the boot-services environment, so I'm pretty sure
kexec_file_load() must be linux-specific.

Can we state somewhere that we only expect arm64 linux to be booted with
kexec_file_load()? Its not clear from the kconfig text, which refers to kexec,
which explicitly states it can boot other OS. But for kexec_file_load() we're
following the kernel's booting.txt.


>>>>> diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
>>>>> new file mode 100644
>>>>> index 000000000000..4dd524ad6611
>>>>> --- /dev/null
>>>>> +++ b/arch/arm64/kernel/kexec_image.c
>>>>> @@ -0,0 +1,79 @@
>>>>
>>>>> +static void *image_load(struct kimage *image,
>>>>> +				char *kernel, unsigned long kernel_len,
>>>>> +				char *initrd, unsigned long initrd_len,
>>>>> +				char *cmdline, unsigned long cmdline_len)
>>>>> +{
>>>>> +	struct kexec_buf kbuf;
>>>>> +	struct arm64_image_header *h = (struct arm64_image_header *)kernel;
>>>>> +	unsigned long text_offset;
>>>>> +	int ret;
>>>>> +
>>>>> +	/* Load the kernel */
>>>>> +	kbuf.image = image;
>>>>> +	kbuf.buf_min = 0;
>>>>> +	kbuf.buf_max = ULONG_MAX;
>>>>> +	kbuf.top_down = false;
>>>>> +
>>>>> +	kbuf.buffer = kernel;
>>>>> +	kbuf.bufsz = kernel_len;
>>>>> +	kbuf.memsz = le64_to_cpu(h->image_size);
>>>>> +	text_offset = le64_to_cpu(h->text_offset);
>>>>> +	kbuf.buf_align = SZ_2M;
>>>>
>>>>> +	/* Adjust kernel segment with TEXT_OFFSET */
>>>>> +	kbuf.memsz += text_offset;
>>>>> +
>>>>> +	ret = kexec_add_buffer(&kbuf);
>>>>> +	if (ret)
>>>>> +		goto out;
>>>>> +
>>>>> +	image->arch.kern_segment = image->nr_segments - 1;
>>>>
>>>> You only seem to use kern_segment here, and in load_other_segments() called
>>>> below. Could it not be a local variable passed in? Instead of arch-specific data
>>>> we keep forever?
>>>
>>> No, kern_segment is also used in load_other_segments() in machine_kexec_file.c.
>>> To optimize memory hole allocation logic in locate_mem_hole_callback(),
>>> we need to know the exact range of kernel image (start and end).
>>
>> That's the second user. My badly-made point is one calls the other, but passes
>> the data via some until-kexec lifetime struct. (its not important, just an
>> indicator this worked differently in the past and hasn't been cleaned up).
>> I meant something like [0].
> 
> OK, but instead of adding kern_seg, I want to change the interface to:
> 
> | extern int load_other_segments(struct kimage *image,
> |		unsigned long kernel_load_addr, unsigned long kernel_size,
> |		char *initrd, unsigned long initrd_len,
> |		char *cmdline, unsigned long cmdline_len);
> 
> This way, we will in future be able to address an issue I mentioned in
> my previous e-mail. (If we support vmlinux, the kernel occupies two segments
> for text and data, respectively.)

Aha, its not from old-stuff, its for future-stuff!


James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support
  2018-05-15 17:11     ` James Morse
  (?)
@ 2018-05-16  8:34       ` James Morse
  -1 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-16  8:34 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

Hi Akashi,

On 15/05/18 18:11, James Morse wrote:
> On 25/04/18 07:26, AKASHI Takahiro wrote:
>> Enabling crash dump (kdump) includes
>> * prepare contents of ELF header of a core dump file, /proc/vmcore,
>>   using crash_prepare_elf64_headers(), and
>> * add two device tree properties, "linux,usable-memory-range" and
>>   "linux,elfcorehdr", which represent repsectively a memory range
>>   to be used by crash dump kernel and the header's location

>> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
>> index 37c0a9dc2e47..ec674f4d267c 100644
>> --- a/arch/arm64/kernel/machine_kexec_file.c
>> +++ b/arch/arm64/kernel/machine_kexec_file.c
>> @@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,

>> +static void fill_property(void *buf, u64 val64, int cells)
>> +{
>> +	u32 val32;
>> +
>> +	if (cells == 1) {
>> +		val32 = cpu_to_fdt32((u32)val64);
>> +		memcpy(buf, &val32, sizeof(val32));
>> +	} else {
> 
>> +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
>> +		buf += cells * sizeof(u32) - sizeof(u64);
> 
> Is this trying to clear the 'top' cells and shuffle the pointer to point at the
> 'bottom' 2? I'm pretty sure this isn't endian safe.

It came to me at 2am: this only works on big-endian, which is exactly what you
want as that is the DT format.


> Do we really expect a system to have #address-cells > 2?


Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-05-16  8:34       ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-16  8:34 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Akashi,

On 15/05/18 18:11, James Morse wrote:
> On 25/04/18 07:26, AKASHI Takahiro wrote:
>> Enabling crash dump (kdump) includes
>> * prepare contents of ELF header of a core dump file, /proc/vmcore,
>>   using crash_prepare_elf64_headers(), and
>> * add two device tree properties, "linux,usable-memory-range" and
>>   "linux,elfcorehdr", which represent repsectively a memory range
>>   to be used by crash dump kernel and the header's location

>> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
>> index 37c0a9dc2e47..ec674f4d267c 100644
>> --- a/arch/arm64/kernel/machine_kexec_file.c
>> +++ b/arch/arm64/kernel/machine_kexec_file.c
>> @@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,

>> +static void fill_property(void *buf, u64 val64, int cells)
>> +{
>> +	u32 val32;
>> +
>> +	if (cells == 1) {
>> +		val32 = cpu_to_fdt32((u32)val64);
>> +		memcpy(buf, &val32, sizeof(val32));
>> +	} else {
> 
>> +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
>> +		buf += cells * sizeof(u32) - sizeof(u64);
> 
> Is this trying to clear the 'top' cells and shuffle the pointer to point at the
> 'bottom' 2? I'm pretty sure this isn't endian safe.

It came to me at 2am: this only works on big-endian, which is exactly what you
want as that is the DT format.


> Do we really expect a system to have #address-cells > 2?


Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-05-16  8:34       ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-16  8:34 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

Hi Akashi,

On 15/05/18 18:11, James Morse wrote:
> On 25/04/18 07:26, AKASHI Takahiro wrote:
>> Enabling crash dump (kdump) includes
>> * prepare contents of ELF header of a core dump file, /proc/vmcore,
>>   using crash_prepare_elf64_headers(), and
>> * add two device tree properties, "linux,usable-memory-range" and
>>   "linux,elfcorehdr", which represent repsectively a memory range
>>   to be used by crash dump kernel and the header's location

>> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
>> index 37c0a9dc2e47..ec674f4d267c 100644
>> --- a/arch/arm64/kernel/machine_kexec_file.c
>> +++ b/arch/arm64/kernel/machine_kexec_file.c
>> @@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,

>> +static void fill_property(void *buf, u64 val64, int cells)
>> +{
>> +	u32 val32;
>> +
>> +	if (cells == 1) {
>> +		val32 = cpu_to_fdt32((u32)val64);
>> +		memcpy(buf, &val32, sizeof(val32));
>> +	} else {
> 
>> +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
>> +		buf += cells * sizeof(u32) - sizeof(u64);
> 
> Is this trying to clear the 'top' cells and shuffle the pointer to point at the
> 'bottom' 2? I'm pretty sure this isn't endian safe.

It came to me at 2am: this only works on big-endian, which is exactly what you
want as that is the DT format.


> Do we really expect a system to have #address-cells > 2?


Thanks,

James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support
  2018-05-15 17:11     ` James Morse
  (?)
@ 2018-05-16 10:06       ` James Morse
  -1 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-16 10:06 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

Hi Akashi,

On 15/05/18 18:11, James Morse wrote:
> On 25/04/18 07:26, AKASHI Takahiro wrote:
>> Enabling crash dump (kdump) includes
>> * prepare contents of ELF header of a core dump file, /proc/vmcore,
>>   using crash_prepare_elf64_headers(), and
>> * add two device tree properties, "linux,usable-memory-range" and
>>   "linux,elfcorehdr", which represent repsectively a memory range
>>   to be used by crash dump kernel and the header's location

>> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
>> index 37c0a9dc2e47..ec674f4d267c 100644
>> --- a/arch/arm64/kernel/machine_kexec_file.c
>> +++ b/arch/arm64/kernel/machine_kexec_file.c

>> +static struct crash_mem *get_crash_memory_ranges(void)
>> +{
>> +	unsigned int nr_ranges;
>> +	struct crash_mem *cmem;
>> +
>> +	nr_ranges = 1; /* for exclusion of crashkernel region */
>> +	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ranges_callback);
>> +
>> +	cmem = vmalloc(sizeof(struct crash_mem) +
>> +			sizeof(struct crash_mem_range) * nr_ranges);
>> +	if (!cmem)
>> +		return NULL;
>> +
>> +	cmem->max_nr_ranges = nr_ranges;
>> +	cmem->nr_ranges = 0;
>> +	walk_system_ram_res(0, -1, cmem, add_mem_range_callback);
>> +
>> +	/* Exclude crashkernel region */
>> +	if (crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end)) {
>> +		vfree(cmem);
>> +		return NULL;
>> +	}
>> +
>> +	return cmem;
>> +}
> 
> Could this function be included in prepare_elf_headers() so that the alloc() and
> free() occur together.
> 
> 
>> +static int prepare_elf_headers(void **addr, unsigned long *sz)
>> +{
>> +	struct crash_mem *cmem;
>> +	int ret = 0;
>> +
>> +	cmem = get_crash_memory_ranges();
>> +	if (!cmem)
>> +		return -ENOMEM;
>> +
>> +	ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
>> +
>> +	vfree(cmem);
> 
>> +	return ret;
>> +}
> 
> All this is moving memory-range information from core-code's
> walk_system_ram_res() into core-code's struct crash_mem, and excluding
> crashk_res, which again is accessible to the core code.
> 
> It looks like this is duplicated in arch/x86 and arch/arm64 because arm64
> doesn't have a second 'crashk_low_res' region, and always wants elf64, instead
> of when IS_ENABLED(CONFIG_X86_64).

Thinking about it some more: don't we want to walk memblock here, not
walk_system_ram_res()? What we want is a list of not-nomap regions that the
kernel may have been using, to form part of vmcore.
walk_system_ram_res() is becoming a murkier list of maybe-nomap, maybe-reserved.

I think we should walk the same list here as we do in patch 4.


Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-05-16 10:06       ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-16 10:06 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Akashi,

On 15/05/18 18:11, James Morse wrote:
> On 25/04/18 07:26, AKASHI Takahiro wrote:
>> Enabling crash dump (kdump) includes
>> * prepare contents of ELF header of a core dump file, /proc/vmcore,
>>   using crash_prepare_elf64_headers(), and
>> * add two device tree properties, "linux,usable-memory-range" and
>>   "linux,elfcorehdr", which represent repsectively a memory range
>>   to be used by crash dump kernel and the header's location

>> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
>> index 37c0a9dc2e47..ec674f4d267c 100644
>> --- a/arch/arm64/kernel/machine_kexec_file.c
>> +++ b/arch/arm64/kernel/machine_kexec_file.c

>> +static struct crash_mem *get_crash_memory_ranges(void)
>> +{
>> +	unsigned int nr_ranges;
>> +	struct crash_mem *cmem;
>> +
>> +	nr_ranges = 1; /* for exclusion of crashkernel region */
>> +	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ranges_callback);
>> +
>> +	cmem = vmalloc(sizeof(struct crash_mem) +
>> +			sizeof(struct crash_mem_range) * nr_ranges);
>> +	if (!cmem)
>> +		return NULL;
>> +
>> +	cmem->max_nr_ranges = nr_ranges;
>> +	cmem->nr_ranges = 0;
>> +	walk_system_ram_res(0, -1, cmem, add_mem_range_callback);
>> +
>> +	/* Exclude crashkernel region */
>> +	if (crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end)) {
>> +		vfree(cmem);
>> +		return NULL;
>> +	}
>> +
>> +	return cmem;
>> +}
> 
> Could this function be included in prepare_elf_headers() so that the alloc() and
> free() occur together.
> 
> 
>> +static int prepare_elf_headers(void **addr, unsigned long *sz)
>> +{
>> +	struct crash_mem *cmem;
>> +	int ret = 0;
>> +
>> +	cmem = get_crash_memory_ranges();
>> +	if (!cmem)
>> +		return -ENOMEM;
>> +
>> +	ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
>> +
>> +	vfree(cmem);
> 
>> +	return ret;
>> +}
> 
> All this is moving memory-range information from core-code's
> walk_system_ram_res() into core-code's struct crash_mem, and excluding
> crashk_res, which again is accessible to the core code.
> 
> It looks like this is duplicated in arch/x86 and arch/arm64 because arm64
> doesn't have a second 'crashk_low_res' region, and always wants elf64, instead
> of when IS_ENABLED(CONFIG_X86_64).

Thinking about it some more: don't we want to walk memblock here, not
walk_system_ram_res()? What we want is a list of not-nomap regions that the
kernel may have been using, to form part of vmcore.
walk_system_ram_res() is becoming a murkier list of maybe-nomap, maybe-reserved.

I think we should walk the same list here as we do in patch 4.


Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-05-16 10:06       ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-16 10:06 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

Hi Akashi,

On 15/05/18 18:11, James Morse wrote:
> On 25/04/18 07:26, AKASHI Takahiro wrote:
>> Enabling crash dump (kdump) includes
>> * prepare contents of ELF header of a core dump file, /proc/vmcore,
>>   using crash_prepare_elf64_headers(), and
>> * add two device tree properties, "linux,usable-memory-range" and
>>   "linux,elfcorehdr", which represent repsectively a memory range
>>   to be used by crash dump kernel and the header's location

>> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
>> index 37c0a9dc2e47..ec674f4d267c 100644
>> --- a/arch/arm64/kernel/machine_kexec_file.c
>> +++ b/arch/arm64/kernel/machine_kexec_file.c

>> +static struct crash_mem *get_crash_memory_ranges(void)
>> +{
>> +	unsigned int nr_ranges;
>> +	struct crash_mem *cmem;
>> +
>> +	nr_ranges = 1; /* for exclusion of crashkernel region */
>> +	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ranges_callback);
>> +
>> +	cmem = vmalloc(sizeof(struct crash_mem) +
>> +			sizeof(struct crash_mem_range) * nr_ranges);
>> +	if (!cmem)
>> +		return NULL;
>> +
>> +	cmem->max_nr_ranges = nr_ranges;
>> +	cmem->nr_ranges = 0;
>> +	walk_system_ram_res(0, -1, cmem, add_mem_range_callback);
>> +
>> +	/* Exclude crashkernel region */
>> +	if (crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end)) {
>> +		vfree(cmem);
>> +		return NULL;
>> +	}
>> +
>> +	return cmem;
>> +}
> 
> Could this function be included in prepare_elf_headers() so that the alloc() and
> free() occur together.
> 
> 
>> +static int prepare_elf_headers(void **addr, unsigned long *sz)
>> +{
>> +	struct crash_mem *cmem;
>> +	int ret = 0;
>> +
>> +	cmem = get_crash_memory_ranges();
>> +	if (!cmem)
>> +		return -ENOMEM;
>> +
>> +	ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
>> +
>> +	vfree(cmem);
> 
>> +	return ret;
>> +}
> 
> All this is moving memory-range information from core-code's
> walk_system_ram_res() into core-code's struct crash_mem, and excluding
> crashk_res, which again is accessible to the core code.
> 
> It looks like this is duplicated in arch/x86 and arch/arm64 because arm64
> doesn't have a second 'crashk_low_res' region, and always wants elf64, instead
> of when IS_ENABLED(CONFIG_X86_64).

Thinking about it some more: don't we want to walk memblock here, not
walk_system_ram_res()? What we want is a list of not-nomap regions that the
kernel may have been using, to form part of vmcore.
walk_system_ram_res() is becoming a murkier list of maybe-nomap, maybe-reserved.

I think we should walk the same list here as we do in patch 4.


Thanks,

James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
  2018-05-07  5:59       ` AKASHI Takahiro
  (?)
@ 2018-05-17  2:10         ` Baoquan He
  -1 siblings, 0 replies; 156+ messages in thread
From: Baoquan He @ 2018-05-17  2:10 UTC (permalink / raw)
  To: AKASHI Takahiro, James Morse, catalin.marinas, will.deacon,
	dhowells, vgoyal, herbert, davem, dyoung, arnd, ard.biesheuvel,
	bhsharma, kexec, linux-arm-kernel, linux-kernel

On 05/07/18 at 02:59pm, AKASHI Takahiro wrote:
> James,
> 
> On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
> > Hi Akashi,
> > 
> > On 25/04/18 07:26, AKASHI Takahiro wrote:
> > > We need to prevent firmware-reserved memory regions, particularly EFI
> > > memory map as well as ACPI tables, from being corrupted by loading
> > > kernel/initrd (or other kexec buffers). We also want to support memory
> > > allocation in top-down manner in addition to default bottom-up.
> > > So let's have arm64 specific arch_kexec_walk_mem() which will search
> > > for available memory ranges in usable memblock list,
> > > i.e. !NOMAP & !reserved, 
> > 
> > > instead of system resource tree.
> > 
> > Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
> > be safe in the EFI-memory-map/ACPI-tables case?
> > 
> > It would be good to avoid having two ways of doing this, and I would like to
> > avoid having extra arch code...
> 
> I know what you mean.
> /proc/iomem or system resource is, in my opinion, not the best place to
> describe memory usage of kernel but rather to describe *physical* hardware
> layout. As we are still discussing about "reserved" memory, I don't want
> to depend on it.
> Along with memblock list, we will have more accurate control over memory
> usage.

In kexec-tools, we see any usable memory as candidate which can be used
to load kexec kernel image/initrd etc. However kexec loading is a
preparation work, it just books those position for later kexec kernel
jumping after "kexec -e", that is why we need kexec_buf to remember
them and do the real content copy of kernel/initrd. Here you use
memblock to search available memory, isn't it deviating too far away
from the original design in kexec-tools. Assume kexec loading and
kexec_file loading should be consistent on loading even though they are
done in different space, kernel space and user space.

I didn't follow the earlier post, may miss something.

Thanks
Baoquan

> 
> > 
> > > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > > new file mode 100644
> > > index 000000000000..f9ebf54ca247
> > > --- /dev/null
> > > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > > @@ -0,0 +1,57 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * kexec_file for arm64
> > > + *
> > > + * Copyright (C) 2018 Linaro Limited
> > > + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > > + *
> > 
> > > + * Most code is derived from arm64 port of kexec-tools
> > 
> > How does kexec-tools walk memblock?
> 
> Will remove this comment from this patch.
> Obviously, this comment is for the rest of the code which will be
> added to succeeding patches (patch #5 and #7).
> 
> 
> > 
> > > + */
> > > +
> > > +#define pr_fmt(fmt) "kexec_file: " fmt
> > > +
> > > +#include <linux/ioport.h>
> > > +#include <linux/kernel.h>
> > > +#include <linux/kexec.h>
> > > +#include <linux/memblock.h>
> > > +
> > > +int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> > > +				int (*func)(struct resource *, void *))
> > > +{
> > > +	phys_addr_t start, end;
> > > +	struct resource res;
> > > +	u64 i;
> > > +	int ret = 0;
> > > +
> > > +	if (kbuf->image->type == KEXEC_TYPE_CRASH)
> > > +		return func(&crashk_res, kbuf);
> > > +
> > > +	if (kbuf->top_down)
> > > +		for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,
> > > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > > +				&start, &end, NULL) {
> > 
> > for_each_free_mem_range_reverse() is a more readable version of this helper.
> 
> OK. I used to use my own limited list of reserved memory instead of
> memblock.reserved here to exclude verbose ranges.
> 
> 
> > > +			if (!memblock_is_map_memory(start))
> > > +				continue;
> > 
> > Passing MEMBLOCK_NONE means this walk will never find MEMBLOCK_NOMAP memory.
> 
> Sure, I confirmed it.
> 
> > 
> > > +			res.start = start;
> > > +			res.end = end;
> > > +			ret = func(&res, kbuf);
> > > +			if (ret)
> > > +				break;
> > > +		}
> > > +	else
> > > +		for_each_mem_range(i, &memblock.memory, &memblock.reserved,
> > > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > > +				&start, &end, NULL) {
> > 
> > for_each_free_mem_range()?
> 
> OK.
> 
> > > +			if (!memblock_is_map_memory(start))
> > > +				continue;
> > > +
> > > +			res.start = start;
> > > +			res.end = end;
> > > +			ret = func(&res, kbuf);
> > > +			if (ret)
> > > +				break;
> > > +		}
> > > +
> > > +	return ret;
> > > +}
> > > 
> > 
> > With these changes, what we have is almost:
> > arch/powerpc/kernel/machine_kexec_file_64.c::arch_kexec_walk_mem() !
> > (the difference being powerpc doesn't yet support crash-kernels here)
> > 
> > If the argument is walking memblock gives a better answer than the stringy
> > walk_system_ram_res() thing, is there any mileage in moving this code into
> > kexec_file.c, and using it if !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)?
> > 
> > This would save arm64/powerpc having near-identical implementations.
> > 32bit arm keeps memblock if it has kexec, so it may be useful there too if
> > kexec_file_load() support is added.
> 
> Thanks. I've forgot ppc.
> 
> -Takahiro AKASHI
> 
> 
> > 
> > Thanks,
> > 
> > James
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
@ 2018-05-17  2:10         ` Baoquan He
  0 siblings, 0 replies; 156+ messages in thread
From: Baoquan He @ 2018-05-17  2:10 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/07/18 at 02:59pm, AKASHI Takahiro wrote:
> James,
> 
> On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
> > Hi Akashi,
> > 
> > On 25/04/18 07:26, AKASHI Takahiro wrote:
> > > We need to prevent firmware-reserved memory regions, particularly EFI
> > > memory map as well as ACPI tables, from being corrupted by loading
> > > kernel/initrd (or other kexec buffers). We also want to support memory
> > > allocation in top-down manner in addition to default bottom-up.
> > > So let's have arm64 specific arch_kexec_walk_mem() which will search
> > > for available memory ranges in usable memblock list,
> > > i.e. !NOMAP & !reserved, 
> > 
> > > instead of system resource tree.
> > 
> > Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
> > be safe in the EFI-memory-map/ACPI-tables case?
> > 
> > It would be good to avoid having two ways of doing this, and I would like to
> > avoid having extra arch code...
> 
> I know what you mean.
> /proc/iomem or system resource is, in my opinion, not the best place to
> describe memory usage of kernel but rather to describe *physical* hardware
> layout. As we are still discussing about "reserved" memory, I don't want
> to depend on it.
> Along with memblock list, we will have more accurate control over memory
> usage.

In kexec-tools, we see any usable memory as candidate which can be used
to load kexec kernel image/initrd etc. However kexec loading is a
preparation work, it just books those position for later kexec kernel
jumping after "kexec -e", that is why we need kexec_buf to remember
them and do the real content copy of kernel/initrd. Here you use
memblock to search available memory, isn't it deviating too far away
from the original design in kexec-tools. Assume kexec loading and
kexec_file loading should be consistent on loading even though they are
done in different space, kernel space and user space.

I didn't follow the earlier post, may miss something.

Thanks
Baoquan

> 
> > 
> > > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > > new file mode 100644
> > > index 000000000000..f9ebf54ca247
> > > --- /dev/null
> > > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > > @@ -0,0 +1,57 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * kexec_file for arm64
> > > + *
> > > + * Copyright (C) 2018 Linaro Limited
> > > + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > > + *
> > 
> > > + * Most code is derived from arm64 port of kexec-tools
> > 
> > How does kexec-tools walk memblock?
> 
> Will remove this comment from this patch.
> Obviously, this comment is for the rest of the code which will be
> added to succeeding patches (patch #5 and #7).
> 
> 
> > 
> > > + */
> > > +
> > > +#define pr_fmt(fmt) "kexec_file: " fmt
> > > +
> > > +#include <linux/ioport.h>
> > > +#include <linux/kernel.h>
> > > +#include <linux/kexec.h>
> > > +#include <linux/memblock.h>
> > > +
> > > +int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> > > +				int (*func)(struct resource *, void *))
> > > +{
> > > +	phys_addr_t start, end;
> > > +	struct resource res;
> > > +	u64 i;
> > > +	int ret = 0;
> > > +
> > > +	if (kbuf->image->type == KEXEC_TYPE_CRASH)
> > > +		return func(&crashk_res, kbuf);
> > > +
> > > +	if (kbuf->top_down)
> > > +		for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,
> > > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > > +				&start, &end, NULL) {
> > 
> > for_each_free_mem_range_reverse() is a more readable version of this helper.
> 
> OK. I used to use my own limited list of reserved memory instead of
> memblock.reserved here to exclude verbose ranges.
> 
> 
> > > +			if (!memblock_is_map_memory(start))
> > > +				continue;
> > 
> > Passing MEMBLOCK_NONE means this walk will never find MEMBLOCK_NOMAP memory.
> 
> Sure, I confirmed it.
> 
> > 
> > > +			res.start = start;
> > > +			res.end = end;
> > > +			ret = func(&res, kbuf);
> > > +			if (ret)
> > > +				break;
> > > +		}
> > > +	else
> > > +		for_each_mem_range(i, &memblock.memory, &memblock.reserved,
> > > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > > +				&start, &end, NULL) {
> > 
> > for_each_free_mem_range()?
> 
> OK.
> 
> > > +			if (!memblock_is_map_memory(start))
> > > +				continue;
> > > +
> > > +			res.start = start;
> > > +			res.end = end;
> > > +			ret = func(&res, kbuf);
> > > +			if (ret)
> > > +				break;
> > > +		}
> > > +
> > > +	return ret;
> > > +}
> > > 
> > 
> > With these changes, what we have is almost:
> > arch/powerpc/kernel/machine_kexec_file_64.c::arch_kexec_walk_mem() !
> > (the difference being powerpc doesn't yet support crash-kernels here)
> > 
> > If the argument is walking memblock gives a better answer than the stringy
> > walk_system_ram_res() thing, is there any mileage in moving this code into
> > kexec_file.c, and using it if !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)?
> > 
> > This would save arm64/powerpc having near-identical implementations.
> > 32bit arm keeps memblock if it has kexec, so it may be useful there too if
> > kexec_file_load() support is added.
> 
> Thanks. I've forgot ppc.
> 
> -Takahiro AKASHI
> 
> 
> > 
> > Thanks,
> > 
> > James
> 
> _______________________________________________
> kexec mailing list
> kexec at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
@ 2018-05-17  2:10         ` Baoquan He
  0 siblings, 0 replies; 156+ messages in thread
From: Baoquan He @ 2018-05-17  2:10 UTC (permalink / raw)
  To: AKASHI Takahiro, James Morse, catalin.marinas, will.deacon,
	dhowells, vgoyal, herbert, davem, dyoung, arnd, ard.biesheuvel,
	bhsharma, kexec, linux-arm-kernel, linux-kernel

On 05/07/18 at 02:59pm, AKASHI Takahiro wrote:
> James,
> 
> On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
> > Hi Akashi,
> > 
> > On 25/04/18 07:26, AKASHI Takahiro wrote:
> > > We need to prevent firmware-reserved memory regions, particularly EFI
> > > memory map as well as ACPI tables, from being corrupted by loading
> > > kernel/initrd (or other kexec buffers). We also want to support memory
> > > allocation in top-down manner in addition to default bottom-up.
> > > So let's have arm64 specific arch_kexec_walk_mem() which will search
> > > for available memory ranges in usable memblock list,
> > > i.e. !NOMAP & !reserved, 
> > 
> > > instead of system resource tree.
> > 
> > Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
> > be safe in the EFI-memory-map/ACPI-tables case?
> > 
> > It would be good to avoid having two ways of doing this, and I would like to
> > avoid having extra arch code...
> 
> I know what you mean.
> /proc/iomem or system resource is, in my opinion, not the best place to
> describe memory usage of kernel but rather to describe *physical* hardware
> layout. As we are still discussing about "reserved" memory, I don't want
> to depend on it.
> Along with memblock list, we will have more accurate control over memory
> usage.

In kexec-tools, we see any usable memory as candidate which can be used
to load kexec kernel image/initrd etc. However kexec loading is a
preparation work, it just books those position for later kexec kernel
jumping after "kexec -e", that is why we need kexec_buf to remember
them and do the real content copy of kernel/initrd. Here you use
memblock to search available memory, isn't it deviating too far away
from the original design in kexec-tools. Assume kexec loading and
kexec_file loading should be consistent on loading even though they are
done in different space, kernel space and user space.

I didn't follow the earlier post, may miss something.

Thanks
Baoquan

> 
> > 
> > > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > > new file mode 100644
> > > index 000000000000..f9ebf54ca247
> > > --- /dev/null
> > > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > > @@ -0,0 +1,57 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * kexec_file for arm64
> > > + *
> > > + * Copyright (C) 2018 Linaro Limited
> > > + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > > + *
> > 
> > > + * Most code is derived from arm64 port of kexec-tools
> > 
> > How does kexec-tools walk memblock?
> 
> Will remove this comment from this patch.
> Obviously, this comment is for the rest of the code which will be
> added to succeeding patches (patch #5 and #7).
> 
> 
> > 
> > > + */
> > > +
> > > +#define pr_fmt(fmt) "kexec_file: " fmt
> > > +
> > > +#include <linux/ioport.h>
> > > +#include <linux/kernel.h>
> > > +#include <linux/kexec.h>
> > > +#include <linux/memblock.h>
> > > +
> > > +int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> > > +				int (*func)(struct resource *, void *))
> > > +{
> > > +	phys_addr_t start, end;
> > > +	struct resource res;
> > > +	u64 i;
> > > +	int ret = 0;
> > > +
> > > +	if (kbuf->image->type == KEXEC_TYPE_CRASH)
> > > +		return func(&crashk_res, kbuf);
> > > +
> > > +	if (kbuf->top_down)
> > > +		for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,
> > > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > > +				&start, &end, NULL) {
> > 
> > for_each_free_mem_range_reverse() is a more readable version of this helper.
> 
> OK. I used to use my own limited list of reserved memory instead of
> memblock.reserved here to exclude verbose ranges.
> 
> 
> > > +			if (!memblock_is_map_memory(start))
> > > +				continue;
> > 
> > Passing MEMBLOCK_NONE means this walk will never find MEMBLOCK_NOMAP memory.
> 
> Sure, I confirmed it.
> 
> > 
> > > +			res.start = start;
> > > +			res.end = end;
> > > +			ret = func(&res, kbuf);
> > > +			if (ret)
> > > +				break;
> > > +		}
> > > +	else
> > > +		for_each_mem_range(i, &memblock.memory, &memblock.reserved,
> > > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > > +				&start, &end, NULL) {
> > 
> > for_each_free_mem_range()?
> 
> OK.
> 
> > > +			if (!memblock_is_map_memory(start))
> > > +				continue;
> > > +
> > > +			res.start = start;
> > > +			res.end = end;
> > > +			ret = func(&res, kbuf);
> > > +			if (ret)
> > > +				break;
> > > +		}
> > > +
> > > +	return ret;
> > > +}
> > > 
> > 
> > With these changes, what we have is almost:
> > arch/powerpc/kernel/machine_kexec_file_64.c::arch_kexec_walk_mem() !
> > (the difference being powerpc doesn't yet support crash-kernels here)
> > 
> > If the argument is walking memblock gives a better answer than the stringy
> > walk_system_ram_res() thing, is there any mileage in moving this code into
> > kexec_file.c, and using it if !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)?
> > 
> > This would save arm64/powerpc having near-identical implementations.
> > 32bit arm keeps memblock if it has kexec, so it may be useful there too if
> > kexec_file_load() support is added.
> 
> Thanks. I've forgot ppc.
> 
> -Takahiro AKASHI
> 
> 
> > 
> > Thanks,
> > 
> > James
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
  2018-05-17  2:10         ` Baoquan He
  (?)
@ 2018-05-17  2:15           ` Baoquan He
  -1 siblings, 0 replies; 156+ messages in thread
From: Baoquan He @ 2018-05-17  2:15 UTC (permalink / raw)
  To: AKASHI Takahiro, James Morse, catalin.marinas, will.deacon,
	dhowells, vgoyal, herbert, davem, dyoung, arnd, ard.biesheuvel,
	bhsharma, kexec, linux-arm-kernel, linux-kernel

On 05/17/18 at 10:10am, Baoquan He wrote:
> On 05/07/18 at 02:59pm, AKASHI Takahiro wrote:
> > James,
> > 
> > On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
> > > Hi Akashi,
> > > 
> > > On 25/04/18 07:26, AKASHI Takahiro wrote:
> > > > We need to prevent firmware-reserved memory regions, particularly EFI
> > > > memory map as well as ACPI tables, from being corrupted by loading
> > > > kernel/initrd (or other kexec buffers). We also want to support memory
> > > > allocation in top-down manner in addition to default bottom-up.
> > > > So let's have arm64 specific arch_kexec_walk_mem() which will search
> > > > for available memory ranges in usable memblock list,
> > > > i.e. !NOMAP & !reserved, 
> > > 
> > > > instead of system resource tree.
> > > 
> > > Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
> > > be safe in the EFI-memory-map/ACPI-tables case?
> > > 
> > > It would be good to avoid having two ways of doing this, and I would like to
> > > avoid having extra arch code...
> > 
> > I know what you mean.
> > /proc/iomem or system resource is, in my opinion, not the best place to
> > describe memory usage of kernel but rather to describe *physical* hardware
> > layout. As we are still discussing about "reserved" memory, I don't want
> > to depend on it.
> > Along with memblock list, we will have more accurate control over memory
> > usage.
> 
> In kexec-tools, we see any usable memory as candidate which can be used

Here I said 'any', it's not accurate. Those memory which need be passed
to 2nd kernel for use need be excluded, just as we have done in
kexec-tools.

> to load kexec kernel image/initrd etc. However kexec loading is a
> preparation work, it just books those position for later kexec kernel
> jumping after "kexec -e", that is why we need kexec_buf to remember
> them and do the real content copy of kernel/initrd. Here you use
> memblock to search available memory, isn't it deviating too far away
> from the original design in kexec-tools. Assume kexec loading and
> kexec_file loading should be consistent on loading even though they are
> done in different space, kernel space and user space.
> 
> I didn't follow the earlier post, may miss something.
> 
> Thanks
> Baoquan
> 
> > 
> > > 
> > > > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > > > new file mode 100644
> > > > index 000000000000..f9ebf54ca247
> > > > --- /dev/null
> > > > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > > > @@ -0,0 +1,57 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +/*
> > > > + * kexec_file for arm64
> > > > + *
> > > > + * Copyright (C) 2018 Linaro Limited
> > > > + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > > > + *
> > > 
> > > > + * Most code is derived from arm64 port of kexec-tools
> > > 
> > > How does kexec-tools walk memblock?
> > 
> > Will remove this comment from this patch.
> > Obviously, this comment is for the rest of the code which will be
> > added to succeeding patches (patch #5 and #7).
> > 
> > 
> > > 
> > > > + */
> > > > +
> > > > +#define pr_fmt(fmt) "kexec_file: " fmt
> > > > +
> > > > +#include <linux/ioport.h>
> > > > +#include <linux/kernel.h>
> > > > +#include <linux/kexec.h>
> > > > +#include <linux/memblock.h>
> > > > +
> > > > +int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> > > > +				int (*func)(struct resource *, void *))
> > > > +{
> > > > +	phys_addr_t start, end;
> > > > +	struct resource res;
> > > > +	u64 i;
> > > > +	int ret = 0;
> > > > +
> > > > +	if (kbuf->image->type == KEXEC_TYPE_CRASH)
> > > > +		return func(&crashk_res, kbuf);
> > > > +
> > > > +	if (kbuf->top_down)
> > > > +		for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,
> > > > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > > > +				&start, &end, NULL) {
> > > 
> > > for_each_free_mem_range_reverse() is a more readable version of this helper.
> > 
> > OK. I used to use my own limited list of reserved memory instead of
> > memblock.reserved here to exclude verbose ranges.
> > 
> > 
> > > > +			if (!memblock_is_map_memory(start))
> > > > +				continue;
> > > 
> > > Passing MEMBLOCK_NONE means this walk will never find MEMBLOCK_NOMAP memory.
> > 
> > Sure, I confirmed it.
> > 
> > > 
> > > > +			res.start = start;
> > > > +			res.end = end;
> > > > +			ret = func(&res, kbuf);
> > > > +			if (ret)
> > > > +				break;
> > > > +		}
> > > > +	else
> > > > +		for_each_mem_range(i, &memblock.memory, &memblock.reserved,
> > > > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > > > +				&start, &end, NULL) {
> > > 
> > > for_each_free_mem_range()?
> > 
> > OK.
> > 
> > > > +			if (!memblock_is_map_memory(start))
> > > > +				continue;
> > > > +
> > > > +			res.start = start;
> > > > +			res.end = end;
> > > > +			ret = func(&res, kbuf);
> > > > +			if (ret)
> > > > +				break;
> > > > +		}
> > > > +
> > > > +	return ret;
> > > > +}
> > > > 
> > > 
> > > With these changes, what we have is almost:
> > > arch/powerpc/kernel/machine_kexec_file_64.c::arch_kexec_walk_mem() !
> > > (the difference being powerpc doesn't yet support crash-kernels here)
> > > 
> > > If the argument is walking memblock gives a better answer than the stringy
> > > walk_system_ram_res() thing, is there any mileage in moving this code into
> > > kexec_file.c, and using it if !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)?
> > > 
> > > This would save arm64/powerpc having near-identical implementations.
> > > 32bit arm keeps memblock if it has kexec, so it may be useful there too if
> > > kexec_file_load() support is added.
> > 
> > Thanks. I've forgot ppc.
> > 
> > -Takahiro AKASHI
> > 
> > 
> > > 
> > > Thanks,
> > > 
> > > James
> > 
> > _______________________________________________
> > kexec mailing list
> > kexec@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/kexec
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
@ 2018-05-17  2:15           ` Baoquan He
  0 siblings, 0 replies; 156+ messages in thread
From: Baoquan He @ 2018-05-17  2:15 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/17/18 at 10:10am, Baoquan He wrote:
> On 05/07/18 at 02:59pm, AKASHI Takahiro wrote:
> > James,
> > 
> > On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
> > > Hi Akashi,
> > > 
> > > On 25/04/18 07:26, AKASHI Takahiro wrote:
> > > > We need to prevent firmware-reserved memory regions, particularly EFI
> > > > memory map as well as ACPI tables, from being corrupted by loading
> > > > kernel/initrd (or other kexec buffers). We also want to support memory
> > > > allocation in top-down manner in addition to default bottom-up.
> > > > So let's have arm64 specific arch_kexec_walk_mem() which will search
> > > > for available memory ranges in usable memblock list,
> > > > i.e. !NOMAP & !reserved, 
> > > 
> > > > instead of system resource tree.
> > > 
> > > Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
> > > be safe in the EFI-memory-map/ACPI-tables case?
> > > 
> > > It would be good to avoid having two ways of doing this, and I would like to
> > > avoid having extra arch code...
> > 
> > I know what you mean.
> > /proc/iomem or system resource is, in my opinion, not the best place to
> > describe memory usage of kernel but rather to describe *physical* hardware
> > layout. As we are still discussing about "reserved" memory, I don't want
> > to depend on it.
> > Along with memblock list, we will have more accurate control over memory
> > usage.
> 
> In kexec-tools, we see any usable memory as candidate which can be used

Here I said 'any', it's not accurate. Those memory which need be passed
to 2nd kernel for use need be excluded, just as we have done in
kexec-tools.

> to load kexec kernel image/initrd etc. However kexec loading is a
> preparation work, it just books those position for later kexec kernel
> jumping after "kexec -e", that is why we need kexec_buf to remember
> them and do the real content copy of kernel/initrd. Here you use
> memblock to search available memory, isn't it deviating too far away
> from the original design in kexec-tools. Assume kexec loading and
> kexec_file loading should be consistent on loading even though they are
> done in different space, kernel space and user space.
> 
> I didn't follow the earlier post, may miss something.
> 
> Thanks
> Baoquan
> 
> > 
> > > 
> > > > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > > > new file mode 100644
> > > > index 000000000000..f9ebf54ca247
> > > > --- /dev/null
> > > > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > > > @@ -0,0 +1,57 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +/*
> > > > + * kexec_file for arm64
> > > > + *
> > > > + * Copyright (C) 2018 Linaro Limited
> > > > + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > > > + *
> > > 
> > > > + * Most code is derived from arm64 port of kexec-tools
> > > 
> > > How does kexec-tools walk memblock?
> > 
> > Will remove this comment from this patch.
> > Obviously, this comment is for the rest of the code which will be
> > added to succeeding patches (patch #5 and #7).
> > 
> > 
> > > 
> > > > + */
> > > > +
> > > > +#define pr_fmt(fmt) "kexec_file: " fmt
> > > > +
> > > > +#include <linux/ioport.h>
> > > > +#include <linux/kernel.h>
> > > > +#include <linux/kexec.h>
> > > > +#include <linux/memblock.h>
> > > > +
> > > > +int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> > > > +				int (*func)(struct resource *, void *))
> > > > +{
> > > > +	phys_addr_t start, end;
> > > > +	struct resource res;
> > > > +	u64 i;
> > > > +	int ret = 0;
> > > > +
> > > > +	if (kbuf->image->type == KEXEC_TYPE_CRASH)
> > > > +		return func(&crashk_res, kbuf);
> > > > +
> > > > +	if (kbuf->top_down)
> > > > +		for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,
> > > > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > > > +				&start, &end, NULL) {
> > > 
> > > for_each_free_mem_range_reverse() is a more readable version of this helper.
> > 
> > OK. I used to use my own limited list of reserved memory instead of
> > memblock.reserved here to exclude verbose ranges.
> > 
> > 
> > > > +			if (!memblock_is_map_memory(start))
> > > > +				continue;
> > > 
> > > Passing MEMBLOCK_NONE means this walk will never find MEMBLOCK_NOMAP memory.
> > 
> > Sure, I confirmed it.
> > 
> > > 
> > > > +			res.start = start;
> > > > +			res.end = end;
> > > > +			ret = func(&res, kbuf);
> > > > +			if (ret)
> > > > +				break;
> > > > +		}
> > > > +	else
> > > > +		for_each_mem_range(i, &memblock.memory, &memblock.reserved,
> > > > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > > > +				&start, &end, NULL) {
> > > 
> > > for_each_free_mem_range()?
> > 
> > OK.
> > 
> > > > +			if (!memblock_is_map_memory(start))
> > > > +				continue;
> > > > +
> > > > +			res.start = start;
> > > > +			res.end = end;
> > > > +			ret = func(&res, kbuf);
> > > > +			if (ret)
> > > > +				break;
> > > > +		}
> > > > +
> > > > +	return ret;
> > > > +}
> > > > 
> > > 
> > > With these changes, what we have is almost:
> > > arch/powerpc/kernel/machine_kexec_file_64.c::arch_kexec_walk_mem() !
> > > (the difference being powerpc doesn't yet support crash-kernels here)
> > > 
> > > If the argument is walking memblock gives a better answer than the stringy
> > > walk_system_ram_res() thing, is there any mileage in moving this code into
> > > kexec_file.c, and using it if !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)?
> > > 
> > > This would save arm64/powerpc having near-identical implementations.
> > > 32bit arm keeps memblock if it has kexec, so it may be useful there too if
> > > kexec_file_load() support is added.
> > 
> > Thanks. I've forgot ppc.
> > 
> > -Takahiro AKASHI
> > 
> > 
> > > 
> > > Thanks,
> > > 
> > > James
> > 
> > _______________________________________________
> > kexec mailing list
> > kexec at lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/kexec
> 
> _______________________________________________
> kexec mailing list
> kexec at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
@ 2018-05-17  2:15           ` Baoquan He
  0 siblings, 0 replies; 156+ messages in thread
From: Baoquan He @ 2018-05-17  2:15 UTC (permalink / raw)
  To: AKASHI Takahiro, James Morse, catalin.marinas, will.deacon,
	dhowells, vgoyal, herbert, davem, dyoung, arnd, ard.biesheuvel,
	bhsharma, kexec, linux-arm-kernel, linux-kernel

On 05/17/18 at 10:10am, Baoquan He wrote:
> On 05/07/18 at 02:59pm, AKASHI Takahiro wrote:
> > James,
> > 
> > On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
> > > Hi Akashi,
> > > 
> > > On 25/04/18 07:26, AKASHI Takahiro wrote:
> > > > We need to prevent firmware-reserved memory regions, particularly EFI
> > > > memory map as well as ACPI tables, from being corrupted by loading
> > > > kernel/initrd (or other kexec buffers). We also want to support memory
> > > > allocation in top-down manner in addition to default bottom-up.
> > > > So let's have arm64 specific arch_kexec_walk_mem() which will search
> > > > for available memory ranges in usable memblock list,
> > > > i.e. !NOMAP & !reserved, 
> > > 
> > > > instead of system resource tree.
> > > 
> > > Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
> > > be safe in the EFI-memory-map/ACPI-tables case?
> > > 
> > > It would be good to avoid having two ways of doing this, and I would like to
> > > avoid having extra arch code...
> > 
> > I know what you mean.
> > /proc/iomem or system resource is, in my opinion, not the best place to
> > describe memory usage of kernel but rather to describe *physical* hardware
> > layout. As we are still discussing about "reserved" memory, I don't want
> > to depend on it.
> > Along with memblock list, we will have more accurate control over memory
> > usage.
> 
> In kexec-tools, we see any usable memory as candidate which can be used

Here I said 'any', it's not accurate. Those memory which need be passed
to 2nd kernel for use need be excluded, just as we have done in
kexec-tools.

> to load kexec kernel image/initrd etc. However kexec loading is a
> preparation work, it just books those position for later kexec kernel
> jumping after "kexec -e", that is why we need kexec_buf to remember
> them and do the real content copy of kernel/initrd. Here you use
> memblock to search available memory, isn't it deviating too far away
> from the original design in kexec-tools. Assume kexec loading and
> kexec_file loading should be consistent on loading even though they are
> done in different space, kernel space and user space.
> 
> I didn't follow the earlier post, may miss something.
> 
> Thanks
> Baoquan
> 
> > 
> > > 
> > > > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > > > new file mode 100644
> > > > index 000000000000..f9ebf54ca247
> > > > --- /dev/null
> > > > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > > > @@ -0,0 +1,57 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +/*
> > > > + * kexec_file for arm64
> > > > + *
> > > > + * Copyright (C) 2018 Linaro Limited
> > > > + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > > > + *
> > > 
> > > > + * Most code is derived from arm64 port of kexec-tools
> > > 
> > > How does kexec-tools walk memblock?
> > 
> > Will remove this comment from this patch.
> > Obviously, this comment is for the rest of the code which will be
> > added to succeeding patches (patch #5 and #7).
> > 
> > 
> > > 
> > > > + */
> > > > +
> > > > +#define pr_fmt(fmt) "kexec_file: " fmt
> > > > +
> > > > +#include <linux/ioport.h>
> > > > +#include <linux/kernel.h>
> > > > +#include <linux/kexec.h>
> > > > +#include <linux/memblock.h>
> > > > +
> > > > +int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> > > > +				int (*func)(struct resource *, void *))
> > > > +{
> > > > +	phys_addr_t start, end;
> > > > +	struct resource res;
> > > > +	u64 i;
> > > > +	int ret = 0;
> > > > +
> > > > +	if (kbuf->image->type == KEXEC_TYPE_CRASH)
> > > > +		return func(&crashk_res, kbuf);
> > > > +
> > > > +	if (kbuf->top_down)
> > > > +		for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,
> > > > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > > > +				&start, &end, NULL) {
> > > 
> > > for_each_free_mem_range_reverse() is a more readable version of this helper.
> > 
> > OK. I used to use my own limited list of reserved memory instead of
> > memblock.reserved here to exclude verbose ranges.
> > 
> > 
> > > > +			if (!memblock_is_map_memory(start))
> > > > +				continue;
> > > 
> > > Passing MEMBLOCK_NONE means this walk will never find MEMBLOCK_NOMAP memory.
> > 
> > Sure, I confirmed it.
> > 
> > > 
> > > > +			res.start = start;
> > > > +			res.end = end;
> > > > +			ret = func(&res, kbuf);
> > > > +			if (ret)
> > > > +				break;
> > > > +		}
> > > > +	else
> > > > +		for_each_mem_range(i, &memblock.memory, &memblock.reserved,
> > > > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > > > +				&start, &end, NULL) {
> > > 
> > > for_each_free_mem_range()?
> > 
> > OK.
> > 
> > > > +			if (!memblock_is_map_memory(start))
> > > > +				continue;
> > > > +
> > > > +			res.start = start;
> > > > +			res.end = end;
> > > > +			ret = func(&res, kbuf);
> > > > +			if (ret)
> > > > +				break;
> > > > +		}
> > > > +
> > > > +	return ret;
> > > > +}
> > > > 
> > > 
> > > With these changes, what we have is almost:
> > > arch/powerpc/kernel/machine_kexec_file_64.c::arch_kexec_walk_mem() !
> > > (the difference being powerpc doesn't yet support crash-kernels here)
> > > 
> > > If the argument is walking memblock gives a better answer than the stringy
> > > walk_system_ram_res() thing, is there any mileage in moving this code into
> > > kexec_file.c, and using it if !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)?
> > > 
> > > This would save arm64/powerpc having near-identical implementations.
> > > 32bit arm keeps memblock if it has kexec, so it may be useful there too if
> > > kexec_file_load() support is added.
> > 
> > Thanks. I've forgot ppc.
> > 
> > -Takahiro AKASHI
> > 
> > 
> > > 
> > > Thanks,
> > > 
> > > James
> > 
> > _______________________________________________
> > kexec mailing list
> > kexec@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/kexec
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
  2018-05-17  2:15           ` Baoquan He
  (?)
@ 2018-05-17 18:04             ` James Morse
  -1 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-17 18:04 UTC (permalink / raw)
  To: Baoquan He, AKASHI Takahiro, catalin.marinas, will.deacon,
	dhowells, vgoyal, herbert, davem, dyoung, arnd, ard.biesheuvel,
	bhsharma, kexec, linux-arm-kernel, linux-kernel

Hi Baoquan,

On 17/05/18 03:15, Baoquan He wrote:
> On 05/17/18 at 10:10am, Baoquan He wrote:
>> On 05/07/18 at 02:59pm, AKASHI Takahiro wrote:
>>> On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
>>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>>>> We need to prevent firmware-reserved memory regions, particularly EFI
>>>>> memory map as well as ACPI tables, from being corrupted by loading
>>>>> kernel/initrd (or other kexec buffers). We also want to support memory
>>>>> allocation in top-down manner in addition to default bottom-up.
>>>>> So let's have arm64 specific arch_kexec_walk_mem() which will search
>>>>> for available memory ranges in usable memblock list,
>>>>> i.e. !NOMAP & !reserved, 
>>>>
>>>>> instead of system resource tree.
>>>>
>>>> Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
>>>> be safe in the EFI-memory-map/ACPI-tables case?
>>>>
>>>> It would be good to avoid having two ways of doing this, and I would like to
>>>> avoid having extra arch code...
>>>
>>> I know what you mean.
>>> /proc/iomem or system resource is, in my opinion, not the best place to
>>> describe memory usage of kernel but rather to describe *physical* hardware
>>> layout. As we are still discussing about "reserved" memory, I don't want
>>> to depend on it.
>>> Along with memblock list, we will have more accurate control over memory
>>> usage.
>>
>> In kexec-tools, we see any usable memory as candidate which can be used
> 
> Here I said 'any', it's not accurate. Those memory which need be passed
> to 2nd kernel for use need be excluded, just as we have done in
> kexec-tools.
> 
>> to load kexec kernel image/initrd etc. However kexec loading is a
>> preparation work, it just books those position for later kexec kernel
>> jumping after "kexec -e", that is why we need kexec_buf to remember
>> them and do the real content copy of kernel/initrd.

The problem we have on arm64 is /proc/iomem is being used for two things.
1) Kexec's this is memory I can book for the new kernel.
2) Kdump's this is memory I must describe for vmcore.

We get the memory map from UEFI via the EFI stub, and leave it in
memblock_reserved() memory. A new kexec kernel needs this to boot: it mustn't
overwrite it. The same goes for the ACPI tables, they could be reclaimed and
used as memory, but the new kexec kernel needs them to boot, they are
memblock_reserved() too.

If we knock all memblock_reserved() regions out of /proc/iomem then kdump
doesn't work, because /proc/iomem is only generated once. Its a snapshot. The
initcode/data is an example of memory we release from memblock_reserve() after
this, then gets used for data we need in the vmcore.

Ideally we would describe all this in /proc/iomem with:
| 8001e80000-83ff186fff : System RAM
|   8002080000-8002feffff : [Data you really need to boot]

kexec-tools should not overwrite 'data you really need to boot' unless it knows
what it is, and that the system will never need it again. (examples: overwrite
the ACPI tables when booting a non-acpi kernel, overwrite the UEFI memory map if
the DT has been regenerated for a non-uefi kernel)

But, kexec-tools doesn't parse those second level entries properly. We have a
bug in user-space, and a bug in the kernel.

Because /proc/iomem is being used for two things, and kexec-tools only parses
one level, I don't think we can fix this in the kernel without breaking one of
the use-cases. I think Akashi's fix user-space too approach is the most
pragmatic approach.


>> Here you use
>> memblock to search available memory, isn't it deviating too far away
>> from the original design in kexec-tools. Assume kexec loading and
>> kexec_file loading should be consistent on loading even though they are
>> done in different space, kernel space and user space.

Its much easier for us to parse memblock in the kernel as the helpers step over
the regions we know we don't want. For the resource list we would need to
strcmp(), and a bunch of handling for the second level entries.


Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
@ 2018-05-17 18:04             ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-17 18:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Baoquan,

On 17/05/18 03:15, Baoquan He wrote:
> On 05/17/18 at 10:10am, Baoquan He wrote:
>> On 05/07/18 at 02:59pm, AKASHI Takahiro wrote:
>>> On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
>>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>>>> We need to prevent firmware-reserved memory regions, particularly EFI
>>>>> memory map as well as ACPI tables, from being corrupted by loading
>>>>> kernel/initrd (or other kexec buffers). We also want to support memory
>>>>> allocation in top-down manner in addition to default bottom-up.
>>>>> So let's have arm64 specific arch_kexec_walk_mem() which will search
>>>>> for available memory ranges in usable memblock list,
>>>>> i.e. !NOMAP & !reserved, 
>>>>
>>>>> instead of system resource tree.
>>>>
>>>> Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
>>>> be safe in the EFI-memory-map/ACPI-tables case?
>>>>
>>>> It would be good to avoid having two ways of doing this, and I would like to
>>>> avoid having extra arch code...
>>>
>>> I know what you mean.
>>> /proc/iomem or system resource is, in my opinion, not the best place to
>>> describe memory usage of kernel but rather to describe *physical* hardware
>>> layout. As we are still discussing about "reserved" memory, I don't want
>>> to depend on it.
>>> Along with memblock list, we will have more accurate control over memory
>>> usage.
>>
>> In kexec-tools, we see any usable memory as candidate which can be used
> 
> Here I said 'any', it's not accurate. Those memory which need be passed
> to 2nd kernel for use need be excluded, just as we have done in
> kexec-tools.
> 
>> to load kexec kernel image/initrd etc. However kexec loading is a
>> preparation work, it just books those position for later kexec kernel
>> jumping after "kexec -e", that is why we need kexec_buf to remember
>> them and do the real content copy of kernel/initrd.

The problem we have on arm64 is /proc/iomem is being used for two things.
1) Kexec's this is memory I can book for the new kernel.
2) Kdump's this is memory I must describe for vmcore.

We get the memory map from UEFI via the EFI stub, and leave it in
memblock_reserved() memory. A new kexec kernel needs this to boot: it mustn't
overwrite it. The same goes for the ACPI tables, they could be reclaimed and
used as memory, but the new kexec kernel needs them to boot, they are
memblock_reserved() too.

If we knock all memblock_reserved() regions out of /proc/iomem then kdump
doesn't work, because /proc/iomem is only generated once. Its a snapshot. The
initcode/data is an example of memory we release from memblock_reserve() after
this, then gets used for data we need in the vmcore.

Ideally we would describe all this in /proc/iomem with:
| 8001e80000-83ff186fff : System RAM
|   8002080000-8002feffff : [Data you really need to boot]

kexec-tools should not overwrite 'data you really need to boot' unless it knows
what it is, and that the system will never need it again. (examples: overwrite
the ACPI tables when booting a non-acpi kernel, overwrite the UEFI memory map if
the DT has been regenerated for a non-uefi kernel)

But, kexec-tools doesn't parse those second level entries properly. We have a
bug in user-space, and a bug in the kernel.

Because /proc/iomem is being used for two things, and kexec-tools only parses
one level, I don't think we can fix this in the kernel without breaking one of
the use-cases. I think Akashi's fix user-space too approach is the most
pragmatic approach.


>> Here you use
>> memblock to search available memory, isn't it deviating too far away
>> from the original design in kexec-tools. Assume kexec loading and
>> kexec_file loading should be consistent on loading even though they are
>> done in different space, kernel space and user space.

Its much easier for us to parse memblock in the kernel as the helpers step over
the regions we know we don't want. For the resource list we would need to
strcmp(), and a bunch of handling for the second level entries.


Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
@ 2018-05-17 18:04             ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-17 18:04 UTC (permalink / raw)
  To: Baoquan He, AKASHI Takahiro, catalin.marinas, will.deacon,
	dhowells, vgoyal, herbert, davem, dyoung, arnd, ard.biesheuvel,
	bhsharma, kexec, linux-arm-kernel, linux-kernel

Hi Baoquan,

On 17/05/18 03:15, Baoquan He wrote:
> On 05/17/18 at 10:10am, Baoquan He wrote:
>> On 05/07/18 at 02:59pm, AKASHI Takahiro wrote:
>>> On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
>>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>>>> We need to prevent firmware-reserved memory regions, particularly EFI
>>>>> memory map as well as ACPI tables, from being corrupted by loading
>>>>> kernel/initrd (or other kexec buffers). We also want to support memory
>>>>> allocation in top-down manner in addition to default bottom-up.
>>>>> So let's have arm64 specific arch_kexec_walk_mem() which will search
>>>>> for available memory ranges in usable memblock list,
>>>>> i.e. !NOMAP & !reserved, 
>>>>
>>>>> instead of system resource tree.
>>>>
>>>> Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
>>>> be safe in the EFI-memory-map/ACPI-tables case?
>>>>
>>>> It would be good to avoid having two ways of doing this, and I would like to
>>>> avoid having extra arch code...
>>>
>>> I know what you mean.
>>> /proc/iomem or system resource is, in my opinion, not the best place to
>>> describe memory usage of kernel but rather to describe *physical* hardware
>>> layout. As we are still discussing about "reserved" memory, I don't want
>>> to depend on it.
>>> Along with memblock list, we will have more accurate control over memory
>>> usage.
>>
>> In kexec-tools, we see any usable memory as candidate which can be used
> 
> Here I said 'any', it's not accurate. Those memory which need be passed
> to 2nd kernel for use need be excluded, just as we have done in
> kexec-tools.
> 
>> to load kexec kernel image/initrd etc. However kexec loading is a
>> preparation work, it just books those position for later kexec kernel
>> jumping after "kexec -e", that is why we need kexec_buf to remember
>> them and do the real content copy of kernel/initrd.

The problem we have on arm64 is /proc/iomem is being used for two things.
1) Kexec's this is memory I can book for the new kernel.
2) Kdump's this is memory I must describe for vmcore.

We get the memory map from UEFI via the EFI stub, and leave it in
memblock_reserved() memory. A new kexec kernel needs this to boot: it mustn't
overwrite it. The same goes for the ACPI tables, they could be reclaimed and
used as memory, but the new kexec kernel needs them to boot, they are
memblock_reserved() too.

If we knock all memblock_reserved() regions out of /proc/iomem then kdump
doesn't work, because /proc/iomem is only generated once. Its a snapshot. The
initcode/data is an example of memory we release from memblock_reserve() after
this, then gets used for data we need in the vmcore.

Ideally we would describe all this in /proc/iomem with:
| 8001e80000-83ff186fff : System RAM
|   8002080000-8002feffff : [Data you really need to boot]

kexec-tools should not overwrite 'data you really need to boot' unless it knows
what it is, and that the system will never need it again. (examples: overwrite
the ACPI tables when booting a non-acpi kernel, overwrite the UEFI memory map if
the DT has been regenerated for a non-uefi kernel)

But, kexec-tools doesn't parse those second level entries properly. We have a
bug in user-space, and a bug in the kernel.

Because /proc/iomem is being used for two things, and kexec-tools only parses
one level, I don't think we can fix this in the kernel without breaking one of
the use-cases. I think Akashi's fix user-space too approach is the most
pragmatic approach.


>> Here you use
>> memblock to search available memory, isn't it deviating too far away
>> from the original design in kexec-tools. Assume kexec loading and
>> kexec_file loading should be consistent on loading even though they are
>> done in different space, kernel space and user space.

Its much easier for us to parse memblock in the kernel as the helpers step over
the regions we know we don't want. For the resource list we would need to
strcmp(), and a bunch of handling for the second level entries.


Thanks,

James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
  2018-05-17 18:04             ` James Morse
  (?)
@ 2018-05-18  1:37               ` Baoquan He
  -1 siblings, 0 replies; 156+ messages in thread
From: Baoquan He @ 2018-05-18  1:37 UTC (permalink / raw)
  To: James Morse
  Cc: AKASHI Takahiro, catalin.marinas, will.deacon, dhowells, vgoyal,
	herbert, davem, dyoung, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

On 05/17/18 at 07:04pm, James Morse wrote:
> Hi Baoquan,
> 
> On 17/05/18 03:15, Baoquan He wrote:
> > On 05/17/18 at 10:10am, Baoquan He wrote:
> >> On 05/07/18 at 02:59pm, AKASHI Takahiro wrote:
> >>> On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
> >>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
> >>>>> We need to prevent firmware-reserved memory regions, particularly EFI
> >>>>> memory map as well as ACPI tables, from being corrupted by loading
> >>>>> kernel/initrd (or other kexec buffers). We also want to support memory
> >>>>> allocation in top-down manner in addition to default bottom-up.
> >>>>> So let's have arm64 specific arch_kexec_walk_mem() which will search
> >>>>> for available memory ranges in usable memblock list,
> >>>>> i.e. !NOMAP & !reserved, 
> >>>>
> >>>>> instead of system resource tree.
> >>>>
> >>>> Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
> >>>> be safe in the EFI-memory-map/ACPI-tables case?
> >>>>
> >>>> It would be good to avoid having two ways of doing this, and I would like to
> >>>> avoid having extra arch code...
> >>>
> >>> I know what you mean.
> >>> /proc/iomem or system resource is, in my opinion, not the best place to
> >>> describe memory usage of kernel but rather to describe *physical* hardware
> >>> layout. As we are still discussing about "reserved" memory, I don't want
> >>> to depend on it.
> >>> Along with memblock list, we will have more accurate control over memory
> >>> usage.
> >>
> >> In kexec-tools, we see any usable memory as candidate which can be used
> > 
> > Here I said 'any', it's not accurate. Those memory which need be passed
> > to 2nd kernel for use need be excluded, just as we have done in
> > kexec-tools.
> > 
> >> to load kexec kernel image/initrd etc. However kexec loading is a
> >> preparation work, it just books those position for later kexec kernel
> >> jumping after "kexec -e", that is why we need kexec_buf to remember
> >> them and do the real content copy of kernel/initrd.
> 
> The problem we have on arm64 is /proc/iomem is being used for two things.
> 1) Kexec's this is memory I can book for the new kernel.
> 2) Kdump's this is memory I must describe for vmcore.
> 
> We get the memory map from UEFI via the EFI stub, and leave it in
> memblock_reserved() memory. A new kexec kernel needs this to boot: it mustn't
> overwrite it. The same goes for the ACPI tables, they could be reclaimed and
> used as memory, but the new kexec kernel needs them to boot, they are
> memblock_reserved() too.

Thanks for these details. Seems arm64 is different. In x86 64 memblock
is used as bootmem allocator and will be released when buddy takes over.
Mainly, using memblock may bring concern that kexec kernel
will jump to a unfixed position. This creates an unexpected effect as
KASLR is doing, namely kernel could be put at a random position. As we
know, kexec was invented for fast kernel dev testing by bypassing
firmware reset, and has been taken to reboot those huge server with
thousands of devices and large memory for business currently. This extra
unpected KASLR effect may cause annoyance even though people have
disabled KASLR explicitly for a specific testing purpose.

Besides, discarding the /proc/iomem scanning but taking memblock instead
in kernel space works for kexec loading for the time being, the flaw of
/proc/iomem still exists and cause problem for user space kexec-tools,
as pointed out. Do we have a plan for that?

> 
> If we knock all memblock_reserved() regions out of /proc/iomem then kdump
> doesn't work, because /proc/iomem is only generated once. Its a snapshot. The
> initcode/data is an example of memory we release from memblock_reserve() after
> this, then gets used for data we need in the vmcore.

Hmm, I'm a little confused here. We have defined different iores type
for different memory region. If acpi need be reused by kdump/kexec, we
can change to not reclaim it, and add them into /proc/iomem in order to
notify components which rely on them to process.


enum {  
        IORES_DESC_NONE                         = 0,
        IORES_DESC_CRASH_KERNEL                 = 1,
        IORES_DESC_ACPI_TABLES                  = 2,
        IORES_DESC_ACPI_NV_STORAGE              = 3,
        IORES_DESC_PERSISTENT_MEMORY            = 4,
        IORES_DESC_PERSISTENT_MEMORY_LEGACY     = 5,
        IORES_DESC_DEVICE_PRIVATE_MEMORY        = 6,
        IORES_DESC_DEVICE_PUBLIC_MEMORY         = 7,
};


Just walk around and talk about it, limited by poor arm64 knowledge, I
may not have a complete view. If it's not like what I think about, I
will stop, and can come back when I get more background knowledge.

Thanks
Baoquan

> 
> Ideally we would describe all this in /proc/iomem with:
> | 8001e80000-83ff186fff : System RAM
> |   8002080000-8002feffff : [Data you really need to boot]
> 
> kexec-tools should not overwrite 'data you really need to boot' unless it knows
> what it is, and that the system will never need it again. (examples: overwrite
> the ACPI tables when booting a non-acpi kernel, overwrite the UEFI memory map if
> the DT has been regenerated for a non-uefi kernel)
> 
> But, kexec-tools doesn't parse those second level entries properly. We have a
> bug in user-space, and a bug in the kernel.
> 
> Because /proc/iomem is being used for two things, and kexec-tools only parses
> one level, I don't think we can fix this in the kernel without breaking one of
> the use-cases. I think Akashi's fix user-space too approach is the most
> pragmatic approach.
> 
> 
> >> Here you use
> >> memblock to search available memory, isn't it deviating too far away
> >> from the original design in kexec-tools. Assume kexec loading and
> >> kexec_file loading should be consistent on loading even though they are
> >> done in different space, kernel space and user space.
> 
> Its much easier for us to parse memblock in the kernel as the helpers step over
> the regions we know we don't want. For the resource list we would need to
> strcmp(), and a bunch of handling for the second level entries.
> 
> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
@ 2018-05-18  1:37               ` Baoquan He
  0 siblings, 0 replies; 156+ messages in thread
From: Baoquan He @ 2018-05-18  1:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/17/18 at 07:04pm, James Morse wrote:
> Hi Baoquan,
> 
> On 17/05/18 03:15, Baoquan He wrote:
> > On 05/17/18 at 10:10am, Baoquan He wrote:
> >> On 05/07/18 at 02:59pm, AKASHI Takahiro wrote:
> >>> On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
> >>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
> >>>>> We need to prevent firmware-reserved memory regions, particularly EFI
> >>>>> memory map as well as ACPI tables, from being corrupted by loading
> >>>>> kernel/initrd (or other kexec buffers). We also want to support memory
> >>>>> allocation in top-down manner in addition to default bottom-up.
> >>>>> So let's have arm64 specific arch_kexec_walk_mem() which will search
> >>>>> for available memory ranges in usable memblock list,
> >>>>> i.e. !NOMAP & !reserved, 
> >>>>
> >>>>> instead of system resource tree.
> >>>>
> >>>> Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
> >>>> be safe in the EFI-memory-map/ACPI-tables case?
> >>>>
> >>>> It would be good to avoid having two ways of doing this, and I would like to
> >>>> avoid having extra arch code...
> >>>
> >>> I know what you mean.
> >>> /proc/iomem or system resource is, in my opinion, not the best place to
> >>> describe memory usage of kernel but rather to describe *physical* hardware
> >>> layout. As we are still discussing about "reserved" memory, I don't want
> >>> to depend on it.
> >>> Along with memblock list, we will have more accurate control over memory
> >>> usage.
> >>
> >> In kexec-tools, we see any usable memory as candidate which can be used
> > 
> > Here I said 'any', it's not accurate. Those memory which need be passed
> > to 2nd kernel for use need be excluded, just as we have done in
> > kexec-tools.
> > 
> >> to load kexec kernel image/initrd etc. However kexec loading is a
> >> preparation work, it just books those position for later kexec kernel
> >> jumping after "kexec -e", that is why we need kexec_buf to remember
> >> them and do the real content copy of kernel/initrd.
> 
> The problem we have on arm64 is /proc/iomem is being used for two things.
> 1) Kexec's this is memory I can book for the new kernel.
> 2) Kdump's this is memory I must describe for vmcore.
> 
> We get the memory map from UEFI via the EFI stub, and leave it in
> memblock_reserved() memory. A new kexec kernel needs this to boot: it mustn't
> overwrite it. The same goes for the ACPI tables, they could be reclaimed and
> used as memory, but the new kexec kernel needs them to boot, they are
> memblock_reserved() too.

Thanks for these details. Seems arm64 is different. In x86 64 memblock
is used as bootmem allocator and will be released when buddy takes over.
Mainly, using memblock may bring concern that kexec kernel
will jump to a unfixed position. This creates an unexpected effect as
KASLR is doing, namely kernel could be put at a random position. As we
know, kexec was invented for fast kernel dev testing by bypassing
firmware reset, and has been taken to reboot those huge server with
thousands of devices and large memory for business currently. This extra
unpected KASLR effect may cause annoyance even though people have
disabled KASLR explicitly for a specific testing purpose.

Besides, discarding the /proc/iomem scanning but taking memblock instead
in kernel space works for kexec loading for the time being, the flaw of
/proc/iomem still exists and cause problem for user space kexec-tools,
as pointed out. Do we have a plan for that?

> 
> If we knock all memblock_reserved() regions out of /proc/iomem then kdump
> doesn't work, because /proc/iomem is only generated once. Its a snapshot. The
> initcode/data is an example of memory we release from memblock_reserve() after
> this, then gets used for data we need in the vmcore.

Hmm, I'm a little confused here. We have defined different iores type
for different memory region. If acpi need be reused by kdump/kexec, we
can change to not reclaim it, and add them into /proc/iomem in order to
notify components which rely on them to process.


enum {  
        IORES_DESC_NONE                         = 0,
        IORES_DESC_CRASH_KERNEL                 = 1,
        IORES_DESC_ACPI_TABLES                  = 2,
        IORES_DESC_ACPI_NV_STORAGE              = 3,
        IORES_DESC_PERSISTENT_MEMORY            = 4,
        IORES_DESC_PERSISTENT_MEMORY_LEGACY     = 5,
        IORES_DESC_DEVICE_PRIVATE_MEMORY        = 6,
        IORES_DESC_DEVICE_PUBLIC_MEMORY         = 7,
};


Just walk around and talk about it, limited by poor arm64 knowledge, I
may not have a complete view. If it's not like what I think about, I
will stop, and can come back when I get more background knowledge.

Thanks
Baoquan

> 
> Ideally we would describe all this in /proc/iomem with:
> | 8001e80000-83ff186fff : System RAM
> |   8002080000-8002feffff : [Data you really need to boot]
> 
> kexec-tools should not overwrite 'data you really need to boot' unless it knows
> what it is, and that the system will never need it again. (examples: overwrite
> the ACPI tables when booting a non-acpi kernel, overwrite the UEFI memory map if
> the DT has been regenerated for a non-uefi kernel)
> 
> But, kexec-tools doesn't parse those second level entries properly. We have a
> bug in user-space, and a bug in the kernel.
> 
> Because /proc/iomem is being used for two things, and kexec-tools only parses
> one level, I don't think we can fix this in the kernel without breaking one of
> the use-cases. I think Akashi's fix user-space too approach is the most
> pragmatic approach.
> 
> 
> >> Here you use
> >> memblock to search available memory, isn't it deviating too far away
> >> from the original design in kexec-tools. Assume kexec loading and
> >> kexec_file loading should be consistent on loading even though they are
> >> done in different space, kernel space and user space.
> 
> Its much easier for us to parse memblock in the kernel as the helpers step over
> the regions we know we don't want. For the resource list we would need to
> strcmp(), and a bunch of handling for the second level entries.
> 
> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
@ 2018-05-18  1:37               ` Baoquan He
  0 siblings, 0 replies; 156+ messages in thread
From: Baoquan He @ 2018-05-18  1:37 UTC (permalink / raw)
  To: James Morse
  Cc: herbert, arnd, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, AKASHI Takahiro,
	linux-arm-kernel, kexec, dyoung, davem, vgoyal

On 05/17/18 at 07:04pm, James Morse wrote:
> Hi Baoquan,
> 
> On 17/05/18 03:15, Baoquan He wrote:
> > On 05/17/18 at 10:10am, Baoquan He wrote:
> >> On 05/07/18 at 02:59pm, AKASHI Takahiro wrote:
> >>> On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
> >>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
> >>>>> We need to prevent firmware-reserved memory regions, particularly EFI
> >>>>> memory map as well as ACPI tables, from being corrupted by loading
> >>>>> kernel/initrd (or other kexec buffers). We also want to support memory
> >>>>> allocation in top-down manner in addition to default bottom-up.
> >>>>> So let's have arm64 specific arch_kexec_walk_mem() which will search
> >>>>> for available memory ranges in usable memblock list,
> >>>>> i.e. !NOMAP & !reserved, 
> >>>>
> >>>>> instead of system resource tree.
> >>>>
> >>>> Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
> >>>> be safe in the EFI-memory-map/ACPI-tables case?
> >>>>
> >>>> It would be good to avoid having two ways of doing this, and I would like to
> >>>> avoid having extra arch code...
> >>>
> >>> I know what you mean.
> >>> /proc/iomem or system resource is, in my opinion, not the best place to
> >>> describe memory usage of kernel but rather to describe *physical* hardware
> >>> layout. As we are still discussing about "reserved" memory, I don't want
> >>> to depend on it.
> >>> Along with memblock list, we will have more accurate control over memory
> >>> usage.
> >>
> >> In kexec-tools, we see any usable memory as candidate which can be used
> > 
> > Here I said 'any', it's not accurate. Those memory which need be passed
> > to 2nd kernel for use need be excluded, just as we have done in
> > kexec-tools.
> > 
> >> to load kexec kernel image/initrd etc. However kexec loading is a
> >> preparation work, it just books those position for later kexec kernel
> >> jumping after "kexec -e", that is why we need kexec_buf to remember
> >> them and do the real content copy of kernel/initrd.
> 
> The problem we have on arm64 is /proc/iomem is being used for two things.
> 1) Kexec's this is memory I can book for the new kernel.
> 2) Kdump's this is memory I must describe for vmcore.
> 
> We get the memory map from UEFI via the EFI stub, and leave it in
> memblock_reserved() memory. A new kexec kernel needs this to boot: it mustn't
> overwrite it. The same goes for the ACPI tables, they could be reclaimed and
> used as memory, but the new kexec kernel needs them to boot, they are
> memblock_reserved() too.

Thanks for these details. Seems arm64 is different. In x86 64 memblock
is used as bootmem allocator and will be released when buddy takes over.
Mainly, using memblock may bring concern that kexec kernel
will jump to a unfixed position. This creates an unexpected effect as
KASLR is doing, namely kernel could be put at a random position. As we
know, kexec was invented for fast kernel dev testing by bypassing
firmware reset, and has been taken to reboot those huge server with
thousands of devices and large memory for business currently. This extra
unpected KASLR effect may cause annoyance even though people have
disabled KASLR explicitly for a specific testing purpose.

Besides, discarding the /proc/iomem scanning but taking memblock instead
in kernel space works for kexec loading for the time being, the flaw of
/proc/iomem still exists and cause problem for user space kexec-tools,
as pointed out. Do we have a plan for that?

> 
> If we knock all memblock_reserved() regions out of /proc/iomem then kdump
> doesn't work, because /proc/iomem is only generated once. Its a snapshot. The
> initcode/data is an example of memory we release from memblock_reserve() after
> this, then gets used for data we need in the vmcore.

Hmm, I'm a little confused here. We have defined different iores type
for different memory region. If acpi need be reused by kdump/kexec, we
can change to not reclaim it, and add them into /proc/iomem in order to
notify components which rely on them to process.


enum {  
        IORES_DESC_NONE                         = 0,
        IORES_DESC_CRASH_KERNEL                 = 1,
        IORES_DESC_ACPI_TABLES                  = 2,
        IORES_DESC_ACPI_NV_STORAGE              = 3,
        IORES_DESC_PERSISTENT_MEMORY            = 4,
        IORES_DESC_PERSISTENT_MEMORY_LEGACY     = 5,
        IORES_DESC_DEVICE_PRIVATE_MEMORY        = 6,
        IORES_DESC_DEVICE_PUBLIC_MEMORY         = 7,
};


Just walk around and talk about it, limited by poor arm64 knowledge, I
may not have a complete view. If it's not like what I think about, I
will stop, and can come back when I get more background knowledge.

Thanks
Baoquan

> 
> Ideally we would describe all this in /proc/iomem with:
> | 8001e80000-83ff186fff : System RAM
> |   8002080000-8002feffff : [Data you really need to boot]
> 
> kexec-tools should not overwrite 'data you really need to boot' unless it knows
> what it is, and that the system will never need it again. (examples: overwrite
> the ACPI tables when booting a non-acpi kernel, overwrite the UEFI memory map if
> the DT has been regenerated for a non-uefi kernel)
> 
> But, kexec-tools doesn't parse those second level entries properly. We have a
> bug in user-space, and a bug in the kernel.
> 
> Because /proc/iomem is being used for two things, and kexec-tools only parses
> one level, I don't think we can fix this in the kernel without breaking one of
> the use-cases. I think Akashi's fix user-space too approach is the most
> pragmatic approach.
> 
> 
> >> Here you use
> >> memblock to search available memory, isn't it deviating too far away
> >> from the original design in kexec-tools. Assume kexec loading and
> >> kexec_file loading should be consistent on loading even though they are
> >> done in different space, kernel space and user space.
> 
> Its much easier for us to parse memblock in the kernel as the helpers step over
> the regions we know we don't want. For the resource list we would need to
> strcmp(), and a bunch of handling for the second level entries.
> 
> 
> Thanks,
> 
> James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
  2018-05-18  1:37               ` Baoquan He
  (?)
@ 2018-05-18  5:07                 ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-18  5:07 UTC (permalink / raw)
  To: Baoquan He
  Cc: James Morse, catalin.marinas, will.deacon, dhowells, vgoyal,
	herbert, davem, dyoung, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

Baoquan,

On Fri, May 18, 2018 at 09:37:35AM +0800, Baoquan He wrote:
> On 05/17/18 at 07:04pm, James Morse wrote:
> > Hi Baoquan,
> > 
> > On 17/05/18 03:15, Baoquan He wrote:
> > > On 05/17/18 at 10:10am, Baoquan He wrote:
> > >> On 05/07/18 at 02:59pm, AKASHI Takahiro wrote:
> > >>> On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
> > >>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > >>>>> We need to prevent firmware-reserved memory regions, particularly EFI
> > >>>>> memory map as well as ACPI tables, from being corrupted by loading
> > >>>>> kernel/initrd (or other kexec buffers). We also want to support memory
> > >>>>> allocation in top-down manner in addition to default bottom-up.
> > >>>>> So let's have arm64 specific arch_kexec_walk_mem() which will search
> > >>>>> for available memory ranges in usable memblock list,
> > >>>>> i.e. !NOMAP & !reserved, 
> > >>>>
> > >>>>> instead of system resource tree.
> > >>>>
> > >>>> Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
> > >>>> be safe in the EFI-memory-map/ACPI-tables case?
> > >>>>
> > >>>> It would be good to avoid having two ways of doing this, and I would like to
> > >>>> avoid having extra arch code...
> > >>>
> > >>> I know what you mean.
> > >>> /proc/iomem or system resource is, in my opinion, not the best place to
> > >>> describe memory usage of kernel but rather to describe *physical* hardware
> > >>> layout. As we are still discussing about "reserved" memory, I don't want
> > >>> to depend on it.
> > >>> Along with memblock list, we will have more accurate control over memory
> > >>> usage.
> > >>
> > >> In kexec-tools, we see any usable memory as candidate which can be used
> > > 
> > > Here I said 'any', it's not accurate. Those memory which need be passed
> > > to 2nd kernel for use need be excluded, just as we have done in
> > > kexec-tools.
> > > 
> > >> to load kexec kernel image/initrd etc. However kexec loading is a
> > >> preparation work, it just books those position for later kexec kernel
> > >> jumping after "kexec -e", that is why we need kexec_buf to remember
> > >> them and do the real content copy of kernel/initrd.
> > 
> > The problem we have on arm64 is /proc/iomem is being used for two things.
> > 1) Kexec's this is memory I can book for the new kernel.
> > 2) Kdump's this is memory I must describe for vmcore.
> > 
> > We get the memory map from UEFI via the EFI stub, and leave it in
> > memblock_reserved() memory. A new kexec kernel needs this to boot: it mustn't
> > overwrite it. The same goes for the ACPI tables, they could be reclaimed and
> > used as memory, but the new kexec kernel needs them to boot, they are
> > memblock_reserved() too.
> 
> Thanks for these details. Seems arm64 is different. In x86 64 memblock

Thanks to James from me, too.

> is used as bootmem allocator and will be released when buddy takes over.
> Mainly, using memblock may bring concern that kexec kernel
> will jump to a unfixed position. This creates an unexpected effect as
> KASLR is doing, namely kernel could be put at a random position. As we

I don't think that this would be a problem on arm64.

> know, kexec was invented for fast kernel dev testing by bypassing
> firmware reset, and has been taken to reboot those huge server with
> thousands of devices and large memory for business currently. This extra
> unpected KASLR effect may cause annoyance even though people have
> disabled KASLR explicitly for a specific testing purpose.
> 
> Besides, discarding the /proc/iomem scanning but taking memblock instead
> in kernel space works for kexec loading for the time being, the flaw of
> /proc/iomem still exists and cause problem for user space kexec-tools,
> as pointed out. Do we have a plan for that?

This was the difference between my and James' standpoint (at leas initially).
James didn't want to require userspace changes to fix the issue, but
the reality is that, without modifying it, we can't support kexec and kdump
perfectly as James explained in his email.

> > 
> > If we knock all memblock_reserved() regions out of /proc/iomem then kdump
> > doesn't work, because /proc/iomem is only generated once. Its a snapshot. The
> > initcode/data is an example of memory we release from memblock_reserve() after
> > this, then gets used for data we need in the vmcore.
> 
> Hmm, I'm a little confused here. We have defined different iores type
> for different memory region. If acpi need be reused by kdump/kexec, we
> can change to not reclaim it, and add them into /proc/iomem in order to
> notify components which rely on them to process.
> 
> 
> enum {  
>         IORES_DESC_NONE                         = 0,
>         IORES_DESC_CRASH_KERNEL                 = 1,
>         IORES_DESC_ACPI_TABLES                  = 2,
>         IORES_DESC_ACPI_NV_STORAGE              = 3,
>         IORES_DESC_PERSISTENT_MEMORY            = 4,
>         IORES_DESC_PERSISTENT_MEMORY_LEGACY     = 5,
>         IORES_DESC_DEVICE_PRIVATE_MEMORY        = 6,
>         IORES_DESC_DEVICE_PUBLIC_MEMORY         = 7,
> };

I don't think that is the point.
Let me give you analogy: x86 has e820 and handles memory layout in kexec/
kdump with *x86-specific* code in kexec-tools, right? We want to do
something similar without introducing e820-like data.
In the current implementation on arm64, however, kexec-tools will only
recognize top-level entries in /proc/iomem leaving subsequent level of
entries ignored (except kernel text & data).
So adding extra hierarchy to /proc/iomem will break the compatibility
in any way.

The main reason that I insist on memblock in my kexec_file patch
is that we seem to be still far from reaching to agreement and
final solution in kexec (opposite to kexec_file) case.

Thanks,
-Takahiro AKASHI


> 
> Just walk around and talk about it, limited by poor arm64 knowledge, I
> may not have a complete view. If it's not like what I think about, I
> will stop, and can come back when I get more background knowledge.
> 
> Thanks
> Baoquan
> 
> > 
> > Ideally we would describe all this in /proc/iomem with:
> > | 8001e80000-83ff186fff : System RAM
> > |   8002080000-8002feffff : [Data you really need to boot]
> > 
> > kexec-tools should not overwrite 'data you really need to boot' unless it knows
> > what it is, and that the system will never need it again. (examples: overwrite
> > the ACPI tables when booting a non-acpi kernel, overwrite the UEFI memory map if
> > the DT has been regenerated for a non-uefi kernel)
> > 
> > But, kexec-tools doesn't parse those second level entries properly. We have a
> > bug in user-space, and a bug in the kernel.
> > 
> > Because /proc/iomem is being used for two things, and kexec-tools only parses
> > one level, I don't think we can fix this in the kernel without breaking one of
> > the use-cases. I think Akashi's fix user-space too approach is the most
> > pragmatic approach.
> > 
> > 
> > >> Here you use
> > >> memblock to search available memory, isn't it deviating too far away
> > >> from the original design in kexec-tools. Assume kexec loading and
> > >> kexec_file loading should be consistent on loading even though they are
> > >> done in different space, kernel space and user space.
> > 
> > Its much easier for us to parse memblock in the kernel as the helpers step over
> > the regions we know we don't want. For the resource list we would need to
> > strcmp(), and a bunch of handling for the second level entries.
> > 
> > 
> > Thanks,
> > 
> > James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
@ 2018-05-18  5:07                 ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-18  5:07 UTC (permalink / raw)
  To: linux-arm-kernel

Baoquan,

On Fri, May 18, 2018 at 09:37:35AM +0800, Baoquan He wrote:
> On 05/17/18 at 07:04pm, James Morse wrote:
> > Hi Baoquan,
> > 
> > On 17/05/18 03:15, Baoquan He wrote:
> > > On 05/17/18 at 10:10am, Baoquan He wrote:
> > >> On 05/07/18 at 02:59pm, AKASHI Takahiro wrote:
> > >>> On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
> > >>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > >>>>> We need to prevent firmware-reserved memory regions, particularly EFI
> > >>>>> memory map as well as ACPI tables, from being corrupted by loading
> > >>>>> kernel/initrd (or other kexec buffers). We also want to support memory
> > >>>>> allocation in top-down manner in addition to default bottom-up.
> > >>>>> So let's have arm64 specific arch_kexec_walk_mem() which will search
> > >>>>> for available memory ranges in usable memblock list,
> > >>>>> i.e. !NOMAP & !reserved, 
> > >>>>
> > >>>>> instead of system resource tree.
> > >>>>
> > >>>> Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
> > >>>> be safe in the EFI-memory-map/ACPI-tables case?
> > >>>>
> > >>>> It would be good to avoid having two ways of doing this, and I would like to
> > >>>> avoid having extra arch code...
> > >>>
> > >>> I know what you mean.
> > >>> /proc/iomem or system resource is, in my opinion, not the best place to
> > >>> describe memory usage of kernel but rather to describe *physical* hardware
> > >>> layout. As we are still discussing about "reserved" memory, I don't want
> > >>> to depend on it.
> > >>> Along with memblock list, we will have more accurate control over memory
> > >>> usage.
> > >>
> > >> In kexec-tools, we see any usable memory as candidate which can be used
> > > 
> > > Here I said 'any', it's not accurate. Those memory which need be passed
> > > to 2nd kernel for use need be excluded, just as we have done in
> > > kexec-tools.
> > > 
> > >> to load kexec kernel image/initrd etc. However kexec loading is a
> > >> preparation work, it just books those position for later kexec kernel
> > >> jumping after "kexec -e", that is why we need kexec_buf to remember
> > >> them and do the real content copy of kernel/initrd.
> > 
> > The problem we have on arm64 is /proc/iomem is being used for two things.
> > 1) Kexec's this is memory I can book for the new kernel.
> > 2) Kdump's this is memory I must describe for vmcore.
> > 
> > We get the memory map from UEFI via the EFI stub, and leave it in
> > memblock_reserved() memory. A new kexec kernel needs this to boot: it mustn't
> > overwrite it. The same goes for the ACPI tables, they could be reclaimed and
> > used as memory, but the new kexec kernel needs them to boot, they are
> > memblock_reserved() too.
> 
> Thanks for these details. Seems arm64 is different. In x86 64 memblock

Thanks to James from me, too.

> is used as bootmem allocator and will be released when buddy takes over.
> Mainly, using memblock may bring concern that kexec kernel
> will jump to a unfixed position. This creates an unexpected effect as
> KASLR is doing, namely kernel could be put at a random position. As we

I don't think that this would be a problem on arm64.

> know, kexec was invented for fast kernel dev testing by bypassing
> firmware reset, and has been taken to reboot those huge server with
> thousands of devices and large memory for business currently. This extra
> unpected KASLR effect may cause annoyance even though people have
> disabled KASLR explicitly for a specific testing purpose.
> 
> Besides, discarding the /proc/iomem scanning but taking memblock instead
> in kernel space works for kexec loading for the time being, the flaw of
> /proc/iomem still exists and cause problem for user space kexec-tools,
> as pointed out. Do we have a plan for that?

This was the difference between my and James' standpoint (at leas initially).
James didn't want to require userspace changes to fix the issue, but
the reality is that, without modifying it, we can't support kexec and kdump
perfectly as James explained in his email.

> > 
> > If we knock all memblock_reserved() regions out of /proc/iomem then kdump
> > doesn't work, because /proc/iomem is only generated once. Its a snapshot. The
> > initcode/data is an example of memory we release from memblock_reserve() after
> > this, then gets used for data we need in the vmcore.
> 
> Hmm, I'm a little confused here. We have defined different iores type
> for different memory region. If acpi need be reused by kdump/kexec, we
> can change to not reclaim it, and add them into /proc/iomem in order to
> notify components which rely on them to process.
> 
> 
> enum {  
>         IORES_DESC_NONE                         = 0,
>         IORES_DESC_CRASH_KERNEL                 = 1,
>         IORES_DESC_ACPI_TABLES                  = 2,
>         IORES_DESC_ACPI_NV_STORAGE              = 3,
>         IORES_DESC_PERSISTENT_MEMORY            = 4,
>         IORES_DESC_PERSISTENT_MEMORY_LEGACY     = 5,
>         IORES_DESC_DEVICE_PRIVATE_MEMORY        = 6,
>         IORES_DESC_DEVICE_PUBLIC_MEMORY         = 7,
> };

I don't think that is the point.
Let me give you analogy: x86 has e820 and handles memory layout in kexec/
kdump with *x86-specific* code in kexec-tools, right? We want to do
something similar without introducing e820-like data.
In the current implementation on arm64, however, kexec-tools will only
recognize top-level entries in /proc/iomem leaving subsequent level of
entries ignored (except kernel text & data).
So adding extra hierarchy to /proc/iomem will break the compatibility
in any way.

The main reason that I insist on memblock in my kexec_file patch
is that we seem to be still far from reaching to agreement and
final solution in kexec (opposite to kexec_file) case.

Thanks,
-Takahiro AKASHI


> 
> Just walk around and talk about it, limited by poor arm64 knowledge, I
> may not have a complete view. If it's not like what I think about, I
> will stop, and can come back when I get more background knowledge.
> 
> Thanks
> Baoquan
> 
> > 
> > Ideally we would describe all this in /proc/iomem with:
> > | 8001e80000-83ff186fff : System RAM
> > |   8002080000-8002feffff : [Data you really need to boot]
> > 
> > kexec-tools should not overwrite 'data you really need to boot' unless it knows
> > what it is, and that the system will never need it again. (examples: overwrite
> > the ACPI tables when booting a non-acpi kernel, overwrite the UEFI memory map if
> > the DT has been regenerated for a non-uefi kernel)
> > 
> > But, kexec-tools doesn't parse those second level entries properly. We have a
> > bug in user-space, and a bug in the kernel.
> > 
> > Because /proc/iomem is being used for two things, and kexec-tools only parses
> > one level, I don't think we can fix this in the kernel without breaking one of
> > the use-cases. I think Akashi's fix user-space too approach is the most
> > pragmatic approach.
> > 
> > 
> > >> Here you use
> > >> memblock to search available memory, isn't it deviating too far away
> > >> from the original design in kexec-tools. Assume kexec loading and
> > >> kexec_file loading should be consistent on loading even though they are
> > >> done in different space, kernel space and user space.
> > 
> > Its much easier for us to parse memblock in the kernel as the helpers step over
> > the regions we know we don't want. For the resource list we would need to
> > strcmp(), and a bunch of handling for the second level entries.
> > 
> > 
> > Thanks,
> > 
> > James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list
@ 2018-05-18  5:07                 ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-18  5:07 UTC (permalink / raw)
  To: Baoquan He
  Cc: herbert, arnd, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, James Morse,
	linux-arm-kernel, kexec, dyoung, davem, vgoyal

Baoquan,

On Fri, May 18, 2018 at 09:37:35AM +0800, Baoquan He wrote:
> On 05/17/18 at 07:04pm, James Morse wrote:
> > Hi Baoquan,
> > 
> > On 17/05/18 03:15, Baoquan He wrote:
> > > On 05/17/18 at 10:10am, Baoquan He wrote:
> > >> On 05/07/18 at 02:59pm, AKASHI Takahiro wrote:
> > >>> On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
> > >>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > >>>>> We need to prevent firmware-reserved memory regions, particularly EFI
> > >>>>> memory map as well as ACPI tables, from being corrupted by loading
> > >>>>> kernel/initrd (or other kexec buffers). We also want to support memory
> > >>>>> allocation in top-down manner in addition to default bottom-up.
> > >>>>> So let's have arm64 specific arch_kexec_walk_mem() which will search
> > >>>>> for available memory ranges in usable memblock list,
> > >>>>> i.e. !NOMAP & !reserved, 
> > >>>>
> > >>>>> instead of system resource tree.
> > >>>>
> > >>>> Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
> > >>>> be safe in the EFI-memory-map/ACPI-tables case?
> > >>>>
> > >>>> It would be good to avoid having two ways of doing this, and I would like to
> > >>>> avoid having extra arch code...
> > >>>
> > >>> I know what you mean.
> > >>> /proc/iomem or system resource is, in my opinion, not the best place to
> > >>> describe memory usage of kernel but rather to describe *physical* hardware
> > >>> layout. As we are still discussing about "reserved" memory, I don't want
> > >>> to depend on it.
> > >>> Along with memblock list, we will have more accurate control over memory
> > >>> usage.
> > >>
> > >> In kexec-tools, we see any usable memory as candidate which can be used
> > > 
> > > Here I said 'any', it's not accurate. Those memory which need be passed
> > > to 2nd kernel for use need be excluded, just as we have done in
> > > kexec-tools.
> > > 
> > >> to load kexec kernel image/initrd etc. However kexec loading is a
> > >> preparation work, it just books those position for later kexec kernel
> > >> jumping after "kexec -e", that is why we need kexec_buf to remember
> > >> them and do the real content copy of kernel/initrd.
> > 
> > The problem we have on arm64 is /proc/iomem is being used for two things.
> > 1) Kexec's this is memory I can book for the new kernel.
> > 2) Kdump's this is memory I must describe for vmcore.
> > 
> > We get the memory map from UEFI via the EFI stub, and leave it in
> > memblock_reserved() memory. A new kexec kernel needs this to boot: it mustn't
> > overwrite it. The same goes for the ACPI tables, they could be reclaimed and
> > used as memory, but the new kexec kernel needs them to boot, they are
> > memblock_reserved() too.
> 
> Thanks for these details. Seems arm64 is different. In x86 64 memblock

Thanks to James from me, too.

> is used as bootmem allocator and will be released when buddy takes over.
> Mainly, using memblock may bring concern that kexec kernel
> will jump to a unfixed position. This creates an unexpected effect as
> KASLR is doing, namely kernel could be put at a random position. As we

I don't think that this would be a problem on arm64.

> know, kexec was invented for fast kernel dev testing by bypassing
> firmware reset, and has been taken to reboot those huge server with
> thousands of devices and large memory for business currently. This extra
> unpected KASLR effect may cause annoyance even though people have
> disabled KASLR explicitly for a specific testing purpose.
> 
> Besides, discarding the /proc/iomem scanning but taking memblock instead
> in kernel space works for kexec loading for the time being, the flaw of
> /proc/iomem still exists and cause problem for user space kexec-tools,
> as pointed out. Do we have a plan for that?

This was the difference between my and James' standpoint (at leas initially).
James didn't want to require userspace changes to fix the issue, but
the reality is that, without modifying it, we can't support kexec and kdump
perfectly as James explained in his email.

> > 
> > If we knock all memblock_reserved() regions out of /proc/iomem then kdump
> > doesn't work, because /proc/iomem is only generated once. Its a snapshot. The
> > initcode/data is an example of memory we release from memblock_reserve() after
> > this, then gets used for data we need in the vmcore.
> 
> Hmm, I'm a little confused here. We have defined different iores type
> for different memory region. If acpi need be reused by kdump/kexec, we
> can change to not reclaim it, and add them into /proc/iomem in order to
> notify components which rely on them to process.
> 
> 
> enum {  
>         IORES_DESC_NONE                         = 0,
>         IORES_DESC_CRASH_KERNEL                 = 1,
>         IORES_DESC_ACPI_TABLES                  = 2,
>         IORES_DESC_ACPI_NV_STORAGE              = 3,
>         IORES_DESC_PERSISTENT_MEMORY            = 4,
>         IORES_DESC_PERSISTENT_MEMORY_LEGACY     = 5,
>         IORES_DESC_DEVICE_PRIVATE_MEMORY        = 6,
>         IORES_DESC_DEVICE_PUBLIC_MEMORY         = 7,
> };

I don't think that is the point.
Let me give you analogy: x86 has e820 and handles memory layout in kexec/
kdump with *x86-specific* code in kexec-tools, right? We want to do
something similar without introducing e820-like data.
In the current implementation on arm64, however, kexec-tools will only
recognize top-level entries in /proc/iomem leaving subsequent level of
entries ignored (except kernel text & data).
So adding extra hierarchy to /proc/iomem will break the compatibility
in any way.

The main reason that I insist on memblock in my kexec_file patch
is that we seem to be still far from reaching to agreement and
final solution in kexec (opposite to kexec_file) case.

Thanks,
-Takahiro AKASHI


> 
> Just walk around and talk about it, limited by poor arm64 knowledge, I
> may not have a complete view. If it's not like what I think about, I
> will stop, and can come back when I get more background knowledge.
> 
> Thanks
> Baoquan
> 
> > 
> > Ideally we would describe all this in /proc/iomem with:
> > | 8001e80000-83ff186fff : System RAM
> > |   8002080000-8002feffff : [Data you really need to boot]
> > 
> > kexec-tools should not overwrite 'data you really need to boot' unless it knows
> > what it is, and that the system will never need it again. (examples: overwrite
> > the ACPI tables when booting a non-acpi kernel, overwrite the UEFI memory map if
> > the DT has been regenerated for a non-uefi kernel)
> > 
> > But, kexec-tools doesn't parse those second level entries properly. We have a
> > bug in user-space, and a bug in the kernel.
> > 
> > Because /proc/iomem is being used for two things, and kexec-tools only parses
> > one level, I don't think we can fix this in the kernel without breaking one of
> > the use-cases. I think Akashi's fix user-space too approach is the most
> > pragmatic approach.
> > 
> > 
> > >> Here you use
> > >> memblock to search available memory, isn't it deviating too far away
> > >> from the original design in kexec-tools. Assume kexec loading and
> > >> kexec_file loading should be consistent on loading even though they are
> > >> done in different space, kernel space and user space.
> > 
> > Its much easier for us to parse memblock in the kernel as the helpers step over
> > the regions we know we don't want. For the resource list we would need to
> > strcmp(), and a bunch of handling for the second level entries.
> > 
> > 
> > Thanks,
> > 
> > James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 03/11] arm64: kexec_file: invoke the kernel without purgatory
  2018-05-15 16:15             ` James Morse
  (?)
@ 2018-05-18  6:22               ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-18  6:22 UTC (permalink / raw)
  To: James Morse
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

James,

On Tue, May 15, 2018 at 05:15:52PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 15/05/18 05:45, AKASHI Takahiro wrote:
> > On Fri, May 11, 2018 at 06:03:49PM +0100, James Morse wrote:
> >> On 07/05/18 06:22, AKASHI Takahiro wrote:
> >>> On Tue, May 01, 2018 at 06:46:06PM +0100, James Morse wrote:
> >>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
> >>>>> diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
> >>>>> index f76ea92dff91..f7dbba00be10 100644
> >>>>> --- a/arch/arm64/kernel/machine_kexec.c
> >>>>> +++ b/arch/arm64/kernel/machine_kexec.c
> >>>>> @@ -205,10 +205,17 @@ void machine_kexec(struct kimage *kimage)
> >>
> >>>>>  	cpu_soft_restart(kimage != kexec_crash_image,
> >>>>> -		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
> >>>>> +		reboot_code_buffer_phys, kimage->head, kimage->start,
> >>>>> +#ifdef CONFIG_KEXEC_FILE
> >>>>> +				kimage->purgatory_info.purgatory_buf ?
> >>>>> +						0 : kimage->arch.dtb_mem);
> >>>>> +#else
> >>>>> +				0);
> >>>>> +#endif
> >>
> >>
> >>>> purgatory_buf seems to only be set in kexec_purgatory_setup_kbuf(), called from
> >>>> kexec_load_purgatory(), which we don't use. How does this get a value?
> >>>>
> >>>> Would it be better to always use kimage->arch.dtb_mem, and ensure that is 0 for
> >>>> regular kexec (as we can't know where the dtb is)? (image_arg may then be a
> >>>> better name).
> >>>
> >>> The problem is arch.dtb_mem is currently defined only if CONFIG_KEXEC_FILE.
> >>
> >> I thought it was ARCH_HAS_KIMAGE_ARCH, which we can define all the time if
> >> that's what we want.
> >>
> >>
> >>> So I would like to
> >>> - merge this patch with patch#8
> >>> - change the condition
> >>>         #ifdef CONFIG_KEXEC_FILE
> >>>        				kimage->file_mode ? kimage->arch.dtb_mem : 0);

We don't need "kimage->file_mode ?" since arch.dtb_mem is 0 if !file_mode.

> >>>         #else
> >>>         			0);
> >>>         #endif
> >>
> >> If we can avoid even this #ifdef by always having kimage->arch, I'd prefer that.
> >> If we do that 'dtb_mem' would need some thing that indicates its for kexec_file,
> >> as kexec has a DTB too, we just don't know where it is...
> > 
> > OK, but I want to have a minimum of kexec.arch always exist.
> 
> I'm curious, why? Its 32bytes that is allocated a maximum of twice.

I believe that I'm a stingy minimalist :)


> (my questions on what needs to go in there were because it looked like a third
> user was missing...)
> 
> 
> > How about this?
> >
> > | struct kimage_arch {
> > | 	phys_addr_t dtb_mem;
> > | #ifdef CONFIG_KEXEC_FILE
> 
> #ifdef in structs just breeds more #ifdefs, as the code that accesses those
> members has to be behind the same set of conditions.
> 
> Given this, I prefer the #ifdefs around cpu_soft_restart() as it doesn't force
> us to add more #ifdefs later.

OK

> For either option without purgatory_info:
> Reviewed-by: James Morse <james.morse@arm.com>

Thanks,
-Takahiro AKASHI

> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 03/11] arm64: kexec_file: invoke the kernel without purgatory
@ 2018-05-18  6:22               ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-18  6:22 UTC (permalink / raw)
  To: linux-arm-kernel

James,

On Tue, May 15, 2018 at 05:15:52PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 15/05/18 05:45, AKASHI Takahiro wrote:
> > On Fri, May 11, 2018 at 06:03:49PM +0100, James Morse wrote:
> >> On 07/05/18 06:22, AKASHI Takahiro wrote:
> >>> On Tue, May 01, 2018 at 06:46:06PM +0100, James Morse wrote:
> >>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
> >>>>> diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
> >>>>> index f76ea92dff91..f7dbba00be10 100644
> >>>>> --- a/arch/arm64/kernel/machine_kexec.c
> >>>>> +++ b/arch/arm64/kernel/machine_kexec.c
> >>>>> @@ -205,10 +205,17 @@ void machine_kexec(struct kimage *kimage)
> >>
> >>>>>  	cpu_soft_restart(kimage != kexec_crash_image,
> >>>>> -		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
> >>>>> +		reboot_code_buffer_phys, kimage->head, kimage->start,
> >>>>> +#ifdef CONFIG_KEXEC_FILE
> >>>>> +				kimage->purgatory_info.purgatory_buf ?
> >>>>> +						0 : kimage->arch.dtb_mem);
> >>>>> +#else
> >>>>> +				0);
> >>>>> +#endif
> >>
> >>
> >>>> purgatory_buf seems to only be set in kexec_purgatory_setup_kbuf(), called from
> >>>> kexec_load_purgatory(), which we don't use. How does this get a value?
> >>>>
> >>>> Would it be better to always use kimage->arch.dtb_mem, and ensure that is 0 for
> >>>> regular kexec (as we can't know where the dtb is)? (image_arg may then be a
> >>>> better name).
> >>>
> >>> The problem is arch.dtb_mem is currently defined only if CONFIG_KEXEC_FILE.
> >>
> >> I thought it was ARCH_HAS_KIMAGE_ARCH, which we can define all the time if
> >> that's what we want.
> >>
> >>
> >>> So I would like to
> >>> - merge this patch with patch#8
> >>> - change the condition
> >>>         #ifdef CONFIG_KEXEC_FILE
> >>>        				kimage->file_mode ? kimage->arch.dtb_mem : 0);

We don't need "kimage->file_mode ?" since arch.dtb_mem is 0 if !file_mode.

> >>>         #else
> >>>         			0);
> >>>         #endif
> >>
> >> If we can avoid even this #ifdef by always having kimage->arch, I'd prefer that.
> >> If we do that 'dtb_mem' would need some thing that indicates its for kexec_file,
> >> as kexec has a DTB too, we just don't know where it is...
> > 
> > OK, but I want to have a minimum of kexec.arch always exist.
> 
> I'm curious, why? Its 32bytes that is allocated a maximum of twice.

I believe that I'm a stingy minimalist :)


> (my questions on what needs to go in there were because it looked like a third
> user was missing...)
> 
> 
> > How about this?
> >
> > | struct kimage_arch {
> > | 	phys_addr_t dtb_mem;
> > | #ifdef CONFIG_KEXEC_FILE
> 
> #ifdef in structs just breeds more #ifdefs, as the code that accesses those
> members has to be behind the same set of conditions.
> 
> Given this, I prefer the #ifdefs around cpu_soft_restart() as it doesn't force
> us to add more #ifdefs later.

OK

> For either option without purgatory_info:
> Reviewed-by: James Morse <james.morse@arm.com>

Thanks,
-Takahiro AKASHI

> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 03/11] arm64: kexec_file: invoke the kernel without purgatory
@ 2018-05-18  6:22               ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-18  6:22 UTC (permalink / raw)
  To: James Morse
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

James,

On Tue, May 15, 2018 at 05:15:52PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 15/05/18 05:45, AKASHI Takahiro wrote:
> > On Fri, May 11, 2018 at 06:03:49PM +0100, James Morse wrote:
> >> On 07/05/18 06:22, AKASHI Takahiro wrote:
> >>> On Tue, May 01, 2018 at 06:46:06PM +0100, James Morse wrote:
> >>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
> >>>>> diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
> >>>>> index f76ea92dff91..f7dbba00be10 100644
> >>>>> --- a/arch/arm64/kernel/machine_kexec.c
> >>>>> +++ b/arch/arm64/kernel/machine_kexec.c
> >>>>> @@ -205,10 +205,17 @@ void machine_kexec(struct kimage *kimage)
> >>
> >>>>>  	cpu_soft_restart(kimage != kexec_crash_image,
> >>>>> -		reboot_code_buffer_phys, kimage->head, kimage->start, 0);
> >>>>> +		reboot_code_buffer_phys, kimage->head, kimage->start,
> >>>>> +#ifdef CONFIG_KEXEC_FILE
> >>>>> +				kimage->purgatory_info.purgatory_buf ?
> >>>>> +						0 : kimage->arch.dtb_mem);
> >>>>> +#else
> >>>>> +				0);
> >>>>> +#endif
> >>
> >>
> >>>> purgatory_buf seems to only be set in kexec_purgatory_setup_kbuf(), called from
> >>>> kexec_load_purgatory(), which we don't use. How does this get a value?
> >>>>
> >>>> Would it be better to always use kimage->arch.dtb_mem, and ensure that is 0 for
> >>>> regular kexec (as we can't know where the dtb is)? (image_arg may then be a
> >>>> better name).
> >>>
> >>> The problem is arch.dtb_mem is currently defined only if CONFIG_KEXEC_FILE.
> >>
> >> I thought it was ARCH_HAS_KIMAGE_ARCH, which we can define all the time if
> >> that's what we want.
> >>
> >>
> >>> So I would like to
> >>> - merge this patch with patch#8
> >>> - change the condition
> >>>         #ifdef CONFIG_KEXEC_FILE
> >>>        				kimage->file_mode ? kimage->arch.dtb_mem : 0);

We don't need "kimage->file_mode ?" since arch.dtb_mem is 0 if !file_mode.

> >>>         #else
> >>>         			0);
> >>>         #endif
> >>
> >> If we can avoid even this #ifdef by always having kimage->arch, I'd prefer that.
> >> If we do that 'dtb_mem' would need some thing that indicates its for kexec_file,
> >> as kexec has a DTB too, we just don't know where it is...
> > 
> > OK, but I want to have a minimum of kexec.arch always exist.
> 
> I'm curious, why? Its 32bytes that is allocated a maximum of twice.

I believe that I'm a stingy minimalist :)


> (my questions on what needs to go in there were because it looked like a third
> user was missing...)
> 
> 
> > How about this?
> >
> > | struct kimage_arch {
> > | 	phys_addr_t dtb_mem;
> > | #ifdef CONFIG_KEXEC_FILE
> 
> #ifdef in structs just breeds more #ifdefs, as the code that accesses those
> members has to be behind the same set of conditions.
> 
> Given this, I prefer the #ifdefs around cpu_soft_restart() as it doesn't force
> us to add more #ifdefs later.

OK

> For either option without purgatory_info:
> Reviewed-by: James Morse <james.morse@arm.com>

Thanks,
-Takahiro AKASHI

> 
> Thanks,
> 
> James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 05/11] arm64: kexec_file: load initrd and device-tree
  2018-05-15 16:20     ` James Morse
  (?)
@ 2018-05-18  7:11       ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-18  7:11 UTC (permalink / raw)
  To: James Morse
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

James,

On Tue, May 15, 2018 at 05:20:00PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > load_other_segments() is expected to allocate and place all the necessary
> > memory segments other than kernel, including initrd and device-tree
> > blob (and elf core header for crash).
> > While most of the code was borrowed from kexec-tools' counterpart,
> > users may not be allowed to specify dtb explicitly, instead, the dtb
> > presented by a boot loader is reused.
> 
> (Nit: "a boot loader" -> "the original boot loader")

OK

> > arch_kimage_kernel_post_load_cleanup() is responsible for freeing arm64-
> > specific data allocated in load_other_segments().
> 
> 
> > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > index f9ebf54ca247..b3b9b1725d8a 100644
> > --- a/arch/arm64/kernel/machine_kexec_file.c
> > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > @@ -13,7 +13,26 @@
> >  #include <linux/ioport.h>
> >  #include <linux/kernel.h>
> >  #include <linux/kexec.h>
> > +#include <linux/libfdt.h>
> >  #include <linux/memblock.h>
> > +#include <linux/of_fdt.h>
> > +#include <linux/types.h>
> > +#include <asm/byteorder.h>
> > +
> > +static int __dt_root_addr_cells;
> > +static int __dt_root_size_cells;
> 
> > @@ -55,3 +74,144 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> >  
> >  	return ret;
> >  }
> > +
> > +static int setup_dtb(struct kimage *image,
> > +		unsigned long initrd_load_addr, unsigned long initrd_len,
> > +		char *cmdline, unsigned long cmdline_len,
> > +		char **dtb_buf, size_t *dtb_buf_len)
> > +{
> > +	char *buf = NULL;
> > +	size_t buf_size;
> > +	int nodeoffset;
> > +	u64 value;
> > +	int range_len;
> > +	int ret;
> > +
> > +	/* duplicate dt blob */
> > +	buf_size = fdt_totalsize(initial_boot_params);
> > +	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> 
> These two cells values are 0 here. Did you want
> arch_kexec_file_init() in patch 7 in this patch?
> 
> Ah, range_len isn't used, so, did you want the cells values and this range_len
> thing in in patch 7!?

Umm, this problem has long existed since my v1 :)
I might better re-think about patch order.

> 
> > +
> > +	if (initrd_load_addr)
> > +		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
> > +				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
> > +
> > +	if (cmdline)
> > +		buf_size += fdt_prop_len("bootargs", cmdline_len + 1);
> 
> I can't find where fdt_prop_len() .... oh, patch 7. fdt_prop_len() doesn't look
> like the sort of thing that should be created here, but I agree there isn't an
> existing API to do this.

Will take care of it.


> (This must be why powerpc guesses that the fdt won't be more than double in size).
> 
> 
> > +	buf = vmalloc(buf_size);
> > +	if (!buf) {
> > +		ret = -ENOMEM;
> > +		goto out_err;
> > +	}
> > +
> > +	ret = fdt_open_into(initial_boot_params, buf, buf_size);
> > +	if (ret)
> > +		goto out_err;
> > +
> > +	nodeoffset = fdt_path_offset(buf, "/chosen");
> > +	if (nodeoffset < 0)
> > +		goto out_err;
> > +
> > +	/* add bootargs */
> > +	if (cmdline) {
> > +		ret = fdt_setprop(buf, nodeoffset, "bootargs",
> > +						cmdline, cmdline_len + 1);
> 
> fdt_setprop_string()?

OK

> 
> > +		if (ret)
> > +			goto out_err;
> > +	}
> > +
> > +	/* add initrd-* */
> > +	if (initrd_load_addr) {
> > +		value = cpu_to_fdt64(initrd_load_addr);
> > +		ret = fdt_setprop(buf, nodeoffset, "linux,initrd-start",
> > +				&value, sizeof(value));
> 
> sizeof(value) was assumed to be the same as sizeof(u64) earlier.
> fdt_setprop_u64()?

OK

> 
> > +		if (ret)
> > +			goto out_err;
> > +
> > +		value = cpu_to_fdt64(initrd_load_addr + initrd_len);
> > +		ret = fdt_setprop(buf, nodeoffset, "linux,initrd-end",
> > +				&value, sizeof(value));
> > +		if (ret)
> > +			goto out_err;
> > +	}
> > +
> > +	/* trim a buffer */
> > +	fdt_pack(buf);
> > +	*dtb_buf = buf;
> > +	*dtb_buf_len = fdt_totalsize(buf);
> > +
> > +	return 0;
> > +
> > +out_err:
> > +	vfree(buf);
> > +	return ret;
> > +}
> 
> While powerpc has some similar code for updating the initrd and cmdline, it
> makes different assumptions about the size of the dt, and has different behavior
> for memreserve. (looks like we don't expect the initramfs to be memreserved).
> Lets leave unifying that stuff where possible for the future.

Sure

> > +int load_other_segments(struct kimage *image,
> > +			char *initrd, unsigned long initrd_len,
> > +			char *cmdline, unsigned long cmdline_len)
> > +{
> > +	struct kexec_segment *kern_seg;
> > +	struct kexec_buf kbuf;
> > +	unsigned long initrd_load_addr = 0;
> > +	char *dtb = NULL;
> > +	unsigned long dtb_len = 0;
> > +	int ret = 0;
> > +
> > +	kern_seg = &image->segment[image->arch.kern_segment];
> > +	kbuf.image = image;
> > +	/* not allocate anything below the kernel */
> > +	kbuf.buf_min = kern_seg->mem + kern_seg->memsz;
> 
> > +	/* load initrd */
> > +	if (initrd) {
> > +		kbuf.buffer = initrd;
> > +		kbuf.bufsz = initrd_len;
> > +		kbuf.memsz = initrd_len;
> 
> > +		kbuf.buf_align = 0;
> 
> I'm surprised there initrd has no alignment requirement,

MeToo.

> but kexec_add_buffer()
> rounds this up to PAGE_SIZE.

It seems that kimage_load_segment() requires this, but I'm not sure.

> 
> > +		/* within 1GB-aligned window of up to 32GB in size */
> > +		kbuf.buf_max = round_down(kern_seg->mem, SZ_1G)
> > +						+ (unsigned long)SZ_1G * 32;
> > +		kbuf.top_down = false;
> > +
> > +		ret = kexec_add_buffer(&kbuf);
> > +		if (ret)
> > +			goto out_err;
> > +		initrd_load_addr = kbuf.mem;
> > +
> > +		pr_debug("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> > +				initrd_load_addr, initrd_len, initrd_len);
> > +	}
> > +
> > +	/* load dtb blob */
> > +	ret = setup_dtb(image, initrd_load_addr, initrd_len,
> > +				cmdline, cmdline_len, &dtb, &dtb_len);
> > +	if (ret) {
> > +		pr_err("Preparing for new dtb failed\n");
> > +		goto out_err;
> > +	}
> > +
> > +	kbuf.buffer = dtb;
> > +	kbuf.bufsz = dtb_len;
> > +	kbuf.memsz = dtb_len;
> > +	/* not across 2MB boundary */
> > +	kbuf.buf_align = SZ_2M;
> > +	kbuf.buf_max = ULONG_MAX;
> > +	kbuf.top_down = true;
> > +
> > +	ret = kexec_add_buffer(&kbuf);
> > +	if (ret)
> > +		goto out_err;
> > +	image->arch.dtb_mem = kbuf.mem;
> > +	image->arch.dtb_buf = dtb;
> > +
> > +	pr_debug("Loaded dtb at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> > +			kbuf.mem, dtb_len, dtb_len);
> > +
> > +	return 0;
> > +
> > +out_err:
> > +	vfree(dtb);
> > +	image->arch.dtb_buf = NULL;
> 
> Won't kimage_file_post_load_cleanup() always be called if we return an error
> here? Why not leave the free()ing until then?

Right.
The reason why I left the code here was that we'd better locally clean up
all the stuff that were locally allocated if we trivially need to (and can)
do so.

As it's redundant, I will remove it.

Thanks,
-Takahiro AKASHI

> 
> > +	return ret;
> > +}
> 
> 
> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 05/11] arm64: kexec_file: load initrd and device-tree
@ 2018-05-18  7:11       ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-18  7:11 UTC (permalink / raw)
  To: linux-arm-kernel

James,

On Tue, May 15, 2018 at 05:20:00PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > load_other_segments() is expected to allocate and place all the necessary
> > memory segments other than kernel, including initrd and device-tree
> > blob (and elf core header for crash).
> > While most of the code was borrowed from kexec-tools' counterpart,
> > users may not be allowed to specify dtb explicitly, instead, the dtb
> > presented by a boot loader is reused.
> 
> (Nit: "a boot loader" -> "the original boot loader")

OK

> > arch_kimage_kernel_post_load_cleanup() is responsible for freeing arm64-
> > specific data allocated in load_other_segments().
> 
> 
> > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > index f9ebf54ca247..b3b9b1725d8a 100644
> > --- a/arch/arm64/kernel/machine_kexec_file.c
> > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > @@ -13,7 +13,26 @@
> >  #include <linux/ioport.h>
> >  #include <linux/kernel.h>
> >  #include <linux/kexec.h>
> > +#include <linux/libfdt.h>
> >  #include <linux/memblock.h>
> > +#include <linux/of_fdt.h>
> > +#include <linux/types.h>
> > +#include <asm/byteorder.h>
> > +
> > +static int __dt_root_addr_cells;
> > +static int __dt_root_size_cells;
> 
> > @@ -55,3 +74,144 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> >  
> >  	return ret;
> >  }
> > +
> > +static int setup_dtb(struct kimage *image,
> > +		unsigned long initrd_load_addr, unsigned long initrd_len,
> > +		char *cmdline, unsigned long cmdline_len,
> > +		char **dtb_buf, size_t *dtb_buf_len)
> > +{
> > +	char *buf = NULL;
> > +	size_t buf_size;
> > +	int nodeoffset;
> > +	u64 value;
> > +	int range_len;
> > +	int ret;
> > +
> > +	/* duplicate dt blob */
> > +	buf_size = fdt_totalsize(initial_boot_params);
> > +	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> 
> These two cells values are 0 here. Did you want
> arch_kexec_file_init() in patch 7 in this patch?
> 
> Ah, range_len isn't used, so, did you want the cells values and this range_len
> thing in in patch 7!?

Umm, this problem has long existed since my v1 :)
I might better re-think about patch order.

> 
> > +
> > +	if (initrd_load_addr)
> > +		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
> > +				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
> > +
> > +	if (cmdline)
> > +		buf_size += fdt_prop_len("bootargs", cmdline_len + 1);
> 
> I can't find where fdt_prop_len() .... oh, patch 7. fdt_prop_len() doesn't look
> like the sort of thing that should be created here, but I agree there isn't an
> existing API to do this.

Will take care of it.


> (This must be why powerpc guesses that the fdt won't be more than double in size).
> 
> 
> > +	buf = vmalloc(buf_size);
> > +	if (!buf) {
> > +		ret = -ENOMEM;
> > +		goto out_err;
> > +	}
> > +
> > +	ret = fdt_open_into(initial_boot_params, buf, buf_size);
> > +	if (ret)
> > +		goto out_err;
> > +
> > +	nodeoffset = fdt_path_offset(buf, "/chosen");
> > +	if (nodeoffset < 0)
> > +		goto out_err;
> > +
> > +	/* add bootargs */
> > +	if (cmdline) {
> > +		ret = fdt_setprop(buf, nodeoffset, "bootargs",
> > +						cmdline, cmdline_len + 1);
> 
> fdt_setprop_string()?

OK

> 
> > +		if (ret)
> > +			goto out_err;
> > +	}
> > +
> > +	/* add initrd-* */
> > +	if (initrd_load_addr) {
> > +		value = cpu_to_fdt64(initrd_load_addr);
> > +		ret = fdt_setprop(buf, nodeoffset, "linux,initrd-start",
> > +				&value, sizeof(value));
> 
> sizeof(value) was assumed to be the same as sizeof(u64) earlier.
> fdt_setprop_u64()?

OK

> 
> > +		if (ret)
> > +			goto out_err;
> > +
> > +		value = cpu_to_fdt64(initrd_load_addr + initrd_len);
> > +		ret = fdt_setprop(buf, nodeoffset, "linux,initrd-end",
> > +				&value, sizeof(value));
> > +		if (ret)
> > +			goto out_err;
> > +	}
> > +
> > +	/* trim a buffer */
> > +	fdt_pack(buf);
> > +	*dtb_buf = buf;
> > +	*dtb_buf_len = fdt_totalsize(buf);
> > +
> > +	return 0;
> > +
> > +out_err:
> > +	vfree(buf);
> > +	return ret;
> > +}
> 
> While powerpc has some similar code for updating the initrd and cmdline, it
> makes different assumptions about the size of the dt, and has different behavior
> for memreserve. (looks like we don't expect the initramfs to be memreserved).
> Lets leave unifying that stuff where possible for the future.

Sure

> > +int load_other_segments(struct kimage *image,
> > +			char *initrd, unsigned long initrd_len,
> > +			char *cmdline, unsigned long cmdline_len)
> > +{
> > +	struct kexec_segment *kern_seg;
> > +	struct kexec_buf kbuf;
> > +	unsigned long initrd_load_addr = 0;
> > +	char *dtb = NULL;
> > +	unsigned long dtb_len = 0;
> > +	int ret = 0;
> > +
> > +	kern_seg = &image->segment[image->arch.kern_segment];
> > +	kbuf.image = image;
> > +	/* not allocate anything below the kernel */
> > +	kbuf.buf_min = kern_seg->mem + kern_seg->memsz;
> 
> > +	/* load initrd */
> > +	if (initrd) {
> > +		kbuf.buffer = initrd;
> > +		kbuf.bufsz = initrd_len;
> > +		kbuf.memsz = initrd_len;
> 
> > +		kbuf.buf_align = 0;
> 
> I'm surprised there initrd has no alignment requirement,

MeToo.

> but kexec_add_buffer()
> rounds this up to PAGE_SIZE.

It seems that kimage_load_segment() requires this, but I'm not sure.

> 
> > +		/* within 1GB-aligned window of up to 32GB in size */
> > +		kbuf.buf_max = round_down(kern_seg->mem, SZ_1G)
> > +						+ (unsigned long)SZ_1G * 32;
> > +		kbuf.top_down = false;
> > +
> > +		ret = kexec_add_buffer(&kbuf);
> > +		if (ret)
> > +			goto out_err;
> > +		initrd_load_addr = kbuf.mem;
> > +
> > +		pr_debug("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> > +				initrd_load_addr, initrd_len, initrd_len);
> > +	}
> > +
> > +	/* load dtb blob */
> > +	ret = setup_dtb(image, initrd_load_addr, initrd_len,
> > +				cmdline, cmdline_len, &dtb, &dtb_len);
> > +	if (ret) {
> > +		pr_err("Preparing for new dtb failed\n");
> > +		goto out_err;
> > +	}
> > +
> > +	kbuf.buffer = dtb;
> > +	kbuf.bufsz = dtb_len;
> > +	kbuf.memsz = dtb_len;
> > +	/* not across 2MB boundary */
> > +	kbuf.buf_align = SZ_2M;
> > +	kbuf.buf_max = ULONG_MAX;
> > +	kbuf.top_down = true;
> > +
> > +	ret = kexec_add_buffer(&kbuf);
> > +	if (ret)
> > +		goto out_err;
> > +	image->arch.dtb_mem = kbuf.mem;
> > +	image->arch.dtb_buf = dtb;
> > +
> > +	pr_debug("Loaded dtb at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> > +			kbuf.mem, dtb_len, dtb_len);
> > +
> > +	return 0;
> > +
> > +out_err:
> > +	vfree(dtb);
> > +	image->arch.dtb_buf = NULL;
> 
> Won't kimage_file_post_load_cleanup() always be called if we return an error
> here? Why not leave the free()ing until then?

Right.
The reason why I left the code here was that we'd better locally clean up
all the stuff that were locally allocated if we trivially need to (and can)
do so.

As it's redundant, I will remove it.

Thanks,
-Takahiro AKASHI

> 
> > +	return ret;
> > +}
> 
> 
> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 05/11] arm64: kexec_file: load initrd and device-tree
@ 2018-05-18  7:11       ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-18  7:11 UTC (permalink / raw)
  To: James Morse
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

James,

On Tue, May 15, 2018 at 05:20:00PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > load_other_segments() is expected to allocate and place all the necessary
> > memory segments other than kernel, including initrd and device-tree
> > blob (and elf core header for crash).
> > While most of the code was borrowed from kexec-tools' counterpart,
> > users may not be allowed to specify dtb explicitly, instead, the dtb
> > presented by a boot loader is reused.
> 
> (Nit: "a boot loader" -> "the original boot loader")

OK

> > arch_kimage_kernel_post_load_cleanup() is responsible for freeing arm64-
> > specific data allocated in load_other_segments().
> 
> 
> > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > index f9ebf54ca247..b3b9b1725d8a 100644
> > --- a/arch/arm64/kernel/machine_kexec_file.c
> > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > @@ -13,7 +13,26 @@
> >  #include <linux/ioport.h>
> >  #include <linux/kernel.h>
> >  #include <linux/kexec.h>
> > +#include <linux/libfdt.h>
> >  #include <linux/memblock.h>
> > +#include <linux/of_fdt.h>
> > +#include <linux/types.h>
> > +#include <asm/byteorder.h>
> > +
> > +static int __dt_root_addr_cells;
> > +static int __dt_root_size_cells;
> 
> > @@ -55,3 +74,144 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> >  
> >  	return ret;
> >  }
> > +
> > +static int setup_dtb(struct kimage *image,
> > +		unsigned long initrd_load_addr, unsigned long initrd_len,
> > +		char *cmdline, unsigned long cmdline_len,
> > +		char **dtb_buf, size_t *dtb_buf_len)
> > +{
> > +	char *buf = NULL;
> > +	size_t buf_size;
> > +	int nodeoffset;
> > +	u64 value;
> > +	int range_len;
> > +	int ret;
> > +
> > +	/* duplicate dt blob */
> > +	buf_size = fdt_totalsize(initial_boot_params);
> > +	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> 
> These two cells values are 0 here. Did you want
> arch_kexec_file_init() in patch 7 in this patch?
> 
> Ah, range_len isn't used, so, did you want the cells values and this range_len
> thing in in patch 7!?

Umm, this problem has long existed since my v1 :)
I might better re-think about patch order.

> 
> > +
> > +	if (initrd_load_addr)
> > +		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
> > +				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
> > +
> > +	if (cmdline)
> > +		buf_size += fdt_prop_len("bootargs", cmdline_len + 1);
> 
> I can't find where fdt_prop_len() .... oh, patch 7. fdt_prop_len() doesn't look
> like the sort of thing that should be created here, but I agree there isn't an
> existing API to do this.

Will take care of it.


> (This must be why powerpc guesses that the fdt won't be more than double in size).
> 
> 
> > +	buf = vmalloc(buf_size);
> > +	if (!buf) {
> > +		ret = -ENOMEM;
> > +		goto out_err;
> > +	}
> > +
> > +	ret = fdt_open_into(initial_boot_params, buf, buf_size);
> > +	if (ret)
> > +		goto out_err;
> > +
> > +	nodeoffset = fdt_path_offset(buf, "/chosen");
> > +	if (nodeoffset < 0)
> > +		goto out_err;
> > +
> > +	/* add bootargs */
> > +	if (cmdline) {
> > +		ret = fdt_setprop(buf, nodeoffset, "bootargs",
> > +						cmdline, cmdline_len + 1);
> 
> fdt_setprop_string()?

OK

> 
> > +		if (ret)
> > +			goto out_err;
> > +	}
> > +
> > +	/* add initrd-* */
> > +	if (initrd_load_addr) {
> > +		value = cpu_to_fdt64(initrd_load_addr);
> > +		ret = fdt_setprop(buf, nodeoffset, "linux,initrd-start",
> > +				&value, sizeof(value));
> 
> sizeof(value) was assumed to be the same as sizeof(u64) earlier.
> fdt_setprop_u64()?

OK

> 
> > +		if (ret)
> > +			goto out_err;
> > +
> > +		value = cpu_to_fdt64(initrd_load_addr + initrd_len);
> > +		ret = fdt_setprop(buf, nodeoffset, "linux,initrd-end",
> > +				&value, sizeof(value));
> > +		if (ret)
> > +			goto out_err;
> > +	}
> > +
> > +	/* trim a buffer */
> > +	fdt_pack(buf);
> > +	*dtb_buf = buf;
> > +	*dtb_buf_len = fdt_totalsize(buf);
> > +
> > +	return 0;
> > +
> > +out_err:
> > +	vfree(buf);
> > +	return ret;
> > +}
> 
> While powerpc has some similar code for updating the initrd and cmdline, it
> makes different assumptions about the size of the dt, and has different behavior
> for memreserve. (looks like we don't expect the initramfs to be memreserved).
> Lets leave unifying that stuff where possible for the future.

Sure

> > +int load_other_segments(struct kimage *image,
> > +			char *initrd, unsigned long initrd_len,
> > +			char *cmdline, unsigned long cmdline_len)
> > +{
> > +	struct kexec_segment *kern_seg;
> > +	struct kexec_buf kbuf;
> > +	unsigned long initrd_load_addr = 0;
> > +	char *dtb = NULL;
> > +	unsigned long dtb_len = 0;
> > +	int ret = 0;
> > +
> > +	kern_seg = &image->segment[image->arch.kern_segment];
> > +	kbuf.image = image;
> > +	/* not allocate anything below the kernel */
> > +	kbuf.buf_min = kern_seg->mem + kern_seg->memsz;
> 
> > +	/* load initrd */
> > +	if (initrd) {
> > +		kbuf.buffer = initrd;
> > +		kbuf.bufsz = initrd_len;
> > +		kbuf.memsz = initrd_len;
> 
> > +		kbuf.buf_align = 0;
> 
> I'm surprised there initrd has no alignment requirement,

MeToo.

> but kexec_add_buffer()
> rounds this up to PAGE_SIZE.

It seems that kimage_load_segment() requires this, but I'm not sure.

> 
> > +		/* within 1GB-aligned window of up to 32GB in size */
> > +		kbuf.buf_max = round_down(kern_seg->mem, SZ_1G)
> > +						+ (unsigned long)SZ_1G * 32;
> > +		kbuf.top_down = false;
> > +
> > +		ret = kexec_add_buffer(&kbuf);
> > +		if (ret)
> > +			goto out_err;
> > +		initrd_load_addr = kbuf.mem;
> > +
> > +		pr_debug("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> > +				initrd_load_addr, initrd_len, initrd_len);
> > +	}
> > +
> > +	/* load dtb blob */
> > +	ret = setup_dtb(image, initrd_load_addr, initrd_len,
> > +				cmdline, cmdline_len, &dtb, &dtb_len);
> > +	if (ret) {
> > +		pr_err("Preparing for new dtb failed\n");
> > +		goto out_err;
> > +	}
> > +
> > +	kbuf.buffer = dtb;
> > +	kbuf.bufsz = dtb_len;
> > +	kbuf.memsz = dtb_len;
> > +	/* not across 2MB boundary */
> > +	kbuf.buf_align = SZ_2M;
> > +	kbuf.buf_max = ULONG_MAX;
> > +	kbuf.top_down = true;
> > +
> > +	ret = kexec_add_buffer(&kbuf);
> > +	if (ret)
> > +		goto out_err;
> > +	image->arch.dtb_mem = kbuf.mem;
> > +	image->arch.dtb_buf = dtb;
> > +
> > +	pr_debug("Loaded dtb at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> > +			kbuf.mem, dtb_len, dtb_len);
> > +
> > +	return 0;
> > +
> > +out_err:
> > +	vfree(dtb);
> > +	image->arch.dtb_buf = NULL;
> 
> Won't kimage_file_post_load_cleanup() always be called if we return an error
> here? Why not leave the free()ing until then?

Right.
The reason why I left the code here was that we'd better locally clean up
all the stuff that were locally allocated if we trivially need to (and can)
do so.

As it's redundant, I will remove it.

Thanks,
-Takahiro AKASHI

> 
> > +	return ret;
> > +}
> 
> 
> 
> Thanks,
> 
> James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 05/11] arm64: kexec_file: load initrd and device-tree
  2018-05-18  7:11       ` AKASHI Takahiro
  (?)
@ 2018-05-18  7:42         ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-18  7:42 UTC (permalink / raw)
  To: James Morse, catalin.marinas, will.deacon, dhowells, vgoyal,
	herbert, davem, dyoung, bhe, arnd, ard.biesheuvel, bhsharma,
	kexec, linux-arm-kernel, linux-kernel

On Fri, May 18, 2018 at 04:11:35PM +0900, AKASHI Takahiro wrote:
> James,
> 
> On Tue, May 15, 2018 at 05:20:00PM +0100, James Morse wrote:
> > Hi Akashi,
> > 
> > On 25/04/18 07:26, AKASHI Takahiro wrote:
> > > load_other_segments() is expected to allocate and place all the necessary
> > > memory segments other than kernel, including initrd and device-tree
> > > blob (and elf core header for crash).
> > > While most of the code was borrowed from kexec-tools' counterpart,
> > > users may not be allowed to specify dtb explicitly, instead, the dtb
> > > presented by a boot loader is reused.
> > 
> > (Nit: "a boot loader" -> "the original boot loader")
> 
> OK
> 
> > > arch_kimage_kernel_post_load_cleanup() is responsible for freeing arm64-
> > > specific data allocated in load_other_segments().
> > 
> > 
> > > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > > index f9ebf54ca247..b3b9b1725d8a 100644
> > > --- a/arch/arm64/kernel/machine_kexec_file.c
> > > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > > @@ -13,7 +13,26 @@
> > >  #include <linux/ioport.h>
> > >  #include <linux/kernel.h>
> > >  #include <linux/kexec.h>
> > > +#include <linux/libfdt.h>
> > >  #include <linux/memblock.h>
> > > +#include <linux/of_fdt.h>
> > > +#include <linux/types.h>
> > > +#include <asm/byteorder.h>
> > > +
> > > +static int __dt_root_addr_cells;
> > > +static int __dt_root_size_cells;
> > 
> > > @@ -55,3 +74,144 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> > >  
> > >  	return ret;
> > >  }
> > > +
> > > +static int setup_dtb(struct kimage *image,
> > > +		unsigned long initrd_load_addr, unsigned long initrd_len,
> > > +		char *cmdline, unsigned long cmdline_len,
> > > +		char **dtb_buf, size_t *dtb_buf_len)
> > > +{
> > > +	char *buf = NULL;
> > > +	size_t buf_size;
> > > +	int nodeoffset;
> > > +	u64 value;
> > > +	int range_len;
> > > +	int ret;
> > > +
> > > +	/* duplicate dt blob */
> > > +	buf_size = fdt_totalsize(initial_boot_params);
> > > +	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> > 
> > These two cells values are 0 here. Did you want
> > arch_kexec_file_init() in patch 7 in this patch?
> > 
> > Ah, range_len isn't used, so, did you want the cells values and this range_len
> > thing in in patch 7!?
> 
> Umm, this problem has long existed since my v1 :)
> I might better re-think about patch order.
> 
> > 
> > > +
> > > +	if (initrd_load_addr)
> > > +		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
> > > +				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
> > > +
> > > +	if (cmdline)
> > > +		buf_size += fdt_prop_len("bootargs", cmdline_len + 1);
> > 
> > I can't find where fdt_prop_len() .... oh, patch 7. fdt_prop_len() doesn't look
> > like the sort of thing that should be created here, but I agree there isn't an
> > existing API to do this.
> 
> Will take care of it.
> 
> 
> > (This must be why powerpc guesses that the fdt won't be more than double in size).
> > 
> > 
> > > +	buf = vmalloc(buf_size);
> > > +	if (!buf) {
> > > +		ret = -ENOMEM;
> > > +		goto out_err;
> > > +	}
> > > +
> > > +	ret = fdt_open_into(initial_boot_params, buf, buf_size);
> > > +	if (ret)
> > > +		goto out_err;
> > > +
> > > +	nodeoffset = fdt_path_offset(buf, "/chosen");
> > > +	if (nodeoffset < 0)
> > > +		goto out_err;
> > > +
> > > +	/* add bootargs */
> > > +	if (cmdline) {
> > > +		ret = fdt_setprop(buf, nodeoffset, "bootargs",
> > > +						cmdline, cmdline_len + 1);
> > 
> > fdt_setprop_string()?
> 
> OK

cmdline_len is passed by system call, kexec_file_load(), and this means
that we can't believe that cmdline is always terminated with '\0'.
> 
> > 
> > > +		if (ret)
> > > +			goto out_err;
> > > +	}
> > > +
> > > +	/* add initrd-* */
> > > +	if (initrd_load_addr) {
> > > +		value = cpu_to_fdt64(initrd_load_addr);
> > > +		ret = fdt_setprop(buf, nodeoffset, "linux,initrd-start",
> > > +				&value, sizeof(value));
> > 
> > sizeof(value) was assumed to be the same as sizeof(u64) earlier.
> > fdt_setprop_u64()?
> 
> OK
> 
> > 
> > > +		if (ret)
> > > +			goto out_err;
> > > +
> > > +		value = cpu_to_fdt64(initrd_load_addr + initrd_len);
> > > +		ret = fdt_setprop(buf, nodeoffset, "linux,initrd-end",
> > > +				&value, sizeof(value));
> > > +		if (ret)
> > > +			goto out_err;
> > > +	}
> > > +
> > > +	/* trim a buffer */
> > > +	fdt_pack(buf);
> > > +	*dtb_buf = buf;
> > > +	*dtb_buf_len = fdt_totalsize(buf);
> > > +
> > > +	return 0;
> > > +
> > > +out_err:
> > > +	vfree(buf);
> > > +	return ret;
> > > +}
> > 
> > While powerpc has some similar code for updating the initrd and cmdline, it
> > makes different assumptions about the size of the dt, and has different behavior
> > for memreserve. (looks like we don't expect the initramfs to be memreserved).
> > Lets leave unifying that stuff where possible for the future.
> 
> Sure
> 
> > > +int load_other_segments(struct kimage *image,
> > > +			char *initrd, unsigned long initrd_len,
> > > +			char *cmdline, unsigned long cmdline_len)
> > > +{
> > > +	struct kexec_segment *kern_seg;
> > > +	struct kexec_buf kbuf;
> > > +	unsigned long initrd_load_addr = 0;
> > > +	char *dtb = NULL;
> > > +	unsigned long dtb_len = 0;
> > > +	int ret = 0;
> > > +
> > > +	kern_seg = &image->segment[image->arch.kern_segment];
> > > +	kbuf.image = image;
> > > +	/* not allocate anything below the kernel */
> > > +	kbuf.buf_min = kern_seg->mem + kern_seg->memsz;
> > 
> > > +	/* load initrd */
> > > +	if (initrd) {
> > > +		kbuf.buffer = initrd;
> > > +		kbuf.bufsz = initrd_len;
> > > +		kbuf.memsz = initrd_len;
> > 
> > > +		kbuf.buf_align = 0;
> > 
> > I'm surprised there initrd has no alignment requirement,
> 
> MeToo.
> 
> > but kexec_add_buffer()
> > rounds this up to PAGE_SIZE.
> 
> It seems that kimage_load_segment() requires this, but I'm not sure.
> 
> > 
> > > +		/* within 1GB-aligned window of up to 32GB in size */
> > > +		kbuf.buf_max = round_down(kern_seg->mem, SZ_1G)
> > > +						+ (unsigned long)SZ_1G * 32;
> > > +		kbuf.top_down = false;
> > > +
> > > +		ret = kexec_add_buffer(&kbuf);
> > > +		if (ret)
> > > +			goto out_err;
> > > +		initrd_load_addr = kbuf.mem;
> > > +
> > > +		pr_debug("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> > > +				initrd_load_addr, initrd_len, initrd_len);
> > > +	}
> > > +
> > > +	/* load dtb blob */
> > > +	ret = setup_dtb(image, initrd_load_addr, initrd_len,
> > > +				cmdline, cmdline_len, &dtb, &dtb_len);
> > > +	if (ret) {
> > > +		pr_err("Preparing for new dtb failed\n");
> > > +		goto out_err;
> > > +	}
> > > +
> > > +	kbuf.buffer = dtb;
> > > +	kbuf.bufsz = dtb_len;
> > > +	kbuf.memsz = dtb_len;
> > > +	/* not across 2MB boundary */
> > > +	kbuf.buf_align = SZ_2M;
> > > +	kbuf.buf_max = ULONG_MAX;
> > > +	kbuf.top_down = true;
> > > +
> > > +	ret = kexec_add_buffer(&kbuf);
> > > +	if (ret)
> > > +		goto out_err;
> > > +	image->arch.dtb_mem = kbuf.mem;
> > > +	image->arch.dtb_buf = dtb;
> > > +
> > > +	pr_debug("Loaded dtb at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> > > +			kbuf.mem, dtb_len, dtb_len);
> > > +
> > > +	return 0;
> > > +
> > > +out_err:
> > > +	vfree(dtb);
> > > +	image->arch.dtb_buf = NULL;
> > 
> > Won't kimage_file_post_load_cleanup() always be called if we return an error
> > here? Why not leave the free()ing until then?
> 
> Right.
> The reason why I left the code here was that we'd better locally clean up
> all the stuff that were locally allocated if we trivially need to (and can)
> do so.
> 
> As it's redundant, I will remove it.

will remove only "image->arch.dtb_buf = NULL."

> Thanks,
> -Takahiro AKASHI
> 
> > 
> > > +	return ret;
> > > +}
> > 
> > 
> > 
> > Thanks,
> > 
> > James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 05/11] arm64: kexec_file: load initrd and device-tree
@ 2018-05-18  7:42         ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-18  7:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 18, 2018 at 04:11:35PM +0900, AKASHI Takahiro wrote:
> James,
> 
> On Tue, May 15, 2018 at 05:20:00PM +0100, James Morse wrote:
> > Hi Akashi,
> > 
> > On 25/04/18 07:26, AKASHI Takahiro wrote:
> > > load_other_segments() is expected to allocate and place all the necessary
> > > memory segments other than kernel, including initrd and device-tree
> > > blob (and elf core header for crash).
> > > While most of the code was borrowed from kexec-tools' counterpart,
> > > users may not be allowed to specify dtb explicitly, instead, the dtb
> > > presented by a boot loader is reused.
> > 
> > (Nit: "a boot loader" -> "the original boot loader")
> 
> OK
> 
> > > arch_kimage_kernel_post_load_cleanup() is responsible for freeing arm64-
> > > specific data allocated in load_other_segments().
> > 
> > 
> > > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > > index f9ebf54ca247..b3b9b1725d8a 100644
> > > --- a/arch/arm64/kernel/machine_kexec_file.c
> > > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > > @@ -13,7 +13,26 @@
> > >  #include <linux/ioport.h>
> > >  #include <linux/kernel.h>
> > >  #include <linux/kexec.h>
> > > +#include <linux/libfdt.h>
> > >  #include <linux/memblock.h>
> > > +#include <linux/of_fdt.h>
> > > +#include <linux/types.h>
> > > +#include <asm/byteorder.h>
> > > +
> > > +static int __dt_root_addr_cells;
> > > +static int __dt_root_size_cells;
> > 
> > > @@ -55,3 +74,144 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> > >  
> > >  	return ret;
> > >  }
> > > +
> > > +static int setup_dtb(struct kimage *image,
> > > +		unsigned long initrd_load_addr, unsigned long initrd_len,
> > > +		char *cmdline, unsigned long cmdline_len,
> > > +		char **dtb_buf, size_t *dtb_buf_len)
> > > +{
> > > +	char *buf = NULL;
> > > +	size_t buf_size;
> > > +	int nodeoffset;
> > > +	u64 value;
> > > +	int range_len;
> > > +	int ret;
> > > +
> > > +	/* duplicate dt blob */
> > > +	buf_size = fdt_totalsize(initial_boot_params);
> > > +	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> > 
> > These two cells values are 0 here. Did you want
> > arch_kexec_file_init() in patch 7 in this patch?
> > 
> > Ah, range_len isn't used, so, did you want the cells values and this range_len
> > thing in in patch 7!?
> 
> Umm, this problem has long existed since my v1 :)
> I might better re-think about patch order.
> 
> > 
> > > +
> > > +	if (initrd_load_addr)
> > > +		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
> > > +				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
> > > +
> > > +	if (cmdline)
> > > +		buf_size += fdt_prop_len("bootargs", cmdline_len + 1);
> > 
> > I can't find where fdt_prop_len() .... oh, patch 7. fdt_prop_len() doesn't look
> > like the sort of thing that should be created here, but I agree there isn't an
> > existing API to do this.
> 
> Will take care of it.
> 
> 
> > (This must be why powerpc guesses that the fdt won't be more than double in size).
> > 
> > 
> > > +	buf = vmalloc(buf_size);
> > > +	if (!buf) {
> > > +		ret = -ENOMEM;
> > > +		goto out_err;
> > > +	}
> > > +
> > > +	ret = fdt_open_into(initial_boot_params, buf, buf_size);
> > > +	if (ret)
> > > +		goto out_err;
> > > +
> > > +	nodeoffset = fdt_path_offset(buf, "/chosen");
> > > +	if (nodeoffset < 0)
> > > +		goto out_err;
> > > +
> > > +	/* add bootargs */
> > > +	if (cmdline) {
> > > +		ret = fdt_setprop(buf, nodeoffset, "bootargs",
> > > +						cmdline, cmdline_len + 1);
> > 
> > fdt_setprop_string()?
> 
> OK

cmdline_len is passed by system call, kexec_file_load(), and this means
that we can't believe that cmdline is always terminated with '\0'.
> 
> > 
> > > +		if (ret)
> > > +			goto out_err;
> > > +	}
> > > +
> > > +	/* add initrd-* */
> > > +	if (initrd_load_addr) {
> > > +		value = cpu_to_fdt64(initrd_load_addr);
> > > +		ret = fdt_setprop(buf, nodeoffset, "linux,initrd-start",
> > > +				&value, sizeof(value));
> > 
> > sizeof(value) was assumed to be the same as sizeof(u64) earlier.
> > fdt_setprop_u64()?
> 
> OK
> 
> > 
> > > +		if (ret)
> > > +			goto out_err;
> > > +
> > > +		value = cpu_to_fdt64(initrd_load_addr + initrd_len);
> > > +		ret = fdt_setprop(buf, nodeoffset, "linux,initrd-end",
> > > +				&value, sizeof(value));
> > > +		if (ret)
> > > +			goto out_err;
> > > +	}
> > > +
> > > +	/* trim a buffer */
> > > +	fdt_pack(buf);
> > > +	*dtb_buf = buf;
> > > +	*dtb_buf_len = fdt_totalsize(buf);
> > > +
> > > +	return 0;
> > > +
> > > +out_err:
> > > +	vfree(buf);
> > > +	return ret;
> > > +}
> > 
> > While powerpc has some similar code for updating the initrd and cmdline, it
> > makes different assumptions about the size of the dt, and has different behavior
> > for memreserve. (looks like we don't expect the initramfs to be memreserved).
> > Lets leave unifying that stuff where possible for the future.
> 
> Sure
> 
> > > +int load_other_segments(struct kimage *image,
> > > +			char *initrd, unsigned long initrd_len,
> > > +			char *cmdline, unsigned long cmdline_len)
> > > +{
> > > +	struct kexec_segment *kern_seg;
> > > +	struct kexec_buf kbuf;
> > > +	unsigned long initrd_load_addr = 0;
> > > +	char *dtb = NULL;
> > > +	unsigned long dtb_len = 0;
> > > +	int ret = 0;
> > > +
> > > +	kern_seg = &image->segment[image->arch.kern_segment];
> > > +	kbuf.image = image;
> > > +	/* not allocate anything below the kernel */
> > > +	kbuf.buf_min = kern_seg->mem + kern_seg->memsz;
> > 
> > > +	/* load initrd */
> > > +	if (initrd) {
> > > +		kbuf.buffer = initrd;
> > > +		kbuf.bufsz = initrd_len;
> > > +		kbuf.memsz = initrd_len;
> > 
> > > +		kbuf.buf_align = 0;
> > 
> > I'm surprised there initrd has no alignment requirement,
> 
> MeToo.
> 
> > but kexec_add_buffer()
> > rounds this up to PAGE_SIZE.
> 
> It seems that kimage_load_segment() requires this, but I'm not sure.
> 
> > 
> > > +		/* within 1GB-aligned window of up to 32GB in size */
> > > +		kbuf.buf_max = round_down(kern_seg->mem, SZ_1G)
> > > +						+ (unsigned long)SZ_1G * 32;
> > > +		kbuf.top_down = false;
> > > +
> > > +		ret = kexec_add_buffer(&kbuf);
> > > +		if (ret)
> > > +			goto out_err;
> > > +		initrd_load_addr = kbuf.mem;
> > > +
> > > +		pr_debug("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> > > +				initrd_load_addr, initrd_len, initrd_len);
> > > +	}
> > > +
> > > +	/* load dtb blob */
> > > +	ret = setup_dtb(image, initrd_load_addr, initrd_len,
> > > +				cmdline, cmdline_len, &dtb, &dtb_len);
> > > +	if (ret) {
> > > +		pr_err("Preparing for new dtb failed\n");
> > > +		goto out_err;
> > > +	}
> > > +
> > > +	kbuf.buffer = dtb;
> > > +	kbuf.bufsz = dtb_len;
> > > +	kbuf.memsz = dtb_len;
> > > +	/* not across 2MB boundary */
> > > +	kbuf.buf_align = SZ_2M;
> > > +	kbuf.buf_max = ULONG_MAX;
> > > +	kbuf.top_down = true;
> > > +
> > > +	ret = kexec_add_buffer(&kbuf);
> > > +	if (ret)
> > > +		goto out_err;
> > > +	image->arch.dtb_mem = kbuf.mem;
> > > +	image->arch.dtb_buf = dtb;
> > > +
> > > +	pr_debug("Loaded dtb at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> > > +			kbuf.mem, dtb_len, dtb_len);
> > > +
> > > +	return 0;
> > > +
> > > +out_err:
> > > +	vfree(dtb);
> > > +	image->arch.dtb_buf = NULL;
> > 
> > Won't kimage_file_post_load_cleanup() always be called if we return an error
> > here? Why not leave the free()ing until then?
> 
> Right.
> The reason why I left the code here was that we'd better locally clean up
> all the stuff that were locally allocated if we trivially need to (and can)
> do so.
> 
> As it's redundant, I will remove it.

will remove only "image->arch.dtb_buf = NULL."

> Thanks,
> -Takahiro AKASHI
> 
> > 
> > > +	return ret;
> > > +}
> > 
> > 
> > 
> > Thanks,
> > 
> > James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 05/11] arm64: kexec_file: load initrd and device-tree
@ 2018-05-18  7:42         ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-18  7:42 UTC (permalink / raw)
  To: James Morse, catalin.marinas, will.deacon, dhowells, vgoyal,
	herbert, davem, dyoung, bhe, arnd, ard.biesheuvel, bhsharma,
	kexec, linux-arm-kernel, linux-kernel

On Fri, May 18, 2018 at 04:11:35PM +0900, AKASHI Takahiro wrote:
> James,
> 
> On Tue, May 15, 2018 at 05:20:00PM +0100, James Morse wrote:
> > Hi Akashi,
> > 
> > On 25/04/18 07:26, AKASHI Takahiro wrote:
> > > load_other_segments() is expected to allocate and place all the necessary
> > > memory segments other than kernel, including initrd and device-tree
> > > blob (and elf core header for crash).
> > > While most of the code was borrowed from kexec-tools' counterpart,
> > > users may not be allowed to specify dtb explicitly, instead, the dtb
> > > presented by a boot loader is reused.
> > 
> > (Nit: "a boot loader" -> "the original boot loader")
> 
> OK
> 
> > > arch_kimage_kernel_post_load_cleanup() is responsible for freeing arm64-
> > > specific data allocated in load_other_segments().
> > 
> > 
> > > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > > index f9ebf54ca247..b3b9b1725d8a 100644
> > > --- a/arch/arm64/kernel/machine_kexec_file.c
> > > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > > @@ -13,7 +13,26 @@
> > >  #include <linux/ioport.h>
> > >  #include <linux/kernel.h>
> > >  #include <linux/kexec.h>
> > > +#include <linux/libfdt.h>
> > >  #include <linux/memblock.h>
> > > +#include <linux/of_fdt.h>
> > > +#include <linux/types.h>
> > > +#include <asm/byteorder.h>
> > > +
> > > +static int __dt_root_addr_cells;
> > > +static int __dt_root_size_cells;
> > 
> > > @@ -55,3 +74,144 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> > >  
> > >  	return ret;
> > >  }
> > > +
> > > +static int setup_dtb(struct kimage *image,
> > > +		unsigned long initrd_load_addr, unsigned long initrd_len,
> > > +		char *cmdline, unsigned long cmdline_len,
> > > +		char **dtb_buf, size_t *dtb_buf_len)
> > > +{
> > > +	char *buf = NULL;
> > > +	size_t buf_size;
> > > +	int nodeoffset;
> > > +	u64 value;
> > > +	int range_len;
> > > +	int ret;
> > > +
> > > +	/* duplicate dt blob */
> > > +	buf_size = fdt_totalsize(initial_boot_params);
> > > +	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> > 
> > These two cells values are 0 here. Did you want
> > arch_kexec_file_init() in patch 7 in this patch?
> > 
> > Ah, range_len isn't used, so, did you want the cells values and this range_len
> > thing in in patch 7!?
> 
> Umm, this problem has long existed since my v1 :)
> I might better re-think about patch order.
> 
> > 
> > > +
> > > +	if (initrd_load_addr)
> > > +		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
> > > +				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
> > > +
> > > +	if (cmdline)
> > > +		buf_size += fdt_prop_len("bootargs", cmdline_len + 1);
> > 
> > I can't find where fdt_prop_len() .... oh, patch 7. fdt_prop_len() doesn't look
> > like the sort of thing that should be created here, but I agree there isn't an
> > existing API to do this.
> 
> Will take care of it.
> 
> 
> > (This must be why powerpc guesses that the fdt won't be more than double in size).
> > 
> > 
> > > +	buf = vmalloc(buf_size);
> > > +	if (!buf) {
> > > +		ret = -ENOMEM;
> > > +		goto out_err;
> > > +	}
> > > +
> > > +	ret = fdt_open_into(initial_boot_params, buf, buf_size);
> > > +	if (ret)
> > > +		goto out_err;
> > > +
> > > +	nodeoffset = fdt_path_offset(buf, "/chosen");
> > > +	if (nodeoffset < 0)
> > > +		goto out_err;
> > > +
> > > +	/* add bootargs */
> > > +	if (cmdline) {
> > > +		ret = fdt_setprop(buf, nodeoffset, "bootargs",
> > > +						cmdline, cmdline_len + 1);
> > 
> > fdt_setprop_string()?
> 
> OK

cmdline_len is passed by system call, kexec_file_load(), and this means
that we can't believe that cmdline is always terminated with '\0'.
> 
> > 
> > > +		if (ret)
> > > +			goto out_err;
> > > +	}
> > > +
> > > +	/* add initrd-* */
> > > +	if (initrd_load_addr) {
> > > +		value = cpu_to_fdt64(initrd_load_addr);
> > > +		ret = fdt_setprop(buf, nodeoffset, "linux,initrd-start",
> > > +				&value, sizeof(value));
> > 
> > sizeof(value) was assumed to be the same as sizeof(u64) earlier.
> > fdt_setprop_u64()?
> 
> OK
> 
> > 
> > > +		if (ret)
> > > +			goto out_err;
> > > +
> > > +		value = cpu_to_fdt64(initrd_load_addr + initrd_len);
> > > +		ret = fdt_setprop(buf, nodeoffset, "linux,initrd-end",
> > > +				&value, sizeof(value));
> > > +		if (ret)
> > > +			goto out_err;
> > > +	}
> > > +
> > > +	/* trim a buffer */
> > > +	fdt_pack(buf);
> > > +	*dtb_buf = buf;
> > > +	*dtb_buf_len = fdt_totalsize(buf);
> > > +
> > > +	return 0;
> > > +
> > > +out_err:
> > > +	vfree(buf);
> > > +	return ret;
> > > +}
> > 
> > While powerpc has some similar code for updating the initrd and cmdline, it
> > makes different assumptions about the size of the dt, and has different behavior
> > for memreserve. (looks like we don't expect the initramfs to be memreserved).
> > Lets leave unifying that stuff where possible for the future.
> 
> Sure
> 
> > > +int load_other_segments(struct kimage *image,
> > > +			char *initrd, unsigned long initrd_len,
> > > +			char *cmdline, unsigned long cmdline_len)
> > > +{
> > > +	struct kexec_segment *kern_seg;
> > > +	struct kexec_buf kbuf;
> > > +	unsigned long initrd_load_addr = 0;
> > > +	char *dtb = NULL;
> > > +	unsigned long dtb_len = 0;
> > > +	int ret = 0;
> > > +
> > > +	kern_seg = &image->segment[image->arch.kern_segment];
> > > +	kbuf.image = image;
> > > +	/* not allocate anything below the kernel */
> > > +	kbuf.buf_min = kern_seg->mem + kern_seg->memsz;
> > 
> > > +	/* load initrd */
> > > +	if (initrd) {
> > > +		kbuf.buffer = initrd;
> > > +		kbuf.bufsz = initrd_len;
> > > +		kbuf.memsz = initrd_len;
> > 
> > > +		kbuf.buf_align = 0;
> > 
> > I'm surprised there initrd has no alignment requirement,
> 
> MeToo.
> 
> > but kexec_add_buffer()
> > rounds this up to PAGE_SIZE.
> 
> It seems that kimage_load_segment() requires this, but I'm not sure.
> 
> > 
> > > +		/* within 1GB-aligned window of up to 32GB in size */
> > > +		kbuf.buf_max = round_down(kern_seg->mem, SZ_1G)
> > > +						+ (unsigned long)SZ_1G * 32;
> > > +		kbuf.top_down = false;
> > > +
> > > +		ret = kexec_add_buffer(&kbuf);
> > > +		if (ret)
> > > +			goto out_err;
> > > +		initrd_load_addr = kbuf.mem;
> > > +
> > > +		pr_debug("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> > > +				initrd_load_addr, initrd_len, initrd_len);
> > > +	}
> > > +
> > > +	/* load dtb blob */
> > > +	ret = setup_dtb(image, initrd_load_addr, initrd_len,
> > > +				cmdline, cmdline_len, &dtb, &dtb_len);
> > > +	if (ret) {
> > > +		pr_err("Preparing for new dtb failed\n");
> > > +		goto out_err;
> > > +	}
> > > +
> > > +	kbuf.buffer = dtb;
> > > +	kbuf.bufsz = dtb_len;
> > > +	kbuf.memsz = dtb_len;
> > > +	/* not across 2MB boundary */
> > > +	kbuf.buf_align = SZ_2M;
> > > +	kbuf.buf_max = ULONG_MAX;
> > > +	kbuf.top_down = true;
> > > +
> > > +	ret = kexec_add_buffer(&kbuf);
> > > +	if (ret)
> > > +		goto out_err;
> > > +	image->arch.dtb_mem = kbuf.mem;
> > > +	image->arch.dtb_buf = dtb;
> > > +
> > > +	pr_debug("Loaded dtb at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
> > > +			kbuf.mem, dtb_len, dtb_len);
> > > +
> > > +	return 0;
> > > +
> > > +out_err:
> > > +	vfree(dtb);
> > > +	image->arch.dtb_buf = NULL;
> > 
> > Won't kimage_file_post_load_cleanup() always be called if we return an error
> > here? Why not leave the free()ing until then?
> 
> Right.
> The reason why I left the code here was that we'd better locally clean up
> all the stuff that were locally allocated if we trivially need to (and can)
> do so.
> 
> As it's redundant, I will remove it.

will remove only "image->arch.dtb_buf = NULL."

> Thanks,
> -Takahiro AKASHI
> 
> > 
> > > +	return ret;
> > > +}
> > 
> > 
> > 
> > Thanks,
> > 
> > James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support
  2018-05-16 10:06       ` James Morse
  (?)
@ 2018-05-18  9:50         ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-18  9:50 UTC (permalink / raw)
  To: James Morse
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

On Wed, May 16, 2018 at 11:06:02AM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 15/05/18 18:11, James Morse wrote:
> > On 25/04/18 07:26, AKASHI Takahiro wrote:
> >> Enabling crash dump (kdump) includes
> >> * prepare contents of ELF header of a core dump file, /proc/vmcore,
> >>   using crash_prepare_elf64_headers(), and
> >> * add two device tree properties, "linux,usable-memory-range" and
> >>   "linux,elfcorehdr", which represent repsectively a memory range
> >>   to be used by crash dump kernel and the header's location
> 
> >> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> >> index 37c0a9dc2e47..ec674f4d267c 100644
> >> --- a/arch/arm64/kernel/machine_kexec_file.c
> >> +++ b/arch/arm64/kernel/machine_kexec_file.c
> 
> >> +static struct crash_mem *get_crash_memory_ranges(void)
> >> +{
> >> +	unsigned int nr_ranges;
> >> +	struct crash_mem *cmem;
> >> +
> >> +	nr_ranges = 1; /* for exclusion of crashkernel region */
> >> +	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ranges_callback);
> >> +
> >> +	cmem = vmalloc(sizeof(struct crash_mem) +
> >> +			sizeof(struct crash_mem_range) * nr_ranges);
> >> +	if (!cmem)
> >> +		return NULL;
> >> +
> >> +	cmem->max_nr_ranges = nr_ranges;
> >> +	cmem->nr_ranges = 0;
> >> +	walk_system_ram_res(0, -1, cmem, add_mem_range_callback);
> >> +
> >> +	/* Exclude crashkernel region */
> >> +	if (crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end)) {
> >> +		vfree(cmem);
> >> +		return NULL;
> >> +	}
> >> +
> >> +	return cmem;
> >> +}
> > 
> > Could this function be included in prepare_elf_headers() so that the alloc() and
> > free() occur together.
> > 
> > 
> >> +static int prepare_elf_headers(void **addr, unsigned long *sz)
> >> +{
> >> +	struct crash_mem *cmem;
> >> +	int ret = 0;
> >> +
> >> +	cmem = get_crash_memory_ranges();
> >> +	if (!cmem)
> >> +		return -ENOMEM;
> >> +
> >> +	ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
> >> +
> >> +	vfree(cmem);
> > 
> >> +	return ret;
> >> +}
> > 
> > All this is moving memory-range information from core-code's
> > walk_system_ram_res() into core-code's struct crash_mem, and excluding
> > crashk_res, which again is accessible to the core code.
> > 
> > It looks like this is duplicated in arch/x86 and arch/arm64 because arm64
> > doesn't have a second 'crashk_low_res' region, and always wants elf64, instead
> > of when IS_ENABLED(CONFIG_X86_64).
> 
> Thinking about it some more: don't we want to walk memblock here, not
> walk_system_ram_res()? What we want is a list of not-nomap regions that the
> kernel may have been using, to form part of vmcore.
> walk_system_ram_res() is becoming a murkier list of maybe-nomap, maybe-reserved.
> 
> I think we should walk the same list here as we do in patch 4.

For consistency, yes.
I missed that.

-Takahiro AKASHI

> 
> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-05-18  9:50         ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-18  9:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 16, 2018 at 11:06:02AM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 15/05/18 18:11, James Morse wrote:
> > On 25/04/18 07:26, AKASHI Takahiro wrote:
> >> Enabling crash dump (kdump) includes
> >> * prepare contents of ELF header of a core dump file, /proc/vmcore,
> >>   using crash_prepare_elf64_headers(), and
> >> * add two device tree properties, "linux,usable-memory-range" and
> >>   "linux,elfcorehdr", which represent repsectively a memory range
> >>   to be used by crash dump kernel and the header's location
> 
> >> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> >> index 37c0a9dc2e47..ec674f4d267c 100644
> >> --- a/arch/arm64/kernel/machine_kexec_file.c
> >> +++ b/arch/arm64/kernel/machine_kexec_file.c
> 
> >> +static struct crash_mem *get_crash_memory_ranges(void)
> >> +{
> >> +	unsigned int nr_ranges;
> >> +	struct crash_mem *cmem;
> >> +
> >> +	nr_ranges = 1; /* for exclusion of crashkernel region */
> >> +	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ranges_callback);
> >> +
> >> +	cmem = vmalloc(sizeof(struct crash_mem) +
> >> +			sizeof(struct crash_mem_range) * nr_ranges);
> >> +	if (!cmem)
> >> +		return NULL;
> >> +
> >> +	cmem->max_nr_ranges = nr_ranges;
> >> +	cmem->nr_ranges = 0;
> >> +	walk_system_ram_res(0, -1, cmem, add_mem_range_callback);
> >> +
> >> +	/* Exclude crashkernel region */
> >> +	if (crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end)) {
> >> +		vfree(cmem);
> >> +		return NULL;
> >> +	}
> >> +
> >> +	return cmem;
> >> +}
> > 
> > Could this function be included in prepare_elf_headers() so that the alloc() and
> > free() occur together.
> > 
> > 
> >> +static int prepare_elf_headers(void **addr, unsigned long *sz)
> >> +{
> >> +	struct crash_mem *cmem;
> >> +	int ret = 0;
> >> +
> >> +	cmem = get_crash_memory_ranges();
> >> +	if (!cmem)
> >> +		return -ENOMEM;
> >> +
> >> +	ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
> >> +
> >> +	vfree(cmem);
> > 
> >> +	return ret;
> >> +}
> > 
> > All this is moving memory-range information from core-code's
> > walk_system_ram_res() into core-code's struct crash_mem, and excluding
> > crashk_res, which again is accessible to the core code.
> > 
> > It looks like this is duplicated in arch/x86 and arch/arm64 because arm64
> > doesn't have a second 'crashk_low_res' region, and always wants elf64, instead
> > of when IS_ENABLED(CONFIG_X86_64).
> 
> Thinking about it some more: don't we want to walk memblock here, not
> walk_system_ram_res()? What we want is a list of not-nomap regions that the
> kernel may have been using, to form part of vmcore.
> walk_system_ram_res() is becoming a murkier list of maybe-nomap, maybe-reserved.
> 
> I think we should walk the same list here as we do in patch 4.

For consistency, yes.
I missed that.

-Takahiro AKASHI

> 
> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-05-18  9:50         ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-18  9:50 UTC (permalink / raw)
  To: James Morse
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

On Wed, May 16, 2018 at 11:06:02AM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 15/05/18 18:11, James Morse wrote:
> > On 25/04/18 07:26, AKASHI Takahiro wrote:
> >> Enabling crash dump (kdump) includes
> >> * prepare contents of ELF header of a core dump file, /proc/vmcore,
> >>   using crash_prepare_elf64_headers(), and
> >> * add two device tree properties, "linux,usable-memory-range" and
> >>   "linux,elfcorehdr", which represent repsectively a memory range
> >>   to be used by crash dump kernel and the header's location
> 
> >> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> >> index 37c0a9dc2e47..ec674f4d267c 100644
> >> --- a/arch/arm64/kernel/machine_kexec_file.c
> >> +++ b/arch/arm64/kernel/machine_kexec_file.c
> 
> >> +static struct crash_mem *get_crash_memory_ranges(void)
> >> +{
> >> +	unsigned int nr_ranges;
> >> +	struct crash_mem *cmem;
> >> +
> >> +	nr_ranges = 1; /* for exclusion of crashkernel region */
> >> +	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ranges_callback);
> >> +
> >> +	cmem = vmalloc(sizeof(struct crash_mem) +
> >> +			sizeof(struct crash_mem_range) * nr_ranges);
> >> +	if (!cmem)
> >> +		return NULL;
> >> +
> >> +	cmem->max_nr_ranges = nr_ranges;
> >> +	cmem->nr_ranges = 0;
> >> +	walk_system_ram_res(0, -1, cmem, add_mem_range_callback);
> >> +
> >> +	/* Exclude crashkernel region */
> >> +	if (crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end)) {
> >> +		vfree(cmem);
> >> +		return NULL;
> >> +	}
> >> +
> >> +	return cmem;
> >> +}
> > 
> > Could this function be included in prepare_elf_headers() so that the alloc() and
> > free() occur together.
> > 
> > 
> >> +static int prepare_elf_headers(void **addr, unsigned long *sz)
> >> +{
> >> +	struct crash_mem *cmem;
> >> +	int ret = 0;
> >> +
> >> +	cmem = get_crash_memory_ranges();
> >> +	if (!cmem)
> >> +		return -ENOMEM;
> >> +
> >> +	ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
> >> +
> >> +	vfree(cmem);
> > 
> >> +	return ret;
> >> +}
> > 
> > All this is moving memory-range information from core-code's
> > walk_system_ram_res() into core-code's struct crash_mem, and excluding
> > crashk_res, which again is accessible to the core code.
> > 
> > It looks like this is duplicated in arch/x86 and arch/arm64 because arm64
> > doesn't have a second 'crashk_low_res' region, and always wants elf64, instead
> > of when IS_ENABLED(CONFIG_X86_64).
> 
> Thinking about it some more: don't we want to walk memblock here, not
> walk_system_ram_res()? What we want is a list of not-nomap regions that the
> kernel may have been using, to form part of vmcore.
> walk_system_ram_res() is becoming a murkier list of maybe-nomap, maybe-reserved.
> 
> I think we should walk the same list here as we do in patch 4.

For consistency, yes.
I missed that.

-Takahiro AKASHI

> 
> 
> Thanks,
> 
> James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support
  2018-05-16  8:34       ` James Morse
  (?)
@ 2018-05-18  9:58         ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-18  9:58 UTC (permalink / raw)
  To: James Morse
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

On Wed, May 16, 2018 at 09:34:41AM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 15/05/18 18:11, James Morse wrote:
> > On 25/04/18 07:26, AKASHI Takahiro wrote:
> >> Enabling crash dump (kdump) includes
> >> * prepare contents of ELF header of a core dump file, /proc/vmcore,
> >>   using crash_prepare_elf64_headers(), and
> >> * add two device tree properties, "linux,usable-memory-range" and
> >>   "linux,elfcorehdr", which represent repsectively a memory range
> >>   to be used by crash dump kernel and the header's location
> 
> >> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> >> index 37c0a9dc2e47..ec674f4d267c 100644
> >> --- a/arch/arm64/kernel/machine_kexec_file.c
> >> +++ b/arch/arm64/kernel/machine_kexec_file.c
> >> @@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> 
> >> +static void fill_property(void *buf, u64 val64, int cells)
> >> +{
> >> +	u32 val32;
> >> +
> >> +	if (cells == 1) {
> >> +		val32 = cpu_to_fdt32((u32)val64);
> >> +		memcpy(buf, &val32, sizeof(val32));
> >> +	} else {
> > 
> >> +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
> >> +		buf += cells * sizeof(u32) - sizeof(u64);
> > 
> > Is this trying to clear the 'top' cells and shuffle the pointer to point at the
> > 'bottom' 2? I'm pretty sure this isn't endian safe.
> 
> It came to me at 2am: this only works on big-endian, which is exactly what you
> want as that is the DT format.

Oops, I was almost tricked as I haven't tested kexec on BE
for a long time :)

Thanks,
-Takahiro AKASHI

> 
> > Do we really expect a system to have #address-cells > 2?
> 
> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-05-18  9:58         ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-18  9:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 16, 2018 at 09:34:41AM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 15/05/18 18:11, James Morse wrote:
> > On 25/04/18 07:26, AKASHI Takahiro wrote:
> >> Enabling crash dump (kdump) includes
> >> * prepare contents of ELF header of a core dump file, /proc/vmcore,
> >>   using crash_prepare_elf64_headers(), and
> >> * add two device tree properties, "linux,usable-memory-range" and
> >>   "linux,elfcorehdr", which represent repsectively a memory range
> >>   to be used by crash dump kernel and the header's location
> 
> >> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> >> index 37c0a9dc2e47..ec674f4d267c 100644
> >> --- a/arch/arm64/kernel/machine_kexec_file.c
> >> +++ b/arch/arm64/kernel/machine_kexec_file.c
> >> @@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> 
> >> +static void fill_property(void *buf, u64 val64, int cells)
> >> +{
> >> +	u32 val32;
> >> +
> >> +	if (cells == 1) {
> >> +		val32 = cpu_to_fdt32((u32)val64);
> >> +		memcpy(buf, &val32, sizeof(val32));
> >> +	} else {
> > 
> >> +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
> >> +		buf += cells * sizeof(u32) - sizeof(u64);
> > 
> > Is this trying to clear the 'top' cells and shuffle the pointer to point at the
> > 'bottom' 2? I'm pretty sure this isn't endian safe.
> 
> It came to me at 2am: this only works on big-endian, which is exactly what you
> want as that is the DT format.

Oops, I was almost tricked as I haven't tested kexec on BE
for a long time :)

Thanks,
-Takahiro AKASHI

> 
> > Do we really expect a system to have #address-cells > 2?
> 
> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-05-18  9:58         ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-18  9:58 UTC (permalink / raw)
  To: James Morse
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

On Wed, May 16, 2018 at 09:34:41AM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 15/05/18 18:11, James Morse wrote:
> > On 25/04/18 07:26, AKASHI Takahiro wrote:
> >> Enabling crash dump (kdump) includes
> >> * prepare contents of ELF header of a core dump file, /proc/vmcore,
> >>   using crash_prepare_elf64_headers(), and
> >> * add two device tree properties, "linux,usable-memory-range" and
> >>   "linux,elfcorehdr", which represent repsectively a memory range
> >>   to be used by crash dump kernel and the header's location
> 
> >> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> >> index 37c0a9dc2e47..ec674f4d267c 100644
> >> --- a/arch/arm64/kernel/machine_kexec_file.c
> >> +++ b/arch/arm64/kernel/machine_kexec_file.c
> >> @@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> 
> >> +static void fill_property(void *buf, u64 val64, int cells)
> >> +{
> >> +	u32 val32;
> >> +
> >> +	if (cells == 1) {
> >> +		val32 = cpu_to_fdt32((u32)val64);
> >> +		memcpy(buf, &val32, sizeof(val32));
> >> +	} else {
> > 
> >> +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
> >> +		buf += cells * sizeof(u32) - sizeof(u64);
> > 
> > Is this trying to clear the 'top' cells and shuffle the pointer to point at the
> > 'bottom' 2? I'm pretty sure this isn't endian safe.
> 
> It came to me at 2am: this only works on big-endian, which is exactly what you
> want as that is the DT format.

Oops, I was almost tricked as I haven't tested kexec on BE
for a long time :)

Thanks,
-Takahiro AKASHI

> 
> > Do we really expect a system to have #address-cells > 2?
> 
> 
> Thanks,
> 
> James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support
  2018-05-15 17:11     ` James Morse
  (?)
@ 2018-05-18 10:39       ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-18 10:39 UTC (permalink / raw)
  To: James Morse
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

On Tue, May 15, 2018 at 06:11:15PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > Enabling crash dump (kdump) includes
> > * prepare contents of ELF header of a core dump file, /proc/vmcore,
> >   using crash_prepare_elf64_headers(), and
> > * add two device tree properties, "linux,usable-memory-range" and
> >   "linux,elfcorehdr", which represent repsectively a memory range
> 
> (Nit: respectively)

Will fix.

> 
> >   to be used by crash dump kernel and the header's location
> 
> >  arch/arm64/include/asm/kexec.h         |   4 +
> >  arch/arm64/kernel/kexec_image.c        |   9 +-
> >  arch/arm64/kernel/machine_kexec_file.c | 202 +++++++++++++++++++++++++
> 
> In this patch, machine_kexec_file.c gains its own private fdt array encoder.

See below.

> 
> > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > index 37c0a9dc2e47..ec674f4d267c 100644
> > --- a/arch/arm64/kernel/machine_kexec_file.c
> > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > @@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> >  	return ret;
> >  }
> >  
> > +static int __init arch_kexec_file_init(void)
> > +{
> > +	/* Those values are used later on loading the kernel */
> > +	__dt_root_addr_cells = dt_root_addr_cells;
> > +	__dt_root_size_cells = dt_root_size_cells;
> > +
> > +	return 0;
> > +}
> > +late_initcall(arch_kexec_file_init);
> 
> If we need these is it worth taking them out of __initdata? I note they've been
> 'temporary' for quite a long time.

I think that I had some reason that I didn't do that, but don't remember now.
If there's no problem, I will take your suggestion.

> 
> > +
> > +#define FDT_ALIGN(x, a)	(((x) + (a) - 1) & ~((a) - 1))
> > +#define FDT_TAGALIGN(x)	(FDT_ALIGN((x), FDT_TAGSIZE))
> > +
> > +static int fdt_prop_len(const char *prop_name, int len)
> > +{
> > +	return (strlen(prop_name) + 1) +
> > +		sizeof(struct fdt_property) +
> > +		FDT_TAGALIGN(len);
> > +}
> 
> This stuff should really be in libfdt.h  Those macros come from
> libfdt_internal.h, so we're probably doing something wrong here.
> 
> 
> > +static bool cells_size_fitted(unsigned long base, unsigned long size)
> > +{
> > +	/* if *_cells >= 2, cells can hold 64-bit values anyway */
> > +	if ((__dt_root_addr_cells == 1) && (base >= (1ULL << 32)))
> > +		return false;
> > +
> > +	if ((__dt_root_size_cells == 1) && (size >= (1ULL << 32)))
> > +		return false;
> 
> Using '> U32_MAX' here may be more readable.

OK

> 
> > +	return true;
> > +}
> > +
> > +static void fill_property(void *buf, u64 val64, int cells)
> > +{
> > +	u32 val32;
> > +
> > +	if (cells == 1) {
> > +		val32 = cpu_to_fdt32((u32)val64);
> > +		memcpy(buf, &val32, sizeof(val32));
> > +	} else {
> 
> > +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
> > +		buf += cells * sizeof(u32) - sizeof(u64);
> 
> Is this trying to clear the 'top' cells and shuffle the pointer to point at the
> 'bottom' 2? I'm pretty sure this isn't endian safe.
> 
> Do we really expect a system to have #address-cells > 2?

I don't know, but just for safety.

> 
> > +		val64 = cpu_to_fdt64(val64);
> > +		memcpy(buf, &val64, sizeof(val64));
> > +	}
> > +}
> > +
> > +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
> > +				unsigned long addr, unsigned long size)
> 
> (the device-tree spec describes a 'ranges' property, which had me confused. This
> is encoding a prop-encoded-array)

Should we rename it to, say, fdt_setprop_reg()?


> > +{
> > +	void *buf, *prop;
> > +	size_t buf_size;
> > +	int result;
> > +
> > +	buf_size = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> > +	prop = buf = vmalloc(buf_size);
> 
> virtual memory allocation for something less than PAGE_SIZE?

I've never cared about that. Let me think again.

> 
> > +	if (!buf)
> > +		return -ENOMEM;
> > +
> > +	fill_property(prop, addr, __dt_root_addr_cells);
> > +	prop += __dt_root_addr_cells * sizeof(u32);
> > +
> > +	fill_property(prop, size, __dt_root_size_cells);
> > +
> > +	result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
> > +
> > +	vfree(buf);
> > +
> > +	return result;
> > +}
> 
> Doesn't this stuff belong in libfdt? I guess there is no 'add array element' api
> because this the first time we've wanted to create a node with more than
> key=fixed-size-value.
> 
> I don't think this belongs in arch C code. Do we have a plan for getting libfdt
> to support encoding prop-arrays? Can we put it somewhere anyone else duplicating
> this will find it, until we can (re)move it?

I will temporarily move all fdt-related stuff to a separate file, but

> I have no idea how that happens... it looks like the devicetree list is the
> place to ask.

should we always sync with the original dtc/libfdt repository?

> 
> >  static int setup_dtb(struct kimage *image,
> >  		unsigned long initrd_load_addr, unsigned long initrd_len,
> >  		char *cmdline, unsigned long cmdline_len,
> > @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
> >  	int range_len;
> >  	int ret;
> >  
> > +	/* check ranges against root's #address-cells and #size-cells */
> > +	if (image->type == KEXEC_TYPE_CRASH &&
> > +		(!cells_size_fitted(image->arch.elf_load_addr,
> > +				image->arch.elf_headers_sz) ||
> > +		 !cells_size_fitted(crashk_res.start,
> > +				crashk_res.end - crashk_res.start + 1))) {
> > +		pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
> > +		ret = -EINVAL;
> > +		goto out_err;
> > +	}
> 
> To check I've understood this properly: This can happen if the firmware provided
> a DTB with 32bit address/size cells, but at least some of the memory requires 64
> bit address/size cells. This could only happen on a UEFI system where the
> firmware-DTB doesn't describe memory. ACPI-only systems would have the EFIstub DT.

Probably, yes. I assumed the case where #address-cells and #size-cells
were just missing in fdt.

> 
> >  	/* duplicate dt blob */
> >  	buf_size = fdt_totalsize(initial_boot_params);
> >  	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> >  
> > +	if (image->type == KEXEC_TYPE_CRASH)
> > +		buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
> > +				+ fdt_prop_len("linux,usable-memory-range",
> > +								range_len);

                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> > +
> >  	if (initrd_load_addr)
> >  		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
> >  				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
> > @@ -113,6 +206,23 @@ static int setup_dtb(struct kimage *image,
> >  	if (nodeoffset < 0)
> >  		goto out_err;
> >  
> > +	if (image->type == KEXEC_TYPE_CRASH) {
> > +		/* add linux,elfcorehdr */
> > +		ret = fdt_setprop_range(buf, nodeoffset, "linux,elfcorehdr",
> > +				image->arch.elf_load_addr,
> > +				image->arch.elf_headers_sz);
> > +		if (ret)
> > +			goto out_err;
> > +
> > +		/* add linux,usable-memory-range */
> > +		ret = fdt_setprop_range(buf, nodeoffset,
> > +				"linux,usable-memory-range",
> > +				crashk_res.start,
> > +				crashk_res.end - crashk_res.start + 1);
> 
> Don't you need to add "linux,usable-memory-range" to the buf_size estimate?

I think the code exists. See above.

> 
> > +		if (ret)
> > +			goto out_err;
> > +	}
> 
> > @@ -148,17 +258,109 @@ static int setup_dtb(struct kimage *image,
> 
> > +static struct crash_mem *get_crash_memory_ranges(void)
> > +{
> > +	unsigned int nr_ranges;
> > +	struct crash_mem *cmem;
> > +
> > +	nr_ranges = 1; /* for exclusion of crashkernel region */
> > +	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ranges_callback);
> > +
> > +	cmem = vmalloc(sizeof(struct crash_mem) +
> > +			sizeof(struct crash_mem_range) * nr_ranges);
> > +	if (!cmem)
> > +		return NULL;
> > +
> > +	cmem->max_nr_ranges = nr_ranges;
> > +	cmem->nr_ranges = 0;
> > +	walk_system_ram_res(0, -1, cmem, add_mem_range_callback);
> > +
> > +	/* Exclude crashkernel region */
> > +	if (crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end)) {
> > +		vfree(cmem);
> > +		return NULL;
> > +	}
> > +
> > +	return cmem;
> > +}
> 
> Could this function be included in prepare_elf_headers() so that the alloc() and
> free() occur together.


Or aiming that arm64 and x86 have similar-look code?

> 
> > +static int prepare_elf_headers(void **addr, unsigned long *sz)
> > +{
> > +	struct crash_mem *cmem;
> > +	int ret = 0;
> > +
> > +	cmem = get_crash_memory_ranges();
> > +	if (!cmem)
> > +		return -ENOMEM;
> > +
> > +	ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
> > +
> > +	vfree(cmem);
> 
> > +	return ret;
> > +}
> 
> All this is moving memory-range information from core-code's
> walk_system_ram_res() into core-code's struct crash_mem, and excluding
> crashk_res, which again is accessible to the core code.
> 
> It looks like this is duplicated in arch/x86 and arch/arm64 because arm64
> doesn't have a second 'crashk_low_res' region, and always wants elf64, instead
> of when IS_ENABLED(CONFIG_X86_64).
> If we can abstract just those two, more of this could be moved to core code
> where powerpc can make use of it if they want to support kdump with
> kexec_file_load().
> 
> But, its getting late for cross-architecture dependencies, lets put that on the
> for-later list. (assuming there isn't a powerpc-kdump series out there adding a
> third copy of this)

Sure. X86 code has so many exceptional lines in the code :)

Thanks,
-Takahiro AKASHI


> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-05-18 10:39       ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-18 10:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 15, 2018 at 06:11:15PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > Enabling crash dump (kdump) includes
> > * prepare contents of ELF header of a core dump file, /proc/vmcore,
> >   using crash_prepare_elf64_headers(), and
> > * add two device tree properties, "linux,usable-memory-range" and
> >   "linux,elfcorehdr", which represent repsectively a memory range
> 
> (Nit: respectively)

Will fix.

> 
> >   to be used by crash dump kernel and the header's location
> 
> >  arch/arm64/include/asm/kexec.h         |   4 +
> >  arch/arm64/kernel/kexec_image.c        |   9 +-
> >  arch/arm64/kernel/machine_kexec_file.c | 202 +++++++++++++++++++++++++
> 
> In this patch, machine_kexec_file.c gains its own private fdt array encoder.

See below.

> 
> > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > index 37c0a9dc2e47..ec674f4d267c 100644
> > --- a/arch/arm64/kernel/machine_kexec_file.c
> > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > @@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> >  	return ret;
> >  }
> >  
> > +static int __init arch_kexec_file_init(void)
> > +{
> > +	/* Those values are used later on loading the kernel */
> > +	__dt_root_addr_cells = dt_root_addr_cells;
> > +	__dt_root_size_cells = dt_root_size_cells;
> > +
> > +	return 0;
> > +}
> > +late_initcall(arch_kexec_file_init);
> 
> If we need these is it worth taking them out of __initdata? I note they've been
> 'temporary' for quite a long time.

I think that I had some reason that I didn't do that, but don't remember now.
If there's no problem, I will take your suggestion.

> 
> > +
> > +#define FDT_ALIGN(x, a)	(((x) + (a) - 1) & ~((a) - 1))
> > +#define FDT_TAGALIGN(x)	(FDT_ALIGN((x), FDT_TAGSIZE))
> > +
> > +static int fdt_prop_len(const char *prop_name, int len)
> > +{
> > +	return (strlen(prop_name) + 1) +
> > +		sizeof(struct fdt_property) +
> > +		FDT_TAGALIGN(len);
> > +}
> 
> This stuff should really be in libfdt.h  Those macros come from
> libfdt_internal.h, so we're probably doing something wrong here.
> 
> 
> > +static bool cells_size_fitted(unsigned long base, unsigned long size)
> > +{
> > +	/* if *_cells >= 2, cells can hold 64-bit values anyway */
> > +	if ((__dt_root_addr_cells == 1) && (base >= (1ULL << 32)))
> > +		return false;
> > +
> > +	if ((__dt_root_size_cells == 1) && (size >= (1ULL << 32)))
> > +		return false;
> 
> Using '> U32_MAX' here may be more readable.

OK

> 
> > +	return true;
> > +}
> > +
> > +static void fill_property(void *buf, u64 val64, int cells)
> > +{
> > +	u32 val32;
> > +
> > +	if (cells == 1) {
> > +		val32 = cpu_to_fdt32((u32)val64);
> > +		memcpy(buf, &val32, sizeof(val32));
> > +	} else {
> 
> > +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
> > +		buf += cells * sizeof(u32) - sizeof(u64);
> 
> Is this trying to clear the 'top' cells and shuffle the pointer to point at the
> 'bottom' 2? I'm pretty sure this isn't endian safe.
> 
> Do we really expect a system to have #address-cells > 2?

I don't know, but just for safety.

> 
> > +		val64 = cpu_to_fdt64(val64);
> > +		memcpy(buf, &val64, sizeof(val64));
> > +	}
> > +}
> > +
> > +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
> > +				unsigned long addr, unsigned long size)
> 
> (the device-tree spec describes a 'ranges' property, which had me confused. This
> is encoding a prop-encoded-array)

Should we rename it to, say, fdt_setprop_reg()?


> > +{
> > +	void *buf, *prop;
> > +	size_t buf_size;
> > +	int result;
> > +
> > +	buf_size = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> > +	prop = buf = vmalloc(buf_size);
> 
> virtual memory allocation for something less than PAGE_SIZE?

I've never cared about that. Let me think again.

> 
> > +	if (!buf)
> > +		return -ENOMEM;
> > +
> > +	fill_property(prop, addr, __dt_root_addr_cells);
> > +	prop += __dt_root_addr_cells * sizeof(u32);
> > +
> > +	fill_property(prop, size, __dt_root_size_cells);
> > +
> > +	result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
> > +
> > +	vfree(buf);
> > +
> > +	return result;
> > +}
> 
> Doesn't this stuff belong in libfdt? I guess there is no 'add array element' api
> because this the first time we've wanted to create a node with more than
> key=fixed-size-value.
> 
> I don't think this belongs in arch C code. Do we have a plan for getting libfdt
> to support encoding prop-arrays? Can we put it somewhere anyone else duplicating
> this will find it, until we can (re)move it?

I will temporarily move all fdt-related stuff to a separate file, but

> I have no idea how that happens... it looks like the devicetree list is the
> place to ask.

should we always sync with the original dtc/libfdt repository?

> 
> >  static int setup_dtb(struct kimage *image,
> >  		unsigned long initrd_load_addr, unsigned long initrd_len,
> >  		char *cmdline, unsigned long cmdline_len,
> > @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
> >  	int range_len;
> >  	int ret;
> >  
> > +	/* check ranges against root's #address-cells and #size-cells */
> > +	if (image->type == KEXEC_TYPE_CRASH &&
> > +		(!cells_size_fitted(image->arch.elf_load_addr,
> > +				image->arch.elf_headers_sz) ||
> > +		 !cells_size_fitted(crashk_res.start,
> > +				crashk_res.end - crashk_res.start + 1))) {
> > +		pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
> > +		ret = -EINVAL;
> > +		goto out_err;
> > +	}
> 
> To check I've understood this properly: This can happen if the firmware provided
> a DTB with 32bit address/size cells, but at least some of the memory requires 64
> bit address/size cells. This could only happen on a UEFI system where the
> firmware-DTB doesn't describe memory. ACPI-only systems would have the EFIstub DT.

Probably, yes. I assumed the case where #address-cells and #size-cells
were just missing in fdt.

> 
> >  	/* duplicate dt blob */
> >  	buf_size = fdt_totalsize(initial_boot_params);
> >  	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> >  
> > +	if (image->type == KEXEC_TYPE_CRASH)
> > +		buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
> > +				+ fdt_prop_len("linux,usable-memory-range",
> > +								range_len);

                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> > +
> >  	if (initrd_load_addr)
> >  		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
> >  				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
> > @@ -113,6 +206,23 @@ static int setup_dtb(struct kimage *image,
> >  	if (nodeoffset < 0)
> >  		goto out_err;
> >  
> > +	if (image->type == KEXEC_TYPE_CRASH) {
> > +		/* add linux,elfcorehdr */
> > +		ret = fdt_setprop_range(buf, nodeoffset, "linux,elfcorehdr",
> > +				image->arch.elf_load_addr,
> > +				image->arch.elf_headers_sz);
> > +		if (ret)
> > +			goto out_err;
> > +
> > +		/* add linux,usable-memory-range */
> > +		ret = fdt_setprop_range(buf, nodeoffset,
> > +				"linux,usable-memory-range",
> > +				crashk_res.start,
> > +				crashk_res.end - crashk_res.start + 1);
> 
> Don't you need to add "linux,usable-memory-range" to the buf_size estimate?

I think the code exists. See above.

> 
> > +		if (ret)
> > +			goto out_err;
> > +	}
> 
> > @@ -148,17 +258,109 @@ static int setup_dtb(struct kimage *image,
> 
> > +static struct crash_mem *get_crash_memory_ranges(void)
> > +{
> > +	unsigned int nr_ranges;
> > +	struct crash_mem *cmem;
> > +
> > +	nr_ranges = 1; /* for exclusion of crashkernel region */
> > +	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ranges_callback);
> > +
> > +	cmem = vmalloc(sizeof(struct crash_mem) +
> > +			sizeof(struct crash_mem_range) * nr_ranges);
> > +	if (!cmem)
> > +		return NULL;
> > +
> > +	cmem->max_nr_ranges = nr_ranges;
> > +	cmem->nr_ranges = 0;
> > +	walk_system_ram_res(0, -1, cmem, add_mem_range_callback);
> > +
> > +	/* Exclude crashkernel region */
> > +	if (crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end)) {
> > +		vfree(cmem);
> > +		return NULL;
> > +	}
> > +
> > +	return cmem;
> > +}
> 
> Could this function be included in prepare_elf_headers() so that the alloc() and
> free() occur together.


Or aiming that arm64 and x86 have similar-look code?

> 
> > +static int prepare_elf_headers(void **addr, unsigned long *sz)
> > +{
> > +	struct crash_mem *cmem;
> > +	int ret = 0;
> > +
> > +	cmem = get_crash_memory_ranges();
> > +	if (!cmem)
> > +		return -ENOMEM;
> > +
> > +	ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
> > +
> > +	vfree(cmem);
> 
> > +	return ret;
> > +}
> 
> All this is moving memory-range information from core-code's
> walk_system_ram_res() into core-code's struct crash_mem, and excluding
> crashk_res, which again is accessible to the core code.
> 
> It looks like this is duplicated in arch/x86 and arch/arm64 because arm64
> doesn't have a second 'crashk_low_res' region, and always wants elf64, instead
> of when IS_ENABLED(CONFIG_X86_64).
> If we can abstract just those two, more of this could be moved to core code
> where powerpc can make use of it if they want to support kdump with
> kexec_file_load().
> 
> But, its getting late for cross-architecture dependencies, lets put that on the
> for-later list. (assuming there isn't a powerpc-kdump series out there adding a
> third copy of this)

Sure. X86 code has so many exceptional lines in the code :)

Thanks,
-Takahiro AKASHI


> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-05-18 10:39       ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-18 10:39 UTC (permalink / raw)
  To: James Morse
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

On Tue, May 15, 2018 at 06:11:15PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > Enabling crash dump (kdump) includes
> > * prepare contents of ELF header of a core dump file, /proc/vmcore,
> >   using crash_prepare_elf64_headers(), and
> > * add two device tree properties, "linux,usable-memory-range" and
> >   "linux,elfcorehdr", which represent repsectively a memory range
> 
> (Nit: respectively)

Will fix.

> 
> >   to be used by crash dump kernel and the header's location
> 
> >  arch/arm64/include/asm/kexec.h         |   4 +
> >  arch/arm64/kernel/kexec_image.c        |   9 +-
> >  arch/arm64/kernel/machine_kexec_file.c | 202 +++++++++++++++++++++++++
> 
> In this patch, machine_kexec_file.c gains its own private fdt array encoder.

See below.

> 
> > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > index 37c0a9dc2e47..ec674f4d267c 100644
> > --- a/arch/arm64/kernel/machine_kexec_file.c
> > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > @@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> >  	return ret;
> >  }
> >  
> > +static int __init arch_kexec_file_init(void)
> > +{
> > +	/* Those values are used later on loading the kernel */
> > +	__dt_root_addr_cells = dt_root_addr_cells;
> > +	__dt_root_size_cells = dt_root_size_cells;
> > +
> > +	return 0;
> > +}
> > +late_initcall(arch_kexec_file_init);
> 
> If we need these is it worth taking them out of __initdata? I note they've been
> 'temporary' for quite a long time.

I think that I had some reason that I didn't do that, but don't remember now.
If there's no problem, I will take your suggestion.

> 
> > +
> > +#define FDT_ALIGN(x, a)	(((x) + (a) - 1) & ~((a) - 1))
> > +#define FDT_TAGALIGN(x)	(FDT_ALIGN((x), FDT_TAGSIZE))
> > +
> > +static int fdt_prop_len(const char *prop_name, int len)
> > +{
> > +	return (strlen(prop_name) + 1) +
> > +		sizeof(struct fdt_property) +
> > +		FDT_TAGALIGN(len);
> > +}
> 
> This stuff should really be in libfdt.h  Those macros come from
> libfdt_internal.h, so we're probably doing something wrong here.
> 
> 
> > +static bool cells_size_fitted(unsigned long base, unsigned long size)
> > +{
> > +	/* if *_cells >= 2, cells can hold 64-bit values anyway */
> > +	if ((__dt_root_addr_cells == 1) && (base >= (1ULL << 32)))
> > +		return false;
> > +
> > +	if ((__dt_root_size_cells == 1) && (size >= (1ULL << 32)))
> > +		return false;
> 
> Using '> U32_MAX' here may be more readable.

OK

> 
> > +	return true;
> > +}
> > +
> > +static void fill_property(void *buf, u64 val64, int cells)
> > +{
> > +	u32 val32;
> > +
> > +	if (cells == 1) {
> > +		val32 = cpu_to_fdt32((u32)val64);
> > +		memcpy(buf, &val32, sizeof(val32));
> > +	} else {
> 
> > +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
> > +		buf += cells * sizeof(u32) - sizeof(u64);
> 
> Is this trying to clear the 'top' cells and shuffle the pointer to point at the
> 'bottom' 2? I'm pretty sure this isn't endian safe.
> 
> Do we really expect a system to have #address-cells > 2?

I don't know, but just for safety.

> 
> > +		val64 = cpu_to_fdt64(val64);
> > +		memcpy(buf, &val64, sizeof(val64));
> > +	}
> > +}
> > +
> > +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
> > +				unsigned long addr, unsigned long size)
> 
> (the device-tree spec describes a 'ranges' property, which had me confused. This
> is encoding a prop-encoded-array)

Should we rename it to, say, fdt_setprop_reg()?


> > +{
> > +	void *buf, *prop;
> > +	size_t buf_size;
> > +	int result;
> > +
> > +	buf_size = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> > +	prop = buf = vmalloc(buf_size);
> 
> virtual memory allocation for something less than PAGE_SIZE?

I've never cared about that. Let me think again.

> 
> > +	if (!buf)
> > +		return -ENOMEM;
> > +
> > +	fill_property(prop, addr, __dt_root_addr_cells);
> > +	prop += __dt_root_addr_cells * sizeof(u32);
> > +
> > +	fill_property(prop, size, __dt_root_size_cells);
> > +
> > +	result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
> > +
> > +	vfree(buf);
> > +
> > +	return result;
> > +}
> 
> Doesn't this stuff belong in libfdt? I guess there is no 'add array element' api
> because this the first time we've wanted to create a node with more than
> key=fixed-size-value.
> 
> I don't think this belongs in arch C code. Do we have a plan for getting libfdt
> to support encoding prop-arrays? Can we put it somewhere anyone else duplicating
> this will find it, until we can (re)move it?

I will temporarily move all fdt-related stuff to a separate file, but

> I have no idea how that happens... it looks like the devicetree list is the
> place to ask.

should we always sync with the original dtc/libfdt repository?

> 
> >  static int setup_dtb(struct kimage *image,
> >  		unsigned long initrd_load_addr, unsigned long initrd_len,
> >  		char *cmdline, unsigned long cmdline_len,
> > @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
> >  	int range_len;
> >  	int ret;
> >  
> > +	/* check ranges against root's #address-cells and #size-cells */
> > +	if (image->type == KEXEC_TYPE_CRASH &&
> > +		(!cells_size_fitted(image->arch.elf_load_addr,
> > +				image->arch.elf_headers_sz) ||
> > +		 !cells_size_fitted(crashk_res.start,
> > +				crashk_res.end - crashk_res.start + 1))) {
> > +		pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
> > +		ret = -EINVAL;
> > +		goto out_err;
> > +	}
> 
> To check I've understood this properly: This can happen if the firmware provided
> a DTB with 32bit address/size cells, but at least some of the memory requires 64
> bit address/size cells. This could only happen on a UEFI system where the
> firmware-DTB doesn't describe memory. ACPI-only systems would have the EFIstub DT.

Probably, yes. I assumed the case where #address-cells and #size-cells
were just missing in fdt.

> 
> >  	/* duplicate dt blob */
> >  	buf_size = fdt_totalsize(initial_boot_params);
> >  	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> >  
> > +	if (image->type == KEXEC_TYPE_CRASH)
> > +		buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
> > +				+ fdt_prop_len("linux,usable-memory-range",
> > +								range_len);

                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> > +
> >  	if (initrd_load_addr)
> >  		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
> >  				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
> > @@ -113,6 +206,23 @@ static int setup_dtb(struct kimage *image,
> >  	if (nodeoffset < 0)
> >  		goto out_err;
> >  
> > +	if (image->type == KEXEC_TYPE_CRASH) {
> > +		/* add linux,elfcorehdr */
> > +		ret = fdt_setprop_range(buf, nodeoffset, "linux,elfcorehdr",
> > +				image->arch.elf_load_addr,
> > +				image->arch.elf_headers_sz);
> > +		if (ret)
> > +			goto out_err;
> > +
> > +		/* add linux,usable-memory-range */
> > +		ret = fdt_setprop_range(buf, nodeoffset,
> > +				"linux,usable-memory-range",
> > +				crashk_res.start,
> > +				crashk_res.end - crashk_res.start + 1);
> 
> Don't you need to add "linux,usable-memory-range" to the buf_size estimate?

I think the code exists. See above.

> 
> > +		if (ret)
> > +			goto out_err;
> > +	}
> 
> > @@ -148,17 +258,109 @@ static int setup_dtb(struct kimage *image,
> 
> > +static struct crash_mem *get_crash_memory_ranges(void)
> > +{
> > +	unsigned int nr_ranges;
> > +	struct crash_mem *cmem;
> > +
> > +	nr_ranges = 1; /* for exclusion of crashkernel region */
> > +	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ranges_callback);
> > +
> > +	cmem = vmalloc(sizeof(struct crash_mem) +
> > +			sizeof(struct crash_mem_range) * nr_ranges);
> > +	if (!cmem)
> > +		return NULL;
> > +
> > +	cmem->max_nr_ranges = nr_ranges;
> > +	cmem->nr_ranges = 0;
> > +	walk_system_ram_res(0, -1, cmem, add_mem_range_callback);
> > +
> > +	/* Exclude crashkernel region */
> > +	if (crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end)) {
> > +		vfree(cmem);
> > +		return NULL;
> > +	}
> > +
> > +	return cmem;
> > +}
> 
> Could this function be included in prepare_elf_headers() so that the alloc() and
> free() occur together.


Or aiming that arm64 and x86 have similar-look code?

> 
> > +static int prepare_elf_headers(void **addr, unsigned long *sz)
> > +{
> > +	struct crash_mem *cmem;
> > +	int ret = 0;
> > +
> > +	cmem = get_crash_memory_ranges();
> > +	if (!cmem)
> > +		return -ENOMEM;
> > +
> > +	ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
> > +
> > +	vfree(cmem);
> 
> > +	return ret;
> > +}
> 
> All this is moving memory-range information from core-code's
> walk_system_ram_res() into core-code's struct crash_mem, and excluding
> crashk_res, which again is accessible to the core code.
> 
> It looks like this is duplicated in arch/x86 and arch/arm64 because arm64
> doesn't have a second 'crashk_low_res' region, and always wants elf64, instead
> of when IS_ENABLED(CONFIG_X86_64).
> If we can abstract just those two, more of this could be moved to core code
> where powerpc can make use of it if they want to support kdump with
> kexec_file_load().
> 
> But, its getting late for cross-architecture dependencies, lets put that on the
> for-later list. (assuming there isn't a powerpc-kdump series out there adding a
> third copy of this)

Sure. X86 code has so many exceptional lines in the code :)

Thanks,
-Takahiro AKASHI


> 
> Thanks,
> 
> James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support
  2018-05-15 17:12     ` James Morse
  (?)
@ 2018-05-18 15:35       ` Rob Herring
  -1 siblings, 0 replies; 156+ messages in thread
From: Rob Herring @ 2018-05-18 15:35 UTC (permalink / raw)
  To: James Morse
  Cc: AKASHI Takahiro, catalin.marinas, will.deacon, dhowells, vgoyal,
	herbert, davem, dyoung, bhe, arnd, ard.biesheuvel, bhsharma,
	kexec, linux-arm-kernel, linux-kernel, devicetree

On Tue, May 15, 2018 at 06:12:59PM +0100, James Morse wrote:
> Hi guys,
> 
> (CC: +RobH, devicetree list)

Thanks.

> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > Enabling crash dump (kdump) includes
> > * prepare contents of ELF header of a core dump file, /proc/vmcore,
> >   using crash_prepare_elf64_headers(), and
> > * add two device tree properties, "linux,usable-memory-range" and
> >   "linux,elfcorehdr", which represent repsectively a memory range
> >   to be used by crash dump kernel and the header's location

BTW, I intend to move existing parsing these out of the arch code. 
Please don't add more DT handling to arch/ unless it is *really* arch 
specific. I'd assume that the next arch to add kexec support will use 
these bindings instead of the powerpc way.

> kexec_file_load() on arm64 needs to be able to create a prop encoded array to
> the FDT, but there doesn't appear to be a libfdt helper to do this.
> 
> Akashi's code below adds fdt_setprop_range() to the arch code, and duplicates
> bits of libfdt_internal.h to do the work.
> 
> How should this be done? I'm assuming this is something we need a new API in
> libfdt.h for. How do these come about, and is there an interim step we can use
> until then?

Submit patches to upstream dtc and then we can pull it in. Ahead of that 
you can add it to drivers/of/fdt.c (or maybe fdt_address.c because 
that's really what this is dealing with).

libfdt has only recently gained the beginnings of address handling.

> 
> Thanks!
> 
> James
> 
> > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > index 37c0a9dc2e47..ec674f4d267c 100644
> > --- a/arch/arm64/kernel/machine_kexec_file.c
> > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > @@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> >  	return ret;
> >  }
> >  
> > +static int __init arch_kexec_file_init(void)
> > +{
> > +	/* Those values are used later on loading the kernel */
> > +	__dt_root_addr_cells = dt_root_addr_cells;
> > +	__dt_root_size_cells = dt_root_size_cells;

I intend to make dt_root_*_cells private, so don't add another user 
outside of drivers/of/.

> > +
> > +	return 0;
> > +}
> > +late_initcall(arch_kexec_file_init);
> > +
> > +#define FDT_ALIGN(x, a)	(((x) + (a) - 1) & ~((a) - 1))
> > +#define FDT_TAGALIGN(x)	(FDT_ALIGN((x), FDT_TAGSIZE))
> > +
> > +static int fdt_prop_len(const char *prop_name, int len)
> > +{
> > +	return (strlen(prop_name) + 1) +
> > +		sizeof(struct fdt_property) +
> > +		FDT_TAGALIGN(len);
> > +}
> > +
> > +static bool cells_size_fitted(unsigned long base, unsigned long size)

I can't imagine this would happen. However, when this is moved to 
drivers/of/ or dtc, these need to be u64 types to work on 32-bit.

> > +{
> > +	/* if *_cells >= 2, cells can hold 64-bit values anyway */
> > +	if ((__dt_root_addr_cells == 1) && (base >= (1ULL << 32)))
> > +		return false;
> > +
> > +	if ((__dt_root_size_cells == 1) && (size >= (1ULL << 32)))
> > +		return false;
> > +
> > +	return true;
> > +}
> > +
> > +static void fill_property(void *buf, u64 val64, int cells)
> > +{
> > +	u32 val32;

This should be a __be32 or fdt32 type. So should buf.

> > +
> > +	if (cells == 1) {
> > +		val32 = cpu_to_fdt32((u32)val64);
> > +		memcpy(buf, &val32, sizeof(val32));
> > +	} else {
> > +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
> > +		buf += cells * sizeof(u32) - sizeof(u64);
> > +
> > +		val64 = cpu_to_fdt64(val64);
> > +		memcpy(buf, &val64, sizeof(val64));

Look how of_read_number() is implemented. You should be able to do 
something similar here looping and avoiding the if/else.

> > +	}
> > +}
> > +
> > +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
> > +				unsigned long addr, unsigned long size)

A very generic sounding function, but really only works on addresses in 
children of the root node.

> > +{
> > +	void *buf, *prop;
> > +	size_t buf_size;
> > +	int result;
> > +
> > +	buf_size = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> > +	prop = buf = vmalloc(buf_size);

This can go on the stack instead (and would be required to to work in 
libfdt).

> > +	if (!buf)
> > +		return -ENOMEM;
> > +
> > +	fill_property(prop, addr, __dt_root_addr_cells);
> > +	prop += __dt_root_addr_cells * sizeof(u32);
> > +
> > +	fill_property(prop, size, __dt_root_size_cells);
> > +
> > +	result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
> > +
> > +	vfree(buf);
> > +
> > +	return result;
> > +}
> > +
> >  static int setup_dtb(struct kimage *image,
> >  		unsigned long initrd_load_addr, unsigned long initrd_len,
> >  		char *cmdline, unsigned long cmdline_len,
> > @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
> >  	int range_len;
> >  	int ret;
> >  
> > +	/* check ranges against root's #address-cells and #size-cells */
> > +	if (image->type == KEXEC_TYPE_CRASH &&
> > +		(!cells_size_fitted(image->arch.elf_load_addr,
> > +				image->arch.elf_headers_sz) ||
> > +		 !cells_size_fitted(crashk_res.start,
> > +				crashk_res.end - crashk_res.start + 1))) {
> > +		pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
> > +		ret = -EINVAL;
> > +		goto out_err;
> > +	}
> > +
> >  	/* duplicate dt blob */
> >  	buf_size = fdt_totalsize(initial_boot_params);
> >  	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> >  
> > +	if (image->type == KEXEC_TYPE_CRASH)
> > +		buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
> > +				+ fdt_prop_len("linux,usable-memory-range",
> > +								range_len);
> > +
> >  	if (initrd_load_addr)
> >  		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
> >  				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
> > @@ -113,6 +206,23 @@ static int setup_dtb(struct kimage *image,
> >  	if (nodeoffset < 0)
> >  		goto out_err;
> >  
> > +	if (image->type == KEXEC_TYPE_CRASH) {
> > +		/* add linux,elfcorehdr */
> > +		ret = fdt_setprop_range(buf, nodeoffset, "linux,elfcorehdr",
> > +				image->arch.elf_load_addr,
> > +				image->arch.elf_headers_sz);
> > +		if (ret)
> > +			goto out_err;
> > +
> > +		/* add linux,usable-memory-range */
> > +		ret = fdt_setprop_range(buf, nodeoffset,
> > +				"linux,usable-memory-range",
> > +				crashk_res.start,
> > +				crashk_res.end - crashk_res.start + 1);
> > +		if (ret)
> > +			goto out_err;
> > +	}
> > +
> >  	/* add bootargs */
> >  	if (cmdline) {
> >  		ret = fdt_setprop(buf, nodeoffset, "bootargs",
> 

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-05-18 15:35       ` Rob Herring
  0 siblings, 0 replies; 156+ messages in thread
From: Rob Herring @ 2018-05-18 15:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, May 15, 2018 at 06:12:59PM +0100, James Morse wrote:
> Hi guys,
> 
> (CC: +RobH, devicetree list)

Thanks.

> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > Enabling crash dump (kdump) includes
> > * prepare contents of ELF header of a core dump file, /proc/vmcore,
> >   using crash_prepare_elf64_headers(), and
> > * add two device tree properties, "linux,usable-memory-range" and
> >   "linux,elfcorehdr", which represent repsectively a memory range
> >   to be used by crash dump kernel and the header's location

BTW, I intend to move existing parsing these out of the arch code. 
Please don't add more DT handling to arch/ unless it is *really* arch 
specific. I'd assume that the next arch to add kexec support will use 
these bindings instead of the powerpc way.

> kexec_file_load() on arm64 needs to be able to create a prop encoded array to
> the FDT, but there doesn't appear to be a libfdt helper to do this.
> 
> Akashi's code below adds fdt_setprop_range() to the arch code, and duplicates
> bits of libfdt_internal.h to do the work.
> 
> How should this be done? I'm assuming this is something we need a new API in
> libfdt.h for. How do these come about, and is there an interim step we can use
> until then?

Submit patches to upstream dtc and then we can pull it in. Ahead of that 
you can add it to drivers/of/fdt.c (or maybe fdt_address.c because 
that's really what this is dealing with).

libfdt has only recently gained the beginnings of address handling.

> 
> Thanks!
> 
> James
> 
> > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > index 37c0a9dc2e47..ec674f4d267c 100644
> > --- a/arch/arm64/kernel/machine_kexec_file.c
> > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > @@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> >  	return ret;
> >  }
> >  
> > +static int __init arch_kexec_file_init(void)
> > +{
> > +	/* Those values are used later on loading the kernel */
> > +	__dt_root_addr_cells = dt_root_addr_cells;
> > +	__dt_root_size_cells = dt_root_size_cells;

I intend to make dt_root_*_cells private, so don't add another user 
outside of drivers/of/.

> > +
> > +	return 0;
> > +}
> > +late_initcall(arch_kexec_file_init);
> > +
> > +#define FDT_ALIGN(x, a)	(((x) + (a) - 1) & ~((a) - 1))
> > +#define FDT_TAGALIGN(x)	(FDT_ALIGN((x), FDT_TAGSIZE))
> > +
> > +static int fdt_prop_len(const char *prop_name, int len)
> > +{
> > +	return (strlen(prop_name) + 1) +
> > +		sizeof(struct fdt_property) +
> > +		FDT_TAGALIGN(len);
> > +}
> > +
> > +static bool cells_size_fitted(unsigned long base, unsigned long size)

I can't imagine this would happen. However, when this is moved to 
drivers/of/ or dtc, these need to be u64 types to work on 32-bit.

> > +{
> > +	/* if *_cells >= 2, cells can hold 64-bit values anyway */
> > +	if ((__dt_root_addr_cells == 1) && (base >= (1ULL << 32)))
> > +		return false;
> > +
> > +	if ((__dt_root_size_cells == 1) && (size >= (1ULL << 32)))
> > +		return false;
> > +
> > +	return true;
> > +}
> > +
> > +static void fill_property(void *buf, u64 val64, int cells)
> > +{
> > +	u32 val32;

This should be a __be32 or fdt32 type. So should buf.

> > +
> > +	if (cells == 1) {
> > +		val32 = cpu_to_fdt32((u32)val64);
> > +		memcpy(buf, &val32, sizeof(val32));
> > +	} else {
> > +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
> > +		buf += cells * sizeof(u32) - sizeof(u64);
> > +
> > +		val64 = cpu_to_fdt64(val64);
> > +		memcpy(buf, &val64, sizeof(val64));

Look how of_read_number() is implemented. You should be able to do 
something similar here looping and avoiding the if/else.

> > +	}
> > +}
> > +
> > +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
> > +				unsigned long addr, unsigned long size)

A very generic sounding function, but really only works on addresses in 
children of the root node.

> > +{
> > +	void *buf, *prop;
> > +	size_t buf_size;
> > +	int result;
> > +
> > +	buf_size = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> > +	prop = buf = vmalloc(buf_size);

This can go on the stack instead (and would be required to to work in 
libfdt).

> > +	if (!buf)
> > +		return -ENOMEM;
> > +
> > +	fill_property(prop, addr, __dt_root_addr_cells);
> > +	prop += __dt_root_addr_cells * sizeof(u32);
> > +
> > +	fill_property(prop, size, __dt_root_size_cells);
> > +
> > +	result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
> > +
> > +	vfree(buf);
> > +
> > +	return result;
> > +}
> > +
> >  static int setup_dtb(struct kimage *image,
> >  		unsigned long initrd_load_addr, unsigned long initrd_len,
> >  		char *cmdline, unsigned long cmdline_len,
> > @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
> >  	int range_len;
> >  	int ret;
> >  
> > +	/* check ranges against root's #address-cells and #size-cells */
> > +	if (image->type == KEXEC_TYPE_CRASH &&
> > +		(!cells_size_fitted(image->arch.elf_load_addr,
> > +				image->arch.elf_headers_sz) ||
> > +		 !cells_size_fitted(crashk_res.start,
> > +				crashk_res.end - crashk_res.start + 1))) {
> > +		pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
> > +		ret = -EINVAL;
> > +		goto out_err;
> > +	}
> > +
> >  	/* duplicate dt blob */
> >  	buf_size = fdt_totalsize(initial_boot_params);
> >  	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> >  
> > +	if (image->type == KEXEC_TYPE_CRASH)
> > +		buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
> > +				+ fdt_prop_len("linux,usable-memory-range",
> > +								range_len);
> > +
> >  	if (initrd_load_addr)
> >  		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
> >  				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
> > @@ -113,6 +206,23 @@ static int setup_dtb(struct kimage *image,
> >  	if (nodeoffset < 0)
> >  		goto out_err;
> >  
> > +	if (image->type == KEXEC_TYPE_CRASH) {
> > +		/* add linux,elfcorehdr */
> > +		ret = fdt_setprop_range(buf, nodeoffset, "linux,elfcorehdr",
> > +				image->arch.elf_load_addr,
> > +				image->arch.elf_headers_sz);
> > +		if (ret)
> > +			goto out_err;
> > +
> > +		/* add linux,usable-memory-range */
> > +		ret = fdt_setprop_range(buf, nodeoffset,
> > +				"linux,usable-memory-range",
> > +				crashk_res.start,
> > +				crashk_res.end - crashk_res.start + 1);
> > +		if (ret)
> > +			goto out_err;
> > +	}
> > +
> >  	/* add bootargs */
> >  	if (cmdline) {
> >  		ret = fdt_setprop(buf, nodeoffset, "bootargs",
> 

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-05-18 15:35       ` Rob Herring
  0 siblings, 0 replies; 156+ messages in thread
From: Rob Herring @ 2018-05-18 15:35 UTC (permalink / raw)
  To: James Morse
  Cc: devicetree, herbert, bhe, ard.biesheuvel, catalin.marinas,
	bhsharma, will.deacon, linux-kernel, dhowells, AKASHI Takahiro,
	arnd, linux-arm-kernel, kexec, dyoung, davem, vgoyal

On Tue, May 15, 2018 at 06:12:59PM +0100, James Morse wrote:
> Hi guys,
> 
> (CC: +RobH, devicetree list)

Thanks.

> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > Enabling crash dump (kdump) includes
> > * prepare contents of ELF header of a core dump file, /proc/vmcore,
> >   using crash_prepare_elf64_headers(), and
> > * add two device tree properties, "linux,usable-memory-range" and
> >   "linux,elfcorehdr", which represent repsectively a memory range
> >   to be used by crash dump kernel and the header's location

BTW, I intend to move existing parsing these out of the arch code. 
Please don't add more DT handling to arch/ unless it is *really* arch 
specific. I'd assume that the next arch to add kexec support will use 
these bindings instead of the powerpc way.

> kexec_file_load() on arm64 needs to be able to create a prop encoded array to
> the FDT, but there doesn't appear to be a libfdt helper to do this.
> 
> Akashi's code below adds fdt_setprop_range() to the arch code, and duplicates
> bits of libfdt_internal.h to do the work.
> 
> How should this be done? I'm assuming this is something we need a new API in
> libfdt.h for. How do these come about, and is there an interim step we can use
> until then?

Submit patches to upstream dtc and then we can pull it in. Ahead of that 
you can add it to drivers/of/fdt.c (or maybe fdt_address.c because 
that's really what this is dealing with).

libfdt has only recently gained the beginnings of address handling.

> 
> Thanks!
> 
> James
> 
> > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > index 37c0a9dc2e47..ec674f4d267c 100644
> > --- a/arch/arm64/kernel/machine_kexec_file.c
> > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > @@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> >  	return ret;
> >  }
> >  
> > +static int __init arch_kexec_file_init(void)
> > +{
> > +	/* Those values are used later on loading the kernel */
> > +	__dt_root_addr_cells = dt_root_addr_cells;
> > +	__dt_root_size_cells = dt_root_size_cells;

I intend to make dt_root_*_cells private, so don't add another user 
outside of drivers/of/.

> > +
> > +	return 0;
> > +}
> > +late_initcall(arch_kexec_file_init);
> > +
> > +#define FDT_ALIGN(x, a)	(((x) + (a) - 1) & ~((a) - 1))
> > +#define FDT_TAGALIGN(x)	(FDT_ALIGN((x), FDT_TAGSIZE))
> > +
> > +static int fdt_prop_len(const char *prop_name, int len)
> > +{
> > +	return (strlen(prop_name) + 1) +
> > +		sizeof(struct fdt_property) +
> > +		FDT_TAGALIGN(len);
> > +}
> > +
> > +static bool cells_size_fitted(unsigned long base, unsigned long size)

I can't imagine this would happen. However, when this is moved to 
drivers/of/ or dtc, these need to be u64 types to work on 32-bit.

> > +{
> > +	/* if *_cells >= 2, cells can hold 64-bit values anyway */
> > +	if ((__dt_root_addr_cells == 1) && (base >= (1ULL << 32)))
> > +		return false;
> > +
> > +	if ((__dt_root_size_cells == 1) && (size >= (1ULL << 32)))
> > +		return false;
> > +
> > +	return true;
> > +}
> > +
> > +static void fill_property(void *buf, u64 val64, int cells)
> > +{
> > +	u32 val32;

This should be a __be32 or fdt32 type. So should buf.

> > +
> > +	if (cells == 1) {
> > +		val32 = cpu_to_fdt32((u32)val64);
> > +		memcpy(buf, &val32, sizeof(val32));
> > +	} else {
> > +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
> > +		buf += cells * sizeof(u32) - sizeof(u64);
> > +
> > +		val64 = cpu_to_fdt64(val64);
> > +		memcpy(buf, &val64, sizeof(val64));

Look how of_read_number() is implemented. You should be able to do 
something similar here looping and avoiding the if/else.

> > +	}
> > +}
> > +
> > +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
> > +				unsigned long addr, unsigned long size)

A very generic sounding function, but really only works on addresses in 
children of the root node.

> > +{
> > +	void *buf, *prop;
> > +	size_t buf_size;
> > +	int result;
> > +
> > +	buf_size = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> > +	prop = buf = vmalloc(buf_size);

This can go on the stack instead (and would be required to to work in 
libfdt).

> > +	if (!buf)
> > +		return -ENOMEM;
> > +
> > +	fill_property(prop, addr, __dt_root_addr_cells);
> > +	prop += __dt_root_addr_cells * sizeof(u32);
> > +
> > +	fill_property(prop, size, __dt_root_size_cells);
> > +
> > +	result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
> > +
> > +	vfree(buf);
> > +
> > +	return result;
> > +}
> > +
> >  static int setup_dtb(struct kimage *image,
> >  		unsigned long initrd_load_addr, unsigned long initrd_len,
> >  		char *cmdline, unsigned long cmdline_len,
> > @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
> >  	int range_len;
> >  	int ret;
> >  
> > +	/* check ranges against root's #address-cells and #size-cells */
> > +	if (image->type == KEXEC_TYPE_CRASH &&
> > +		(!cells_size_fitted(image->arch.elf_load_addr,
> > +				image->arch.elf_headers_sz) ||
> > +		 !cells_size_fitted(crashk_res.start,
> > +				crashk_res.end - crashk_res.start + 1))) {
> > +		pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
> > +		ret = -EINVAL;
> > +		goto out_err;
> > +	}
> > +
> >  	/* duplicate dt blob */
> >  	buf_size = fdt_totalsize(initial_boot_params);
> >  	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> >  
> > +	if (image->type == KEXEC_TYPE_CRASH)
> > +		buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
> > +				+ fdt_prop_len("linux,usable-memory-range",
> > +								range_len);
> > +
> >  	if (initrd_load_addr)
> >  		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
> >  				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
> > @@ -113,6 +206,23 @@ static int setup_dtb(struct kimage *image,
> >  	if (nodeoffset < 0)
> >  		goto out_err;
> >  
> > +	if (image->type == KEXEC_TYPE_CRASH) {
> > +		/* add linux,elfcorehdr */
> > +		ret = fdt_setprop_range(buf, nodeoffset, "linux,elfcorehdr",
> > +				image->arch.elf_load_addr,
> > +				image->arch.elf_headers_sz);
> > +		if (ret)
> > +			goto out_err;
> > +
> > +		/* add linux,usable-memory-range */
> > +		ret = fdt_setprop_range(buf, nodeoffset,
> > +				"linux,usable-memory-range",
> > +				crashk_res.start,
> > +				crashk_res.end - crashk_res.start + 1);
> > +		if (ret)
> > +			goto out_err;
> > +	}
> > +
> >  	/* add bootargs */
> >  	if (cmdline) {
> >  		ret = fdt_setprop(buf, nodeoffset, "bootargs",
> 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 05/11] arm64: kexec_file: load initrd and device-tree
  2018-05-18  7:42         ` AKASHI Takahiro
  (?)
@ 2018-05-18 15:59           ` James Morse
  -1 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-18 15:59 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

Hi Akashi,

On 18/05/18 08:42, AKASHI Takahiro wrote:
> On Fri, May 18, 2018 at 04:11:35PM +0900, AKASHI Takahiro wrote:
>> On Tue, May 15, 2018 at 05:20:00PM +0100, James Morse wrote:
>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>>> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
>>>> index f9ebf54ca247..b3b9b1725d8a 100644
>>>> --- a/arch/arm64/kernel/machine_kexec_file.c
>>>> +++ b/arch/arm64/kernel/machine_kexec_file.c

>>>> @@ -55,3 +74,144 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,

>>>> +	buf = vmalloc(buf_size);
>>>> +	if (!buf) {
>>>> +		ret = -ENOMEM;
>>>> +		goto out_err;
>>>> +	}
>>>> +
>>>> +	ret = fdt_open_into(initial_boot_params, buf, buf_size);
>>>> +	if (ret)
>>>> +		goto out_err;
>>>> +
>>>> +	nodeoffset = fdt_path_offset(buf, "/chosen");
>>>> +	if (nodeoffset < 0)
>>>> +		goto out_err;
>>>> +
>>>> +	/* add bootargs */
>>>> +	if (cmdline) {
>>>> +		ret = fdt_setprop(buf, nodeoffset, "bootargs",
>>>> +						cmdline, cmdline_len + 1);
>>>
>>> fdt_setprop_string()?
>>
>> OK
> 
> cmdline_len is passed by system call, kexec_file_load(), and this means
> that we can't believe that cmdline is always terminated with '\0'.

Yuck, we expect user-space to tell us how long the string is. It may be worth a
comment that it isn't necessarily null-terminated, as that is surprising!

(I assume the DT's property length is enough to make that safe for the new
kernel to read).


>>>> +		/* within 1GB-aligned window of up to 32GB in size */
>>>> +		kbuf.buf_max = round_down(kern_seg->mem, SZ_1G)
>>>> +						+ (unsigned long)SZ_1G * 32;
>>>> +		kbuf.top_down = false;
>>>> +
>>>> +		ret = kexec_add_buffer(&kbuf);
>>>> +		if (ret)
>>>> +			goto out_err;
>>>> +		initrd_load_addr = kbuf.mem;
>>>> +
>>>> +		pr_debug("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
>>>> +				initrd_load_addr, initrd_len, initrd_len);
>>>> +	}
>>>> +
>>>> +	/* load dtb blob */
>>>> +	ret = setup_dtb(image, initrd_load_addr, initrd_len,
>>>> +				cmdline, cmdline_len, &dtb, &dtb_len);
>>>> +	if (ret) {
>>>> +		pr_err("Preparing for new dtb failed\n");
>>>> +		goto out_err;
>>>> +	}
>>>> +
>>>> +	kbuf.buffer = dtb;
>>>> +	kbuf.bufsz = dtb_len;
>>>> +	kbuf.memsz = dtb_len;
>>>> +	/* not across 2MB boundary */
>>>> +	kbuf.buf_align = SZ_2M;
>>>> +	kbuf.buf_max = ULONG_MAX;
>>>> +	kbuf.top_down = true;
>>>> +
>>>> +	ret = kexec_add_buffer(&kbuf);
>>>> +	if (ret)
>>>> +		goto out_err;
>>>> +	image->arch.dtb_mem = kbuf.mem;
>>>> +	image->arch.dtb_buf = dtb;
>>>> +
>>>> +	pr_debug("Loaded dtb at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
>>>> +			kbuf.mem, dtb_len, dtb_len);
>>>> +
>>>> +	return 0;
>>>> +
>>>> +out_err:
>>>> +	vfree(dtb);
>>>> +	image->arch.dtb_buf = NULL;
>>>
>>> Won't kimage_file_post_load_cleanup() always be called if we return an error
>>> here? Why not leave the free()ing until then?
>>
>> Right.
>> The reason why I left the code here was that we'd better locally clean up
>> all the stuff that were locally allocated if we trivially need to (and can)
>> do so.
>>
>> As it's redundant, I will remove it.
> 
> will remove only "image->arch.dtb_buf = NULL."

Ah, because you haven't set the arch.dtb_buf pointer yet.

What about in patch 7 where you expect kimage_file_prepare_segments() to call
arch_kimage_file_post_load_cleanup() to free the arch.elf_headers? I'd expect
the free()ing to always happen in one place.


Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 05/11] arm64: kexec_file: load initrd and device-tree
@ 2018-05-18 15:59           ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-18 15:59 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Akashi,

On 18/05/18 08:42, AKASHI Takahiro wrote:
> On Fri, May 18, 2018 at 04:11:35PM +0900, AKASHI Takahiro wrote:
>> On Tue, May 15, 2018 at 05:20:00PM +0100, James Morse wrote:
>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>>> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
>>>> index f9ebf54ca247..b3b9b1725d8a 100644
>>>> --- a/arch/arm64/kernel/machine_kexec_file.c
>>>> +++ b/arch/arm64/kernel/machine_kexec_file.c

>>>> @@ -55,3 +74,144 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,

>>>> +	buf = vmalloc(buf_size);
>>>> +	if (!buf) {
>>>> +		ret = -ENOMEM;
>>>> +		goto out_err;
>>>> +	}
>>>> +
>>>> +	ret = fdt_open_into(initial_boot_params, buf, buf_size);
>>>> +	if (ret)
>>>> +		goto out_err;
>>>> +
>>>> +	nodeoffset = fdt_path_offset(buf, "/chosen");
>>>> +	if (nodeoffset < 0)
>>>> +		goto out_err;
>>>> +
>>>> +	/* add bootargs */
>>>> +	if (cmdline) {
>>>> +		ret = fdt_setprop(buf, nodeoffset, "bootargs",
>>>> +						cmdline, cmdline_len + 1);
>>>
>>> fdt_setprop_string()?
>>
>> OK
> 
> cmdline_len is passed by system call, kexec_file_load(), and this means
> that we can't believe that cmdline is always terminated with '\0'.

Yuck, we expect user-space to tell us how long the string is. It may be worth a
comment that it isn't necessarily null-terminated, as that is surprising!

(I assume the DT's property length is enough to make that safe for the new
kernel to read).


>>>> +		/* within 1GB-aligned window of up to 32GB in size */
>>>> +		kbuf.buf_max = round_down(kern_seg->mem, SZ_1G)
>>>> +						+ (unsigned long)SZ_1G * 32;
>>>> +		kbuf.top_down = false;
>>>> +
>>>> +		ret = kexec_add_buffer(&kbuf);
>>>> +		if (ret)
>>>> +			goto out_err;
>>>> +		initrd_load_addr = kbuf.mem;
>>>> +
>>>> +		pr_debug("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
>>>> +				initrd_load_addr, initrd_len, initrd_len);
>>>> +	}
>>>> +
>>>> +	/* load dtb blob */
>>>> +	ret = setup_dtb(image, initrd_load_addr, initrd_len,
>>>> +				cmdline, cmdline_len, &dtb, &dtb_len);
>>>> +	if (ret) {
>>>> +		pr_err("Preparing for new dtb failed\n");
>>>> +		goto out_err;
>>>> +	}
>>>> +
>>>> +	kbuf.buffer = dtb;
>>>> +	kbuf.bufsz = dtb_len;
>>>> +	kbuf.memsz = dtb_len;
>>>> +	/* not across 2MB boundary */
>>>> +	kbuf.buf_align = SZ_2M;
>>>> +	kbuf.buf_max = ULONG_MAX;
>>>> +	kbuf.top_down = true;
>>>> +
>>>> +	ret = kexec_add_buffer(&kbuf);
>>>> +	if (ret)
>>>> +		goto out_err;
>>>> +	image->arch.dtb_mem = kbuf.mem;
>>>> +	image->arch.dtb_buf = dtb;
>>>> +
>>>> +	pr_debug("Loaded dtb at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
>>>> +			kbuf.mem, dtb_len, dtb_len);
>>>> +
>>>> +	return 0;
>>>> +
>>>> +out_err:
>>>> +	vfree(dtb);
>>>> +	image->arch.dtb_buf = NULL;
>>>
>>> Won't kimage_file_post_load_cleanup() always be called if we return an error
>>> here? Why not leave the free()ing until then?
>>
>> Right.
>> The reason why I left the code here was that we'd better locally clean up
>> all the stuff that were locally allocated if we trivially need to (and can)
>> do so.
>>
>> As it's redundant, I will remove it.
> 
> will remove only "image->arch.dtb_buf = NULL."

Ah, because you haven't set the arch.dtb_buf pointer yet.

What about in patch 7 where you expect kimage_file_prepare_segments() to call
arch_kimage_file_post_load_cleanup() to free the arch.elf_headers? I'd expect
the free()ing to always happen in one place.


Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 05/11] arm64: kexec_file: load initrd and device-tree
@ 2018-05-18 15:59           ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-18 15:59 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

Hi Akashi,

On 18/05/18 08:42, AKASHI Takahiro wrote:
> On Fri, May 18, 2018 at 04:11:35PM +0900, AKASHI Takahiro wrote:
>> On Tue, May 15, 2018 at 05:20:00PM +0100, James Morse wrote:
>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>>> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
>>>> index f9ebf54ca247..b3b9b1725d8a 100644
>>>> --- a/arch/arm64/kernel/machine_kexec_file.c
>>>> +++ b/arch/arm64/kernel/machine_kexec_file.c

>>>> @@ -55,3 +74,144 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,

>>>> +	buf = vmalloc(buf_size);
>>>> +	if (!buf) {
>>>> +		ret = -ENOMEM;
>>>> +		goto out_err;
>>>> +	}
>>>> +
>>>> +	ret = fdt_open_into(initial_boot_params, buf, buf_size);
>>>> +	if (ret)
>>>> +		goto out_err;
>>>> +
>>>> +	nodeoffset = fdt_path_offset(buf, "/chosen");
>>>> +	if (nodeoffset < 0)
>>>> +		goto out_err;
>>>> +
>>>> +	/* add bootargs */
>>>> +	if (cmdline) {
>>>> +		ret = fdt_setprop(buf, nodeoffset, "bootargs",
>>>> +						cmdline, cmdline_len + 1);
>>>
>>> fdt_setprop_string()?
>>
>> OK
> 
> cmdline_len is passed by system call, kexec_file_load(), and this means
> that we can't believe that cmdline is always terminated with '\0'.

Yuck, we expect user-space to tell us how long the string is. It may be worth a
comment that it isn't necessarily null-terminated, as that is surprising!

(I assume the DT's property length is enough to make that safe for the new
kernel to read).


>>>> +		/* within 1GB-aligned window of up to 32GB in size */
>>>> +		kbuf.buf_max = round_down(kern_seg->mem, SZ_1G)
>>>> +						+ (unsigned long)SZ_1G * 32;
>>>> +		kbuf.top_down = false;
>>>> +
>>>> +		ret = kexec_add_buffer(&kbuf);
>>>> +		if (ret)
>>>> +			goto out_err;
>>>> +		initrd_load_addr = kbuf.mem;
>>>> +
>>>> +		pr_debug("Loaded initrd at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
>>>> +				initrd_load_addr, initrd_len, initrd_len);
>>>> +	}
>>>> +
>>>> +	/* load dtb blob */
>>>> +	ret = setup_dtb(image, initrd_load_addr, initrd_len,
>>>> +				cmdline, cmdline_len, &dtb, &dtb_len);
>>>> +	if (ret) {
>>>> +		pr_err("Preparing for new dtb failed\n");
>>>> +		goto out_err;
>>>> +	}
>>>> +
>>>> +	kbuf.buffer = dtb;
>>>> +	kbuf.bufsz = dtb_len;
>>>> +	kbuf.memsz = dtb_len;
>>>> +	/* not across 2MB boundary */
>>>> +	kbuf.buf_align = SZ_2M;
>>>> +	kbuf.buf_max = ULONG_MAX;
>>>> +	kbuf.top_down = true;
>>>> +
>>>> +	ret = kexec_add_buffer(&kbuf);
>>>> +	if (ret)
>>>> +		goto out_err;
>>>> +	image->arch.dtb_mem = kbuf.mem;
>>>> +	image->arch.dtb_buf = dtb;
>>>> +
>>>> +	pr_debug("Loaded dtb at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
>>>> +			kbuf.mem, dtb_len, dtb_len);
>>>> +
>>>> +	return 0;
>>>> +
>>>> +out_err:
>>>> +	vfree(dtb);
>>>> +	image->arch.dtb_buf = NULL;
>>>
>>> Won't kimage_file_post_load_cleanup() always be called if we return an error
>>> here? Why not leave the free()ing until then?
>>
>> Right.
>> The reason why I left the code here was that we'd better locally clean up
>> all the stuff that were locally allocated if we trivially need to (and can)
>> do so.
>>
>> As it's redundant, I will remove it.
> 
> will remove only "image->arch.dtb_buf = NULL."

Ah, because you haven't set the arch.dtb_buf pointer yet.

What about in patch 7 where you expect kimage_file_prepare_segments() to call
arch_kimage_file_post_load_cleanup() to free the arch.elf_headers? I'd expect
the free()ing to always happen in one place.


Thanks,

James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support
  2018-05-18 10:39       ` AKASHI Takahiro
  (?)
@ 2018-05-18 16:00         ` James Morse
  -1 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-18 16:00 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

Hi Akashi,

On 18/05/18 11:39, AKASHI Takahiro wrote:
> On Tue, May 15, 2018 at 06:11:15PM +0100, James Morse wrote:
>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>> Enabling crash dump (kdump) includes
>>> * prepare contents of ELF header of a core dump file, /proc/vmcore,
>>>   using crash_prepare_elf64_headers(), and
>>> * add two device tree properties, "linux,usable-memory-range" and
>>>   "linux,elfcorehdr", which represent repsectively a memory range

>>> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
>>> index 37c0a9dc2e47..ec674f4d267c 100644
>>> --- a/arch/arm64/kernel/machine_kexec_file.c
>>> +++ b/arch/arm64/kernel/machine_kexec_file.c

>>> +static void fill_property(void *buf, u64 val64, int cells)
>>> +{
>>> +	u32 val32;
>>> +
>>> +	if (cells == 1) {
>>> +		val32 = cpu_to_fdt32((u32)val64);
>>> +		memcpy(buf, &val32, sizeof(val32));
>>> +	} else {
>>
>>> +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
>>> +		buf += cells * sizeof(u32) - sizeof(u64);
>>
>> Is this trying to clear the 'top' cells and shuffle the pointer to point at the
>> 'bottom' 2? I'm pretty sure this isn't endian safe.
>>
>> Do we really expect a system to have #address-cells > 2?
> 
> I don't know, but just for safety.

Okay, so this is aiming to be a cover-all-cases library function.


>>> +		val64 = cpu_to_fdt64(val64);
>>> +		memcpy(buf, &val64, sizeof(val64));
>>> +	}
>>> +}
>>> +
>>> +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
>>> +				unsigned long addr, unsigned long size)
>>
>> (the device-tree spec describes a 'ranges' property, which had me confused. This
>> is encoding a prop-encoded-array)
> 
> Should we rename it to, say, fdt_setprop_reg()?

Sure, but I'd really like this code to come from libfdt. I'm hoping for some
temporary workaround, lets see what the DT folk say.


>>> +	if (!buf)
>>> +		return -ENOMEM;
>>> +
>>> +	fill_property(prop, addr, __dt_root_addr_cells);
>>> +	prop += __dt_root_addr_cells * sizeof(u32);
>>> +
>>> +	fill_property(prop, size, __dt_root_size_cells);
>>> +
>>> +	result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
>>> +
>>> +	vfree(buf);
>>> +
>>> +	return result;
>>> +}
>>
>> Doesn't this stuff belong in libfdt? I guess there is no 'add array element' api
>> because this the first time we've wanted to create a node with more than
>> key=fixed-size-value.
>>
>> I don't think this belongs in arch C code. Do we have a plan for getting libfdt
>> to support encoding prop-arrays? Can we put it somewhere anyone else duplicating
>> this will find it, until we can (re)move it?
> 
> I will temporarily move all fdt-related stuff to a separate file, but
> 
>> I have no idea how that happens... it looks like the devicetree list is the
>> place to ask.
> 
> should we always sync with the original dtc/libfdt repository?

I thought so, libfdt is one of those external libraries that the kernel
consumes, like acpica. For acpica at least the rule is changes go upstream, then
get sync'd back.


>>>  static int setup_dtb(struct kimage *image,
>>>  		unsigned long initrd_load_addr, unsigned long initrd_len,
>>>  		char *cmdline, unsigned long cmdline_len,
>>> @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
>>>  	int range_len;
>>>  	int ret;
>>>  
>>> +	/* check ranges against root's #address-cells and #size-cells */
>>> +	if (image->type == KEXEC_TYPE_CRASH &&
>>> +		(!cells_size_fitted(image->arch.elf_load_addr,
>>> +				image->arch.elf_headers_sz) ||
>>> +		 !cells_size_fitted(crashk_res.start,
>>> +				crashk_res.end - crashk_res.start + 1))) {
>>> +		pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
>>> +		ret = -EINVAL;
>>> +		goto out_err;
>>> +	}
>>
>> To check I've understood this properly: This can happen if the firmware provided
>> a DTB with 32bit address/size cells, but at least some of the memory requires 64
>> bit address/size cells. This could only happen on a UEFI system where the
>> firmware-DTB doesn't describe memory. ACPI-only systems would have the EFIstub DT.
> 
> Probably, yes. I assumed the case where #address-cells and #size-cells
> were just missing in fdt.

Ah, that's another one. I just wanted to check we could boot on a system where
this can happen.


>>>  	/* duplicate dt blob */
>>>  	buf_size = fdt_totalsize(initial_boot_params);
>>>  	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
>>>  
>>> +	if (image->type == KEXEC_TYPE_CRASH)
>>> +		buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
>>> +				+ fdt_prop_len("linux,usable-memory-range",
>>> +								range_len);

>                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[...]

>> Don't you need to add "linux,usable-memory-range" to the buf_size estimate?
> 
> I think the code exists. See above.

Sorry, turns out I can't read!


>>> +		if (ret)
>>> +			goto out_err;
>>> +	}
>>
>>> @@ -148,17 +258,109 @@ static int setup_dtb(struct kimage *image,
>>
>>> +static struct crash_mem *get_crash_memory_ranges(void)
>>> +{
>>> +	unsigned int nr_ranges;
>>> +	struct crash_mem *cmem;
>>> +
>>> +	nr_ranges = 1; /* for exclusion of crashkernel region */
>>> +	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ranges_callback);
>>> +
>>> +	cmem = vmalloc(sizeof(struct crash_mem) +
>>> +			sizeof(struct crash_mem_range) * nr_ranges);
>>> +	if (!cmem)
>>> +		return NULL;
>>> +
>>> +	cmem->max_nr_ranges = nr_ranges;
>>> +	cmem->nr_ranges = 0;
>>> +	walk_system_ram_res(0, -1, cmem, add_mem_range_callback);
>>> +
>>> +	/* Exclude crashkernel region */
>>> +	if (crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end)) {
>>> +		vfree(cmem);
>>> +		return NULL;
>>> +	}
>>> +
>>> +	return cmem;
>>> +}
>>
>> Could this function be included in prepare_elf_headers() so that the alloc() and
>> free() occur together.
> 
> Or aiming that arm64 and x86 have similar-look code?

What's the advantage in things looking the same? If they are the same, it
probably shouldn't be in per-arch code. Otherwise it should be as simple as
possible, otherwise we can't spot the bugs/leaks.

But I think walking memblock here will remove all 'looks the same' properties here.


>>> +static int prepare_elf_headers(void **addr, unsigned long *sz)
>>> +{
>>> +	struct crash_mem *cmem;
>>> +	int ret = 0;
>>> +
>>> +	cmem = get_crash_memory_ranges();
>>> +	if (!cmem)
>>> +		return -ENOMEM;
>>> +
>>> +	ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
>>> +
>>> +	vfree(cmem);
>>
>>> +	return ret;
>>> +}
>>
>> All this is moving memory-range information from core-code's
>> walk_system_ram_res() into core-code's struct crash_mem, and excluding
>> crashk_res, which again is accessible to the core code.
>>
>> It looks like this is duplicated in arch/x86 and arch/arm64 because arm64
>> doesn't have a second 'crashk_low_res' region, and always wants elf64, instead
>> of when IS_ENABLED(CONFIG_X86_64).
>> If we can abstract just those two, more of this could be moved to core code
>> where powerpc can make use of it if they want to support kdump with
>> kexec_file_load().
>>
>> But, its getting late for cross-architecture dependencies, lets put that on the
>> for-later list. (assuming there isn't a powerpc-kdump series out there adding a
>> third copy of this)
> 
> Sure. X86 code has so many exceptional lines in the code :)

They also pass the e820 'usable-memory' map on the cmdline...


Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-05-18 16:00         ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-18 16:00 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Akashi,

On 18/05/18 11:39, AKASHI Takahiro wrote:
> On Tue, May 15, 2018 at 06:11:15PM +0100, James Morse wrote:
>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>> Enabling crash dump (kdump) includes
>>> * prepare contents of ELF header of a core dump file, /proc/vmcore,
>>>   using crash_prepare_elf64_headers(), and
>>> * add two device tree properties, "linux,usable-memory-range" and
>>>   "linux,elfcorehdr", which represent repsectively a memory range

>>> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
>>> index 37c0a9dc2e47..ec674f4d267c 100644
>>> --- a/arch/arm64/kernel/machine_kexec_file.c
>>> +++ b/arch/arm64/kernel/machine_kexec_file.c

>>> +static void fill_property(void *buf, u64 val64, int cells)
>>> +{
>>> +	u32 val32;
>>> +
>>> +	if (cells == 1) {
>>> +		val32 = cpu_to_fdt32((u32)val64);
>>> +		memcpy(buf, &val32, sizeof(val32));
>>> +	} else {
>>
>>> +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
>>> +		buf += cells * sizeof(u32) - sizeof(u64);
>>
>> Is this trying to clear the 'top' cells and shuffle the pointer to point at the
>> 'bottom' 2? I'm pretty sure this isn't endian safe.
>>
>> Do we really expect a system to have #address-cells > 2?
> 
> I don't know, but just for safety.

Okay, so this is aiming to be a cover-all-cases library function.


>>> +		val64 = cpu_to_fdt64(val64);
>>> +		memcpy(buf, &val64, sizeof(val64));
>>> +	}
>>> +}
>>> +
>>> +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
>>> +				unsigned long addr, unsigned long size)
>>
>> (the device-tree spec describes a 'ranges' property, which had me confused. This
>> is encoding a prop-encoded-array)
> 
> Should we rename it to, say, fdt_setprop_reg()?

Sure, but I'd really like this code to come from libfdt. I'm hoping for some
temporary workaround, lets see what the DT folk say.


>>> +	if (!buf)
>>> +		return -ENOMEM;
>>> +
>>> +	fill_property(prop, addr, __dt_root_addr_cells);
>>> +	prop += __dt_root_addr_cells * sizeof(u32);
>>> +
>>> +	fill_property(prop, size, __dt_root_size_cells);
>>> +
>>> +	result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
>>> +
>>> +	vfree(buf);
>>> +
>>> +	return result;
>>> +}
>>
>> Doesn't this stuff belong in libfdt? I guess there is no 'add array element' api
>> because this the first time we've wanted to create a node with more than
>> key=fixed-size-value.
>>
>> I don't think this belongs in arch C code. Do we have a plan for getting libfdt
>> to support encoding prop-arrays? Can we put it somewhere anyone else duplicating
>> this will find it, until we can (re)move it?
> 
> I will temporarily move all fdt-related stuff to a separate file, but
> 
>> I have no idea how that happens... it looks like the devicetree list is the
>> place to ask.
> 
> should we always sync with the original dtc/libfdt repository?

I thought so, libfdt is one of those external libraries that the kernel
consumes, like acpica. For acpica at least the rule is changes go upstream, then
get sync'd back.


>>>  static int setup_dtb(struct kimage *image,
>>>  		unsigned long initrd_load_addr, unsigned long initrd_len,
>>>  		char *cmdline, unsigned long cmdline_len,
>>> @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
>>>  	int range_len;
>>>  	int ret;
>>>  
>>> +	/* check ranges against root's #address-cells and #size-cells */
>>> +	if (image->type == KEXEC_TYPE_CRASH &&
>>> +		(!cells_size_fitted(image->arch.elf_load_addr,
>>> +				image->arch.elf_headers_sz) ||
>>> +		 !cells_size_fitted(crashk_res.start,
>>> +				crashk_res.end - crashk_res.start + 1))) {
>>> +		pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
>>> +		ret = -EINVAL;
>>> +		goto out_err;
>>> +	}
>>
>> To check I've understood this properly: This can happen if the firmware provided
>> a DTB with 32bit address/size cells, but at least some of the memory requires 64
>> bit address/size cells. This could only happen on a UEFI system where the
>> firmware-DTB doesn't describe memory. ACPI-only systems would have the EFIstub DT.
> 
> Probably, yes. I assumed the case where #address-cells and #size-cells
> were just missing in fdt.

Ah, that's another one. I just wanted to check we could boot on a system where
this can happen.


>>>  	/* duplicate dt blob */
>>>  	buf_size = fdt_totalsize(initial_boot_params);
>>>  	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
>>>  
>>> +	if (image->type == KEXEC_TYPE_CRASH)
>>> +		buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
>>> +				+ fdt_prop_len("linux,usable-memory-range",
>>> +								range_len);

>                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[...]

>> Don't you need to add "linux,usable-memory-range" to the buf_size estimate?
> 
> I think the code exists. See above.

Sorry, turns out I can't read!


>>> +		if (ret)
>>> +			goto out_err;
>>> +	}
>>
>>> @@ -148,17 +258,109 @@ static int setup_dtb(struct kimage *image,
>>
>>> +static struct crash_mem *get_crash_memory_ranges(void)
>>> +{
>>> +	unsigned int nr_ranges;
>>> +	struct crash_mem *cmem;
>>> +
>>> +	nr_ranges = 1; /* for exclusion of crashkernel region */
>>> +	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ranges_callback);
>>> +
>>> +	cmem = vmalloc(sizeof(struct crash_mem) +
>>> +			sizeof(struct crash_mem_range) * nr_ranges);
>>> +	if (!cmem)
>>> +		return NULL;
>>> +
>>> +	cmem->max_nr_ranges = nr_ranges;
>>> +	cmem->nr_ranges = 0;
>>> +	walk_system_ram_res(0, -1, cmem, add_mem_range_callback);
>>> +
>>> +	/* Exclude crashkernel region */
>>> +	if (crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end)) {
>>> +		vfree(cmem);
>>> +		return NULL;
>>> +	}
>>> +
>>> +	return cmem;
>>> +}
>>
>> Could this function be included in prepare_elf_headers() so that the alloc() and
>> free() occur together.
> 
> Or aiming that arm64 and x86 have similar-look code?

What's the advantage in things looking the same? If they are the same, it
probably shouldn't be in per-arch code. Otherwise it should be as simple as
possible, otherwise we can't spot the bugs/leaks.

But I think walking memblock here will remove all 'looks the same' properties here.


>>> +static int prepare_elf_headers(void **addr, unsigned long *sz)
>>> +{
>>> +	struct crash_mem *cmem;
>>> +	int ret = 0;
>>> +
>>> +	cmem = get_crash_memory_ranges();
>>> +	if (!cmem)
>>> +		return -ENOMEM;
>>> +
>>> +	ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
>>> +
>>> +	vfree(cmem);
>>
>>> +	return ret;
>>> +}
>>
>> All this is moving memory-range information from core-code's
>> walk_system_ram_res() into core-code's struct crash_mem, and excluding
>> crashk_res, which again is accessible to the core code.
>>
>> It looks like this is duplicated in arch/x86 and arch/arm64 because arm64
>> doesn't have a second 'crashk_low_res' region, and always wants elf64, instead
>> of when IS_ENABLED(CONFIG_X86_64).
>> If we can abstract just those two, more of this could be moved to core code
>> where powerpc can make use of it if they want to support kdump with
>> kexec_file_load().
>>
>> But, its getting late for cross-architecture dependencies, lets put that on the
>> for-later list. (assuming there isn't a powerpc-kdump series out there adding a
>> third copy of this)
> 
> Sure. X86 code has so many exceptional lines in the code :)

They also pass the e820 'usable-memory' map on the cmdline...


Thanks,

James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-05-18 16:00         ` James Morse
  0 siblings, 0 replies; 156+ messages in thread
From: James Morse @ 2018-05-18 16:00 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

Hi Akashi,

On 18/05/18 11:39, AKASHI Takahiro wrote:
> On Tue, May 15, 2018 at 06:11:15PM +0100, James Morse wrote:
>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>> Enabling crash dump (kdump) includes
>>> * prepare contents of ELF header of a core dump file, /proc/vmcore,
>>>   using crash_prepare_elf64_headers(), and
>>> * add two device tree properties, "linux,usable-memory-range" and
>>>   "linux,elfcorehdr", which represent repsectively a memory range

>>> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
>>> index 37c0a9dc2e47..ec674f4d267c 100644
>>> --- a/arch/arm64/kernel/machine_kexec_file.c
>>> +++ b/arch/arm64/kernel/machine_kexec_file.c

>>> +static void fill_property(void *buf, u64 val64, int cells)
>>> +{
>>> +	u32 val32;
>>> +
>>> +	if (cells == 1) {
>>> +		val32 = cpu_to_fdt32((u32)val64);
>>> +		memcpy(buf, &val32, sizeof(val32));
>>> +	} else {
>>
>>> +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
>>> +		buf += cells * sizeof(u32) - sizeof(u64);
>>
>> Is this trying to clear the 'top' cells and shuffle the pointer to point at the
>> 'bottom' 2? I'm pretty sure this isn't endian safe.
>>
>> Do we really expect a system to have #address-cells > 2?
> 
> I don't know, but just for safety.

Okay, so this is aiming to be a cover-all-cases library function.


>>> +		val64 = cpu_to_fdt64(val64);
>>> +		memcpy(buf, &val64, sizeof(val64));
>>> +	}
>>> +}
>>> +
>>> +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
>>> +				unsigned long addr, unsigned long size)
>>
>> (the device-tree spec describes a 'ranges' property, which had me confused. This
>> is encoding a prop-encoded-array)
> 
> Should we rename it to, say, fdt_setprop_reg()?

Sure, but I'd really like this code to come from libfdt. I'm hoping for some
temporary workaround, lets see what the DT folk say.


>>> +	if (!buf)
>>> +		return -ENOMEM;
>>> +
>>> +	fill_property(prop, addr, __dt_root_addr_cells);
>>> +	prop += __dt_root_addr_cells * sizeof(u32);
>>> +
>>> +	fill_property(prop, size, __dt_root_size_cells);
>>> +
>>> +	result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
>>> +
>>> +	vfree(buf);
>>> +
>>> +	return result;
>>> +}
>>
>> Doesn't this stuff belong in libfdt? I guess there is no 'add array element' api
>> because this the first time we've wanted to create a node with more than
>> key=fixed-size-value.
>>
>> I don't think this belongs in arch C code. Do we have a plan for getting libfdt
>> to support encoding prop-arrays? Can we put it somewhere anyone else duplicating
>> this will find it, until we can (re)move it?
> 
> I will temporarily move all fdt-related stuff to a separate file, but
> 
>> I have no idea how that happens... it looks like the devicetree list is the
>> place to ask.
> 
> should we always sync with the original dtc/libfdt repository?

I thought so, libfdt is one of those external libraries that the kernel
consumes, like acpica. For acpica at least the rule is changes go upstream, then
get sync'd back.


>>>  static int setup_dtb(struct kimage *image,
>>>  		unsigned long initrd_load_addr, unsigned long initrd_len,
>>>  		char *cmdline, unsigned long cmdline_len,
>>> @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
>>>  	int range_len;
>>>  	int ret;
>>>  
>>> +	/* check ranges against root's #address-cells and #size-cells */
>>> +	if (image->type == KEXEC_TYPE_CRASH &&
>>> +		(!cells_size_fitted(image->arch.elf_load_addr,
>>> +				image->arch.elf_headers_sz) ||
>>> +		 !cells_size_fitted(crashk_res.start,
>>> +				crashk_res.end - crashk_res.start + 1))) {
>>> +		pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
>>> +		ret = -EINVAL;
>>> +		goto out_err;
>>> +	}
>>
>> To check I've understood this properly: This can happen if the firmware provided
>> a DTB with 32bit address/size cells, but at least some of the memory requires 64
>> bit address/size cells. This could only happen on a UEFI system where the
>> firmware-DTB doesn't describe memory. ACPI-only systems would have the EFIstub DT.
> 
> Probably, yes. I assumed the case where #address-cells and #size-cells
> were just missing in fdt.

Ah, that's another one. I just wanted to check we could boot on a system where
this can happen.


>>>  	/* duplicate dt blob */
>>>  	buf_size = fdt_totalsize(initial_boot_params);
>>>  	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
>>>  
>>> +	if (image->type == KEXEC_TYPE_CRASH)
>>> +		buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
>>> +				+ fdt_prop_len("linux,usable-memory-range",
>>> +								range_len);

>                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[...]

>> Don't you need to add "linux,usable-memory-range" to the buf_size estimate?
> 
> I think the code exists. See above.

Sorry, turns out I can't read!


>>> +		if (ret)
>>> +			goto out_err;
>>> +	}
>>
>>> @@ -148,17 +258,109 @@ static int setup_dtb(struct kimage *image,
>>
>>> +static struct crash_mem *get_crash_memory_ranges(void)
>>> +{
>>> +	unsigned int nr_ranges;
>>> +	struct crash_mem *cmem;
>>> +
>>> +	nr_ranges = 1; /* for exclusion of crashkernel region */
>>> +	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ranges_callback);
>>> +
>>> +	cmem = vmalloc(sizeof(struct crash_mem) +
>>> +			sizeof(struct crash_mem_range) * nr_ranges);
>>> +	if (!cmem)
>>> +		return NULL;
>>> +
>>> +	cmem->max_nr_ranges = nr_ranges;
>>> +	cmem->nr_ranges = 0;
>>> +	walk_system_ram_res(0, -1, cmem, add_mem_range_callback);
>>> +
>>> +	/* Exclude crashkernel region */
>>> +	if (crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end)) {
>>> +		vfree(cmem);
>>> +		return NULL;
>>> +	}
>>> +
>>> +	return cmem;
>>> +}
>>
>> Could this function be included in prepare_elf_headers() so that the alloc() and
>> free() occur together.
> 
> Or aiming that arm64 and x86 have similar-look code?

What's the advantage in things looking the same? If they are the same, it
probably shouldn't be in per-arch code. Otherwise it should be as simple as
possible, otherwise we can't spot the bugs/leaks.

But I think walking memblock here will remove all 'looks the same' properties here.


>>> +static int prepare_elf_headers(void **addr, unsigned long *sz)
>>> +{
>>> +	struct crash_mem *cmem;
>>> +	int ret = 0;
>>> +
>>> +	cmem = get_crash_memory_ranges();
>>> +	if (!cmem)
>>> +		return -ENOMEM;
>>> +
>>> +	ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
>>> +
>>> +	vfree(cmem);
>>
>>> +	return ret;
>>> +}
>>
>> All this is moving memory-range information from core-code's
>> walk_system_ram_res() into core-code's struct crash_mem, and excluding
>> crashk_res, which again is accessible to the core code.
>>
>> It looks like this is duplicated in arch/x86 and arch/arm64 because arm64
>> doesn't have a second 'crashk_low_res' region, and always wants elf64, instead
>> of when IS_ENABLED(CONFIG_X86_64).
>> If we can abstract just those two, more of this could be moved to core code
>> where powerpc can make use of it if they want to support kdump with
>> kexec_file_load().
>>
>> But, its getting late for cross-architecture dependencies, lets put that on the
>> for-later list. (assuming there isn't a powerpc-kdump series out there adding a
>> third copy of this)
> 
> Sure. X86 code has so many exceptional lines in the code :)

They also pass the e820 'usable-memory' map on the cmdline...


Thanks,

James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 06/11] arm64: kexec_file: allow for loading Image-format kernel
  2018-05-15 17:14             ` James Morse
  (?)
@ 2018-05-21  9:32               ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-21  9:32 UTC (permalink / raw)
  To: James Morse
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

James,

I haven't commented on this email.

On Tue, May 15, 2018 at 06:14:37PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 15/05/18 06:13, AKASHI Takahiro wrote:
> > On Fri, May 11, 2018 at 06:07:06PM +0100, James Morse wrote:
> >> On 07/05/18 08:21, AKASHI Takahiro wrote:
> >>> On Tue, May 01, 2018 at 06:46:11PM +0100, James Morse wrote:
> >>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
> >>>>> This patch provides kexec_file_ops for "Image"-format kernel. In this
> >>>>> implementation, a binary is always loaded with a fixed offset identified
> >>>>> in text_offset field of its header.
> >>
> >>>>> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
> >>>>> index e4de1223715f..3cba4161818a 100644
> >>>>> --- a/arch/arm64/include/asm/kexec.h
> >>>>> +++ b/arch/arm64/include/asm/kexec.h
> 
> >>>> Could we check branch_code is non-zero, and text-offset points within image-size?
> >>>
> >>> We could do it, but I don't think this check is very useful.
> >>>
> >>>>
> >>>> We could check that this platform supports the page-size/endian config that this
> >>>> Image was built with... We get a message from the EFI stub if the page-size
> >>>> can't be supported, it would be nice to do the same here (as we can).
> >>>
> >>> There is no restriction on page-size or endianness for kexec.
> >>
> >> No, but it won't boot if the hardware doesn't support it. The kernel will spin
> >> at a magic address that is, difficult, to debug without JTAG. The bug report
> >> will be "it didn't boot".
> > 
> > OK.
> > Added sanity checks for cpu features, endianness as well as page size.
> > 
> >>
> >>> What will be the purpose of this check?
> >>
> >> These values are in the header so that the bootloader can check them, then print
> >> a meaningful error. Here, kexec_file_load() is playing the part of the bootloader.
> 
> >> I'm assuming kexec_file_load() can only be used to kexec linux... unlike regular
> >> kexec. Is this where I'm going wrong?
> 
> Trying to work this out for myself: we can't support any UEFI application as we
> can't give it the boot-services environment, so I'm pretty sure
> kexec_file_load() must be linux-specific.
> 
> Can we state somewhere that we only expect arm64 linux to be booted with
> kexec_file_load()? Its not clear from the kconfig text, which refers to kexec,
> which explicitly states it can boot other OS. But for kexec_file_load() we're
> following the kernel's booting.txt.

While I don't know anything about requirements in booting other OS's nor
if we can boot them even with kexec, I agree that kexec_file_load is a more
limited form of booting mechanism. I will add some statement in Kconfig.

> >>>>> diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
> >>>>> new file mode 100644
> >>>>> index 000000000000..4dd524ad6611
> >>>>> --- /dev/null
> >>>>> +++ b/arch/arm64/kernel/kexec_image.c
> >>>>> @@ -0,0 +1,79 @@
> >>>>
> >>>>> +static void *image_load(struct kimage *image,
> >>>>> +				char *kernel, unsigned long kernel_len,
> >>>>> +				char *initrd, unsigned long initrd_len,
> >>>>> +				char *cmdline, unsigned long cmdline_len)
> >>>>> +{
> >>>>> +	struct kexec_buf kbuf;
> >>>>> +	struct arm64_image_header *h = (struct arm64_image_header *)kernel;
> >>>>> +	unsigned long text_offset;
> >>>>> +	int ret;
> >>>>> +
> >>>>> +	/* Load the kernel */
> >>>>> +	kbuf.image = image;
> >>>>> +	kbuf.buf_min = 0;
> >>>>> +	kbuf.buf_max = ULONG_MAX;
> >>>>> +	kbuf.top_down = false;
> >>>>> +
> >>>>> +	kbuf.buffer = kernel;
> >>>>> +	kbuf.bufsz = kernel_len;
> >>>>> +	kbuf.memsz = le64_to_cpu(h->image_size);
> >>>>> +	text_offset = le64_to_cpu(h->text_offset);
> >>>>> +	kbuf.buf_align = SZ_2M;
> >>>>
> >>>>> +	/* Adjust kernel segment with TEXT_OFFSET */
> >>>>> +	kbuf.memsz += text_offset;
> >>>>> +
> >>>>> +	ret = kexec_add_buffer(&kbuf);
> >>>>> +	if (ret)
> >>>>> +		goto out;
> >>>>> +
> >>>>> +	image->arch.kern_segment = image->nr_segments - 1;
> >>>>
> >>>> You only seem to use kern_segment here, and in load_other_segments() called
> >>>> below. Could it not be a local variable passed in? Instead of arch-specific data
> >>>> we keep forever?
> >>>
> >>> No, kern_segment is also used in load_other_segments() in machine_kexec_file.c.
> >>> To optimize memory hole allocation logic in locate_mem_hole_callback(),
> >>> we need to know the exact range of kernel image (start and end).
> >>
> >> That's the second user. My badly-made point is one calls the other, but passes
> >> the data via some until-kexec lifetime struct. (its not important, just an
> >> indicator this worked differently in the past and hasn't been cleaned up).
> >> I meant something like [0].
> > 
> > OK, but instead of adding kern_seg, I want to change the interface to:
> > 
> > | extern int load_other_segments(struct kimage *image,
> > |		unsigned long kernel_load_addr, unsigned long kernel_size,
> > |		char *initrd, unsigned long initrd_len,
> > |		char *cmdline, unsigned long cmdline_len);
> > 
> > This way, we will in future be able to address an issue I mentioned in
> > my previous e-mail. (If we support vmlinux, the kernel occupies two segments
> > for text and data, respectively.)
> 
> Aha, its not from old-stuff, its for future-stuff!

I have vmlinux patch, but it is very unlikely for me to submit it :)

Thanks,
-Takahiro AKASHI

> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 06/11] arm64: kexec_file: allow for loading Image-format kernel
@ 2018-05-21  9:32               ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-21  9:32 UTC (permalink / raw)
  To: linux-arm-kernel

James,

I haven't commented on this email.

On Tue, May 15, 2018 at 06:14:37PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 15/05/18 06:13, AKASHI Takahiro wrote:
> > On Fri, May 11, 2018 at 06:07:06PM +0100, James Morse wrote:
> >> On 07/05/18 08:21, AKASHI Takahiro wrote:
> >>> On Tue, May 01, 2018 at 06:46:11PM +0100, James Morse wrote:
> >>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
> >>>>> This patch provides kexec_file_ops for "Image"-format kernel. In this
> >>>>> implementation, a binary is always loaded with a fixed offset identified
> >>>>> in text_offset field of its header.
> >>
> >>>>> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
> >>>>> index e4de1223715f..3cba4161818a 100644
> >>>>> --- a/arch/arm64/include/asm/kexec.h
> >>>>> +++ b/arch/arm64/include/asm/kexec.h
> 
> >>>> Could we check branch_code is non-zero, and text-offset points within image-size?
> >>>
> >>> We could do it, but I don't think this check is very useful.
> >>>
> >>>>
> >>>> We could check that this platform supports the page-size/endian config that this
> >>>> Image was built with... We get a message from the EFI stub if the page-size
> >>>> can't be supported, it would be nice to do the same here (as we can).
> >>>
> >>> There is no restriction on page-size or endianness for kexec.
> >>
> >> No, but it won't boot if the hardware doesn't support it. The kernel will spin
> >> at a magic address that is, difficult, to debug without JTAG. The bug report
> >> will be "it didn't boot".
> > 
> > OK.
> > Added sanity checks for cpu features, endianness as well as page size.
> > 
> >>
> >>> What will be the purpose of this check?
> >>
> >> These values are in the header so that the bootloader can check them, then print
> >> a meaningful error. Here, kexec_file_load() is playing the part of the bootloader.
> 
> >> I'm assuming kexec_file_load() can only be used to kexec linux... unlike regular
> >> kexec. Is this where I'm going wrong?
> 
> Trying to work this out for myself: we can't support any UEFI application as we
> can't give it the boot-services environment, so I'm pretty sure
> kexec_file_load() must be linux-specific.
> 
> Can we state somewhere that we only expect arm64 linux to be booted with
> kexec_file_load()? Its not clear from the kconfig text, which refers to kexec,
> which explicitly states it can boot other OS. But for kexec_file_load() we're
> following the kernel's booting.txt.

While I don't know anything about requirements in booting other OS's nor
if we can boot them even with kexec, I agree that kexec_file_load is a more
limited form of booting mechanism. I will add some statement in Kconfig.

> >>>>> diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
> >>>>> new file mode 100644
> >>>>> index 000000000000..4dd524ad6611
> >>>>> --- /dev/null
> >>>>> +++ b/arch/arm64/kernel/kexec_image.c
> >>>>> @@ -0,0 +1,79 @@
> >>>>
> >>>>> +static void *image_load(struct kimage *image,
> >>>>> +				char *kernel, unsigned long kernel_len,
> >>>>> +				char *initrd, unsigned long initrd_len,
> >>>>> +				char *cmdline, unsigned long cmdline_len)
> >>>>> +{
> >>>>> +	struct kexec_buf kbuf;
> >>>>> +	struct arm64_image_header *h = (struct arm64_image_header *)kernel;
> >>>>> +	unsigned long text_offset;
> >>>>> +	int ret;
> >>>>> +
> >>>>> +	/* Load the kernel */
> >>>>> +	kbuf.image = image;
> >>>>> +	kbuf.buf_min = 0;
> >>>>> +	kbuf.buf_max = ULONG_MAX;
> >>>>> +	kbuf.top_down = false;
> >>>>> +
> >>>>> +	kbuf.buffer = kernel;
> >>>>> +	kbuf.bufsz = kernel_len;
> >>>>> +	kbuf.memsz = le64_to_cpu(h->image_size);
> >>>>> +	text_offset = le64_to_cpu(h->text_offset);
> >>>>> +	kbuf.buf_align = SZ_2M;
> >>>>
> >>>>> +	/* Adjust kernel segment with TEXT_OFFSET */
> >>>>> +	kbuf.memsz += text_offset;
> >>>>> +
> >>>>> +	ret = kexec_add_buffer(&kbuf);
> >>>>> +	if (ret)
> >>>>> +		goto out;
> >>>>> +
> >>>>> +	image->arch.kern_segment = image->nr_segments - 1;
> >>>>
> >>>> You only seem to use kern_segment here, and in load_other_segments() called
> >>>> below. Could it not be a local variable passed in? Instead of arch-specific data
> >>>> we keep forever?
> >>>
> >>> No, kern_segment is also used in load_other_segments() in machine_kexec_file.c.
> >>> To optimize memory hole allocation logic in locate_mem_hole_callback(),
> >>> we need to know the exact range of kernel image (start and end).
> >>
> >> That's the second user. My badly-made point is one calls the other, but passes
> >> the data via some until-kexec lifetime struct. (its not important, just an
> >> indicator this worked differently in the past and hasn't been cleaned up).
> >> I meant something like [0].
> > 
> > OK, but instead of adding kern_seg, I want to change the interface to:
> > 
> > | extern int load_other_segments(struct kimage *image,
> > |		unsigned long kernel_load_addr, unsigned long kernel_size,
> > |		char *initrd, unsigned long initrd_len,
> > |		char *cmdline, unsigned long cmdline_len);
> > 
> > This way, we will in future be able to address an issue I mentioned in
> > my previous e-mail. (If we support vmlinux, the kernel occupies two segments
> > for text and data, respectively.)
> 
> Aha, its not from old-stuff, its for future-stuff!

I have vmlinux patch, but it is very unlikely for me to submit it :)

Thanks,
-Takahiro AKASHI

> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 06/11] arm64: kexec_file: allow for loading Image-format kernel
@ 2018-05-21  9:32               ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-21  9:32 UTC (permalink / raw)
  To: James Morse
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

James,

I haven't commented on this email.

On Tue, May 15, 2018 at 06:14:37PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 15/05/18 06:13, AKASHI Takahiro wrote:
> > On Fri, May 11, 2018 at 06:07:06PM +0100, James Morse wrote:
> >> On 07/05/18 08:21, AKASHI Takahiro wrote:
> >>> On Tue, May 01, 2018 at 06:46:11PM +0100, James Morse wrote:
> >>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
> >>>>> This patch provides kexec_file_ops for "Image"-format kernel. In this
> >>>>> implementation, a binary is always loaded with a fixed offset identified
> >>>>> in text_offset field of its header.
> >>
> >>>>> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
> >>>>> index e4de1223715f..3cba4161818a 100644
> >>>>> --- a/arch/arm64/include/asm/kexec.h
> >>>>> +++ b/arch/arm64/include/asm/kexec.h
> 
> >>>> Could we check branch_code is non-zero, and text-offset points within image-size?
> >>>
> >>> We could do it, but I don't think this check is very useful.
> >>>
> >>>>
> >>>> We could check that this platform supports the page-size/endian config that this
> >>>> Image was built with... We get a message from the EFI stub if the page-size
> >>>> can't be supported, it would be nice to do the same here (as we can).
> >>>
> >>> There is no restriction on page-size or endianness for kexec.
> >>
> >> No, but it won't boot if the hardware doesn't support it. The kernel will spin
> >> at a magic address that is, difficult, to debug without JTAG. The bug report
> >> will be "it didn't boot".
> > 
> > OK.
> > Added sanity checks for cpu features, endianness as well as page size.
> > 
> >>
> >>> What will be the purpose of this check?
> >>
> >> These values are in the header so that the bootloader can check them, then print
> >> a meaningful error. Here, kexec_file_load() is playing the part of the bootloader.
> 
> >> I'm assuming kexec_file_load() can only be used to kexec linux... unlike regular
> >> kexec. Is this where I'm going wrong?
> 
> Trying to work this out for myself: we can't support any UEFI application as we
> can't give it the boot-services environment, so I'm pretty sure
> kexec_file_load() must be linux-specific.
> 
> Can we state somewhere that we only expect arm64 linux to be booted with
> kexec_file_load()? Its not clear from the kconfig text, which refers to kexec,
> which explicitly states it can boot other OS. But for kexec_file_load() we're
> following the kernel's booting.txt.

While I don't know anything about requirements in booting other OS's nor
if we can boot them even with kexec, I agree that kexec_file_load is a more
limited form of booting mechanism. I will add some statement in Kconfig.

> >>>>> diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
> >>>>> new file mode 100644
> >>>>> index 000000000000..4dd524ad6611
> >>>>> --- /dev/null
> >>>>> +++ b/arch/arm64/kernel/kexec_image.c
> >>>>> @@ -0,0 +1,79 @@
> >>>>
> >>>>> +static void *image_load(struct kimage *image,
> >>>>> +				char *kernel, unsigned long kernel_len,
> >>>>> +				char *initrd, unsigned long initrd_len,
> >>>>> +				char *cmdline, unsigned long cmdline_len)
> >>>>> +{
> >>>>> +	struct kexec_buf kbuf;
> >>>>> +	struct arm64_image_header *h = (struct arm64_image_header *)kernel;
> >>>>> +	unsigned long text_offset;
> >>>>> +	int ret;
> >>>>> +
> >>>>> +	/* Load the kernel */
> >>>>> +	kbuf.image = image;
> >>>>> +	kbuf.buf_min = 0;
> >>>>> +	kbuf.buf_max = ULONG_MAX;
> >>>>> +	kbuf.top_down = false;
> >>>>> +
> >>>>> +	kbuf.buffer = kernel;
> >>>>> +	kbuf.bufsz = kernel_len;
> >>>>> +	kbuf.memsz = le64_to_cpu(h->image_size);
> >>>>> +	text_offset = le64_to_cpu(h->text_offset);
> >>>>> +	kbuf.buf_align = SZ_2M;
> >>>>
> >>>>> +	/* Adjust kernel segment with TEXT_OFFSET */
> >>>>> +	kbuf.memsz += text_offset;
> >>>>> +
> >>>>> +	ret = kexec_add_buffer(&kbuf);
> >>>>> +	if (ret)
> >>>>> +		goto out;
> >>>>> +
> >>>>> +	image->arch.kern_segment = image->nr_segments - 1;
> >>>>
> >>>> You only seem to use kern_segment here, and in load_other_segments() called
> >>>> below. Could it not be a local variable passed in? Instead of arch-specific data
> >>>> we keep forever?
> >>>
> >>> No, kern_segment is also used in load_other_segments() in machine_kexec_file.c.
> >>> To optimize memory hole allocation logic in locate_mem_hole_callback(),
> >>> we need to know the exact range of kernel image (start and end).
> >>
> >> That's the second user. My badly-made point is one calls the other, but passes
> >> the data via some until-kexec lifetime struct. (its not important, just an
> >> indicator this worked differently in the past and hasn't been cleaned up).
> >> I meant something like [0].
> > 
> > OK, but instead of adding kern_seg, I want to change the interface to:
> > 
> > | extern int load_other_segments(struct kimage *image,
> > |		unsigned long kernel_load_addr, unsigned long kernel_size,
> > |		char *initrd, unsigned long initrd_len,
> > |		char *cmdline, unsigned long cmdline_len);
> > 
> > This way, we will in future be able to address an issue I mentioned in
> > my previous e-mail. (If we support vmlinux, the kernel occupies two segments
> > for text and data, respectively.)
> 
> Aha, its not from old-stuff, its for future-stuff!

I have vmlinux patch, but it is very unlikely for me to submit it :)

Thanks,
-Takahiro AKASHI

> 
> James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support
  2018-05-18 16:00         ` James Morse
  (?)
@ 2018-05-21  9:46           ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-21  9:46 UTC (permalink / raw)
  To: James Morse
  Cc: catalin.marinas, will.deacon, dhowells, vgoyal, herbert, davem,
	dyoung, bhe, arnd, ard.biesheuvel, bhsharma, kexec,
	linux-arm-kernel, linux-kernel

James,

On Fri, May 18, 2018 at 05:00:55PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 18/05/18 11:39, AKASHI Takahiro wrote:
> > On Tue, May 15, 2018 at 06:11:15PM +0100, James Morse wrote:
> >> On 25/04/18 07:26, AKASHI Takahiro wrote:
> >>> Enabling crash dump (kdump) includes
> >>> * prepare contents of ELF header of a core dump file, /proc/vmcore,
> >>>   using crash_prepare_elf64_headers(), and
> >>> * add two device tree properties, "linux,usable-memory-range" and
> >>>   "linux,elfcorehdr", which represent repsectively a memory range
> 
> >>> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> >>> index 37c0a9dc2e47..ec674f4d267c 100644
> >>> --- a/arch/arm64/kernel/machine_kexec_file.c
> >>> +++ b/arch/arm64/kernel/machine_kexec_file.c
> 
> >>> +static void fill_property(void *buf, u64 val64, int cells)
> >>> +{
> >>> +	u32 val32;
> >>> +
> >>> +	if (cells == 1) {
> >>> +		val32 = cpu_to_fdt32((u32)val64);
> >>> +		memcpy(buf, &val32, sizeof(val32));
> >>> +	} else {
> >>
> >>> +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
> >>> +		buf += cells * sizeof(u32) - sizeof(u64);
> >>
> >> Is this trying to clear the 'top' cells and shuffle the pointer to point at the
> >> 'bottom' 2? I'm pretty sure this isn't endian safe.
> >>
> >> Do we really expect a system to have #address-cells > 2?
> > 
> > I don't know, but just for safety.
> 
> Okay, so this is aiming to be a cover-all-cases library function.
> 
> 
> >>> +		val64 = cpu_to_fdt64(val64);
> >>> +		memcpy(buf, &val64, sizeof(val64));
> >>> +	}
> >>> +}
> >>> +
> >>> +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
> >>> +				unsigned long addr, unsigned long size)
> >>
> >> (the device-tree spec describes a 'ranges' property, which had me confused. This
> >> is encoding a prop-encoded-array)
> > 
> > Should we rename it to, say, fdt_setprop_reg()?
> 
> Sure, but I'd really like this code to come from libfdt. I'm hoping for some
> temporary workaround, lets see what the DT folk say.

OK, I will follow Rob's suggestion.

> >>> +	if (!buf)
> >>> +		return -ENOMEM;
> >>> +
> >>> +	fill_property(prop, addr, __dt_root_addr_cells);
> >>> +	prop += __dt_root_addr_cells * sizeof(u32);
> >>> +
> >>> +	fill_property(prop, size, __dt_root_size_cells);
> >>> +
> >>> +	result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
> >>> +
> >>> +	vfree(buf);
> >>> +
> >>> +	return result;
> >>> +}
> >>
> >> Doesn't this stuff belong in libfdt? I guess there is no 'add array element' api
> >> because this the first time we've wanted to create a node with more than
> >> key=fixed-size-value.
> >>
> >> I don't think this belongs in arch C code. Do we have a plan for getting libfdt
> >> to support encoding prop-arrays? Can we put it somewhere anyone else duplicating
> >> this will find it, until we can (re)move it?
> > 
> > I will temporarily move all fdt-related stuff to a separate file, but
> > 
> >> I have no idea how that happens... it looks like the devicetree list is the
> >> place to ask.
> > 
> > should we always sync with the original dtc/libfdt repository?
> 
> I thought so, libfdt is one of those external libraries that the kernel
> consumes, like acpica. For acpica at least the rule is changes go upstream, then
> get sync'd back.

Same above.

> >>>  static int setup_dtb(struct kimage *image,
> >>>  		unsigned long initrd_load_addr, unsigned long initrd_len,
> >>>  		char *cmdline, unsigned long cmdline_len,
> >>> @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
> >>>  	int range_len;
> >>>  	int ret;
> >>>  
> >>> +	/* check ranges against root's #address-cells and #size-cells */
> >>> +	if (image->type == KEXEC_TYPE_CRASH &&
> >>> +		(!cells_size_fitted(image->arch.elf_load_addr,
> >>> +				image->arch.elf_headers_sz) ||
> >>> +		 !cells_size_fitted(crashk_res.start,
> >>> +				crashk_res.end - crashk_res.start + 1))) {
> >>> +		pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
> >>> +		ret = -EINVAL;
> >>> +		goto out_err;
> >>> +	}
> >>
> >> To check I've understood this properly: This can happen if the firmware provided
> >> a DTB with 32bit address/size cells, but at least some of the memory requires 64
> >> bit address/size cells. This could only happen on a UEFI system where the
> >> firmware-DTB doesn't describe memory. ACPI-only systems would have the EFIstub DT.
> > 
> > Probably, yes. I assumed the case where #address-cells and #size-cells
> > were just missing in fdt.
> 
> Ah, that's another one. I just wanted to check we could boot on a system where
> this can happen.
> 
> 
> >>>  	/* duplicate dt blob */
> >>>  	buf_size = fdt_totalsize(initial_boot_params);
> >>>  	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> >>>  
> >>> +	if (image->type == KEXEC_TYPE_CRASH)
> >>> +		buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
> >>> +				+ fdt_prop_len("linux,usable-memory-range",
> >>> +								range_len);
> 
> >                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> [...]
> 
> >> Don't you need to add "linux,usable-memory-range" to the buf_size estimate?
> > 
> > I think the code exists. See above.
> 
> Sorry, turns out I can't read!
> 
> 
> >>> +		if (ret)
> >>> +			goto out_err;
> >>> +	}
> >>
> >>> @@ -148,17 +258,109 @@ static int setup_dtb(struct kimage *image,
> >>
> >>> +static struct crash_mem *get_crash_memory_ranges(void)
> >>> +{
> >>> +	unsigned int nr_ranges;
> >>> +	struct crash_mem *cmem;
> >>> +
> >>> +	nr_ranges = 1; /* for exclusion of crashkernel region */
> >>> +	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ranges_callback);
> >>> +
> >>> +	cmem = vmalloc(sizeof(struct crash_mem) +
> >>> +			sizeof(struct crash_mem_range) * nr_ranges);
> >>> +	if (!cmem)
> >>> +		return NULL;
> >>> +
> >>> +	cmem->max_nr_ranges = nr_ranges;
> >>> +	cmem->nr_ranges = 0;
> >>> +	walk_system_ram_res(0, -1, cmem, add_mem_range_callback);
> >>> +
> >>> +	/* Exclude crashkernel region */
> >>> +	if (crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end)) {
> >>> +		vfree(cmem);
> >>> +		return NULL;
> >>> +	}
> >>> +
> >>> +	return cmem;
> >>> +}
> >>
> >> Could this function be included in prepare_elf_headers() so that the alloc() and
> >> free() occur together.
> > 
> > Or aiming that arm64 and x86 have similar-look code?
> 
> What's the advantage in things looking the same? If they are the same, it
> probably shouldn't be in per-arch code. Otherwise it should be as simple as
> possible, otherwise we can't spot the bugs/leaks.
> 
> But I think walking memblock here will remove all 'looks the same' properties here.

OK, I will unfold the function in prepare_elf_headers().

> 
> >>> +static int prepare_elf_headers(void **addr, unsigned long *sz)
> >>> +{
> >>> +	struct crash_mem *cmem;
> >>> +	int ret = 0;
> >>> +
> >>> +	cmem = get_crash_memory_ranges();
> >>> +	if (!cmem)
> >>> +		return -ENOMEM;
> >>> +
> >>> +	ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
> >>> +
> >>> +	vfree(cmem);
> >>
> >>> +	return ret;
> >>> +}
> >>
> >> All this is moving memory-range information from core-code's
> >> walk_system_ram_res() into core-code's struct crash_mem, and excluding
> >> crashk_res, which again is accessible to the core code.
> >>
> >> It looks like this is duplicated in arch/x86 and arch/arm64 because arm64
> >> doesn't have a second 'crashk_low_res' region, and always wants elf64, instead
> >> of when IS_ENABLED(CONFIG_X86_64).
> >> If we can abstract just those two, more of this could be moved to core code
> >> where powerpc can make use of it if they want to support kdump with
> >> kexec_file_load().
> >>
> >> But, its getting late for cross-architecture dependencies, lets put that on the
> >> for-later list. (assuming there isn't a powerpc-kdump series out there adding a
> >> third copy of this)
> > 
> > Sure. X86 code has so many exceptional lines in the code :)
> 
> They also pass the e820 'usable-memory' map on the cmdline...

Well, according to Dave(RedHat)'s past comment, this type of kernel
parameters are in a old style, and x86 now has a dedicated memory region
passed for this sake.

Thanks,
-Takahiro AKASHI

> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-05-21  9:46           ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-21  9:46 UTC (permalink / raw)
  To: linux-arm-kernel

James,

On Fri, May 18, 2018 at 05:00:55PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 18/05/18 11:39, AKASHI Takahiro wrote:
> > On Tue, May 15, 2018 at 06:11:15PM +0100, James Morse wrote:
> >> On 25/04/18 07:26, AKASHI Takahiro wrote:
> >>> Enabling crash dump (kdump) includes
> >>> * prepare contents of ELF header of a core dump file, /proc/vmcore,
> >>>   using crash_prepare_elf64_headers(), and
> >>> * add two device tree properties, "linux,usable-memory-range" and
> >>>   "linux,elfcorehdr", which represent repsectively a memory range
> 
> >>> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> >>> index 37c0a9dc2e47..ec674f4d267c 100644
> >>> --- a/arch/arm64/kernel/machine_kexec_file.c
> >>> +++ b/arch/arm64/kernel/machine_kexec_file.c
> 
> >>> +static void fill_property(void *buf, u64 val64, int cells)
> >>> +{
> >>> +	u32 val32;
> >>> +
> >>> +	if (cells == 1) {
> >>> +		val32 = cpu_to_fdt32((u32)val64);
> >>> +		memcpy(buf, &val32, sizeof(val32));
> >>> +	} else {
> >>
> >>> +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
> >>> +		buf += cells * sizeof(u32) - sizeof(u64);
> >>
> >> Is this trying to clear the 'top' cells and shuffle the pointer to point at the
> >> 'bottom' 2? I'm pretty sure this isn't endian safe.
> >>
> >> Do we really expect a system to have #address-cells > 2?
> > 
> > I don't know, but just for safety.
> 
> Okay, so this is aiming to be a cover-all-cases library function.
> 
> 
> >>> +		val64 = cpu_to_fdt64(val64);
> >>> +		memcpy(buf, &val64, sizeof(val64));
> >>> +	}
> >>> +}
> >>> +
> >>> +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
> >>> +				unsigned long addr, unsigned long size)
> >>
> >> (the device-tree spec describes a 'ranges' property, which had me confused. This
> >> is encoding a prop-encoded-array)
> > 
> > Should we rename it to, say, fdt_setprop_reg()?
> 
> Sure, but I'd really like this code to come from libfdt. I'm hoping for some
> temporary workaround, lets see what the DT folk say.

OK, I will follow Rob's suggestion.

> >>> +	if (!buf)
> >>> +		return -ENOMEM;
> >>> +
> >>> +	fill_property(prop, addr, __dt_root_addr_cells);
> >>> +	prop += __dt_root_addr_cells * sizeof(u32);
> >>> +
> >>> +	fill_property(prop, size, __dt_root_size_cells);
> >>> +
> >>> +	result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
> >>> +
> >>> +	vfree(buf);
> >>> +
> >>> +	return result;
> >>> +}
> >>
> >> Doesn't this stuff belong in libfdt? I guess there is no 'add array element' api
> >> because this the first time we've wanted to create a node with more than
> >> key=fixed-size-value.
> >>
> >> I don't think this belongs in arch C code. Do we have a plan for getting libfdt
> >> to support encoding prop-arrays? Can we put it somewhere anyone else duplicating
> >> this will find it, until we can (re)move it?
> > 
> > I will temporarily move all fdt-related stuff to a separate file, but
> > 
> >> I have no idea how that happens... it looks like the devicetree list is the
> >> place to ask.
> > 
> > should we always sync with the original dtc/libfdt repository?
> 
> I thought so, libfdt is one of those external libraries that the kernel
> consumes, like acpica. For acpica at least the rule is changes go upstream, then
> get sync'd back.

Same above.

> >>>  static int setup_dtb(struct kimage *image,
> >>>  		unsigned long initrd_load_addr, unsigned long initrd_len,
> >>>  		char *cmdline, unsigned long cmdline_len,
> >>> @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
> >>>  	int range_len;
> >>>  	int ret;
> >>>  
> >>> +	/* check ranges against root's #address-cells and #size-cells */
> >>> +	if (image->type == KEXEC_TYPE_CRASH &&
> >>> +		(!cells_size_fitted(image->arch.elf_load_addr,
> >>> +				image->arch.elf_headers_sz) ||
> >>> +		 !cells_size_fitted(crashk_res.start,
> >>> +				crashk_res.end - crashk_res.start + 1))) {
> >>> +		pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
> >>> +		ret = -EINVAL;
> >>> +		goto out_err;
> >>> +	}
> >>
> >> To check I've understood this properly: This can happen if the firmware provided
> >> a DTB with 32bit address/size cells, but at least some of the memory requires 64
> >> bit address/size cells. This could only happen on a UEFI system where the
> >> firmware-DTB doesn't describe memory. ACPI-only systems would have the EFIstub DT.
> > 
> > Probably, yes. I assumed the case where #address-cells and #size-cells
> > were just missing in fdt.
> 
> Ah, that's another one. I just wanted to check we could boot on a system where
> this can happen.
> 
> 
> >>>  	/* duplicate dt blob */
> >>>  	buf_size = fdt_totalsize(initial_boot_params);
> >>>  	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> >>>  
> >>> +	if (image->type == KEXEC_TYPE_CRASH)
> >>> +		buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
> >>> +				+ fdt_prop_len("linux,usable-memory-range",
> >>> +								range_len);
> 
> >                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> [...]
> 
> >> Don't you need to add "linux,usable-memory-range" to the buf_size estimate?
> > 
> > I think the code exists. See above.
> 
> Sorry, turns out I can't read!
> 
> 
> >>> +		if (ret)
> >>> +			goto out_err;
> >>> +	}
> >>
> >>> @@ -148,17 +258,109 @@ static int setup_dtb(struct kimage *image,
> >>
> >>> +static struct crash_mem *get_crash_memory_ranges(void)
> >>> +{
> >>> +	unsigned int nr_ranges;
> >>> +	struct crash_mem *cmem;
> >>> +
> >>> +	nr_ranges = 1; /* for exclusion of crashkernel region */
> >>> +	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ranges_callback);
> >>> +
> >>> +	cmem = vmalloc(sizeof(struct crash_mem) +
> >>> +			sizeof(struct crash_mem_range) * nr_ranges);
> >>> +	if (!cmem)
> >>> +		return NULL;
> >>> +
> >>> +	cmem->max_nr_ranges = nr_ranges;
> >>> +	cmem->nr_ranges = 0;
> >>> +	walk_system_ram_res(0, -1, cmem, add_mem_range_callback);
> >>> +
> >>> +	/* Exclude crashkernel region */
> >>> +	if (crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end)) {
> >>> +		vfree(cmem);
> >>> +		return NULL;
> >>> +	}
> >>> +
> >>> +	return cmem;
> >>> +}
> >>
> >> Could this function be included in prepare_elf_headers() so that the alloc() and
> >> free() occur together.
> > 
> > Or aiming that arm64 and x86 have similar-look code?
> 
> What's the advantage in things looking the same? If they are the same, it
> probably shouldn't be in per-arch code. Otherwise it should be as simple as
> possible, otherwise we can't spot the bugs/leaks.
> 
> But I think walking memblock here will remove all 'looks the same' properties here.

OK, I will unfold the function in prepare_elf_headers().

> 
> >>> +static int prepare_elf_headers(void **addr, unsigned long *sz)
> >>> +{
> >>> +	struct crash_mem *cmem;
> >>> +	int ret = 0;
> >>> +
> >>> +	cmem = get_crash_memory_ranges();
> >>> +	if (!cmem)
> >>> +		return -ENOMEM;
> >>> +
> >>> +	ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
> >>> +
> >>> +	vfree(cmem);
> >>
> >>> +	return ret;
> >>> +}
> >>
> >> All this is moving memory-range information from core-code's
> >> walk_system_ram_res() into core-code's struct crash_mem, and excluding
> >> crashk_res, which again is accessible to the core code.
> >>
> >> It looks like this is duplicated in arch/x86 and arch/arm64 because arm64
> >> doesn't have a second 'crashk_low_res' region, and always wants elf64, instead
> >> of when IS_ENABLED(CONFIG_X86_64).
> >> If we can abstract just those two, more of this could be moved to core code
> >> where powerpc can make use of it if they want to support kdump with
> >> kexec_file_load().
> >>
> >> But, its getting late for cross-architecture dependencies, lets put that on the
> >> for-later list. (assuming there isn't a powerpc-kdump series out there adding a
> >> third copy of this)
> > 
> > Sure. X86 code has so many exceptional lines in the code :)
> 
> They also pass the e820 'usable-memory' map on the cmdline...

Well, according to Dave(RedHat)'s past comment, this type of kernel
parameters are in a old style, and x86 now has a dedicated memory region
passed for this sake.

Thanks,
-Takahiro AKASHI

> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-05-21  9:46           ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-21  9:46 UTC (permalink / raw)
  To: James Morse
  Cc: herbert, bhe, ard.biesheuvel, catalin.marinas, bhsharma,
	will.deacon, linux-kernel, dhowells, arnd, linux-arm-kernel,
	kexec, dyoung, davem, vgoyal

James,

On Fri, May 18, 2018 at 05:00:55PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 18/05/18 11:39, AKASHI Takahiro wrote:
> > On Tue, May 15, 2018 at 06:11:15PM +0100, James Morse wrote:
> >> On 25/04/18 07:26, AKASHI Takahiro wrote:
> >>> Enabling crash dump (kdump) includes
> >>> * prepare contents of ELF header of a core dump file, /proc/vmcore,
> >>>   using crash_prepare_elf64_headers(), and
> >>> * add two device tree properties, "linux,usable-memory-range" and
> >>>   "linux,elfcorehdr", which represent repsectively a memory range
> 
> >>> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> >>> index 37c0a9dc2e47..ec674f4d267c 100644
> >>> --- a/arch/arm64/kernel/machine_kexec_file.c
> >>> +++ b/arch/arm64/kernel/machine_kexec_file.c
> 
> >>> +static void fill_property(void *buf, u64 val64, int cells)
> >>> +{
> >>> +	u32 val32;
> >>> +
> >>> +	if (cells == 1) {
> >>> +		val32 = cpu_to_fdt32((u32)val64);
> >>> +		memcpy(buf, &val32, sizeof(val32));
> >>> +	} else {
> >>
> >>> +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
> >>> +		buf += cells * sizeof(u32) - sizeof(u64);
> >>
> >> Is this trying to clear the 'top' cells and shuffle the pointer to point at the
> >> 'bottom' 2? I'm pretty sure this isn't endian safe.
> >>
> >> Do we really expect a system to have #address-cells > 2?
> > 
> > I don't know, but just for safety.
> 
> Okay, so this is aiming to be a cover-all-cases library function.
> 
> 
> >>> +		val64 = cpu_to_fdt64(val64);
> >>> +		memcpy(buf, &val64, sizeof(val64));
> >>> +	}
> >>> +}
> >>> +
> >>> +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
> >>> +				unsigned long addr, unsigned long size)
> >>
> >> (the device-tree spec describes a 'ranges' property, which had me confused. This
> >> is encoding a prop-encoded-array)
> > 
> > Should we rename it to, say, fdt_setprop_reg()?
> 
> Sure, but I'd really like this code to come from libfdt. I'm hoping for some
> temporary workaround, lets see what the DT folk say.

OK, I will follow Rob's suggestion.

> >>> +	if (!buf)
> >>> +		return -ENOMEM;
> >>> +
> >>> +	fill_property(prop, addr, __dt_root_addr_cells);
> >>> +	prop += __dt_root_addr_cells * sizeof(u32);
> >>> +
> >>> +	fill_property(prop, size, __dt_root_size_cells);
> >>> +
> >>> +	result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
> >>> +
> >>> +	vfree(buf);
> >>> +
> >>> +	return result;
> >>> +}
> >>
> >> Doesn't this stuff belong in libfdt? I guess there is no 'add array element' api
> >> because this the first time we've wanted to create a node with more than
> >> key=fixed-size-value.
> >>
> >> I don't think this belongs in arch C code. Do we have a plan for getting libfdt
> >> to support encoding prop-arrays? Can we put it somewhere anyone else duplicating
> >> this will find it, until we can (re)move it?
> > 
> > I will temporarily move all fdt-related stuff to a separate file, but
> > 
> >> I have no idea how that happens... it looks like the devicetree list is the
> >> place to ask.
> > 
> > should we always sync with the original dtc/libfdt repository?
> 
> I thought so, libfdt is one of those external libraries that the kernel
> consumes, like acpica. For acpica at least the rule is changes go upstream, then
> get sync'd back.

Same above.

> >>>  static int setup_dtb(struct kimage *image,
> >>>  		unsigned long initrd_load_addr, unsigned long initrd_len,
> >>>  		char *cmdline, unsigned long cmdline_len,
> >>> @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
> >>>  	int range_len;
> >>>  	int ret;
> >>>  
> >>> +	/* check ranges against root's #address-cells and #size-cells */
> >>> +	if (image->type == KEXEC_TYPE_CRASH &&
> >>> +		(!cells_size_fitted(image->arch.elf_load_addr,
> >>> +				image->arch.elf_headers_sz) ||
> >>> +		 !cells_size_fitted(crashk_res.start,
> >>> +				crashk_res.end - crashk_res.start + 1))) {
> >>> +		pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
> >>> +		ret = -EINVAL;
> >>> +		goto out_err;
> >>> +	}
> >>
> >> To check I've understood this properly: This can happen if the firmware provided
> >> a DTB with 32bit address/size cells, but at least some of the memory requires 64
> >> bit address/size cells. This could only happen on a UEFI system where the
> >> firmware-DTB doesn't describe memory. ACPI-only systems would have the EFIstub DT.
> > 
> > Probably, yes. I assumed the case where #address-cells and #size-cells
> > were just missing in fdt.
> 
> Ah, that's another one. I just wanted to check we could boot on a system where
> this can happen.
> 
> 
> >>>  	/* duplicate dt blob */
> >>>  	buf_size = fdt_totalsize(initial_boot_params);
> >>>  	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> >>>  
> >>> +	if (image->type == KEXEC_TYPE_CRASH)
> >>> +		buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
> >>> +				+ fdt_prop_len("linux,usable-memory-range",
> >>> +								range_len);
> 
> >                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> [...]
> 
> >> Don't you need to add "linux,usable-memory-range" to the buf_size estimate?
> > 
> > I think the code exists. See above.
> 
> Sorry, turns out I can't read!
> 
> 
> >>> +		if (ret)
> >>> +			goto out_err;
> >>> +	}
> >>
> >>> @@ -148,17 +258,109 @@ static int setup_dtb(struct kimage *image,
> >>
> >>> +static struct crash_mem *get_crash_memory_ranges(void)
> >>> +{
> >>> +	unsigned int nr_ranges;
> >>> +	struct crash_mem *cmem;
> >>> +
> >>> +	nr_ranges = 1; /* for exclusion of crashkernel region */
> >>> +	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ranges_callback);
> >>> +
> >>> +	cmem = vmalloc(sizeof(struct crash_mem) +
> >>> +			sizeof(struct crash_mem_range) * nr_ranges);
> >>> +	if (!cmem)
> >>> +		return NULL;
> >>> +
> >>> +	cmem->max_nr_ranges = nr_ranges;
> >>> +	cmem->nr_ranges = 0;
> >>> +	walk_system_ram_res(0, -1, cmem, add_mem_range_callback);
> >>> +
> >>> +	/* Exclude crashkernel region */
> >>> +	if (crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end)) {
> >>> +		vfree(cmem);
> >>> +		return NULL;
> >>> +	}
> >>> +
> >>> +	return cmem;
> >>> +}
> >>
> >> Could this function be included in prepare_elf_headers() so that the alloc() and
> >> free() occur together.
> > 
> > Or aiming that arm64 and x86 have similar-look code?
> 
> What's the advantage in things looking the same? If they are the same, it
> probably shouldn't be in per-arch code. Otherwise it should be as simple as
> possible, otherwise we can't spot the bugs/leaks.
> 
> But I think walking memblock here will remove all 'looks the same' properties here.

OK, I will unfold the function in prepare_elf_headers().

> 
> >>> +static int prepare_elf_headers(void **addr, unsigned long *sz)
> >>> +{
> >>> +	struct crash_mem *cmem;
> >>> +	int ret = 0;
> >>> +
> >>> +	cmem = get_crash_memory_ranges();
> >>> +	if (!cmem)
> >>> +		return -ENOMEM;
> >>> +
> >>> +	ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
> >>> +
> >>> +	vfree(cmem);
> >>
> >>> +	return ret;
> >>> +}
> >>
> >> All this is moving memory-range information from core-code's
> >> walk_system_ram_res() into core-code's struct crash_mem, and excluding
> >> crashk_res, which again is accessible to the core code.
> >>
> >> It looks like this is duplicated in arch/x86 and arch/arm64 because arm64
> >> doesn't have a second 'crashk_low_res' region, and always wants elf64, instead
> >> of when IS_ENABLED(CONFIG_X86_64).
> >> If we can abstract just those two, more of this could be moved to core code
> >> where powerpc can make use of it if they want to support kdump with
> >> kexec_file_load().
> >>
> >> But, its getting late for cross-architecture dependencies, lets put that on the
> >> for-later list. (assuming there isn't a powerpc-kdump series out there adding a
> >> third copy of this)
> > 
> > Sure. X86 code has so many exceptional lines in the code :)
> 
> They also pass the e820 'usable-memory' map on the cmdline...

Well, according to Dave(RedHat)'s past comment, this type of kernel
parameters are in a old style, and x86 now has a dedicated memory region
passed for this sake.

Thanks,
-Takahiro AKASHI

> 
> Thanks,
> 
> James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support
  2018-05-18 15:35       ` Rob Herring
  (?)
@ 2018-05-21 10:14         ` AKASHI Takahiro
  -1 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-21 10:14 UTC (permalink / raw)
  To: Rob Herring
  Cc: James Morse, catalin.marinas, will.deacon, dhowells, vgoyal,
	herbert, davem, dyoung, bhe, arnd, ard.biesheuvel, bhsharma,
	kexec, linux-arm-kernel, linux-kernel, devicetree

Hi Rob,

On Fri, May 18, 2018 at 10:35:52AM -0500, Rob Herring wrote:
> On Tue, May 15, 2018 at 06:12:59PM +0100, James Morse wrote:
> > Hi guys,
> > 
> > (CC: +RobH, devicetree list)
> 
> Thanks.
> 
> > On 25/04/18 07:26, AKASHI Takahiro wrote:
> > > Enabling crash dump (kdump) includes
> > > * prepare contents of ELF header of a core dump file, /proc/vmcore,
> > >   using crash_prepare_elf64_headers(), and
> > > * add two device tree properties, "linux,usable-memory-range" and
> > >   "linux,elfcorehdr", which represent repsectively a memory range
> > >   to be used by crash dump kernel and the header's location
> 
> BTW, I intend to move existing parsing these out of the arch code. 
> Please don't add more DT handling to arch/ unless it is *really* arch 
> specific. I'd assume that the next arch to add kexec support will use 
> these bindings instead of the powerpc way.

So do you expect all the fdt-related stuff in my current implementation
for arm64 to be put into libfdt, or at least drivers/of, from the beginning?

I'm not sure how arch-specific the properties here are. For instance,
it is only arm64 that uses "linux,usable-memory-range" right now but
if some other arch follows, it is no more arch-specific.
# I remember that you didn't like this property :)

> > kexec_file_load() on arm64 needs to be able to create a prop encoded array to
> > the FDT, but there doesn't appear to be a libfdt helper to do this.
> > 
> > Akashi's code below adds fdt_setprop_range() to the arch code, and duplicates
> > bits of libfdt_internal.h to do the work.
> > 
> > How should this be done? I'm assuming this is something we need a new API in
> > libfdt.h for. How do these come about, and is there an interim step we can use
> > until then?
> 
> Submit patches to upstream dtc and then we can pull it in. Ahead of that 
> you can add it to drivers/of/fdt.c (or maybe fdt_address.c because 
> that's really what this is dealing with).

OK, I'm going to try to follow your suggestion.

> libfdt has only recently gained the beginnings of address handling.
> 
> > 
> > Thanks!
> > 
> > James
> > 
> > > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > > index 37c0a9dc2e47..ec674f4d267c 100644
> > > --- a/arch/arm64/kernel/machine_kexec_file.c
> > > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > > @@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> > >  	return ret;
> > >  }
> > >  
> > > +static int __init arch_kexec_file_init(void)
> > > +{
> > > +	/* Those values are used later on loading the kernel */
> > > +	__dt_root_addr_cells = dt_root_addr_cells;
> > > +	__dt_root_size_cells = dt_root_size_cells;
> 
> I intend to make dt_root_*_cells private, so don't add another user 
> outside of drivers/of/.

Once cells_size_fitted() moves to drivers/of, there will be no users.

> > > +
> > > +	return 0;
> > > +}
> > > +late_initcall(arch_kexec_file_init);
> > > +
> > > +#define FDT_ALIGN(x, a)	(((x) + (a) - 1) & ~((a) - 1))
> > > +#define FDT_TAGALIGN(x)	(FDT_ALIGN((x), FDT_TAGSIZE))
> > > +
> > > +static int fdt_prop_len(const char *prop_name, int len)
> > > +{
> > > +	return (strlen(prop_name) + 1) +
> > > +		sizeof(struct fdt_property) +
> > > +		FDT_TAGALIGN(len);
> > > +}
> > > +
> > > +static bool cells_size_fitted(unsigned long base, unsigned long size)
> 
> I can't imagine this would happen. However, when this is moved to 
> drivers/of/ or dtc, these need to be u64 types to work on 32-bit.

OK.

> > > +	/* if *_cells >= 2, cells can hold 64-bit values anyway */
> > > +	if ((__dt_root_addr_cells == 1) && (base >= (1ULL << 32)))
> > > +		return false;
> > > +
> > > +	if ((__dt_root_size_cells == 1) && (size >= (1ULL << 32)))
> > > +		return false;
> > > +
> > > +	return true;
> > > +}
> > > +
> > > +static void fill_property(void *buf, u64 val64, int cells)
> > > +{
> > > +	u32 val32;
> 
> This should be a __be32 or fdt32 type. So should buf.

OK for val32, but buf is a local pointer address.

> > > +
> > > +	if (cells == 1) {
> > > +		val32 = cpu_to_fdt32((u32)val64);
> > > +		memcpy(buf, &val32, sizeof(val32));
> > > +	} else {
> > > +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
> > > +		buf += cells * sizeof(u32) - sizeof(u64);
> > > +
> > > +		val64 = cpu_to_fdt64(val64);
> > > +		memcpy(buf, &val64, sizeof(val64));
> 
> Look how of_read_number() is implemented. You should be able to do 
> something similar here looping and avoiding the if/else.

Ah, excellent!

> > > +	}
> > > +}
> > > +
> > > +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
> > > +				unsigned long addr, unsigned long size)
> 
> A very generic sounding function, but really only works on addresses in 
> children of the root node.
> 
> > > +{
> > > +	void *buf, *prop;
> > > +	size_t buf_size;
> > > +	int result;
> > > +
> > > +	buf_size = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> > > +	prop = buf = vmalloc(buf_size);
> 
> This can go on the stack instead (and would be required to to work in 
> libfdt).

Well, I can't agree with you here since we are now in effort, as far as
I correctly understand, of purging all the variable-sized arrays on a local
stack out of the kernel code.

Thank you for your review.
-Takahiro AKASHI

> > > +	if (!buf)
> > > +		return -ENOMEM;
> > > +
> > > +	fill_property(prop, addr, __dt_root_addr_cells);
> > > +	prop += __dt_root_addr_cells * sizeof(u32);
> > > +
> > > +	fill_property(prop, size, __dt_root_size_cells);
> > > +
> > > +	result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
> > > +
> > > +	vfree(buf);
> > > +
> > > +	return result;
> > > +}
> > > +
> > >  static int setup_dtb(struct kimage *image,
> > >  		unsigned long initrd_load_addr, unsigned long initrd_len,
> > >  		char *cmdline, unsigned long cmdline_len,
> > > @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
> > >  	int range_len;
> > >  	int ret;
> > >  
> > > +	/* check ranges against root's #address-cells and #size-cells */
> > > +	if (image->type == KEXEC_TYPE_CRASH &&
> > > +		(!cells_size_fitted(image->arch.elf_load_addr,
> > > +				image->arch.elf_headers_sz) ||
> > > +		 !cells_size_fitted(crashk_res.start,
> > > +				crashk_res.end - crashk_res.start + 1))) {
> > > +		pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
> > > +		ret = -EINVAL;
> > > +		goto out_err;
> > > +	}
> > > +
> > >  	/* duplicate dt blob */
> > >  	buf_size = fdt_totalsize(initial_boot_params);
> > >  	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> > >  
> > > +	if (image->type == KEXEC_TYPE_CRASH)
> > > +		buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
> > > +				+ fdt_prop_len("linux,usable-memory-range",
> > > +								range_len);
> > > +
> > >  	if (initrd_load_addr)
> > >  		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
> > >  				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
> > > @@ -113,6 +206,23 @@ static int setup_dtb(struct kimage *image,
> > >  	if (nodeoffset < 0)
> > >  		goto out_err;
> > >  
> > > +	if (image->type == KEXEC_TYPE_CRASH) {
> > > +		/* add linux,elfcorehdr */
> > > +		ret = fdt_setprop_range(buf, nodeoffset, "linux,elfcorehdr",
> > > +				image->arch.elf_load_addr,
> > > +				image->arch.elf_headers_sz);
> > > +		if (ret)
> > > +			goto out_err;
> > > +
> > > +		/* add linux,usable-memory-range */
> > > +		ret = fdt_setprop_range(buf, nodeoffset,
> > > +				"linux,usable-memory-range",
> > > +				crashk_res.start,
> > > +				crashk_res.end - crashk_res.start + 1);
> > > +		if (ret)
> > > +			goto out_err;
> > > +	}
> > > +
> > >  	/* add bootargs */
> > >  	if (cmdline) {
> > >  		ret = fdt_setprop(buf, nodeoffset, "bootargs",
> > 

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-05-21 10:14         ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-21 10:14 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Rob,

On Fri, May 18, 2018 at 10:35:52AM -0500, Rob Herring wrote:
> On Tue, May 15, 2018 at 06:12:59PM +0100, James Morse wrote:
> > Hi guys,
> > 
> > (CC: +RobH, devicetree list)
> 
> Thanks.
> 
> > On 25/04/18 07:26, AKASHI Takahiro wrote:
> > > Enabling crash dump (kdump) includes
> > > * prepare contents of ELF header of a core dump file, /proc/vmcore,
> > >   using crash_prepare_elf64_headers(), and
> > > * add two device tree properties, "linux,usable-memory-range" and
> > >   "linux,elfcorehdr", which represent repsectively a memory range
> > >   to be used by crash dump kernel and the header's location
> 
> BTW, I intend to move existing parsing these out of the arch code. 
> Please don't add more DT handling to arch/ unless it is *really* arch 
> specific. I'd assume that the next arch to add kexec support will use 
> these bindings instead of the powerpc way.

So do you expect all the fdt-related stuff in my current implementation
for arm64 to be put into libfdt, or at least drivers/of, from the beginning?

I'm not sure how arch-specific the properties here are. For instance,
it is only arm64 that uses "linux,usable-memory-range" right now but
if some other arch follows, it is no more arch-specific.
# I remember that you didn't like this property :)

> > kexec_file_load() on arm64 needs to be able to create a prop encoded array to
> > the FDT, but there doesn't appear to be a libfdt helper to do this.
> > 
> > Akashi's code below adds fdt_setprop_range() to the arch code, and duplicates
> > bits of libfdt_internal.h to do the work.
> > 
> > How should this be done? I'm assuming this is something we need a new API in
> > libfdt.h for. How do these come about, and is there an interim step we can use
> > until then?
> 
> Submit patches to upstream dtc and then we can pull it in. Ahead of that 
> you can add it to drivers/of/fdt.c (or maybe fdt_address.c because 
> that's really what this is dealing with).

OK, I'm going to try to follow your suggestion.

> libfdt has only recently gained the beginnings of address handling.
> 
> > 
> > Thanks!
> > 
> > James
> > 
> > > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > > index 37c0a9dc2e47..ec674f4d267c 100644
> > > --- a/arch/arm64/kernel/machine_kexec_file.c
> > > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > > @@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> > >  	return ret;
> > >  }
> > >  
> > > +static int __init arch_kexec_file_init(void)
> > > +{
> > > +	/* Those values are used later on loading the kernel */
> > > +	__dt_root_addr_cells = dt_root_addr_cells;
> > > +	__dt_root_size_cells = dt_root_size_cells;
> 
> I intend to make dt_root_*_cells private, so don't add another user 
> outside of drivers/of/.

Once cells_size_fitted() moves to drivers/of, there will be no users.

> > > +
> > > +	return 0;
> > > +}
> > > +late_initcall(arch_kexec_file_init);
> > > +
> > > +#define FDT_ALIGN(x, a)	(((x) + (a) - 1) & ~((a) - 1))
> > > +#define FDT_TAGALIGN(x)	(FDT_ALIGN((x), FDT_TAGSIZE))
> > > +
> > > +static int fdt_prop_len(const char *prop_name, int len)
> > > +{
> > > +	return (strlen(prop_name) + 1) +
> > > +		sizeof(struct fdt_property) +
> > > +		FDT_TAGALIGN(len);
> > > +}
> > > +
> > > +static bool cells_size_fitted(unsigned long base, unsigned long size)
> 
> I can't imagine this would happen. However, when this is moved to 
> drivers/of/ or dtc, these need to be u64 types to work on 32-bit.

OK.

> > > +	/* if *_cells >= 2, cells can hold 64-bit values anyway */
> > > +	if ((__dt_root_addr_cells == 1) && (base >= (1ULL << 32)))
> > > +		return false;
> > > +
> > > +	if ((__dt_root_size_cells == 1) && (size >= (1ULL << 32)))
> > > +		return false;
> > > +
> > > +	return true;
> > > +}
> > > +
> > > +static void fill_property(void *buf, u64 val64, int cells)
> > > +{
> > > +	u32 val32;
> 
> This should be a __be32 or fdt32 type. So should buf.

OK for val32, but buf is a local pointer address.

> > > +
> > > +	if (cells == 1) {
> > > +		val32 = cpu_to_fdt32((u32)val64);
> > > +		memcpy(buf, &val32, sizeof(val32));
> > > +	} else {
> > > +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
> > > +		buf += cells * sizeof(u32) - sizeof(u64);
> > > +
> > > +		val64 = cpu_to_fdt64(val64);
> > > +		memcpy(buf, &val64, sizeof(val64));
> 
> Look how of_read_number() is implemented. You should be able to do 
> something similar here looping and avoiding the if/else.

Ah, excellent!

> > > +	}
> > > +}
> > > +
> > > +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
> > > +				unsigned long addr, unsigned long size)
> 
> A very generic sounding function, but really only works on addresses in 
> children of the root node.
> 
> > > +{
> > > +	void *buf, *prop;
> > > +	size_t buf_size;
> > > +	int result;
> > > +
> > > +	buf_size = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> > > +	prop = buf = vmalloc(buf_size);
> 
> This can go on the stack instead (and would be required to to work in 
> libfdt).

Well, I can't agree with you here since we are now in effort, as far as
I correctly understand, of purging all the variable-sized arrays on a local
stack out of the kernel code.

Thank you for your review.
-Takahiro AKASHI

> > > +	if (!buf)
> > > +		return -ENOMEM;
> > > +
> > > +	fill_property(prop, addr, __dt_root_addr_cells);
> > > +	prop += __dt_root_addr_cells * sizeof(u32);
> > > +
> > > +	fill_property(prop, size, __dt_root_size_cells);
> > > +
> > > +	result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
> > > +
> > > +	vfree(buf);
> > > +
> > > +	return result;
> > > +}
> > > +
> > >  static int setup_dtb(struct kimage *image,
> > >  		unsigned long initrd_load_addr, unsigned long initrd_len,
> > >  		char *cmdline, unsigned long cmdline_len,
> > > @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
> > >  	int range_len;
> > >  	int ret;
> > >  
> > > +	/* check ranges against root's #address-cells and #size-cells */
> > > +	if (image->type == KEXEC_TYPE_CRASH &&
> > > +		(!cells_size_fitted(image->arch.elf_load_addr,
> > > +				image->arch.elf_headers_sz) ||
> > > +		 !cells_size_fitted(crashk_res.start,
> > > +				crashk_res.end - crashk_res.start + 1))) {
> > > +		pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
> > > +		ret = -EINVAL;
> > > +		goto out_err;
> > > +	}
> > > +
> > >  	/* duplicate dt blob */
> > >  	buf_size = fdt_totalsize(initial_boot_params);
> > >  	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> > >  
> > > +	if (image->type == KEXEC_TYPE_CRASH)
> > > +		buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
> > > +				+ fdt_prop_len("linux,usable-memory-range",
> > > +								range_len);
> > > +
> > >  	if (initrd_load_addr)
> > >  		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
> > >  				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
> > > @@ -113,6 +206,23 @@ static int setup_dtb(struct kimage *image,
> > >  	if (nodeoffset < 0)
> > >  		goto out_err;
> > >  
> > > +	if (image->type == KEXEC_TYPE_CRASH) {
> > > +		/* add linux,elfcorehdr */
> > > +		ret = fdt_setprop_range(buf, nodeoffset, "linux,elfcorehdr",
> > > +				image->arch.elf_load_addr,
> > > +				image->arch.elf_headers_sz);
> > > +		if (ret)
> > > +			goto out_err;
> > > +
> > > +		/* add linux,usable-memory-range */
> > > +		ret = fdt_setprop_range(buf, nodeoffset,
> > > +				"linux,usable-memory-range",
> > > +				crashk_res.start,
> > > +				crashk_res.end - crashk_res.start + 1);
> > > +		if (ret)
> > > +			goto out_err;
> > > +	}
> > > +
> > >  	/* add bootargs */
> > >  	if (cmdline) {
> > >  		ret = fdt_setprop(buf, nodeoffset, "bootargs",
> > 

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-05-21 10:14         ` AKASHI Takahiro
  0 siblings, 0 replies; 156+ messages in thread
From: AKASHI Takahiro @ 2018-05-21 10:14 UTC (permalink / raw)
  To: Rob Herring
  Cc: devicetree, herbert, bhe, ard.biesheuvel, catalin.marinas,
	bhsharma, will.deacon, linux-kernel, arnd, dhowells, James Morse,
	linux-arm-kernel, kexec, dyoung, davem, vgoyal

Hi Rob,

On Fri, May 18, 2018 at 10:35:52AM -0500, Rob Herring wrote:
> On Tue, May 15, 2018 at 06:12:59PM +0100, James Morse wrote:
> > Hi guys,
> > 
> > (CC: +RobH, devicetree list)
> 
> Thanks.
> 
> > On 25/04/18 07:26, AKASHI Takahiro wrote:
> > > Enabling crash dump (kdump) includes
> > > * prepare contents of ELF header of a core dump file, /proc/vmcore,
> > >   using crash_prepare_elf64_headers(), and
> > > * add two device tree properties, "linux,usable-memory-range" and
> > >   "linux,elfcorehdr", which represent repsectively a memory range
> > >   to be used by crash dump kernel and the header's location
> 
> BTW, I intend to move existing parsing these out of the arch code. 
> Please don't add more DT handling to arch/ unless it is *really* arch 
> specific. I'd assume that the next arch to add kexec support will use 
> these bindings instead of the powerpc way.

So do you expect all the fdt-related stuff in my current implementation
for arm64 to be put into libfdt, or at least drivers/of, from the beginning?

I'm not sure how arch-specific the properties here are. For instance,
it is only arm64 that uses "linux,usable-memory-range" right now but
if some other arch follows, it is no more arch-specific.
# I remember that you didn't like this property :)

> > kexec_file_load() on arm64 needs to be able to create a prop encoded array to
> > the FDT, but there doesn't appear to be a libfdt helper to do this.
> > 
> > Akashi's code below adds fdt_setprop_range() to the arch code, and duplicates
> > bits of libfdt_internal.h to do the work.
> > 
> > How should this be done? I'm assuming this is something we need a new API in
> > libfdt.h for. How do these come about, and is there an interim step we can use
> > until then?
> 
> Submit patches to upstream dtc and then we can pull it in. Ahead of that 
> you can add it to drivers/of/fdt.c (or maybe fdt_address.c because 
> that's really what this is dealing with).

OK, I'm going to try to follow your suggestion.

> libfdt has only recently gained the beginnings of address handling.
> 
> > 
> > Thanks!
> > 
> > James
> > 
> > > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > > index 37c0a9dc2e47..ec674f4d267c 100644
> > > --- a/arch/arm64/kernel/machine_kexec_file.c
> > > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > > @@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> > >  	return ret;
> > >  }
> > >  
> > > +static int __init arch_kexec_file_init(void)
> > > +{
> > > +	/* Those values are used later on loading the kernel */
> > > +	__dt_root_addr_cells = dt_root_addr_cells;
> > > +	__dt_root_size_cells = dt_root_size_cells;
> 
> I intend to make dt_root_*_cells private, so don't add another user 
> outside of drivers/of/.

Once cells_size_fitted() moves to drivers/of, there will be no users.

> > > +
> > > +	return 0;
> > > +}
> > > +late_initcall(arch_kexec_file_init);
> > > +
> > > +#define FDT_ALIGN(x, a)	(((x) + (a) - 1) & ~((a) - 1))
> > > +#define FDT_TAGALIGN(x)	(FDT_ALIGN((x), FDT_TAGSIZE))
> > > +
> > > +static int fdt_prop_len(const char *prop_name, int len)
> > > +{
> > > +	return (strlen(prop_name) + 1) +
> > > +		sizeof(struct fdt_property) +
> > > +		FDT_TAGALIGN(len);
> > > +}
> > > +
> > > +static bool cells_size_fitted(unsigned long base, unsigned long size)
> 
> I can't imagine this would happen. However, when this is moved to 
> drivers/of/ or dtc, these need to be u64 types to work on 32-bit.

OK.

> > > +	/* if *_cells >= 2, cells can hold 64-bit values anyway */
> > > +	if ((__dt_root_addr_cells == 1) && (base >= (1ULL << 32)))
> > > +		return false;
> > > +
> > > +	if ((__dt_root_size_cells == 1) && (size >= (1ULL << 32)))
> > > +		return false;
> > > +
> > > +	return true;
> > > +}
> > > +
> > > +static void fill_property(void *buf, u64 val64, int cells)
> > > +{
> > > +	u32 val32;
> 
> This should be a __be32 or fdt32 type. So should buf.

OK for val32, but buf is a local pointer address.

> > > +
> > > +	if (cells == 1) {
> > > +		val32 = cpu_to_fdt32((u32)val64);
> > > +		memcpy(buf, &val32, sizeof(val32));
> > > +	} else {
> > > +		memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
> > > +		buf += cells * sizeof(u32) - sizeof(u64);
> > > +
> > > +		val64 = cpu_to_fdt64(val64);
> > > +		memcpy(buf, &val64, sizeof(val64));
> 
> Look how of_read_number() is implemented. You should be able to do 
> something similar here looping and avoiding the if/else.

Ah, excellent!

> > > +	}
> > > +}
> > > +
> > > +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
> > > +				unsigned long addr, unsigned long size)
> 
> A very generic sounding function, but really only works on addresses in 
> children of the root node.
> 
> > > +{
> > > +	void *buf, *prop;
> > > +	size_t buf_size;
> > > +	int result;
> > > +
> > > +	buf_size = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> > > +	prop = buf = vmalloc(buf_size);
> 
> This can go on the stack instead (and would be required to to work in 
> libfdt).

Well, I can't agree with you here since we are now in effort, as far as
I correctly understand, of purging all the variable-sized arrays on a local
stack out of the kernel code.

Thank you for your review.
-Takahiro AKASHI

> > > +	if (!buf)
> > > +		return -ENOMEM;
> > > +
> > > +	fill_property(prop, addr, __dt_root_addr_cells);
> > > +	prop += __dt_root_addr_cells * sizeof(u32);
> > > +
> > > +	fill_property(prop, size, __dt_root_size_cells);
> > > +
> > > +	result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
> > > +
> > > +	vfree(buf);
> > > +
> > > +	return result;
> > > +}
> > > +
> > >  static int setup_dtb(struct kimage *image,
> > >  		unsigned long initrd_load_addr, unsigned long initrd_len,
> > >  		char *cmdline, unsigned long cmdline_len,
> > > @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
> > >  	int range_len;
> > >  	int ret;
> > >  
> > > +	/* check ranges against root's #address-cells and #size-cells */
> > > +	if (image->type == KEXEC_TYPE_CRASH &&
> > > +		(!cells_size_fitted(image->arch.elf_load_addr,
> > > +				image->arch.elf_headers_sz) ||
> > > +		 !cells_size_fitted(crashk_res.start,
> > > +				crashk_res.end - crashk_res.start + 1))) {
> > > +		pr_err("Crash memory region doesn't fit into DT's root cell sizes.\n");
> > > +		ret = -EINVAL;
> > > +		goto out_err;
> > > +	}
> > > +
> > >  	/* duplicate dt blob */
> > >  	buf_size = fdt_totalsize(initial_boot_params);
> > >  	range_len = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
> > >  
> > > +	if (image->type == KEXEC_TYPE_CRASH)
> > > +		buf_size += fdt_prop_len("linux,elfcorehdr", range_len)
> > > +				+ fdt_prop_len("linux,usable-memory-range",
> > > +								range_len);
> > > +
> > >  	if (initrd_load_addr)
> > >  		buf_size += fdt_prop_len("linux,initrd-start", sizeof(u64))
> > >  				+ fdt_prop_len("linux,initrd-end", sizeof(u64));
> > > @@ -113,6 +206,23 @@ static int setup_dtb(struct kimage *image,
> > >  	if (nodeoffset < 0)
> > >  		goto out_err;
> > >  
> > > +	if (image->type == KEXEC_TYPE_CRASH) {
> > > +		/* add linux,elfcorehdr */
> > > +		ret = fdt_setprop_range(buf, nodeoffset, "linux,elfcorehdr",
> > > +				image->arch.elf_load_addr,
> > > +				image->arch.elf_headers_sz);
> > > +		if (ret)
> > > +			goto out_err;
> > > +
> > > +		/* add linux,usable-memory-range */
> > > +		ret = fdt_setprop_range(buf, nodeoffset,
> > > +				"linux,usable-memory-range",
> > > +				crashk_res.start,
> > > +				crashk_res.end - crashk_res.start + 1);
> > > +		if (ret)
> > > +			goto out_err;
> > > +	}
> > > +
> > >  	/* add bootargs */
> > >  	if (cmdline) {
> > >  		ret = fdt_setprop(buf, nodeoffset, "bootargs",
> > 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support
  2018-05-21 10:14         ` AKASHI Takahiro
  (?)
@ 2018-05-24 14:25           ` Rob Herring
  -1 siblings, 0 replies; 156+ messages in thread
From: Rob Herring @ 2018-05-24 14:25 UTC (permalink / raw)
  To: AKASHI Takahiro, Rob Herring, James Morse, Catalin Marinas,
	Will Deacon, David Howells, Vivek Goyal, Herbert Xu,
	David Miller, dyoung, Baoquan He, Arnd Bergmann, Ard Biesheuvel,
	bhsharma, kexec,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	linux-kernel, devicetree

On Mon, May 21, 2018 at 5:14 AM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> Hi Rob,
>
> On Fri, May 18, 2018 at 10:35:52AM -0500, Rob Herring wrote:
>> On Tue, May 15, 2018 at 06:12:59PM +0100, James Morse wrote:
>> > Hi guys,
>> >
>> > (CC: +RobH, devicetree list)
>>
>> Thanks.
>>
>> > On 25/04/18 07:26, AKASHI Takahiro wrote:
>> > > Enabling crash dump (kdump) includes
>> > > * prepare contents of ELF header of a core dump file, /proc/vmcore,
>> > >   using crash_prepare_elf64_headers(), and
>> > > * add two device tree properties, "linux,usable-memory-range" and
>> > >   "linux,elfcorehdr", which represent repsectively a memory range
>> > >   to be used by crash dump kernel and the header's location
>>
>> BTW, I intend to move existing parsing these out of the arch code.
>> Please don't add more DT handling to arch/ unless it is *really* arch
>> specific. I'd assume that the next arch to add kexec support will use
>> these bindings instead of the powerpc way.
>
> So do you expect all the fdt-related stuff in my current implementation
> for arm64 to be put into libfdt, or at least drivers/of, from the beginning?

Yes.

> I'm not sure how arch-specific the properties here are. For instance,
> it is only arm64 that uses "linux,usable-memory-range" right now but
> if some other arch follows, it is no more arch-specific.
> # I remember that you didn't like this property :)

The question I guess is what will the next arch use. I don't think any
other DT based arch supports crashdump or kexec yet.

>> > > +{
>> > > + void *buf, *prop;
>> > > + size_t buf_size;
>> > > + int result;
>> > > +
>> > > + buf_size = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
>> > > + prop = buf = vmalloc(buf_size);
>>
>> This can go on the stack instead (and would be required to to work in
>> libfdt).
>
> Well, I can't agree with you here since we are now in effort, as far as
> I correctly understand, of purging all the variable-sized arrays on a local
> stack out of the kernel code.

You don't need a variable sized array. The array size just needs to
the the maximum size (16 bytes).

Rob

^ permalink raw reply	[flat|nested] 156+ messages in thread

* [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-05-24 14:25           ` Rob Herring
  0 siblings, 0 replies; 156+ messages in thread
From: Rob Herring @ 2018-05-24 14:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, May 21, 2018 at 5:14 AM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> Hi Rob,
>
> On Fri, May 18, 2018 at 10:35:52AM -0500, Rob Herring wrote:
>> On Tue, May 15, 2018 at 06:12:59PM +0100, James Morse wrote:
>> > Hi guys,
>> >
>> > (CC: +RobH, devicetree list)
>>
>> Thanks.
>>
>> > On 25/04/18 07:26, AKASHI Takahiro wrote:
>> > > Enabling crash dump (kdump) includes
>> > > * prepare contents of ELF header of a core dump file, /proc/vmcore,
>> > >   using crash_prepare_elf64_headers(), and
>> > > * add two device tree properties, "linux,usable-memory-range" and
>> > >   "linux,elfcorehdr", which represent repsectively a memory range
>> > >   to be used by crash dump kernel and the header's location
>>
>> BTW, I intend to move existing parsing these out of the arch code.
>> Please don't add more DT handling to arch/ unless it is *really* arch
>> specific. I'd assume that the next arch to add kexec support will use
>> these bindings instead of the powerpc way.
>
> So do you expect all the fdt-related stuff in my current implementation
> for arm64 to be put into libfdt, or at least drivers/of, from the beginning?

Yes.

> I'm not sure how arch-specific the properties here are. For instance,
> it is only arm64 that uses "linux,usable-memory-range" right now but
> if some other arch follows, it is no more arch-specific.
> # I remember that you didn't like this property :)

The question I guess is what will the next arch use. I don't think any
other DT based arch supports crashdump or kexec yet.

>> > > +{
>> > > + void *buf, *prop;
>> > > + size_t buf_size;
>> > > + int result;
>> > > +
>> > > + buf_size = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
>> > > + prop = buf = vmalloc(buf_size);
>>
>> This can go on the stack instead (and would be required to to work in
>> libfdt).
>
> Well, I can't agree with you here since we are now in effort, as far as
> I correctly understand, of purging all the variable-sized arrays on a local
> stack out of the kernel code.

You don't need a variable sized array. The array size just needs to
the the maximum size (16 bytes).

Rob

^ permalink raw reply	[flat|nested] 156+ messages in thread

* Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support
@ 2018-05-24 14:25           ` Rob Herring
  0 siblings, 0 replies; 156+ messages in thread
From: Rob Herring @ 2018-05-24 14:25 UTC (permalink / raw)
  To: AKASHI Takahiro, Rob Herring, James Morse, Catalin Marinas,
	Will Deacon, David Howells, Vivek Goyal, Herbert Xu,
	David Miller, dyoung, Baoquan He, Arnd Bergmann, Ard Biesheuvel,
	bhsharma, kexec,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	linux-kernel, devicetree

On Mon, May 21, 2018 at 5:14 AM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> Hi Rob,
>
> On Fri, May 18, 2018 at 10:35:52AM -0500, Rob Herring wrote:
>> On Tue, May 15, 2018 at 06:12:59PM +0100, James Morse wrote:
>> > Hi guys,
>> >
>> > (CC: +RobH, devicetree list)
>>
>> Thanks.
>>
>> > On 25/04/18 07:26, AKASHI Takahiro wrote:
>> > > Enabling crash dump (kdump) includes
>> > > * prepare contents of ELF header of a core dump file, /proc/vmcore,
>> > >   using crash_prepare_elf64_headers(), and
>> > > * add two device tree properties, "linux,usable-memory-range" and
>> > >   "linux,elfcorehdr", which represent repsectively a memory range
>> > >   to be used by crash dump kernel and the header's location
>>
>> BTW, I intend to move existing parsing these out of the arch code.
>> Please don't add more DT handling to arch/ unless it is *really* arch
>> specific. I'd assume that the next arch to add kexec support will use
>> these bindings instead of the powerpc way.
>
> So do you expect all the fdt-related stuff in my current implementation
> for arm64 to be put into libfdt, or at least drivers/of, from the beginning?

Yes.

> I'm not sure how arch-specific the properties here are. For instance,
> it is only arm64 that uses "linux,usable-memory-range" right now but
> if some other arch follows, it is no more arch-specific.
> # I remember that you didn't like this property :)

The question I guess is what will the next arch use. I don't think any
other DT based arch supports crashdump or kexec yet.

>> > > +{
>> > > + void *buf, *prop;
>> > > + size_t buf_size;
>> > > + int result;
>> > > +
>> > > + buf_size = (__dt_root_addr_cells + __dt_root_size_cells) * sizeof(u32);
>> > > + prop = buf = vmalloc(buf_size);
>>
>> This can go on the stack instead (and would be required to to work in
>> libfdt).
>
> Well, I can't agree with you here since we are now in effort, as far as
> I correctly understand, of purging all the variable-sized arrays on a local
> stack out of the kernel code.

You don't need a variable sized array. The array size just needs to
the the maximum size (16 bytes).

Rob

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 156+ messages in thread

end of thread, other threads:[~2018-05-24 14:26 UTC | newest]

Thread overview: 156+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-25  6:26 [PATCH v9 00/11] arm64: kexec: add kexec_file_load() support AKASHI Takahiro
2018-04-25  6:26 ` AKASHI Takahiro
2018-04-25  6:26 ` AKASHI Takahiro
2018-04-25  6:26 ` [PATCH v9 01/11] asm-generic: add kexec_file_load system call to unistd.h AKASHI Takahiro
2018-04-25  6:26   ` AKASHI Takahiro
2018-04-25  6:26   ` AKASHI Takahiro
2018-04-25  6:26 ` [PATCH v9 02/11] kexec_file: make kexec_image_post_load_cleanup_default() global AKASHI Takahiro
2018-04-25  6:26   ` AKASHI Takahiro
2018-04-25  6:26   ` AKASHI Takahiro
2018-04-28  9:45   ` Dave Young
2018-04-28  9:45     ` Dave Young
2018-04-28  9:45     ` Dave Young
2018-05-01 17:46   ` James Morse
2018-05-01 17:46     ` James Morse
2018-05-01 17:46     ` James Morse
2018-05-07  4:40     ` AKASHI Takahiro
2018-05-07  4:40       ` AKASHI Takahiro
2018-05-07  4:40       ` AKASHI Takahiro
2018-04-25  6:26 ` [PATCH v9 03/11] arm64: kexec_file: invoke the kernel without purgatory AKASHI Takahiro
2018-04-25  6:26   ` AKASHI Takahiro
2018-04-25  6:26   ` AKASHI Takahiro
2018-05-01 17:46   ` James Morse
2018-05-01 17:46     ` James Morse
2018-05-01 17:46     ` James Morse
2018-05-07  5:22     ` AKASHI Takahiro
2018-05-07  5:22       ` AKASHI Takahiro
2018-05-07  5:22       ` AKASHI Takahiro
2018-05-11 17:03       ` James Morse
2018-05-11 17:03         ` James Morse
2018-05-11 17:03         ` James Morse
2018-05-15  4:45         ` AKASHI Takahiro
2018-05-15  4:45           ` AKASHI Takahiro
2018-05-15  4:45           ` AKASHI Takahiro
2018-05-15 16:15           ` James Morse
2018-05-15 16:15             ` James Morse
2018-05-15 16:15             ` James Morse
2018-05-18  6:22             ` AKASHI Takahiro
2018-05-18  6:22               ` AKASHI Takahiro
2018-05-18  6:22               ` AKASHI Takahiro
2018-04-25  6:26 ` [PATCH v9 04/11] arm64: kexec_file: allocate memory walking through memblock list AKASHI Takahiro
2018-04-25  6:26   ` AKASHI Takahiro
2018-04-25  6:26   ` AKASHI Takahiro
2018-05-01 17:46   ` James Morse
2018-05-01 17:46     ` James Morse
2018-05-01 17:46     ` James Morse
2018-05-07  5:59     ` AKASHI Takahiro
2018-05-07  5:59       ` AKASHI Takahiro
2018-05-07  5:59       ` AKASHI Takahiro
2018-05-15  4:35       ` AKASHI Takahiro
2018-05-15  4:35         ` AKASHI Takahiro
2018-05-15  4:35         ` AKASHI Takahiro
2018-05-15 16:17         ` James Morse
2018-05-15 16:17           ` James Morse
2018-05-15 16:17           ` James Morse
2018-05-17  2:10       ` Baoquan He
2018-05-17  2:10         ` Baoquan He
2018-05-17  2:10         ` Baoquan He
2018-05-17  2:15         ` Baoquan He
2018-05-17  2:15           ` Baoquan He
2018-05-17  2:15           ` Baoquan He
2018-05-17 18:04           ` James Morse
2018-05-17 18:04             ` James Morse
2018-05-17 18:04             ` James Morse
2018-05-18  1:37             ` Baoquan He
2018-05-18  1:37               ` Baoquan He
2018-05-18  1:37               ` Baoquan He
2018-05-18  5:07               ` AKASHI Takahiro
2018-05-18  5:07                 ` AKASHI Takahiro
2018-05-18  5:07                 ` AKASHI Takahiro
2018-04-25  6:26 ` [PATCH v9 05/11] arm64: kexec_file: load initrd and device-tree AKASHI Takahiro
2018-04-25  6:26   ` AKASHI Takahiro
2018-04-25  6:26   ` AKASHI Takahiro
2018-05-15 16:20   ` James Morse
2018-05-15 16:20     ` James Morse
2018-05-15 16:20     ` James Morse
2018-05-18  7:11     ` AKASHI Takahiro
2018-05-18  7:11       ` AKASHI Takahiro
2018-05-18  7:11       ` AKASHI Takahiro
2018-05-18  7:42       ` AKASHI Takahiro
2018-05-18  7:42         ` AKASHI Takahiro
2018-05-18  7:42         ` AKASHI Takahiro
2018-05-18 15:59         ` James Morse
2018-05-18 15:59           ` James Morse
2018-05-18 15:59           ` James Morse
2018-04-25  6:26 ` [PATCH v9 06/11] arm64: kexec_file: allow for loading Image-format kernel AKASHI Takahiro
2018-04-25  6:26   ` AKASHI Takahiro
2018-04-25  6:26   ` AKASHI Takahiro
2018-05-01 17:46   ` James Morse
2018-05-01 17:46     ` James Morse
2018-05-01 17:46     ` James Morse
2018-05-07  7:21     ` AKASHI Takahiro
2018-05-07  7:21       ` AKASHI Takahiro
2018-05-07  7:21       ` AKASHI Takahiro
2018-05-11 17:07       ` James Morse
2018-05-11 17:07         ` James Morse
2018-05-11 17:07         ` James Morse
2018-05-15  5:13         ` AKASHI Takahiro
2018-05-15  5:13           ` AKASHI Takahiro
2018-05-15  5:13           ` AKASHI Takahiro
2018-05-15 17:14           ` James Morse
2018-05-15 17:14             ` James Morse
2018-05-15 17:14             ` James Morse
2018-05-21  9:32             ` AKASHI Takahiro
2018-05-21  9:32               ` AKASHI Takahiro
2018-05-21  9:32               ` AKASHI Takahiro
2018-04-25  6:26 ` [PATCH v9 07/11] arm64: kexec_file: add crash dump support AKASHI Takahiro
2018-04-25  6:26   ` AKASHI Takahiro
2018-04-25  6:26   ` AKASHI Takahiro
2018-05-15 17:11   ` James Morse
2018-05-15 17:11     ` James Morse
2018-05-15 17:11     ` James Morse
2018-05-16  8:34     ` James Morse
2018-05-16  8:34       ` James Morse
2018-05-16  8:34       ` James Morse
2018-05-18  9:58       ` AKASHI Takahiro
2018-05-18  9:58         ` AKASHI Takahiro
2018-05-18  9:58         ` AKASHI Takahiro
2018-05-16 10:06     ` James Morse
2018-05-16 10:06       ` James Morse
2018-05-16 10:06       ` James Morse
2018-05-18  9:50       ` AKASHI Takahiro
2018-05-18  9:50         ` AKASHI Takahiro
2018-05-18  9:50         ` AKASHI Takahiro
2018-05-18 10:39     ` AKASHI Takahiro
2018-05-18 10:39       ` AKASHI Takahiro
2018-05-18 10:39       ` AKASHI Takahiro
2018-05-18 16:00       ` James Morse
2018-05-18 16:00         ` James Morse
2018-05-18 16:00         ` James Morse
2018-05-21  9:46         ` AKASHI Takahiro
2018-05-21  9:46           ` AKASHI Takahiro
2018-05-21  9:46           ` AKASHI Takahiro
2018-05-15 17:12   ` James Morse
2018-05-15 17:12     ` James Morse
2018-05-15 17:12     ` James Morse
2018-05-18 15:35     ` Rob Herring
2018-05-18 15:35       ` Rob Herring
2018-05-18 15:35       ` Rob Herring
2018-05-21 10:14       ` AKASHI Takahiro
2018-05-21 10:14         ` AKASHI Takahiro
2018-05-21 10:14         ` AKASHI Takahiro
2018-05-24 14:25         ` Rob Herring
2018-05-24 14:25           ` Rob Herring
2018-05-24 14:25           ` Rob Herring
2018-04-25  6:26 ` [PATCH v9 08/11] arm64: enable KEXEC_FILE config AKASHI Takahiro
2018-04-25  6:26   ` AKASHI Takahiro
2018-04-25  6:26   ` AKASHI Takahiro
2018-04-25  6:26 ` [PATCH v9 09/11] include: pe.h: remove message[] from mz header definition AKASHI Takahiro
2018-04-25  6:26   ` AKASHI Takahiro
2018-04-25  6:26   ` AKASHI Takahiro
2018-04-25  6:26 ` [PATCH v9 10/11] arm64: kexec_file: add kernel signature verification support AKASHI Takahiro
2018-04-25  6:26   ` AKASHI Takahiro
2018-04-25  6:26   ` AKASHI Takahiro
2018-04-25  6:26 ` [PATCH v9 11/11] arm64: kexec_file: add kaslr support AKASHI Takahiro
2018-04-25  6:26   ` AKASHI Takahiro
2018-04-25  6:26   ` AKASHI Takahiro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.