kexec.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Anthony Yznaga <anthony.yznaga@oracle.com>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com,
	luto@kernel.org, peterz@infradead.org, rppt@kernel.org,
	akpm@linux-foundation.org, ebiederm@xmission.com,
	keescook@chromium.org, graf@amazon.com, jason.zeng@intel.com,
	lei.l.li@intel.com, steven.sistare@oracle.com,
	fam.zheng@bytedance.com, mgalaxy@akamai.com,
	kexec@lists.infradead.org
Subject: [RFC v3 00/21] Preserved-over-Kexec RAM
Date: Wed, 26 Apr 2023 17:08:36 -0700	[thread overview]
Message-ID: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> (raw)

Sending out this RFC in part to guage community interest.
This patchset implements preserved-over-kexec memory storage or PKRAM as a
method for saving memory pages of the currently executing kernel so that
they may be restored after kexec into a new kernel. The patches are adapted
from an RFC patchset sent out in 2013 by Vladimir Davydov [1]. They
introduce the PKRAM kernel API.

One use case for PKRAM is preserving guest memory and/or auxillary
supporting data (e.g. iommu data) across kexec to support reboot of the
host with minimal disruption to the guest. PKRAM provides a flexible way
for doing this without requiring that the amount of memory used by a fixed
size created a priori.  Another use case is for databases to preserve their
block caches in shared memory across reboot.

Changes since RFC v2
  - Rebased onto 6.3
  - Updated API to save/load folios rather than file pages
  - Omitted previous patches for implementing and optimizing preservation
    and restoration of shmem files to reduce the number of patches and
    focus on core functionality.

Changes since RFC v1
  - Rebased onto 5.12-rc4
  - Refined the API to reduce the number of calls
    and better support multithreading.
  - Allow preserving byte data of arbitrary length
    (was previously limited to one page).
  - Build a new memblock reserved list with the
    preserved ranges and then substitute it for
    the existing one. (Mike Rapoport)
  - Use mem_avoid_overlap() to avoid kaslr stepping
    on preserved ranges. (Kees Cook)

-- Implementation details --

 * To aid in quickly finding contiguous ranges of memory containing
   preserved pages a pseudo physical mapping pagetable is populated
   with pages as they are preserved.

 * If a page to be preserved is found to be in range of memory that was
   previously reserved during early boot or in range of memory where the
   kernel will be loaded to on kexec, the page will be copied to a page
   outside of those ranges and the new page will be preserved. A compound
   page will be copied to and preserved as individual base pages.
   Note that this means that a page that cannot be moved (e.g. pinned for
   DMA) currently cannot safely be preserved. This could be addressed by
   adding functionality to kexec to reconfigure the destination addreses
   for the sections of an already-loaded kexec kernel.

 * A single page is allocated for the PKRAM super block. For the next kernel
   kexec boot to find preserved memory metadata, the pfn of the PKRAM super
   block, which is exported via /sys/kernel/pkram, is passed in the 'pkram'
   boot option.

 * In the newly booted kernel, PKRAM adds all preserved pages to the memblock
   reserve list during early boot so that they will not be recycled.

 * Since kexec may load the new kernel code to any memory region, it could
   destroy preserved memory. When the kernel selects the memory region
   (kexec_file_load syscall), kexec will avoid preserved pages.  When the
   user selects the kexec memory region to use (kexec_load syscall) , kexec
   load will fail if there is conflict with preserved pages. Pages preserved
   after a kexec kernel is loaded will be relocated if they conflict with
   the selected memory region.

[1] https://lkml.org/lkml/2013/7/1/211

Anthony Yznaga (21):
  mm: add PKRAM API stubs and Kconfig
  mm: PKRAM: implement node load and save functions
  mm: PKRAM: implement object load and save functions
  mm: PKRAM: implement folio stream operations
  mm: PKRAM: implement byte stream operations
  mm: PKRAM: link nodes by pfn before reboot
  mm: PKRAM: introduce super block
  PKRAM: track preserved pages in a physical mapping pagetable
  PKRAM: pass a list of preserved ranges to the next kernel
  PKRAM: prepare for adding preserved ranges to memblock reserved
  mm: PKRAM: reserve preserved memory at boot
  PKRAM: free the preserved ranges list
  PKRAM: prevent inadvertent use of a stale superblock
  PKRAM: provide a way to ban pages from use by PKRAM
  kexec: PKRAM: prevent kexec clobbering preserved pages in some cases
  PKRAM: provide a way to check if a memory range has preserved pages
  kexec: PKRAM: avoid clobbering already preserved pages
  mm: PKRAM: allow preserved memory to be freed from userspace
  PKRAM: disable feature when running the kdump kernel
  x86/KASLR: PKRAM: support physical kaslr
  x86/boot/compressed/64: use 1GB pages for mappings

 arch/x86/boot/compressed/Makefile       |    3 +
 arch/x86/boot/compressed/ident_map_64.c |    9 +-
 arch/x86/boot/compressed/kaslr.c        |   10 +-
 arch/x86/boot/compressed/misc.h         |   10 +
 arch/x86/boot/compressed/pkram.c        |  110 ++
 arch/x86/kernel/setup.c                 |    3 +
 arch/x86/mm/init_64.c                   |    3 +
 include/linux/pkram.h                   |  116 ++
 kernel/kexec.c                          |    9 +
 kernel/kexec_core.c                     |    3 +
 kernel/kexec_file.c                     |   15 +
 mm/Kconfig                              |    9 +
 mm/Makefile                             |    2 +
 mm/pkram.c                              | 1753 +++++++++++++++++++++++++++++++
 mm/pkram_pagetable.c                    |  375 +++++++
 15 files changed, 2424 insertions(+), 6 deletions(-)
 create mode 100644 arch/x86/boot/compressed/pkram.c
 create mode 100644 include/linux/pkram.h
 create mode 100644 mm/pkram.c
 create mode 100644 mm/pkram_pagetable.c

-- 
1.9.4


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

             reply	other threads:[~2023-04-27  0:10 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-27  0:08 Anthony Yznaga [this message]
2023-04-27  0:08 ` [RFC v3 01/21] mm: add PKRAM API stubs and Kconfig Anthony Yznaga
2023-04-27  0:08 ` [RFC v3 02/21] mm: PKRAM: implement node load and save functions Anthony Yznaga
2023-04-27  0:08 ` [RFC v3 03/21] mm: PKRAM: implement object " Anthony Yznaga
2023-04-27  0:08 ` [RFC v3 04/21] mm: PKRAM: implement folio stream operations Anthony Yznaga
2023-04-27  0:08 ` [RFC v3 05/21] mm: PKRAM: implement byte " Anthony Yznaga
2023-04-27  0:08 ` [RFC v3 06/21] mm: PKRAM: link nodes by pfn before reboot Anthony Yznaga
2023-04-27  0:08 ` [RFC v3 07/21] mm: PKRAM: introduce super block Anthony Yznaga
2023-06-05  2:40   ` Coiby Xu
2023-06-06  2:01     ` Anthony Yznaga
2023-06-06  2:55       ` Coiby Xu
2023-06-06  3:12         ` Anthony Yznaga
2023-04-27  0:08 ` [RFC v3 08/21] PKRAM: track preserved pages in a physical mapping pagetable Anthony Yznaga
2023-04-27  0:08 ` [RFC v3 09/21] PKRAM: pass a list of preserved ranges to the next kernel Anthony Yznaga
2023-04-27  0:08 ` [RFC v3 10/21] PKRAM: prepare for adding preserved ranges to memblock reserved Anthony Yznaga
2023-04-27  0:08 ` [RFC v3 11/21] mm: PKRAM: reserve preserved memory at boot Anthony Yznaga
2023-04-27  0:08 ` [RFC v3 12/21] PKRAM: free the preserved ranges list Anthony Yznaga
2023-04-27  0:08 ` [RFC v3 13/21] PKRAM: prevent inadvertent use of a stale superblock Anthony Yznaga
2023-04-27  0:08 ` [RFC v3 14/21] PKRAM: provide a way to ban pages from use by PKRAM Anthony Yznaga
2023-04-27  0:08 ` [RFC v3 15/21] kexec: PKRAM: prevent kexec clobbering preserved pages in some cases Anthony Yznaga
2023-04-27  0:08 ` [RFC v3 16/21] PKRAM: provide a way to check if a memory range has preserved pages Anthony Yznaga
2023-04-27  0:08 ` [RFC v3 17/21] kexec: PKRAM: avoid clobbering already " Anthony Yznaga
2023-04-27  0:08 ` [RFC v3 18/21] mm: PKRAM: allow preserved memory to be freed from userspace Anthony Yznaga
2023-04-27  0:08 ` [RFC v3 19/21] PKRAM: disable feature when running the kdump kernel Anthony Yznaga
2023-04-27  0:08 ` [RFC v3 20/21] x86/KASLR: PKRAM: support physical kaslr Anthony Yznaga
2023-04-27  0:08 ` [RFC v3 21/21] x86/boot/compressed/64: use 1GB pages for mappings Anthony Yznaga
2023-04-27 18:40   ` H. Peter Anvin
2023-04-27 22:38     ` Anthony Yznaga
2023-05-26 13:57 ` [RFC v3 00/21] Preserved-over-Kexec RAM Gowans, James
2023-05-31 23:14   ` Anthony Yznaga
2023-06-01  2:15 ` Baoquan He
2023-06-01 23:58   ` Anthony Yznaga

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com \
    --to=anthony.yznaga@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=ebiederm@xmission.com \
    --cc=fam.zheng@bytedance.com \
    --cc=graf@amazon.com \
    --cc=hpa@zytor.com \
    --cc=jason.zeng@intel.com \
    --cc=keescook@chromium.org \
    --cc=kexec@lists.infradead.org \
    --cc=lei.l.li@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mgalaxy@akamai.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rppt@kernel.org \
    --cc=steven.sistare@oracle.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).