linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Pavel Tatashin <pasha.tatashin@soleen.com>
To: Anthony Yznaga <anthony.yznaga@oracle.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com,
	dave.hansen@linux.intel.com, luto@kernel.org,
	peterz@infradead.org, rppt@kernel.org, akpm@linux-foundation.org,
	hughd@google.com, ebiederm@xmission.com, keescook@chromium.org,
	ardb@kernel.org, nivedita@alum.mit.edu, jroedel@suse.de,
	masahiroy@kernel.org, nathan@kernel.org, terrelln@fb.com,
	vincenzo.frascino@arm.com, martin.b.radev@gmail.com,
	andreyknvl@google.com, daniel.kiper@oracle.com,
	rafael.j.wysocki@intel.com, dan.j.williams@intel.com,
	Jonathan.Cameron@huawei.com, bhe@redhat.com, rminnich@gmail.com,
	ashish.kalra@amd.com, guro@fb.com, hannes@cmpxchg.org,
	mhocko@kernel.org, iamjoonsoo.kim@lge.com, vbabka@suse.cz,
	alex.shi@linux.alibaba.com, david@redhat.com,
	richard.weiyang@gmail.com, vdavydov.dev@gmail.com,
	graf@amazon.com, jason.zeng@intel.com, lei.l.li@intel.com,
	daniel.m.jordan@oracle.com, steven.sistare@oracle.com,
	linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org,
	kexec@lists.infradead.org
Subject: Re: [RFC v2 00/43] PKRAM: Preserved-over-Kexec RAM
Date: Sat, 5 Jun 2021 09:39:04 -0400	[thread overview]
Message-ID: <6e74451b-6a29-d0fc-cf26-b3700a099a09@soleen.com> (raw)
In-Reply-To: <1617140178-8773-1-git-send-email-anthony.yznaga@oracle.com>



On 3/30/21 5:35 PM, Anthony Yznaga wrote:
> This patchset implements preserved-over-kexec memory storage or PKRAM as a
> method for saving memory pages of the currently executing kernel so that
> they may be restored after kexec into a new kernel. The patches are adapted
> from an RFC patchset sent out in 2013 by Vladimir Davydov [1]. They
> introduce the PKRAM kernel API and implement its use within tmpfs, allowing
> tmpfs files to be preserved across kexec.
> 
> One use case for PKRAM is preserving guest memory and/or auxillary supporting
> data (e.g. iommu data) across kexec in support of VMM Fast Restart[2].
> VMM Fast Restart is currently using PKRAM to support preserving "Keep Alive
> State" across reboot[3].  PKRAM provides a flexible way for doing this
> without requiring that the amount of memory used by a fixed size created
> a priori.  Another use case is for databases to preserve their block caches
> in shared memory across reboot.

Hi Anthony,

I have several concerns about preserving arbitrary not prereserved segments across reboot.

1. PKRAM does not work across firmware reboots
With emulated persistent memory it is possible to do reboot through firmware and not loose the preserved-memory. The firmware can be modified to mark the required ranges pages as PRAM, and Linux will treat them as such. The benefit of this is that it works for both cases kexec and reboot through firmware. The disadvantage is that you have to know in advance how much memory needs to be preserved. However, with the ability to hot-plug/hot-remove the PMEM, the second point becomes moot as it is possible to mark a large chunk of memory as PMEM if needed. I have designed something like this for one of our projects, and it is already been used in the fleet. Reboot through firmware, allows us to service firmware in addition to kernel.

2. Boot failures due to memory fragmentation
We also considered using PRAM instead of PMEM. PRAM was one of the previous attempts to do the persistent memory thing via tmpfs flag: mount -t tmpfs -o pram=mytmpfs none /mnt/crdump"; that project was never upstreamed. However, we gave up with that idea because in addition to loosing possibility to reboot through the firmware, it also adds memory fragmentation. For example, if the new kernel require larger contiguous memory chunks to be allocated during boot than the previous kernel (i.e. the next kernel has new drivers, or some debug feature enabled), the boot might simply fail because of the extra memory ranges being reserved.

3. New intra-kernel dependencies
Kexec reboot is when one Linux kernel works as a bootloader for the next one. Currently, there is very little information that is passed from the old kernel to the next kernel. Adding more information that two independent kernels must know about each other is not a very good thing from architectural point of view. It limits the flexibility of kexec.

However, we do need PKRAM and ability to preserve kernel memory across reboot for fast hypervisor updates or such. User pages can already be preserved across reboot on emulated or real persistent memory. The easiest way is via DAXFS placed on that memory.
Kernel cannot preserve its memory on  PMEM across the reboot. However, functionality can be extended so kernel memory can be preserved on both emulated persistent memory or on real persistent memory. PKRAM could provide an interface to save kernel data to a file, and that file could be placed on any filesystem including DAXFS. When placed on DAXFS, that file can be used as iommu data, as it is actually located in physical memory and not moving anywhere. It is preserved across firmware/kexec reboot with having the devices survive the reboot state intact. During boot, have the device drivers that use PKRAM preserve functionality map saved files from DAXFS in order to have IOMMU functionality working again.

Thank you,
Pasha


      parent reply	other threads:[~2021-06-05 13:39 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-30 21:35 [RFC v2 00/43] PKRAM: Preserved-over-Kexec RAM Anthony Yznaga
2021-03-30 21:35 ` [RFC v2 01/43] mm: add PKRAM API stubs and Kconfig Anthony Yznaga
2021-03-31 18:43   ` Randy Dunlap
2021-03-31 20:28     ` Anthony Yznaga
2021-03-30 21:35 ` [RFC v2 02/43] mm: PKRAM: implement node load and save functions Anthony Yznaga
2021-03-30 21:35 ` [RFC v2 03/43] mm: PKRAM: implement object " Anthony Yznaga
2021-03-30 21:35 ` [RFC v2 04/43] mm: PKRAM: implement page stream operations Anthony Yznaga
2021-03-30 21:35 ` [RFC v2 05/43] mm: PKRAM: support preserving transparent hugepages Anthony Yznaga
2021-03-30 21:35 ` [RFC v2 06/43] mm: PKRAM: implement byte stream operations Anthony Yznaga
2021-03-30 21:35 ` [RFC v2 07/43] mm: PKRAM: link nodes by pfn before reboot Anthony Yznaga
2021-03-30 21:35 ` [RFC v2 08/43] mm: PKRAM: introduce super block Anthony Yznaga
2021-03-30 21:35 ` [RFC v2 09/43] PKRAM: track preserved pages in a physical mapping pagetable Anthony Yznaga
2021-03-30 21:35 ` [RFC v2 10/43] PKRAM: pass a list of preserved ranges to the next kernel Anthony Yznaga
2021-03-30 21:35 ` [RFC v2 11/43] PKRAM: prepare for adding preserved ranges to memblock reserved Anthony Yznaga
2021-03-30 21:35 ` [RFC v2 12/43] mm: PKRAM: reserve preserved memory at boot Anthony Yznaga
2021-03-30 21:35 ` [RFC v2 13/43] PKRAM: free the preserved ranges list Anthony Yznaga
2021-03-30 21:35 ` [RFC v2 14/43] PKRAM: prevent inadvertent use of a stale superblock Anthony Yznaga
2021-03-30 21:35 ` [RFC v2 15/43] PKRAM: provide a way to ban pages from use by PKRAM Anthony Yznaga
2021-03-30 21:35 ` [RFC v2 16/43] kexec: PKRAM: prevent kexec clobbering preserved pages in some cases Anthony Yznaga
2021-03-30 21:35 ` [RFC v2 17/43] PKRAM: provide a way to check if a memory range has preserved pages Anthony Yznaga
2021-03-30 21:35 ` [RFC v2 18/43] kexec: PKRAM: avoid clobbering already " Anthony Yznaga
2021-03-30 21:35 ` [RFC v2 19/43] mm: PKRAM: allow preserved memory to be freed from userspace Anthony Yznaga
2021-03-30 21:35 ` [RFC v2 20/43] PKRAM: disable feature when running the kdump kernel Anthony Yznaga
2021-03-30 21:35 ` [RFC v2 21/43] x86/KASLR: PKRAM: support physical kaslr Anthony Yznaga
2021-03-30 21:35 ` [RFC v2 22/43] x86/boot/compressed/64: use 1GB pages for mappings Anthony Yznaga
2021-03-30 21:35 ` [RFC v2 23/43] mm: shmem: introduce shmem_insert_page Anthony Yznaga
2021-03-30 21:35 ` [RFC v2 24/43] mm: shmem: enable saving to PKRAM Anthony Yznaga
2021-03-30 21:36 ` [RFC v2 25/43] mm: shmem: prevent swapping of PKRAM-enabled tmpfs pages Anthony Yznaga
2021-03-30 21:36 ` [RFC v2 26/43] mm: shmem: specify the mm to use when inserting pages Anthony Yznaga
2021-03-30 21:36 ` [RFC v2 27/43] mm: shmem: when inserting, handle pages already charged to a memcg Anthony Yznaga
2021-03-30 21:36 ` [RFC v2 28/43] x86/mm/numa: add numa_isolate_memblocks() Anthony Yznaga
2021-03-30 21:36 ` [RFC v2 29/43] PKRAM: ensure memblocks with preserved pages init'd for numa Anthony Yznaga
2021-03-30 21:36 ` [RFC v2 30/43] memblock: PKRAM: mark memblocks that contain preserved pages Anthony Yznaga
2021-03-30 21:36 ` [RFC v2 31/43] memblock, mm: defer initialization of " Anthony Yznaga
2021-03-30 21:36 ` [RFC v2 32/43] shmem: preserve shmem files a chunk at a time Anthony Yznaga
2021-03-30 21:36 ` [RFC v2 33/43] PKRAM: atomically add and remove link pages Anthony Yznaga
2021-03-30 21:36 ` [RFC v2 34/43] shmem: PKRAM: multithread preserving and restoring shmem pages Anthony Yznaga
2021-03-30 21:36 ` [RFC v2 35/43] shmem: introduce shmem_insert_pages() Anthony Yznaga
2021-03-30 21:36 ` [RFC v2 36/43] PKRAM: add support for loading pages in bulk Anthony Yznaga
2021-03-30 21:36 ` [RFC v2 37/43] shmem: PKRAM: enable bulk loading of preserved pages into shmem Anthony Yznaga
2021-03-30 21:36 ` [RFC v2 38/43] mm: implement splicing a list of pages to the LRU Anthony Yznaga
2021-03-30 21:36 ` [RFC v2 39/43] shmem: optimize adding pages to the LRU in shmem_insert_pages() Anthony Yznaga
2021-03-30 21:36 ` [RFC v2 40/43] shmem: initial support for adding multiple pages to pagecache Anthony Yznaga
2021-03-30 21:36 ` [RFC v2 41/43] XArray: add xas_export_node() and xas_import_node() Anthony Yznaga
2021-03-30 21:36 ` [RFC v2 42/43] shmem: reduce time holding xa_lock when inserting pages Anthony Yznaga
2021-03-30 21:36 ` [RFC v2 43/43] PKRAM: improve index alignment of pkram_link entries Anthony Yznaga
2021-06-05 13:39 ` Pavel Tatashin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6e74451b-6a29-d0fc-cf26-b3700a099a09@soleen.com \
    --to=pasha.tatashin@soleen.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linux.alibaba.com \
    --cc=andreyknvl@google.com \
    --cc=anthony.yznaga@oracle.com \
    --cc=ardb@kernel.org \
    --cc=ashish.kalra@amd.com \
    --cc=bhe@redhat.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.kiper@oracle.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=graf@amazon.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=jason.zeng@intel.com \
    --cc=jroedel@suse.de \
    --cc=keescook@chromium.org \
    --cc=kexec@lists.infradead.org \
    --cc=lei.l.li@intel.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=martin.b.radev@gmail.com \
    --cc=masahiroy@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@redhat.com \
    --cc=nathan@kernel.org \
    --cc=nivedita@alum.mit.edu \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=richard.weiyang@gmail.com \
    --cc=rminnich@gmail.com \
    --cc=rppt@kernel.org \
    --cc=steven.sistare@oracle.com \
    --cc=terrelln@fb.com \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    --cc=vdavydov.dev@gmail.com \
    --cc=vincenzo.frascino@arm.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).