From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2B1AC47082 for ; Sat, 5 Jun 2021 13:39:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4A9636135F for ; Sat, 5 Jun 2021 13:39:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4A9636135F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=soleen.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8218D6B0036; Sat, 5 Jun 2021 09:39:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7D1BB6B006C; Sat, 5 Jun 2021 09:39:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5FDF56B006E; Sat, 5 Jun 2021 09:39:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0222.hostedemail.com [216.40.44.222]) by kanga.kvack.org (Postfix) with ESMTP id 2A8726B0036 for ; Sat, 5 Jun 2021 09:39:09 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id B8B9A9097 for ; Sat, 5 Jun 2021 13:39:08 +0000 (UTC) X-FDA: 78219776376.01.5D9DE38 Received: from mail-qk1-f172.google.com (mail-qk1-f172.google.com [209.85.222.172]) by imf05.hostedemail.com (Postfix) with ESMTP id C0A43E000243 for ; Sat, 5 Jun 2021 13:39:07 +0000 (UTC) Received: by mail-qk1-f172.google.com with SMTP id j184so12210555qkd.6 for ; Sat, 05 Jun 2021 06:39:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=4Atvk23VAJRECo9gPmro4DFImXIG4NDqEfap1exKlKA=; b=GoIlQAOVCIkuM0rDmfLKmend+emu/3tfZr00HzMcgeR2Rzv4LvpAS92aJxNYw+M5UJ FpYZiGfjBHmpQ13B6Yx59IKS0oMeH+YImmGxhPlkOW7uMSNjNZ/Ln1IsBYEe0kVtoDlo czoe4hs6Itaasb6Xbg+2JJyvtyAnway8sPQVnNylYr4bKkjK14udQP4UMfnCaLncxdeM R3ZznE37tkAJH1hykxV6tzyY1j21glrRTKGjkSZgHAUfo1sgdvqo+mdXH50QAUCkfXRa uXo2eEZ7zHv52PUj8ibdL6WdToRHdaC+o8oIoeBVk00adqKCYBluQ8tTS0emEF0v6pmG IZWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=4Atvk23VAJRECo9gPmro4DFImXIG4NDqEfap1exKlKA=; b=gR7IW30f5h/dN75ZZTnnhcFLCpsYi//g4Sd9trZ+wVcoivQ6ixR/PZbHAsjkNq7z3B 4CJN5M8XgliJJF4YHBuNbkbflTtU/QLD3uqsHVEwVgyzq/t3+WWx0vDSMM8RVgjkgfHf 1ew2mzqyontnhoP54VXYvu/kjs/JrcnKoQeaavlA8mMheD6/WvA/GkXA9aJh8HZUs5Ak ILhQgT1PWIir22kak/sbtaV7MJFG3s7oZ2Ac6M5zC2/qI70FRE2FMUJ5aN72KMflOg9P Meiq5t/zdfPCWNjhTHJOKXWmHSFpiOHBjbCaVOVB9XpW9aOtneOeZ6Go0AMCzZgMEl5v tC7w== X-Gm-Message-State: AOAM530MhIph15N81L3FhRPSsZPNGelJwFngqgNSzhSHyVE34JfNzdRX 44i1RwLxaglJh/rl2GFihfTHEg== X-Google-Smtp-Source: ABdhPJxxQmtZ5zcZUhMN8lSaketJPR9mnwCxRc9RBRcUQivKmWRw06oGdARzPmIk964gpjL9qs0S5g== X-Received: by 2002:a05:620a:29c9:: with SMTP id s9mr8738237qkp.171.1622900347586; Sat, 05 Jun 2021 06:39:07 -0700 (PDT) Received: from [192.168.1.10] (c-73-69-118-222.hsd1.nh.comcast.net. [73.69.118.222]) by smtp.gmail.com with ESMTPSA id a14sm5355058qtj.57.2021.06.05.06.39.05 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 05 Jun 2021 06:39:06 -0700 (PDT) Subject: Re: [RFC v2 00/43] PKRAM: Preserved-over-Kexec RAM To: Anthony Yznaga , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: willy@infradead.org, corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@kernel.org, akpm@linux-foundation.org, hughd@google.com, ebiederm@xmission.com, keescook@chromium.org, ardb@kernel.org, nivedita@alum.mit.edu, jroedel@suse.de, masahiroy@kernel.org, nathan@kernel.org, terrelln@fb.com, vincenzo.frascino@arm.com, martin.b.radev@gmail.com, andreyknvl@google.com, daniel.kiper@oracle.com, rafael.j.wysocki@intel.com, dan.j.williams@intel.com, Jonathan.Cameron@huawei.com, bhe@redhat.com, rminnich@gmail.com, ashish.kalra@amd.com, guro@fb.com, hannes@cmpxchg.org, mhocko@kernel.org, iamjoonsoo.kim@lge.com, vbabka@suse.cz, alex.shi@linux.alibaba.com, david@redhat.com, richard.weiyang@gmail.com, vdavydov.dev@gmail.com, graf@amazon.com, jason.zeng@intel.com, lei.l.li@intel.com, daniel.m.jordan@oracle.com, steven.sistare@oracle.com, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, kexec@lists.infradead.org References: <1617140178-8773-1-git-send-email-anthony.yznaga@oracle.com> From: Pavel Tatashin Message-ID: <6e74451b-6a29-d0fc-cf26-b3700a099a09@soleen.com> Date: Sat, 5 Jun 2021 09:39:04 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: <1617140178-8773-1-git-send-email-anthony.yznaga@oracle.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=GoIlQAOV; spf=pass (imf05.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.222.172 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=none X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: C0A43E000243 X-Stat-Signature: yawfxhet1g6914x3kqpn6ckx7fhogw3y X-HE-Tag: 1622900347-12258 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 3/30/21 5:35 PM, Anthony Yznaga wrote: > This patchset implements preserved-over-kexec memory storage or PKRAM as a > method for saving memory pages of the currently executing kernel so that > they may be restored after kexec into a new kernel. The patches are adapted > from an RFC patchset sent out in 2013 by Vladimir Davydov [1]. They > introduce the PKRAM kernel API and implement its use within tmpfs, allowing > tmpfs files to be preserved across kexec. > > One use case for PKRAM is preserving guest memory and/or auxillary supporting > data (e.g. iommu data) across kexec in support of VMM Fast Restart[2]. > VMM Fast Restart is currently using PKRAM to support preserving "Keep Alive > State" across reboot[3]. PKRAM provides a flexible way for doing this > without requiring that the amount of memory used by a fixed size created > a priori. Another use case is for databases to preserve their block caches > in shared memory across reboot. Hi Anthony, I have several concerns about preserving arbitrary not prereserved segments across reboot. 1. PKRAM does not work across firmware reboots With emulated persistent memory it is possible to do reboot through firmware and not loose the preserved-memory. The firmware can be modified to mark the required ranges pages as PRAM, and Linux will treat them as such. The benefit of this is that it works for both cases kexec and reboot through firmware. The disadvantage is that you have to know in advance how much memory needs to be preserved. However, with the ability to hot-plug/hot-remove the PMEM, the second point becomes moot as it is possible to mark a large chunk of memory as PMEM if needed. I have designed something like this for one of our projects, and it is already been used in the fleet. Reboot through firmware, allows us to service firmware in addition to kernel. 2. Boot failures due to memory fragmentation We also considered using PRAM instead of PMEM. PRAM was one of the previous attempts to do the persistent memory thing via tmpfs flag: mount -t tmpfs -o pram=mytmpfs none /mnt/crdump"; that project was never upstreamed. However, we gave up with that idea because in addition to loosing possibility to reboot through the firmware, it also adds memory fragmentation. For example, if the new kernel require larger contiguous memory chunks to be allocated during boot than the previous kernel (i.e. the next kernel has new drivers, or some debug feature enabled), the boot might simply fail because of the extra memory ranges being reserved. 3. New intra-kernel dependencies Kexec reboot is when one Linux kernel works as a bootloader for the next one. Currently, there is very little information that is passed from the old kernel to the next kernel. Adding more information that two independent kernels must know about each other is not a very good thing from architectural point of view. It limits the flexibility of kexec. However, we do need PKRAM and ability to preserve kernel memory across reboot for fast hypervisor updates or such. User pages can already be preserved across reboot on emulated or real persistent memory. The easiest way is via DAXFS placed on that memory. Kernel cannot preserve its memory on PMEM across the reboot. However, functionality can be extended so kernel memory can be preserved on both emulated persistent memory or on real persistent memory. PKRAM could provide an interface to save kernel data to a file, and that file could be placed on any filesystem including DAXFS. When placed on DAXFS, that file can be used as iommu data, as it is actually located in physical memory and not moving anywhere. It is preserved across firmware/kexec reboot with having the devices survive the reboot state intact. During boot, have the device drivers that use PKRAM preserve functionality map saved files from DAXFS in order to have IOMMU functionality working again. Thank you, Pasha