* [RFC PATCH v1 01/10] mm/prmem: Allocate memory during boot for storing persistent data
2023-10-16 23:32 ` [RFC PATCH v1 00/10] mm/prmem: Implement the Persistent-Across-Kexec memory feature (prmem) madvenka
@ 2023-10-16 23:32 ` madvenka
2023-10-17 18:36 ` kernel test robot
2023-10-16 23:32 ` [RFC PATCH v1 02/10] mm/prmem: Reserve metadata and persistent regions in early boot after kexec madvenka
` (9 subsequent siblings)
10 siblings, 1 reply; 16+ messages in thread
From: madvenka @ 2023-10-16 23:32 UTC (permalink / raw)
To: gregkh, pbonzini, rppt, jgowans, graf, arnd, keescook,
stanislav.kinsburskii, anthony.yznaga, linux-mm, linux-kernel,
madvenka, jamorris
From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
Introduce the "Persistent-Across-Kexec memory (prmem)" feature that allows
user and kernel data to be persisted across kexecs.
The first step is to set aside some memory for storing persistent data.
Introduce a new kernel command line parameter for this:
prmem=size[KMG]
Allocate this memory from memblocks during boot. Make sure that the
allocation is done late enough so it does not interfere with any fixed
range allocations.
Define a "prmem_region" structure to store the range that is allocated. The
region structure will be used to manage the memory.
Define a "prmem" structure for storing persistence metadata.
Allocate a metadata page to contain the metadata structure. Initialize the
metadata. Add the initial region to a region list in the metadata.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
arch/x86/kernel/setup.c | 2 +
include/linux/prmem.h | 76 ++++++++++++++++++++++++++++++++++++
kernel/Makefile | 1 +
kernel/prmem/Makefile | 3 ++
kernel/prmem/prmem_init.c | 27 +++++++++++++
kernel/prmem/prmem_parse.c | 33 ++++++++++++++++
kernel/prmem/prmem_region.c | 21 ++++++++++
kernel/prmem/prmem_reserve.c | 56 ++++++++++++++++++++++++++
mm/mm_init.c | 2 +
9 files changed, 221 insertions(+)
create mode 100644 include/linux/prmem.h
create mode 100644 kernel/prmem/Makefile
create mode 100644 kernel/prmem/prmem_init.c
create mode 100644 kernel/prmem/prmem_parse.c
create mode 100644 kernel/prmem/prmem_region.c
create mode 100644 kernel/prmem/prmem_reserve.c
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index fd975a4a5200..f2b13b3d3ead 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -25,6 +25,7 @@
#include <linux/static_call.h>
#include <linux/swiotlb.h>
#include <linux/random.h>
+#include <linux/prmem.h>
#include <uapi/linux/mount.h>
@@ -1231,6 +1232,7 @@ void __init setup_arch(char **cmdline_p)
* won't consume hotpluggable memory.
*/
reserve_crashkernel();
+ prmem_reserve();
memblock_find_dma_reserve();
diff --git a/include/linux/prmem.h b/include/linux/prmem.h
new file mode 100644
index 000000000000..7f22016c4ad2
--- /dev/null
+++ b/include/linux/prmem.h
@@ -0,0 +1,76 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Persistent-Across-Kexec memory (prmem) - Definitions.
+ *
+ * Copyright (C) 2023 Microsoft Corporation
+ * Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
+ */
+#ifndef _LINUX_PRMEM_H
+#define _LINUX_PRMEM_H
+/*
+ * The prmem feature can be used to persist kernel and user data across kexec
+ * reboots in memory for various uses. E.g.,
+ *
+ * - Saving cached data. E.g., database caches.
+ * - Saving state. E.g., KVM guest states.
+ * - Saving historical information since the last cold boot such as
+ * events, logs and journals.
+ * - Saving measurements for integrity checks on the next boot.
+ * - Saving driver data.
+ * - Saving IOMMU mappings.
+ * - Saving MMIO config information.
+ *
+ * This is useful on systems where there is no non-volatile storage or
+ * non-volatile storage is too slow.
+ */
+#include <linux/types.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/memblock.h>
+#include <linux/printk.h>
+
+#include <asm-generic/errno.h>
+#include <asm/page.h>
+#include <asm/setup.h>
+/*
+ * A prmem region supplies the memory for storing persistent data.
+ *
+ * node List node.
+ * pa Physical address of the region.
+ * size Size of the region in bytes.
+ */
+struct prmem_region {
+ struct list_head node;
+ unsigned long pa;
+ size_t size;
+};
+
+/*
+ * PRMEM metadata.
+ *
+ * metadata Physical address of the metadata page.
+ * size Size of initial memory allocated to prmem.
+ *
+ * regions List of memory regions.
+ */
+struct prmem {
+ unsigned long metadata;
+ size_t size;
+
+ /* Persistent Regions. */
+ struct list_head regions;
+};
+
+extern struct prmem *prmem;
+extern unsigned long prmem_metadata;
+extern unsigned long prmem_pa;
+extern size_t prmem_size;
+
+/* Kernel API. */
+void prmem_reserve(void);
+void prmem_init(void);
+
+/* Internal functions. */
+struct prmem_region *prmem_add_region(unsigned long pa, size_t size);
+
+#endif /* _LINUX_PRMEM_H */
diff --git a/kernel/Makefile b/kernel/Makefile
index 3947122d618b..43b485b0467a 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -50,6 +50,7 @@ obj-y += rcu/
obj-y += livepatch/
obj-y += dma/
obj-y += entry/
+obj-y += prmem/
obj-$(CONFIG_MODULES) += module/
obj-$(CONFIG_KCMP) += kcmp.o
diff --git a/kernel/prmem/Makefile b/kernel/prmem/Makefile
new file mode 100644
index 000000000000..11a53d49312a
--- /dev/null
+++ b/kernel/prmem/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-y += prmem_parse.o prmem_reserve.o prmem_init.o prmem_region.o
diff --git a/kernel/prmem/prmem_init.c b/kernel/prmem/prmem_init.c
new file mode 100644
index 000000000000..97b550252028
--- /dev/null
+++ b/kernel/prmem/prmem_init.c
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Persistent-Across-Kexec memory (prmem) - Initialization.
+ *
+ * Copyright (C) 2023 Microsoft Corporation
+ * Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
+ */
+#include <linux/prmem.h>
+
+bool prmem_inited;
+
+void __init prmem_init(void)
+{
+ if (!prmem)
+ return;
+
+ if (!prmem->metadata) {
+ /* Cold boot. */
+ prmem->metadata = prmem_metadata;
+ prmem->size = prmem_size;
+ INIT_LIST_HEAD(&prmem->regions);
+
+ if (!prmem_add_region(prmem_pa, prmem_size))
+ return;
+ }
+ prmem_inited = true;
+}
diff --git a/kernel/prmem/prmem_parse.c b/kernel/prmem/prmem_parse.c
new file mode 100644
index 000000000000..191655b53545
--- /dev/null
+++ b/kernel/prmem/prmem_parse.c
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Persistent-Across-Kexec memory (prmem) - Process prmem cmdline parameter.
+ *
+ * Copyright (C) 2023 Microsoft Corporation
+ * Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
+ */
+#include <linux/prmem.h>
+
+/*
+ * Syntax: prmem=size[KMG]
+ *
+ * Specifies the size of the initial memory to be allocated to prmem.
+ */
+static int __init prmem_size_parse(char *cmdline)
+{
+ char *tmp, *cur = cmdline;
+ unsigned long size;
+
+ if (!cur)
+ return -EINVAL;
+
+ /* Get initial size. */
+ size = memparse(cur, &tmp);
+ if (cur == tmp || !size || size & (PAGE_SIZE - 1)) {
+ pr_warn("%s: Incorrect size %lx\n", __func__, size);
+ return -EINVAL;
+ }
+
+ prmem_size = size;
+ return 0;
+}
+early_param("prmem", prmem_size_parse);
diff --git a/kernel/prmem/prmem_region.c b/kernel/prmem/prmem_region.c
new file mode 100644
index 000000000000..8254dafcee13
--- /dev/null
+++ b/kernel/prmem/prmem_region.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Persistent-Across-Kexec memory (prmem) - Regions.
+ *
+ * Copyright (C) 2023 Microsoft Corporation
+ * Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
+ */
+#include <linux/prmem.h>
+
+struct prmem_region *prmem_add_region(unsigned long pa, size_t size)
+{
+ struct prmem_region *region;
+
+ /* Allocate region structure from the base of the region itself. */
+ region = __va(pa);
+ region->pa = pa;
+ region->size = size;
+
+ list_add_tail(®ion->node, &prmem->regions);
+ return region;
+}
diff --git a/kernel/prmem/prmem_reserve.c b/kernel/prmem/prmem_reserve.c
new file mode 100644
index 000000000000..e20e31a61d12
--- /dev/null
+++ b/kernel/prmem/prmem_reserve.c
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Persistent-Across-Kexec memory (prmem) - Reserve memory.
+ *
+ * Copyright (C) 2023 Microsoft Corporation
+ * Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
+ */
+#include <linux/prmem.h>
+
+struct prmem *prmem;
+unsigned long prmem_metadata;
+unsigned long prmem_pa;
+unsigned long prmem_size;
+
+void __init prmem_reserve(void)
+{
+ BUILD_BUG_ON(sizeof(*prmem) > PAGE_SIZE);
+
+ if (!prmem_size)
+ return;
+
+ /*
+ * prmem uses direct map addresses. If PAGE_OFFSET is randomized,
+ * these addresses will change across kexecs. Persistence cannot
+ * be supported.
+ */
+ if (kaslr_memory_enabled()) {
+ pr_warn("%s: Cannot support persistence because of KASLR.\n",
+ __func__);
+ return;
+ }
+
+ /* Allocate a metadata page. */
+ prmem_metadata = memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
+ if (!prmem_metadata) {
+ pr_warn("%s: Could not allocate metadata at %lx\n", __func__,
+ prmem_metadata);
+ return;
+ }
+
+ /* Allocate initial memory. */
+ prmem_pa = memblock_phys_alloc(prmem_size, PAGE_SIZE);
+ if (!prmem_pa) {
+ pr_warn("%s: Could not allocate initial memory\n", __func__);
+ goto free_metadata;
+ }
+
+ /* Clear metadata. */
+ prmem = __va(prmem_metadata);
+ memset(prmem, 0, sizeof(*prmem));
+ return;
+
+free_metadata:
+ memblock_phys_free(prmem_metadata, PAGE_SIZE);
+ prmem = NULL;
+}
diff --git a/mm/mm_init.c b/mm/mm_init.c
index a1963c3322af..f12757829281 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -24,6 +24,7 @@
#include <linux/page_ext.h>
#include <linux/pti.h>
#include <linux/pgtable.h>
+#include <linux/prmem.h>
#include <linux/swap.h>
#include <linux/cma.h>
#include "internal.h"
@@ -2804,4 +2805,5 @@ void __init mm_core_init(void)
pti_init();
kmsan_init_runtime();
mm_cache_init();
+ prmem_init();
}
--
2.25.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC PATCH v1 01/10] mm/prmem: Allocate memory during boot for storing persistent data
2023-10-16 23:32 ` [RFC PATCH v1 01/10] mm/prmem: Allocate memory during boot for storing persistent data madvenka
@ 2023-10-17 18:36 ` kernel test robot
0 siblings, 0 replies; 16+ messages in thread
From: kernel test robot @ 2023-10-17 18:36 UTC (permalink / raw)
To: madvenka; +Cc: oe-kbuild-all
Hi,
[This is a private test report for your RFC patch.]
kernel test robot noticed the following build warnings:
[auto build test WARNING on 2dde18cd1d8fac735875f2e4987f11817cc0bc2c]
url: https://github.com/intel-lab-lkp/linux/commits/madvenka-linux-microsoft-com/mm-prmem-Allocate-memory-during-boot-for-storing-persistent-data/20231017-194340
base: 2dde18cd1d8fac735875f2e4987f11817cc0bc2c
patch link: https://lore.kernel.org/r/20231016233215.13090-2-madvenka%40linux.microsoft.com
patch subject: [RFC PATCH v1 01/10] mm/prmem: Allocate memory during boot for storing persistent data
config: alpha-allyesconfig (https://download.01.org/0day-ci/archive/20231018/202310180205.XwQNDqOS-lkp@intel.com/config)
compiler: alpha-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231018/202310180205.XwQNDqOS-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202310180205.XwQNDqOS-lkp@intel.com/
All warnings (new ones prefixed by >>):
|
arch/alpha/include/uapi/asm/errno.h:39: note: this is the location of the previous definition
39 | #define ELOOP 62 /* Too many symbolic links encountered */
|
include/uapi/asm-generic/errno.h:23: warning: "ENOMSG" redefined
23 | #define ENOMSG 42 /* No message of desired type */
|
arch/alpha/include/uapi/asm/errno.h:53: note: this is the location of the previous definition
53 | #define ENOMSG 80 /* No message of desired type */
|
include/uapi/asm-generic/errno.h:24: warning: "EIDRM" redefined
24 | #define EIDRM 43 /* Identifier removed */
|
arch/alpha/include/uapi/asm/errno.h:54: note: this is the location of the previous definition
54 | #define EIDRM 81 /* Identifier removed */
|
include/uapi/asm-generic/errno.h:25: warning: "ECHRNG" redefined
25 | #define ECHRNG 44 /* Channel number out of range */
|
arch/alpha/include/uapi/asm/errno.h:67: note: this is the location of the previous definition
67 | #define ECHRNG 88 /* Channel number out of range */
|
include/uapi/asm-generic/errno.h:26: warning: "EL2NSYNC" redefined
26 | #define EL2NSYNC 45 /* Level 2 not synchronized */
|
arch/alpha/include/uapi/asm/errno.h:68: note: this is the location of the previous definition
68 | #define EL2NSYNC 89 /* Level 2 not synchronized */
|
include/uapi/asm-generic/errno.h:27: warning: "EL3HLT" redefined
27 | #define EL3HLT 46 /* Level 3 halted */
|
arch/alpha/include/uapi/asm/errno.h:69: note: this is the location of the previous definition
69 | #define EL3HLT 90 /* Level 3 halted */
|
include/uapi/asm-generic/errno.h:28: warning: "EL3RST" redefined
28 | #define EL3RST 47 /* Level 3 reset */
|
arch/alpha/include/uapi/asm/errno.h:70: note: this is the location of the previous definition
70 | #define EL3RST 91 /* Level 3 reset */
|
include/uapi/asm-generic/errno.h:29: warning: "ELNRNG" redefined
29 | #define ELNRNG 48 /* Link number out of range */
|
arch/alpha/include/uapi/asm/errno.h:72: note: this is the location of the previous definition
72 | #define ELNRNG 93 /* Link number out of range */
|
include/uapi/asm-generic/errno.h:30: warning: "EUNATCH" redefined
30 | #define EUNATCH 49 /* Protocol driver not attached */
|
arch/alpha/include/uapi/asm/errno.h:73: note: this is the location of the previous definition
73 | #define EUNATCH 94 /* Protocol driver not attached */
|
include/uapi/asm-generic/errno.h:31: warning: "ENOCSI" redefined
31 | #define ENOCSI 50 /* No CSI structure available */
|
arch/alpha/include/uapi/asm/errno.h:74: note: this is the location of the previous definition
74 | #define ENOCSI 95 /* No CSI structure available */
|
include/uapi/asm-generic/errno.h:32: warning: "EL2HLT" redefined
32 | #define EL2HLT 51 /* Level 2 halted */
|
arch/alpha/include/uapi/asm/errno.h:75: note: this is the location of the previous definition
75 | #define EL2HLT 96 /* Level 2 halted */
|
include/uapi/asm-generic/errno.h:33: warning: "EBADE" redefined
33 | #define EBADE 52 /* Invalid exchange */
|
arch/alpha/include/uapi/asm/errno.h:76: note: this is the location of the previous definition
76 | #define EBADE 97 /* Invalid exchange */
|
include/uapi/asm-generic/errno.h:34: warning: "EBADR" redefined
34 | #define EBADR 53 /* Invalid request descriptor */
|
arch/alpha/include/uapi/asm/errno.h:77: note: this is the location of the previous definition
77 | #define EBADR 98 /* Invalid request descriptor */
|
include/uapi/asm-generic/errno.h:35: warning: "EXFULL" redefined
35 | #define EXFULL 54 /* Exchange full */
|
arch/alpha/include/uapi/asm/errno.h:78: note: this is the location of the previous definition
78 | #define EXFULL 99 /* Exchange full */
|
include/uapi/asm-generic/errno.h:36: warning: "ENOANO" redefined
36 | #define ENOANO 55 /* No anode */
|
arch/alpha/include/uapi/asm/errno.h:79: note: this is the location of the previous definition
79 | #define ENOANO 100 /* No anode */
|
include/uapi/asm-generic/errno.h:37: warning: "EBADRQC" redefined
37 | #define EBADRQC 56 /* Invalid request code */
|
arch/alpha/include/uapi/asm/errno.h:80: note: this is the location of the previous definition
80 | #define EBADRQC 101 /* Invalid request code */
|
include/uapi/asm-generic/errno.h:38: warning: "EBADSLT" redefined
38 | #define EBADSLT 57 /* Invalid slot */
|
arch/alpha/include/uapi/asm/errno.h:81: note: this is the location of the previous definition
81 | #define EBADSLT 102 /* Invalid slot */
|
>> include/uapi/asm-generic/errno.h:42: warning: "EBFONT" redefined
42 | #define EBFONT 59 /* Bad font file format */
|
arch/alpha/include/uapi/asm/errno.h:85: note: this is the location of the previous definition
85 | #define EBFONT 104 /* Bad font file format */
|
>> include/uapi/asm-generic/errno.h:43: warning: "ENOSTR" redefined
43 | #define ENOSTR 60 /* Device not a stream */
|
arch/alpha/include/uapi/asm/errno.h:60: note: this is the location of the previous definition
60 | #define ENOSTR 87 /* Device not a stream */
|
>> include/uapi/asm-generic/errno.h:44: warning: "ENODATA" redefined
44 | #define ENODATA 61 /* No data available */
|
arch/alpha/include/uapi/asm/errno.h:59: note: this is the location of the previous definition
59 | #define ENODATA 86 /* No data available */
|
>> include/uapi/asm-generic/errno.h:45: warning: "ETIME" redefined
45 | #define ETIME 62 /* Timer expired */
|
arch/alpha/include/uapi/asm/errno.h:56: note: this is the location of the previous definition
56 | #define ETIME 83 /* Timer expired */
|
>> include/uapi/asm-generic/errno.h:46: warning: "ENOSR" redefined
46 | #define ENOSR 63 /* Out of streams resources */
|
arch/alpha/include/uapi/asm/errno.h:55: note: this is the location of the previous definition
55 | #define ENOSR 82 /* Out of streams resources */
|
>> include/uapi/asm-generic/errno.h:47: warning: "ENONET" redefined
47 | #define ENONET 64 /* Machine is not on the network */
|
arch/alpha/include/uapi/asm/errno.h:86: note: this is the location of the previous definition
86 | #define ENONET 105 /* Machine is not on the network */
|
>> include/uapi/asm-generic/errno.h:48: warning: "ENOPKG" redefined
48 | #define ENOPKG 65 /* Package not installed */
|
arch/alpha/include/uapi/asm/errno.h:62: note: this is the location of the previous definition
62 | #define ENOPKG 92 /* Package not installed */
|
>> include/uapi/asm-generic/errno.h:49: warning: "EREMOTE" redefined
49 | #define EREMOTE 66 /* Object is remote */
|
arch/alpha/include/uapi/asm/errno.h:48: note: this is the location of the previous definition
48 | #define EREMOTE 71 /* Object is remote */
|
>> include/uapi/asm-generic/errno.h:50: warning: "ENOLINK" redefined
50 | #define ENOLINK 67 /* Link has been severed */
|
arch/alpha/include/uapi/asm/errno.h:87: note: this is the location of the previous definition
87 | #define ENOLINK 106 /* Link has been severed */
|
>> include/uapi/asm-generic/errno.h:51: warning: "EADV" redefined
51 | #define EADV 68 /* Advertise error */
|
arch/alpha/include/uapi/asm/errno.h:88: note: this is the location of the previous definition
88 | #define EADV 107 /* Advertise error */
|
>> include/uapi/asm-generic/errno.h:52: warning: "ESRMNT" redefined
52 | #define ESRMNT 69 /* Srmount error */
|
arch/alpha/include/uapi/asm/errno.h:89: note: this is the location of the previous definition
89 | #define ESRMNT 108 /* Srmount error */
|
>> include/uapi/asm-generic/errno.h:53: warning: "ECOMM" redefined
53 | #define ECOMM 70 /* Communication error on send */
|
arch/alpha/include/uapi/asm/errno.h:90: note: this is the location of the previous definition
90 | #define ECOMM 109 /* Communication error on send */
|
>> include/uapi/asm-generic/errno.h:54: warning: "EPROTO" redefined
54 | #define EPROTO 71 /* Protocol error */
|
arch/alpha/include/uapi/asm/errno.h:58: note: this is the location of the previous definition
58 | #define EPROTO 85 /* Protocol error */
|
include/uapi/asm-generic/errno.h:55: warning: "EMULTIHOP" redefined
55 | #define EMULTIHOP 72 /* Multihop attempted */
|
arch/alpha/include/uapi/asm/errno.h:91: note: this is the location of the previous definition
91 | #define EMULTIHOP 110 /* Multihop attempted */
|
>> include/uapi/asm-generic/errno.h:56: warning: "EDOTDOT" redefined
56 | #define EDOTDOT 73 /* RFS specific error */
|
arch/alpha/include/uapi/asm/errno.h:92: note: this is the location of the previous definition
92 | #define EDOTDOT 111 /* RFS specific error */
|
include/uapi/asm-generic/errno.h:57: warning: "EBADMSG" redefined
57 | #define EBADMSG 74 /* Not a data message */
|
arch/alpha/include/uapi/asm/errno.h:57: note: this is the location of the previous definition
57 | #define EBADMSG 84 /* Not a data message */
|
include/uapi/asm-generic/errno.h:58: warning: "EOVERFLOW" redefined
58 | #define EOVERFLOW 75 /* Value too large for defined data type */
|
arch/alpha/include/uapi/asm/errno.h:93: note: this is the location of the previous definition
93 | #define EOVERFLOW 112 /* Value too large for defined data type */
|
include/uapi/asm-generic/errno.h:59: warning: "ENOTUNIQ" redefined
59 | #define ENOTUNIQ 76 /* Name not unique on network */
|
arch/alpha/include/uapi/asm/errno.h:94: note: this is the location of the previous definition
94 | #define ENOTUNIQ 113 /* Name not unique on network */
|
include/uapi/asm-generic/errno.h:60: warning: "EBADFD" redefined
60 | #define EBADFD 77 /* File descriptor in bad state */
|
arch/alpha/include/uapi/asm/errno.h:95: note: this is the location of the previous definition
95 | #define EBADFD 114 /* File descriptor in bad state */
|
include/uapi/asm-generic/errno.h:61: warning: "EREMCHG" redefined
61 | #define EREMCHG 78 /* Remote address changed */
|
arch/alpha/include/uapi/asm/errno.h:96: note: this is the location of the previous definition
96 | #define EREMCHG 115 /* Remote address changed */
|
include/uapi/asm-generic/errno.h:62: warning: "ELIBACC" redefined
62 | #define ELIBACC 79 /* Can not access a needed shared library */
|
arch/alpha/include/uapi/asm/errno.h:104: note: this is the location of the previous definition
104 | #define ELIBACC 122 /* Can not access a needed shared library */
|
include/uapi/asm-generic/errno.h:63: warning: "ELIBBAD" redefined
63 | #define ELIBBAD 80 /* Accessing a corrupted shared library */
|
arch/alpha/include/uapi/asm/errno.h:105: note: this is the location of the previous definition
105 | #define ELIBBAD 123 /* Accessing a corrupted shared library */
|
include/uapi/asm-generic/errno.h:64: warning: "ELIBSCN" redefined
64 | #define ELIBSCN 81 /* .lib section in a.out corrupted */
|
arch/alpha/include/uapi/asm/errno.h:106: note: this is the location of the previous definition
106 | #define ELIBSCN 124 /* .lib section in a.out corrupted */
|
include/uapi/asm-generic/errno.h:65: warning: "ELIBMAX" redefined
65 | #define ELIBMAX 82 /* Attempting to link in too many shared libraries */
|
arch/alpha/include/uapi/asm/errno.h:107: note: this is the location of the previous definition
107 | #define ELIBMAX 125 /* Attempting to link in too many shared libraries */
|
include/uapi/asm-generic/errno.h:66: warning: "ELIBEXEC" redefined
66 | #define ELIBEXEC 83 /* Cannot exec a shared library directly */
|
arch/alpha/include/uapi/asm/errno.h:108: note: this is the location of the previous definition
108 | #define ELIBEXEC 126 /* Cannot exec a shared library directly */
|
include/uapi/asm-generic/errno.h:67: warning: "EILSEQ" redefined
67 | #define EILSEQ 84 /* Illegal byte sequence */
|
arch/alpha/include/uapi/asm/errno.h:64: note: this is the location of the previous definition
64 | #define EILSEQ 116 /* Illegal byte sequence */
|
include/uapi/asm-generic/errno.h:68: warning: "ERESTART" redefined
68 | #define ERESTART 85 /* Interrupted system call should be restarted */
|
arch/alpha/include/uapi/asm/errno.h:109: note: this is the location of the previous definition
109 | #define ERESTART 127 /* Interrupted system call should be restarted */
|
include/uapi/asm-generic/errno.h:69: warning: "ESTRPIPE" redefined
69 | #define ESTRPIPE 86 /* Streams pipe error */
|
arch/alpha/include/uapi/asm/errno.h:110: note: this is the location of the previous definition
110 | #define ESTRPIPE 128 /* Streams pipe error */
|
include/uapi/asm-generic/errno.h:70: warning: "EUSERS" redefined
70 | #define EUSERS 87 /* Too many users */
|
arch/alpha/include/uapi/asm/errno.h:45: note: this is the location of the previous definition
45 | #define EUSERS 68 /* Too many users */
|
include/uapi/asm-generic/errno.h:71: warning: "ENOTSOCK" redefined
71 | #define ENOTSOCK 88 /* Socket operation on non-socket */
|
arch/alpha/include/uapi/asm/errno.h:15: note: this is the location of the previous definition
15 | #define ENOTSOCK 38 /* Socket operation on non-socket */
|
include/uapi/asm-generic/errno.h:72: warning: "EDESTADDRREQ" redefined
72 | #define EDESTADDRREQ 89 /* Destination address required */
|
arch/alpha/include/uapi/asm/errno.h:16: note: this is the location of the previous definition
16 | #define EDESTADDRREQ 39 /* Destination address required */
--
|
arch/alpha/include/uapi/asm/errno.h:39: note: this is the location of the previous definition
39 | #define ELOOP 62 /* Too many symbolic links encountered */
|
include/uapi/asm-generic/errno.h:23: warning: "ENOMSG" redefined
23 | #define ENOMSG 42 /* No message of desired type */
|
arch/alpha/include/uapi/asm/errno.h:53: note: this is the location of the previous definition
53 | #define ENOMSG 80 /* No message of desired type */
|
include/uapi/asm-generic/errno.h:24: warning: "EIDRM" redefined
24 | #define EIDRM 43 /* Identifier removed */
|
arch/alpha/include/uapi/asm/errno.h:54: note: this is the location of the previous definition
54 | #define EIDRM 81 /* Identifier removed */
|
include/uapi/asm-generic/errno.h:25: warning: "ECHRNG" redefined
25 | #define ECHRNG 44 /* Channel number out of range */
|
arch/alpha/include/uapi/asm/errno.h:67: note: this is the location of the previous definition
67 | #define ECHRNG 88 /* Channel number out of range */
|
include/uapi/asm-generic/errno.h:26: warning: "EL2NSYNC" redefined
26 | #define EL2NSYNC 45 /* Level 2 not synchronized */
|
arch/alpha/include/uapi/asm/errno.h:68: note: this is the location of the previous definition
68 | #define EL2NSYNC 89 /* Level 2 not synchronized */
|
include/uapi/asm-generic/errno.h:27: warning: "EL3HLT" redefined
27 | #define EL3HLT 46 /* Level 3 halted */
|
arch/alpha/include/uapi/asm/errno.h:69: note: this is the location of the previous definition
69 | #define EL3HLT 90 /* Level 3 halted */
|
include/uapi/asm-generic/errno.h:28: warning: "EL3RST" redefined
28 | #define EL3RST 47 /* Level 3 reset */
|
arch/alpha/include/uapi/asm/errno.h:70: note: this is the location of the previous definition
70 | #define EL3RST 91 /* Level 3 reset */
|
include/uapi/asm-generic/errno.h:29: warning: "ELNRNG" redefined
29 | #define ELNRNG 48 /* Link number out of range */
|
arch/alpha/include/uapi/asm/errno.h:72: note: this is the location of the previous definition
72 | #define ELNRNG 93 /* Link number out of range */
|
include/uapi/asm-generic/errno.h:30: warning: "EUNATCH" redefined
30 | #define EUNATCH 49 /* Protocol driver not attached */
|
arch/alpha/include/uapi/asm/errno.h:73: note: this is the location of the previous definition
73 | #define EUNATCH 94 /* Protocol driver not attached */
|
include/uapi/asm-generic/errno.h:31: warning: "ENOCSI" redefined
31 | #define ENOCSI 50 /* No CSI structure available */
|
arch/alpha/include/uapi/asm/errno.h:74: note: this is the location of the previous definition
74 | #define ENOCSI 95 /* No CSI structure available */
|
include/uapi/asm-generic/errno.h:32: warning: "EL2HLT" redefined
32 | #define EL2HLT 51 /* Level 2 halted */
|
arch/alpha/include/uapi/asm/errno.h:75: note: this is the location of the previous definition
75 | #define EL2HLT 96 /* Level 2 halted */
|
include/uapi/asm-generic/errno.h:33: warning: "EBADE" redefined
33 | #define EBADE 52 /* Invalid exchange */
|
arch/alpha/include/uapi/asm/errno.h:76: note: this is the location of the previous definition
76 | #define EBADE 97 /* Invalid exchange */
|
include/uapi/asm-generic/errno.h:34: warning: "EBADR" redefined
34 | #define EBADR 53 /* Invalid request descriptor */
|
arch/alpha/include/uapi/asm/errno.h:77: note: this is the location of the previous definition
77 | #define EBADR 98 /* Invalid request descriptor */
|
include/uapi/asm-generic/errno.h:35: warning: "EXFULL" redefined
35 | #define EXFULL 54 /* Exchange full */
|
arch/alpha/include/uapi/asm/errno.h:78: note: this is the location of the previous definition
78 | #define EXFULL 99 /* Exchange full */
|
include/uapi/asm-generic/errno.h:36: warning: "ENOANO" redefined
36 | #define ENOANO 55 /* No anode */
|
arch/alpha/include/uapi/asm/errno.h:79: note: this is the location of the previous definition
79 | #define ENOANO 100 /* No anode */
|
include/uapi/asm-generic/errno.h:37: warning: "EBADRQC" redefined
37 | #define EBADRQC 56 /* Invalid request code */
|
arch/alpha/include/uapi/asm/errno.h:80: note: this is the location of the previous definition
80 | #define EBADRQC 101 /* Invalid request code */
|
include/uapi/asm-generic/errno.h:38: warning: "EBADSLT" redefined
38 | #define EBADSLT 57 /* Invalid slot */
|
arch/alpha/include/uapi/asm/errno.h:81: note: this is the location of the previous definition
81 | #define EBADSLT 102 /* Invalid slot */
|
>> include/uapi/asm-generic/errno.h:42: warning: "EBFONT" redefined
42 | #define EBFONT 59 /* Bad font file format */
|
arch/alpha/include/uapi/asm/errno.h:85: note: this is the location of the previous definition
85 | #define EBFONT 104 /* Bad font file format */
|
>> include/uapi/asm-generic/errno.h:43: warning: "ENOSTR" redefined
43 | #define ENOSTR 60 /* Device not a stream */
|
arch/alpha/include/uapi/asm/errno.h:60: note: this is the location of the previous definition
60 | #define ENOSTR 87 /* Device not a stream */
|
>> include/uapi/asm-generic/errno.h:44: warning: "ENODATA" redefined
44 | #define ENODATA 61 /* No data available */
|
arch/alpha/include/uapi/asm/errno.h:59: note: this is the location of the previous definition
59 | #define ENODATA 86 /* No data available */
|
>> include/uapi/asm-generic/errno.h:45: warning: "ETIME" redefined
45 | #define ETIME 62 /* Timer expired */
|
arch/alpha/include/uapi/asm/errno.h:56: note: this is the location of the previous definition
56 | #define ETIME 83 /* Timer expired */
|
>> include/uapi/asm-generic/errno.h:46: warning: "ENOSR" redefined
46 | #define ENOSR 63 /* Out of streams resources */
|
arch/alpha/include/uapi/asm/errno.h:55: note: this is the location of the previous definition
55 | #define ENOSR 82 /* Out of streams resources */
|
>> include/uapi/asm-generic/errno.h:47: warning: "ENONET" redefined
47 | #define ENONET 64 /* Machine is not on the network */
|
arch/alpha/include/uapi/asm/errno.h:86: note: this is the location of the previous definition
86 | #define ENONET 105 /* Machine is not on the network */
|
>> include/uapi/asm-generic/errno.h:48: warning: "ENOPKG" redefined
48 | #define ENOPKG 65 /* Package not installed */
|
arch/alpha/include/uapi/asm/errno.h:62: note: this is the location of the previous definition
62 | #define ENOPKG 92 /* Package not installed */
|
>> include/uapi/asm-generic/errno.h:49: warning: "EREMOTE" redefined
49 | #define EREMOTE 66 /* Object is remote */
|
arch/alpha/include/uapi/asm/errno.h:48: note: this is the location of the previous definition
48 | #define EREMOTE 71 /* Object is remote */
|
>> include/uapi/asm-generic/errno.h:50: warning: "ENOLINK" redefined
50 | #define ENOLINK 67 /* Link has been severed */
|
arch/alpha/include/uapi/asm/errno.h:87: note: this is the location of the previous definition
87 | #define ENOLINK 106 /* Link has been severed */
|
>> include/uapi/asm-generic/errno.h:51: warning: "EADV" redefined
51 | #define EADV 68 /* Advertise error */
|
arch/alpha/include/uapi/asm/errno.h:88: note: this is the location of the previous definition
88 | #define EADV 107 /* Advertise error */
|
>> include/uapi/asm-generic/errno.h:52: warning: "ESRMNT" redefined
52 | #define ESRMNT 69 /* Srmount error */
|
arch/alpha/include/uapi/asm/errno.h:89: note: this is the location of the previous definition
89 | #define ESRMNT 108 /* Srmount error */
|
>> include/uapi/asm-generic/errno.h:53: warning: "ECOMM" redefined
53 | #define ECOMM 70 /* Communication error on send */
|
arch/alpha/include/uapi/asm/errno.h:90: note: this is the location of the previous definition
90 | #define ECOMM 109 /* Communication error on send */
|
>> include/uapi/asm-generic/errno.h:54: warning: "EPROTO" redefined
54 | #define EPROTO 71 /* Protocol error */
|
arch/alpha/include/uapi/asm/errno.h:58: note: this is the location of the previous definition
58 | #define EPROTO 85 /* Protocol error */
|
include/uapi/asm-generic/errno.h:55: warning: "EMULTIHOP" redefined
55 | #define EMULTIHOP 72 /* Multihop attempted */
|
arch/alpha/include/uapi/asm/errno.h:91: note: this is the location of the previous definition
91 | #define EMULTIHOP 110 /* Multihop attempted */
|
>> include/uapi/asm-generic/errno.h:56: warning: "EDOTDOT" redefined
56 | #define EDOTDOT 73 /* RFS specific error */
|
arch/alpha/include/uapi/asm/errno.h:92: note: this is the location of the previous definition
92 | #define EDOTDOT 111 /* RFS specific error */
|
include/uapi/asm-generic/errno.h:57: warning: "EBADMSG" redefined
57 | #define EBADMSG 74 /* Not a data message */
|
arch/alpha/include/uapi/asm/errno.h:57: note: this is the location of the previous definition
57 | #define EBADMSG 84 /* Not a data message */
|
include/uapi/asm-generic/errno.h:58: warning: "EOVERFLOW" redefined
58 | #define EOVERFLOW 75 /* Value too large for defined data type */
|
arch/alpha/include/uapi/asm/errno.h:93: note: this is the location of the previous definition
93 | #define EOVERFLOW 112 /* Value too large for defined data type */
|
include/uapi/asm-generic/errno.h:59: warning: "ENOTUNIQ" redefined
59 | #define ENOTUNIQ 76 /* Name not unique on network */
|
arch/alpha/include/uapi/asm/errno.h:94: note: this is the location of the previous definition
94 | #define ENOTUNIQ 113 /* Name not unique on network */
|
include/uapi/asm-generic/errno.h:60: warning: "EBADFD" redefined
60 | #define EBADFD 77 /* File descriptor in bad state */
|
arch/alpha/include/uapi/asm/errno.h:95: note: this is the location of the previous definition
95 | #define EBADFD 114 /* File descriptor in bad state */
|
include/uapi/asm-generic/errno.h:61: warning: "EREMCHG" redefined
61 | #define EREMCHG 78 /* Remote address changed */
|
arch/alpha/include/uapi/asm/errno.h:96: note: this is the location of the previous definition
96 | #define EREMCHG 115 /* Remote address changed */
|
include/uapi/asm-generic/errno.h:62: warning: "ELIBACC" redefined
62 | #define ELIBACC 79 /* Can not access a needed shared library */
|
arch/alpha/include/uapi/asm/errno.h:104: note: this is the location of the previous definition
104 | #define ELIBACC 122 /* Can not access a needed shared library */
|
include/uapi/asm-generic/errno.h:63: warning: "ELIBBAD" redefined
63 | #define ELIBBAD 80 /* Accessing a corrupted shared library */
|
arch/alpha/include/uapi/asm/errno.h:105: note: this is the location of the previous definition
105 | #define ELIBBAD 123 /* Accessing a corrupted shared library */
|
include/uapi/asm-generic/errno.h:64: warning: "ELIBSCN" redefined
64 | #define ELIBSCN 81 /* .lib section in a.out corrupted */
|
arch/alpha/include/uapi/asm/errno.h:106: note: this is the location of the previous definition
106 | #define ELIBSCN 124 /* .lib section in a.out corrupted */
|
include/uapi/asm-generic/errno.h:65: warning: "ELIBMAX" redefined
65 | #define ELIBMAX 82 /* Attempting to link in too many shared libraries */
|
arch/alpha/include/uapi/asm/errno.h:107: note: this is the location of the previous definition
107 | #define ELIBMAX 125 /* Attempting to link in too many shared libraries */
|
include/uapi/asm-generic/errno.h:66: warning: "ELIBEXEC" redefined
66 | #define ELIBEXEC 83 /* Cannot exec a shared library directly */
|
arch/alpha/include/uapi/asm/errno.h:108: note: this is the location of the previous definition
108 | #define ELIBEXEC 126 /* Cannot exec a shared library directly */
|
include/uapi/asm-generic/errno.h:67: warning: "EILSEQ" redefined
67 | #define EILSEQ 84 /* Illegal byte sequence */
|
arch/alpha/include/uapi/asm/errno.h:64: note: this is the location of the previous definition
64 | #define EILSEQ 116 /* Illegal byte sequence */
|
include/uapi/asm-generic/errno.h:68: warning: "ERESTART" redefined
68 | #define ERESTART 85 /* Interrupted system call should be restarted */
|
arch/alpha/include/uapi/asm/errno.h:109: note: this is the location of the previous definition
109 | #define ERESTART 127 /* Interrupted system call should be restarted */
|
include/uapi/asm-generic/errno.h:69: warning: "ESTRPIPE" redefined
69 | #define ESTRPIPE 86 /* Streams pipe error */
|
arch/alpha/include/uapi/asm/errno.h:110: note: this is the location of the previous definition
110 | #define ESTRPIPE 128 /* Streams pipe error */
|
include/uapi/asm-generic/errno.h:70: warning: "EUSERS" redefined
70 | #define EUSERS 87 /* Too many users */
|
arch/alpha/include/uapi/asm/errno.h:45: note: this is the location of the previous definition
45 | #define EUSERS 68 /* Too many users */
|
include/uapi/asm-generic/errno.h:71: warning: "ENOTSOCK" redefined
71 | #define ENOTSOCK 88 /* Socket operation on non-socket */
|
arch/alpha/include/uapi/asm/errno.h:15: note: this is the location of the previous definition
15 | #define ENOTSOCK 38 /* Socket operation on non-socket */
|
include/uapi/asm-generic/errno.h:72: warning: "EDESTADDRREQ" redefined
72 | #define EDESTADDRREQ 89 /* Destination address required */
|
arch/alpha/include/uapi/asm/errno.h:16: note: this is the location of the previous definition
16 | #define EDESTADDRREQ 39 /* Destination address required */
vim +/EBFONT +42 include/uapi/asm-generic/errno.h
e15f431fe2d53c include/uapi/asm-generic/errno.h Andy Lutomirski 2015-04-16 19
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 20 #define ENOTEMPTY 39 /* Directory not empty */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 21 #define ELOOP 40 /* Too many symbolic links encountered */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 22 #define EWOULDBLOCK EAGAIN /* Operation would block */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 23 #define ENOMSG 42 /* No message of desired type */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 24 #define EIDRM 43 /* Identifier removed */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 25 #define ECHRNG 44 /* Channel number out of range */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 26 #define EL2NSYNC 45 /* Level 2 not synchronized */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 27 #define EL3HLT 46 /* Level 3 halted */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 28 #define EL3RST 47 /* Level 3 reset */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 29 #define ELNRNG 48 /* Link number out of range */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 30 #define EUNATCH 49 /* Protocol driver not attached */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 31 #define ENOCSI 50 /* No CSI structure available */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 32 #define EL2HLT 51 /* Level 2 halted */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 33 #define EBADE 52 /* Invalid exchange */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 34 #define EBADR 53 /* Invalid request descriptor */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 35 #define EXFULL 54 /* Exchange full */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 36 #define ENOANO 55 /* No anode */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 37 #define EBADRQC 56 /* Invalid request code */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 @38 #define EBADSLT 57 /* Invalid slot */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 39
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 40 #define EDEADLOCK EDEADLK
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 41
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 @42 #define EBFONT 59 /* Bad font file format */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 @43 #define ENOSTR 60 /* Device not a stream */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 @44 #define ENODATA 61 /* No data available */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 @45 #define ETIME 62 /* Timer expired */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 @46 #define ENOSR 63 /* Out of streams resources */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 @47 #define ENONET 64 /* Machine is not on the network */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 @48 #define ENOPKG 65 /* Package not installed */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 @49 #define EREMOTE 66 /* Object is remote */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 @50 #define ENOLINK 67 /* Link has been severed */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 @51 #define EADV 68 /* Advertise error */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 @52 #define ESRMNT 69 /* Srmount error */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 @53 #define ECOMM 70 /* Communication error on send */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 @54 #define EPROTO 71 /* Protocol error */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 @55 #define EMULTIHOP 72 /* Multihop attempted */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 @56 #define EDOTDOT 73 /* RFS specific error */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 57 #define EBADMSG 74 /* Not a data message */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 58 #define EOVERFLOW 75 /* Value too large for defined data type */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 59 #define ENOTUNIQ 76 /* Name not unique on network */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 60 #define EBADFD 77 /* File descriptor in bad state */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 61 #define EREMCHG 78 /* Remote address changed */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 62 #define ELIBACC 79 /* Can not access a needed shared library */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 63 #define ELIBBAD 80 /* Accessing a corrupted shared library */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 64 #define ELIBSCN 81 /* .lib section in a.out corrupted */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 65 #define ELIBMAX 82 /* Attempting to link in too many shared libraries */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 66 #define ELIBEXEC 83 /* Cannot exec a shared library directly */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 67 #define EILSEQ 84 /* Illegal byte sequence */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 68 #define ERESTART 85 /* Interrupted system call should be restarted */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 69 #define ESTRPIPE 86 /* Streams pipe error */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 70 #define EUSERS 87 /* Too many users */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 71 #define ENOTSOCK 88 /* Socket operation on non-socket */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 72 #define EDESTADDRREQ 89 /* Destination address required */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 73 #define EMSGSIZE 90 /* Message too long */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 74 #define EPROTOTYPE 91 /* Protocol wrong type for socket */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 75 #define ENOPROTOOPT 92 /* Protocol not available */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 76 #define EPROTONOSUPPORT 93 /* Protocol not supported */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 77 #define ESOCKTNOSUPPORT 94 /* Socket type not supported */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 78 #define EOPNOTSUPP 95 /* Operation not supported on transport endpoint */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 79 #define EPFNOSUPPORT 96 /* Protocol family not supported */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 80 #define EAFNOSUPPORT 97 /* Address family not supported by protocol */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 81 #define EADDRINUSE 98 /* Address already in use */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 82 #define EADDRNOTAVAIL 99 /* Cannot assign requested address */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 83 #define ENETDOWN 100 /* Network is down */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 84 #define ENETUNREACH 101 /* Network is unreachable */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 85 #define ENETRESET 102 /* Network dropped connection because of reset */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 86 #define ECONNABORTED 103 /* Software caused connection abort */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 87 #define ECONNRESET 104 /* Connection reset by peer */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 88 #define ENOBUFS 105 /* No buffer space available */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 89 #define EISCONN 106 /* Transport endpoint is already connected */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 90 #define ENOTCONN 107 /* Transport endpoint is not connected */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 91 #define ESHUTDOWN 108 /* Cannot send after transport endpoint shutdown */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 92 #define ETOOMANYREFS 109 /* Too many references: cannot splice */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 93 #define ETIMEDOUT 110 /* Connection timed out */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 94 #define ECONNREFUSED 111 /* Connection refused */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 95 #define EHOSTDOWN 112 /* Host is down */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 96 #define EHOSTUNREACH 113 /* No route to host */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 97 #define EALREADY 114 /* Operation already in progress */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 98 #define EINPROGRESS 115 /* Operation now in progress */
0ca43435188b9f include/uapi/asm-generic/errno.h Eric Sandeen 2013-11-12 99 #define ESTALE 116 /* Stale file handle */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 100 #define EUCLEAN 117 /* Structure needs cleaning */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 101 #define ENOTNAM 118 /* Not a XENIX named type file */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 102 #define ENAVAIL 119 /* No XENIX semaphores available */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 103 #define EISNAM 120 /* Is a named type file */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 104 #define EREMOTEIO 121 /* Remote I/O error */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 105 #define EDQUOT 122 /* Quota exceeded */
^1da177e4c3f41 include/asm-generic/errno.h Linus Torvalds 2005-04-16 106
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC PATCH v1 02/10] mm/prmem: Reserve metadata and persistent regions in early boot after kexec
2023-10-16 23:32 ` [RFC PATCH v1 00/10] mm/prmem: Implement the Persistent-Across-Kexec memory feature (prmem) madvenka
2023-10-16 23:32 ` [RFC PATCH v1 01/10] mm/prmem: Allocate memory during boot for storing persistent data madvenka
@ 2023-10-16 23:32 ` madvenka
2023-10-17 19:29 ` kernel test robot
2023-10-16 23:32 ` [RFC PATCH v1 03/10] mm/prmem: Manage persistent memory with the gen pool allocator madvenka
` (8 subsequent siblings)
10 siblings, 1 reply; 16+ messages in thread
From: madvenka @ 2023-10-16 23:32 UTC (permalink / raw)
To: gregkh, pbonzini, rppt, jgowans, graf, arnd, keescook,
stanislav.kinsburskii, anthony.yznaga, linux-mm, linux-kernel,
madvenka, jamorris
From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
Currently, only one memory region is given to prmem to store persistent
data. In the future, regions may be added dynamically.
The prmem metadata and the regions need to be reserved during early boot
after a kexec. For this to happen, the kernel must know where the metadata
is. To allow this, introduce a kernel command line parameter:
prmem_meta=metadata_address
When a kexec image is loaded into the kernel, add this parameter to the
kexec cmdline. Upon a kexec boot, get the metadata page from the cmdline
and reserve it. Then, walk the list of regions in the metadata and reserve
the regions.
Note that the cmdline modification is done automatically within the kernel.
Userland does not have to do anything.
The metadata needs to be validated before it can be used. To allow this,
compute a checksum on the metadata and store it in the metadata at the end
of shutdown. During early boot, validate the metadata with the checksum.
If the validation fails, discard the metadata. Treat it as a cold boot.
That is, allocate a new metadata page and initial region and start over.
Similarly, if the reservation of the regions fails, treat it as a cold
boot and start over.
This means that all persistent data will be lost on any of these failures.
Note that there will be no memory leak when this happens.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
arch/x86/kernel/kexec-bzimage64.c | 5 +-
arch/x86/kernel/setup.c | 2 +
include/linux/memblock.h | 2 +
include/linux/prmem.h | 11 ++++
kernel/prmem/Makefile | 2 +-
kernel/prmem/prmem_init.c | 9 ++++
kernel/prmem/prmem_misc.c | 85 +++++++++++++++++++++++++++++++
kernel/prmem/prmem_parse.c | 29 +++++++++++
kernel/prmem/prmem_reserve.c | 70 ++++++++++++++++++++++++-
kernel/reboot.c | 2 +
mm/memblock.c | 12 +++++
11 files changed, 226 insertions(+), 3 deletions(-)
create mode 100644 kernel/prmem/prmem_misc.c
diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c
index a61c12c01270..a19f172be410 100644
--- a/arch/x86/kernel/kexec-bzimage64.c
+++ b/arch/x86/kernel/kexec-bzimage64.c
@@ -18,6 +18,7 @@
#include <linux/mm.h>
#include <linux/efi.h>
#include <linux/random.h>
+#include <linux/prmem.h>
#include <asm/bootparam.h>
#include <asm/setup.h>
@@ -82,6 +83,8 @@ static int setup_cmdline(struct kimage *image, struct boot_params *params,
cmdline_ptr[cmdline_len - 1] = '\0';
+ prmem_cmdline(cmdline_ptr);
+
pr_debug("Final command line is: %s\n", cmdline_ptr);
cmdline_ptr_phys = bootparams_load_addr + cmdline_offset;
cmdline_low_32 = cmdline_ptr_phys & 0xffffffffUL;
@@ -458,7 +461,7 @@ static void *bzImage64_load(struct kimage *image, char *kernel,
*/
efi_map_sz = efi_get_runtime_map_size();
params_cmdline_sz = sizeof(struct boot_params) + cmdline_len +
- MAX_ELFCOREHDR_STR_LEN;
+ MAX_ELFCOREHDR_STR_LEN + prmem_cmdline_size();
params_cmdline_sz = ALIGN(params_cmdline_sz, 16);
kbuf.bufsz = params_cmdline_sz + ALIGN(efi_map_sz, 16) +
sizeof(struct setup_data) +
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f2b13b3d3ead..22f5cd494291 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1137,6 +1137,8 @@ void __init setup_arch(char **cmdline_p)
*/
efi_reserve_boot_services();
+ prmem_reserve_early();
+
/* preallocate 4k for mptable mpc */
e820__memblock_alloc_reserved_mpc_new();
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index f71ff9f0ec81..584bbb884c8e 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -114,6 +114,8 @@ int memblock_add(phys_addr_t base, phys_addr_t size);
int memblock_remove(phys_addr_t base, phys_addr_t size);
int memblock_phys_free(phys_addr_t base, phys_addr_t size);
int memblock_reserve(phys_addr_t base, phys_addr_t size);
+void memblock_unreserve(phys_addr_t base, phys_addr_t size);
+
#ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP
int memblock_physmem_add(phys_addr_t base, phys_addr_t size);
#endif
diff --git a/include/linux/prmem.h b/include/linux/prmem.h
index 7f22016c4ad2..bc8054a86f49 100644
--- a/include/linux/prmem.h
+++ b/include/linux/prmem.h
@@ -48,12 +48,16 @@ struct prmem_region {
/*
* PRMEM metadata.
*
+ * checksum Just before reboot, a checksum is computed on the metadata. On
+ * the next kexec reboot, the metadata is validated with the
+ * checksum to make sure that the metadata has not been corrupted.
* metadata Physical address of the metadata page.
* size Size of initial memory allocated to prmem.
*
* regions List of memory regions.
*/
struct prmem {
+ unsigned long checksum;
unsigned long metadata;
size_t size;
@@ -65,12 +69,19 @@ extern struct prmem *prmem;
extern unsigned long prmem_metadata;
extern unsigned long prmem_pa;
extern size_t prmem_size;
+extern bool prmem_inited;
/* Kernel API. */
+void prmem_reserve_early(void);
void prmem_reserve(void);
void prmem_init(void);
+void prmem_fini(void);
+int prmem_cmdline_size(void);
/* Internal functions. */
struct prmem_region *prmem_add_region(unsigned long pa, size_t size);
+unsigned long prmem_checksum(void *start, size_t size);
+bool __init prmem_validate(void);
+void prmem_cmdline(char *cmdline);
#endif /* _LINUX_PRMEM_H */
diff --git a/kernel/prmem/Makefile b/kernel/prmem/Makefile
index 11a53d49312a..9b0a693bfee1 100644
--- a/kernel/prmem/Makefile
+++ b/kernel/prmem/Makefile
@@ -1,3 +1,3 @@
# SPDX-License-Identifier: GPL-2.0
-obj-y += prmem_parse.o prmem_reserve.o prmem_init.o prmem_region.o
+obj-y += prmem_parse.o prmem_reserve.o prmem_init.o prmem_region.o prmem_misc.o
diff --git a/kernel/prmem/prmem_init.c b/kernel/prmem/prmem_init.c
index 97b550252028..9cea1cd3b6a5 100644
--- a/kernel/prmem/prmem_init.c
+++ b/kernel/prmem/prmem_init.c
@@ -25,3 +25,12 @@ void __init prmem_init(void)
}
prmem_inited = true;
}
+
+void prmem_fini(void)
+{
+ if (!prmem_inited)
+ return;
+
+ /* Compute checksum over the metadata. */
+ prmem->checksum = prmem_checksum(prmem, sizeof(*prmem));
+}
diff --git a/kernel/prmem/prmem_misc.c b/kernel/prmem/prmem_misc.c
new file mode 100644
index 000000000000..49b6a7232c1a
--- /dev/null
+++ b/kernel/prmem/prmem_misc.c
@@ -0,0 +1,85 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Persistent-Across-Kexec memory (prmem) - Miscellaneous functions.
+ *
+ * Copyright (C) 2023 Microsoft Corporation
+ * Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
+ */
+#include <linux/prmem.h>
+
+#define MAX_META_LENGTH 31
+
+/*
+ * On a kexec, modify the kernel command line to include the boot parameter
+ * "prmem_meta=" so that the metadata can be found on the next boot. If the
+ * parameter is already present in cmdline, overwrite it. Else, add it.
+ */
+void prmem_cmdline(char *cmdline)
+{
+ char meta[MAX_META_LENGTH], *str;
+ unsigned long metadata;
+
+ metadata = prmem_inited ? prmem->metadata : 0;
+ snprintf(meta, MAX_META_LENGTH, " prmem_meta=0x%.16lx", metadata);
+
+ str = strstr(cmdline, " prmem_meta");
+ if (str) {
+ /*
+ * Boot parameter already exists. Overwrite it. We deliberately
+ * use strncpy() and rely on the fact that it will not NULL
+ * terminate the copy.
+ */
+ strncpy(str, meta, MAX_META_LENGTH - 1);
+ return;
+ }
+ if (prmem_inited) {
+ /* Boot parameter does not exist. Add it. */
+ strcat(cmdline, meta);
+ }
+}
+
+/*
+ * Make sure that the kexec command line can accommodate the prmem_meta
+ * command line parameter.
+ */
+int prmem_cmdline_size(void)
+{
+ return MAX_META_LENGTH;
+}
+
+unsigned long prmem_checksum(void *start, size_t size)
+{
+ unsigned long checksum = 0;
+ unsigned long *ptr;
+ void *end;
+
+ end = start + size;
+ for (ptr = start; (void *) ptr < end; ptr++)
+ checksum += *ptr;
+ return checksum;
+}
+
+/*
+ * Check if the metadata is sane. It would not be sane on a cold boot or if the
+ * metadata has been corrupted. In the latter case, we treat it as a cold boot.
+ */
+bool __init prmem_validate(void)
+{
+ unsigned long checksum;
+
+ /* Sanity check the boot parameter. */
+ if (prmem_metadata != prmem->metadata || prmem_size != prmem->size) {
+ pr_warn("%s: Boot parameter mismatch\n", __func__);
+ return false;
+ }
+
+ /* Compute and check the checksum of the metadata. */
+ checksum = prmem->checksum;
+ prmem->checksum = 0;
+
+ if (checksum != prmem_checksum(prmem, sizeof(*prmem))) {
+ pr_warn("%s: Checksum mismatch\n", __func__);
+ return false;
+ }
+ return true;
+}
diff --git a/kernel/prmem/prmem_parse.c b/kernel/prmem/prmem_parse.c
index 191655b53545..6c1a23c6b84e 100644
--- a/kernel/prmem/prmem_parse.c
+++ b/kernel/prmem/prmem_parse.c
@@ -31,3 +31,32 @@ static int __init prmem_size_parse(char *cmdline)
return 0;
}
early_param("prmem", prmem_size_parse);
+
+/*
+ * Syntax: prmem_meta=metadata_address
+ *
+ * Specifies the address of a single page where the prmem metadata resides.
+ *
+ * On a kexec, the following will be appended to the kernel command line -
+ * "prmem_meta=metadata_address". This is so that the metadata can be located
+ * easily on kexec reboots.
+ */
+static int __init prmem_meta_parse(char *cmdline)
+{
+ char *tmp, *cur = cmdline;
+ unsigned long addr;
+
+ if (!cur)
+ return -EINVAL;
+
+ /* Get metadata address. */
+ addr = memparse(cur, &tmp);
+ if (cur == tmp || addr & (PAGE_SIZE - 1)) {
+ pr_warn("%s: Incorrect address %lx\n", __func__, addr);
+ return -EINVAL;
+ }
+
+ prmem_metadata = addr;
+ return 0;
+}
+early_param("prmem_meta", prmem_meta_parse);
diff --git a/kernel/prmem/prmem_reserve.c b/kernel/prmem/prmem_reserve.c
index e20e31a61d12..8000fff05402 100644
--- a/kernel/prmem/prmem_reserve.c
+++ b/kernel/prmem/prmem_reserve.c
@@ -12,11 +12,79 @@ unsigned long prmem_metadata;
unsigned long prmem_pa;
unsigned long prmem_size;
+void __init prmem_reserve_early(void)
+{
+ struct prmem_region *region;
+ unsigned long nregions;
+
+ /* Need to specify an initial size to enable prmem. */
+ if (!prmem_size)
+ return;
+
+ /* Nothing to be done if it is a cold boot. */
+ if (!prmem_metadata)
+ return;
+
+ /*
+ * prmem uses direct map addresses. If PAGE_OFFSET is randomized,
+ * these addresses will change across kexecs. Persistence cannot
+ * be supported.
+ */
+ if (kaslr_memory_enabled()) {
+ pr_warn("%s: Cannot support persistence because of KASLR.\n",
+ __func__);
+ return;
+ }
+
+ /*
+ * This is a kexec reboot. If any step fails here, treat this like a
+ * cold boot. That is, forget all persistent data and start over.
+ */
+
+ /* Reserve metadata page. */
+ if (memblock_reserve(prmem_metadata, PAGE_SIZE)) {
+ pr_warn("%s: Unable to reserve metadata at %lx\n", __func__,
+ prmem_metadata);
+ return;
+ }
+ prmem = __va(prmem_metadata);
+
+ /* Make sure that the metadata is sane. */
+ if (!prmem_validate())
+ goto unreserve_metadata;
+
+ /* Reserve regions that were added to prmem. */
+ nregions = 0;
+ list_for_each_entry(region, &prmem->regions, node) {
+ if (memblock_reserve(region->pa, region->size)) {
+ pr_warn("%s: Unable to reserve %lx, %lx\n", __func__,
+ region->pa, region->size);
+ goto unreserve_regions;
+ }
+ nregions++;
+ }
+ return;
+
+unreserve_regions:
+ /* Unreserve regions. */
+ list_for_each_entry(region, &prmem->regions, node) {
+ if (!nregions)
+ break;
+ memblock_unreserve(region->pa, region->size);
+ nregions--;
+ }
+
+unreserve_metadata:
+ /* Unreserve the metadata page. */
+ memblock_unreserve(prmem_metadata, PAGE_SIZE);
+ prmem = NULL;
+}
+
void __init prmem_reserve(void)
{
BUILD_BUG_ON(sizeof(*prmem) > PAGE_SIZE);
- if (!prmem_size)
+ if (!prmem_size || prmem)
return;
/*
diff --git a/kernel/reboot.c b/kernel/reboot.c
index 3bba88c7ffc6..b4595b7e77f3 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -13,6 +13,7 @@
#include <linux/kexec.h>
#include <linux/kmod.h>
#include <linux/kmsg_dump.h>
+#include <linux/prmem.h>
#include <linux/reboot.h>
#include <linux/suspend.h>
#include <linux/syscalls.h>
@@ -84,6 +85,7 @@ void kernel_restart_prepare(char *cmd)
system_state = SYSTEM_RESTART;
usermodehelper_disable();
device_shutdown();
+ prmem_fini();
}
/**
diff --git a/mm/memblock.c b/mm/memblock.c
index f9e61e565a53..1f5070f7b5bc 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -873,6 +873,18 @@ int __init_memblock memblock_reserve(phys_addr_t base, phys_addr_t size)
return memblock_add_range(&memblock.reserved, base, size, MAX_NUMNODES, 0);
}
+void __init_memblock memblock_unreserve(phys_addr_t base, phys_addr_t size)
+{
+ phys_addr_t end = base + size - 1;
+
+ memblock_dbg("%s: [%pa-%pa] %pS\n", __func__,
+ &base, &end, (void *)_RET_IP_);
+
+ if (memblock_remove_range(&memblock.reserved, base, size))
+ return;
+ memblock_add_range(&memblock.memory, base, size, MAX_NUMNODES, 0);
+}
+
#ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP
int __init_memblock memblock_physmem_add(phys_addr_t base, phys_addr_t size)
{
--
2.25.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC PATCH v1 02/10] mm/prmem: Reserve metadata and persistent regions in early boot after kexec
2023-10-16 23:32 ` [RFC PATCH v1 02/10] mm/prmem: Reserve metadata and persistent regions in early boot after kexec madvenka
@ 2023-10-17 19:29 ` kernel test robot
0 siblings, 0 replies; 16+ messages in thread
From: kernel test robot @ 2023-10-17 19:29 UTC (permalink / raw)
To: madvenka; +Cc: oe-kbuild-all
Hi,
[This is a private test report for your RFC patch.]
kernel test robot noticed the following build warnings:
[auto build test WARNING on 2dde18cd1d8fac735875f2e4987f11817cc0bc2c]
url: https://github.com/intel-lab-lkp/linux/commits/madvenka-linux-microsoft-com/mm-prmem-Allocate-memory-during-boot-for-storing-persistent-data/20231017-194340
base: 2dde18cd1d8fac735875f2e4987f11817cc0bc2c
patch link: https://lore.kernel.org/r/20231016233215.13090-3-madvenka%40linux.microsoft.com
patch subject: [RFC PATCH v1 02/10] mm/prmem: Reserve metadata and persistent regions in early boot after kexec
config: alpha-allyesconfig (https://download.01.org/0day-ci/archive/20231018/202310180358.7NmPpExl-lkp@intel.com/config)
compiler: alpha-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231018/202310180358.7NmPpExl-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202310180358.7NmPpExl-lkp@intel.com/
All warnings (new ones prefixed by >>):
arch/alpha/include/uapi/asm/errno.h:41: note: this is the location of the previous definition
41 | #define EHOSTDOWN 64 /* Host is down */
|
include/uapi/asm-generic/errno.h:96: warning: "EHOSTUNREACH" redefined
96 | #define EHOSTUNREACH 113 /* No route to host */
|
arch/alpha/include/uapi/asm/errno.h:42: note: this is the location of the previous definition
42 | #define EHOSTUNREACH 65 /* No route to host */
|
include/uapi/asm-generic/errno.h:97: warning: "EALREADY" redefined
97 | #define EALREADY 114 /* Operation already in progress */
|
arch/alpha/include/uapi/asm/errno.h:14: note: this is the location of the previous definition
14 | #define EALREADY 37 /* Operation already in progress */
|
include/uapi/asm-generic/errno.h:98: warning: "EINPROGRESS" redefined
98 | #define EINPROGRESS 115 /* Operation now in progress */
|
arch/alpha/include/uapi/asm/errno.h:13: note: this is the location of the previous definition
13 | #define EINPROGRESS 36 /* Operation now in progress */
|
include/uapi/asm-generic/errno.h:99: warning: "ESTALE" redefined
99 | #define ESTALE 116 /* Stale file handle */
|
arch/alpha/include/uapi/asm/errno.h:47: note: this is the location of the previous definition
47 | #define ESTALE 70 /* Stale file handle */
|
include/uapi/asm-generic/errno.h:105: warning: "EDQUOT" redefined
105 | #define EDQUOT 122 /* Quota exceeded */
|
arch/alpha/include/uapi/asm/errno.h:46: note: this is the location of the previous definition
46 | #define EDQUOT 69 /* Quota exceeded */
|
include/uapi/asm-generic/errno.h:107: warning: "ENOMEDIUM" redefined
107 | #define ENOMEDIUM 123 /* No medium found */
|
arch/alpha/include/uapi/asm/errno.h:112: note: this is the location of the previous definition
112 | #define ENOMEDIUM 129 /* No medium found */
|
include/uapi/asm-generic/errno.h:108: warning: "EMEDIUMTYPE" redefined
108 | #define EMEDIUMTYPE 124 /* Wrong medium type */
|
arch/alpha/include/uapi/asm/errno.h:113: note: this is the location of the previous definition
113 | #define EMEDIUMTYPE 130 /* Wrong medium type */
|
include/uapi/asm-generic/errno.h:109: warning: "ECANCELED" redefined
109 | #define ECANCELED 125 /* Operation Canceled */
|
arch/alpha/include/uapi/asm/errno.h:114: note: this is the location of the previous definition
114 | #define ECANCELED 131 /* Operation Cancelled */
|
include/uapi/asm-generic/errno.h:110: warning: "ENOKEY" redefined
110 | #define ENOKEY 126 /* Required key not available */
|
arch/alpha/include/uapi/asm/errno.h:115: note: this is the location of the previous definition
115 | #define ENOKEY 132 /* Required key not available */
|
include/uapi/asm-generic/errno.h:111: warning: "EKEYEXPIRED" redefined
111 | #define EKEYEXPIRED 127 /* Key has expired */
|
arch/alpha/include/uapi/asm/errno.h:116: note: this is the location of the previous definition
116 | #define EKEYEXPIRED 133 /* Key has expired */
|
include/uapi/asm-generic/errno.h:112: warning: "EKEYREVOKED" redefined
112 | #define EKEYREVOKED 128 /* Key has been revoked */
|
arch/alpha/include/uapi/asm/errno.h:117: note: this is the location of the previous definition
117 | #define EKEYREVOKED 134 /* Key has been revoked */
|
include/uapi/asm-generic/errno.h:113: warning: "EKEYREJECTED" redefined
113 | #define EKEYREJECTED 129 /* Key was rejected by service */
|
arch/alpha/include/uapi/asm/errno.h:118: note: this is the location of the previous definition
118 | #define EKEYREJECTED 135 /* Key was rejected by service */
|
include/uapi/asm-generic/errno.h:116: warning: "EOWNERDEAD" redefined
116 | #define EOWNERDEAD 130 /* Owner died */
|
arch/alpha/include/uapi/asm/errno.h:121: note: this is the location of the previous definition
121 | #define EOWNERDEAD 136 /* Owner died */
|
include/uapi/asm-generic/errno.h:117: warning: "ENOTRECOVERABLE" redefined
117 | #define ENOTRECOVERABLE 131 /* State not recoverable */
|
arch/alpha/include/uapi/asm/errno.h:122: note: this is the location of the previous definition
122 | #define ENOTRECOVERABLE 137 /* State not recoverable */
|
include/uapi/asm-generic/errno.h:119: warning: "ERFKILL" redefined
119 | #define ERFKILL 132 /* Operation not possible due to RF-kill */
|
arch/alpha/include/uapi/asm/errno.h:124: note: this is the location of the previous definition
124 | #define ERFKILL 138 /* Operation not possible due to RF-kill */
|
include/uapi/asm-generic/errno.h:121: warning: "EHWPOISON" redefined
121 | #define EHWPOISON 133 /* Memory page has hardware error */
|
arch/alpha/include/uapi/asm/errno.h:126: note: this is the location of the previous definition
126 | #define EHWPOISON 139 /* Memory page has hardware error */
|
kernel/prmem/prmem_misc.c: In function 'prmem_cmdline':
>> kernel/prmem/prmem_misc.c:32:17: warning: 'strncpy' output may be truncated copying 30 bytes from a string of length 30 [-Wstringop-truncation]
32 | strncpy(str, meta, MAX_META_LENGTH - 1);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vim +/strncpy +32 kernel/prmem/prmem_misc.c
11
12 /*
13 * On a kexec, modify the kernel command line to include the boot parameter
14 * "prmem_meta=" so that the metadata can be found on the next boot. If the
15 * parameter is already present in cmdline, overwrite it. Else, add it.
16 */
17 void prmem_cmdline(char *cmdline)
18 {
19 char meta[MAX_META_LENGTH], *str;
20 unsigned long metadata;
21
22 metadata = prmem_inited ? prmem->metadata : 0;
23 snprintf(meta, MAX_META_LENGTH, " prmem_meta=0x%.16lx", metadata);
24
25 str = strstr(cmdline, " prmem_meta");
26 if (str) {
27 /*
28 * Boot parameter already exists. Overwrite it. We deliberately
29 * use strncpy() and rely on the fact that it will not NULL
30 * terminate the copy.
31 */
> 32 strncpy(str, meta, MAX_META_LENGTH - 1);
33 return;
34 }
35 if (prmem_inited) {
36 /* Boot parameter does not exist. Add it. */
37 strcat(cmdline, meta);
38 }
39 }
40
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC PATCH v1 03/10] mm/prmem: Manage persistent memory with the gen pool allocator.
2023-10-16 23:32 ` [RFC PATCH v1 00/10] mm/prmem: Implement the Persistent-Across-Kexec memory feature (prmem) madvenka
2023-10-16 23:32 ` [RFC PATCH v1 01/10] mm/prmem: Allocate memory during boot for storing persistent data madvenka
2023-10-16 23:32 ` [RFC PATCH v1 02/10] mm/prmem: Reserve metadata and persistent regions in early boot after kexec madvenka
@ 2023-10-16 23:32 ` madvenka
2023-10-16 23:32 ` [RFC PATCH v1 04/10] mm/prmem: Implement a page allocator for persistent memory madvenka
` (7 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: madvenka @ 2023-10-16 23:32 UTC (permalink / raw)
To: gregkh, pbonzini, rppt, jgowans, graf, arnd, keescook,
stanislav.kinsburskii, anthony.yznaga, linux-mm, linux-kernel,
madvenka, jamorris
From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
The memory in a prmem region must be managed by an allocator. Use
the Gen Pool allocator (lib/genalloc.c) for that purpose. This is so we
don't have to write a new allocator.
Now, the Gen Pool allocator uses a "struct gen_pool_chunk" to manage a
contiguous range of memory. The chunk is normally allocated using the kmem
allocator. However, for prmem, the chunk must be persisted across a
kexec reboot so that the allocations can be "remembered". To allow this,
allocate the chunk from the region itself and initialize it. Then, pass
the chunk to the Gen Pool allocator. In other words, persist the chunk.
Inside the Gen Pool allocator, distinguish between a chunk that is
allocated internally from kmem and a chunk that is passed by the caller
and handle it properly when the pool is destroyed.
Provide wrapper functions around the Gen Pool allocator functions so we
can change the allocator in the future if we wanted to.
prmem_create_pool()
prmem_alloc_pool()
prmem_free_pool()
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
include/linux/genalloc.h | 6 ++++
include/linux/prmem.h | 8 +++++
kernel/prmem/prmem_init.c | 8 +++++
kernel/prmem/prmem_region.c | 67 ++++++++++++++++++++++++++++++++++++-
lib/genalloc.c | 45 ++++++++++++++++++-------
5 files changed, 121 insertions(+), 13 deletions(-)
diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 0bd581003cd5..186757b0aec7 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -73,6 +73,7 @@ struct gen_pool_chunk {
struct list_head next_chunk; /* next chunk in pool */
atomic_long_t avail;
phys_addr_t phys_addr; /* physical starting address of memory chunk */
+ bool external; /* Chunk is passed by caller. */
void *owner; /* private data to retrieve at alloc time */
unsigned long start_addr; /* start address of memory chunk */
unsigned long end_addr; /* end address of memory chunk (inclusive) */
@@ -121,6 +122,11 @@ static inline int gen_pool_add(struct gen_pool *pool, unsigned long addr,
{
return gen_pool_add_virt(pool, addr, -1, size, nid);
}
+extern unsigned long gen_pool_chunk_size(size_t size, int min_alloc_order);
+extern void gen_pool_init_chunk(struct gen_pool_chunk *chunk,
+ unsigned long addr, phys_addr_t phys,
+ size_t size, bool external, void *owner);
+void gen_pool_add_chunk(struct gen_pool *pool, struct gen_pool_chunk *chunk);
extern void gen_pool_destroy(struct gen_pool *);
unsigned long gen_pool_alloc_algo_owner(struct gen_pool *pool, size_t size,
genpool_algo_t algo, void *data, void **owner);
diff --git a/include/linux/prmem.h b/include/linux/prmem.h
index bc8054a86f49..f43f5b0d2b9c 100644
--- a/include/linux/prmem.h
+++ b/include/linux/prmem.h
@@ -24,6 +24,7 @@
* non-volatile storage is too slow.
*/
#include <linux/types.h>
+#include <linux/genalloc.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/memblock.h>
@@ -38,11 +39,15 @@
* node List node.
* pa Physical address of the region.
* size Size of the region in bytes.
+ * pool Gen Pool to manage region memory.
+ * chunk Persistent Gen Pool chunk.
*/
struct prmem_region {
struct list_head node;
unsigned long pa;
size_t size;
+ struct gen_pool *pool;
+ struct gen_pool_chunk *chunk;
};
/*
@@ -80,6 +85,9 @@ int prmem_cmdline_size(void);
/* Internal functions. */
struct prmem_region *prmem_add_region(unsigned long pa, size_t size);
+bool prmem_create_pool(struct prmem_region *region, bool new_region);
+void *prmem_alloc_pool(struct prmem_region *region, size_t size, int align);
+void prmem_free_pool(struct prmem_region *region, void *va, size_t size);
unsigned long prmem_checksum(void *start, size_t size);
bool __init prmem_validate(void);
void prmem_cmdline(char *cmdline);
diff --git a/kernel/prmem/prmem_init.c b/kernel/prmem/prmem_init.c
index 9cea1cd3b6a5..56df1e6d3ebc 100644
--- a/kernel/prmem/prmem_init.c
+++ b/kernel/prmem/prmem_init.c
@@ -22,6 +22,14 @@ void __init prmem_init(void)
if (!prmem_add_region(prmem_pa, prmem_size))
return;
+ } else {
+ /* Warm boot. */
+ struct prmem_region *region;
+
+ list_for_each_entry(region, &prmem->regions, node) {
+ if (!prmem_create_pool(region, false))
+ return;
+ }
}
prmem_inited = true;
}
diff --git a/kernel/prmem/prmem_region.c b/kernel/prmem/prmem_region.c
index 8254dafcee13..6dc88c74d9c8 100644
--- a/kernel/prmem/prmem_region.c
+++ b/kernel/prmem/prmem_region.c
@@ -1,12 +1,74 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
- * Persistent-Across-Kexec memory (prmem) - Regions.
+ * Persistent-Across-Kexec memory (prmem) - Regions and Region Pools.
*
* Copyright (C) 2023 Microsoft Corporation
* Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
*/
#include <linux/prmem.h>
+bool prmem_create_pool(struct prmem_region *region, bool new_region)
+{
+ size_t chunk_size, total_size;
+
+ chunk_size = gen_pool_chunk_size(region->size, PAGE_SHIFT);
+ total_size = sizeof(*region) + chunk_size;
+ total_size = ALIGN(total_size, PAGE_SIZE);
+
+ if (new_region) {
+ /*
+ * We place the region structure at the base of the region
+ * itself. Part of the region is a genpool chunk that is used
+ * to manage the region memory.
+ *
+ * Normally, the chunk is allocated from regular memory by
+ * genpool. But in the case of prmem, the chunk must be
+ * persisted across kexecs so allocations can be remembered.
+ * That is why it is allocated from the region memory itself
+ * and passed to genpool.
+ *
+ * Make sure there is enough space for the region and the chunk.
+ */
+ if (total_size >= region->size) {
+ pr_warn("%s: region size too small\n", __func__);
+ return false;
+ }
+
+ /* Initialize the persistent genpool chunk. */
+ region->chunk = (void *) (region + 1);
+ memset(region->chunk, 0, chunk_size);
+ gen_pool_init_chunk(region->chunk, (unsigned long) region,
+ region->pa, region->size, true, NULL);
+ }
+
+ region->pool = gen_pool_create(PAGE_SHIFT, NUMA_NO_NODE);
+ if (!region->pool) {
+ pr_warn("%s: Could not create genpool\n", __func__);
+ return false;
+ }
+
+ gen_pool_add_chunk(region->pool, region->chunk);
+
+ if (new_region) {
+ /* Reserve the region and chunk. */
+ gen_pool_alloc(region->pool, total_size);
+ }
+ return true;
+}
+
+void *prmem_alloc_pool(struct prmem_region *region, size_t size, int align)
+{
+ struct genpool_data_align data = { .align = align, };
+
+ return (void *) gen_pool_alloc_algo(region->pool, size,
+ gen_pool_first_fit_align, &data);
+}
+
+void prmem_free_pool(struct prmem_region *region, void *va, size_t size)
+{
+ gen_pool_free(region->pool, (unsigned long) va, size);
+}
+
struct prmem_region *prmem_add_region(unsigned long pa, size_t size)
{
struct prmem_region *region;
@@ -16,6 +78,9 @@ struct prmem_region *prmem_add_region(unsigned long pa, size_t size)
region->pa = pa;
region->size = size;
+ if (!prmem_create_pool(region, true))
+ return NULL;
+
list_add_tail(®ion->node, &prmem->regions);
return region;
}
diff --git a/lib/genalloc.c b/lib/genalloc.c
index 6c644f954bc5..655db7b47ea9 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -165,6 +165,33 @@ struct gen_pool *gen_pool_create(int min_alloc_order, int nid)
}
EXPORT_SYMBOL(gen_pool_create);
+size_t gen_pool_chunk_size(size_t size, int min_alloc_order)
+{
+ unsigned long nbits = size >> min_alloc_order;
+ unsigned long nbytes = sizeof(struct gen_pool_chunk) +
+ BITS_TO_LONGS(nbits) * sizeof(long);
+ return nbytes;
+}
+
+void gen_pool_init_chunk(struct gen_pool_chunk *chunk, unsigned long virt,
+ phys_addr_t phys, size_t size, bool external,
+ void *owner)
+{
+ chunk->phys_addr = phys;
+ chunk->start_addr = virt;
+ chunk->end_addr = virt + size - 1;
+ chunk->external = external;
+ chunk->owner = owner;
+ atomic_long_set(&chunk->avail, size);
+}
+
+void gen_pool_add_chunk(struct gen_pool *pool, struct gen_pool_chunk *chunk)
+{
+ spin_lock(&pool->lock);
+ list_add_rcu(&chunk->next_chunk, &pool->chunks);
+ spin_unlock(&pool->lock);
+}
+
/**
* gen_pool_add_owner- add a new chunk of special memory to the pool
* @pool: pool to add new memory chunk to
@@ -183,23 +210,14 @@ int gen_pool_add_owner(struct gen_pool *pool, unsigned long virt, phys_addr_t ph
size_t size, int nid, void *owner)
{
struct gen_pool_chunk *chunk;
- unsigned long nbits = size >> pool->min_alloc_order;
- unsigned long nbytes = sizeof(struct gen_pool_chunk) +
- BITS_TO_LONGS(nbits) * sizeof(long);
+ unsigned long nbytes = gen_pool_chunk_size(size, pool->min_alloc_order);
chunk = vzalloc_node(nbytes, nid);
if (unlikely(chunk == NULL))
return -ENOMEM;
- chunk->phys_addr = phys;
- chunk->start_addr = virt;
- chunk->end_addr = virt + size - 1;
- chunk->owner = owner;
- atomic_long_set(&chunk->avail, size);
-
- spin_lock(&pool->lock);
- list_add_rcu(&chunk->next_chunk, &pool->chunks);
- spin_unlock(&pool->lock);
+ gen_pool_init_chunk(chunk, virt, phys, size, false, owner);
+ gen_pool_add_chunk(pool, chunk);
return 0;
}
@@ -248,6 +266,9 @@ void gen_pool_destroy(struct gen_pool *pool)
chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
list_del(&chunk->next_chunk);
+ if (chunk->external)
+ continue;
+
end_bit = chunk_size(chunk) >> order;
bit = find_first_bit(chunk->bits, end_bit);
BUG_ON(bit < end_bit);
--
2.25.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCH v1 04/10] mm/prmem: Implement a page allocator for persistent memory
2023-10-16 23:32 ` [RFC PATCH v1 00/10] mm/prmem: Implement the Persistent-Across-Kexec memory feature (prmem) madvenka
` (2 preceding siblings ...)
2023-10-16 23:32 ` [RFC PATCH v1 03/10] mm/prmem: Manage persistent memory with the gen pool allocator madvenka
@ 2023-10-16 23:32 ` madvenka
2023-10-16 23:32 ` [RFC PATCH v1 05/10] mm/prmem: Implement a buffer " madvenka
` (6 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: madvenka @ 2023-10-16 23:32 UTC (permalink / raw)
To: gregkh, pbonzini, rppt, jgowans, graf, arnd, keescook,
stanislav.kinsburskii, anthony.yznaga, linux-mm, linux-kernel,
madvenka, jamorris
From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
Define the following convenience wrapper functions for allocating and
freeing pages:
- prmem_alloc_pages()
- prmem_free_pages()
The functions look similar to alloc_pages() and __free_pages(). However,
the only GFP flag that is processed is __GFP_ZERO to zero out the
allocated memory.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
include/linux/prmem.h | 7 ++++
kernel/prmem/Makefile | 1 +
kernel/prmem/prmem_allocator.c | 74 ++++++++++++++++++++++++++++++++++
kernel/prmem/prmem_init.c | 2 +
4 files changed, 84 insertions(+)
create mode 100644 kernel/prmem/prmem_allocator.c
diff --git a/include/linux/prmem.h b/include/linux/prmem.h
index f43f5b0d2b9c..108683933c82 100644
--- a/include/linux/prmem.h
+++ b/include/linux/prmem.h
@@ -75,6 +75,7 @@ extern unsigned long prmem_metadata;
extern unsigned long prmem_pa;
extern size_t prmem_size;
extern bool prmem_inited;
+extern spinlock_t prmem_lock;
/* Kernel API. */
void prmem_reserve_early(void);
@@ -83,11 +84,17 @@ void prmem_init(void);
void prmem_fini(void);
int prmem_cmdline_size(void);
+/* Allocator API. */
+struct page *prmem_alloc_pages(unsigned int order, gfp_t gfp);
+void prmem_free_pages(struct page *pages, unsigned int order);
+
/* Internal functions. */
struct prmem_region *prmem_add_region(unsigned long pa, size_t size);
bool prmem_create_pool(struct prmem_region *region, bool new_region);
void *prmem_alloc_pool(struct prmem_region *region, size_t size, int align);
void prmem_free_pool(struct prmem_region *region, void *va, size_t size);
+void *prmem_alloc_pages_locked(unsigned int order);
+void prmem_free_pages_locked(void *va, unsigned int order);
unsigned long prmem_checksum(void *start, size_t size);
bool __init prmem_validate(void);
void prmem_cmdline(char *cmdline);
diff --git a/kernel/prmem/Makefile b/kernel/prmem/Makefile
index 9b0a693bfee1..99bb19f0afd3 100644
--- a/kernel/prmem/Makefile
+++ b/kernel/prmem/Makefile
@@ -1,3 +1,4 @@
# SPDX-License-Identifier: GPL-2.0
obj-y += prmem_parse.o prmem_reserve.o prmem_init.o prmem_region.o prmem_misc.o
+obj-y += prmem_allocator.o
diff --git a/kernel/prmem/prmem_allocator.c b/kernel/prmem/prmem_allocator.c
new file mode 100644
index 000000000000..07a5a430630c
--- /dev/null
+++ b/kernel/prmem/prmem_allocator.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Persistent-Across-Kexec memory feature (prmem) - Allocator.
+ *
+ * Copyright (C) 2023 Microsoft Corporation
+ * Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
+ */
+#include <linux/prmem.h>
+
+/* Page Allocation functions. */
+
+void *prmem_alloc_pages_locked(unsigned int order)
+{
+ struct prmem_region *region;
+ void *va;
+ size_t size = (1UL << order) << PAGE_SHIFT;
+
+ list_for_each_entry(region, &prmem->regions, node) {
+ va = prmem_alloc_pool(region, size, size);
+ if (va)
+ return va;
+ }
+ return NULL;
+}
+
+struct page *prmem_alloc_pages(unsigned int order, gfp_t gfp)
+{
+ void *va;
+ size_t size = (1UL << order) << PAGE_SHIFT;
+ bool zero = !!(gfp & __GFP_ZERO);
+
+ if (!prmem_inited || order > MAX_ORDER)
+ return NULL;
+
+ spin_lock(&prmem_lock);
+ va = prmem_alloc_pages_locked(order);
+ spin_unlock(&prmem_lock);
+
+ if (va) {
+ if (zero)
+ memset(va, 0, size);
+ return virt_to_page(va);
+ }
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(prmem_alloc_pages);
+
+void prmem_free_pages_locked(void *va, unsigned int order)
+{
+ struct prmem_region *region;
+ size_t size = (1UL << order) << PAGE_SHIFT;
+ void *eva = va + size;
+ void *region_va;
+
+ list_for_each_entry(region, &prmem->regions, node) {
+ /* The region structure is at the base of the region memory. */
+ region_va = region;
+ if (va >= region_va && eva <= (region_va + region->size)) {
+ prmem_free_pool(region, va, size);
+ return;
+ }
+ }
+}
+
+void prmem_free_pages(struct page *pages, unsigned int order)
+{
+ if (!prmem_inited || order > MAX_ORDER)
+ return;
+
+ spin_lock(&prmem_lock);
+ prmem_free_pages_locked(page_to_virt(pages), order);
+ spin_unlock(&prmem_lock);
+}
+EXPORT_SYMBOL_GPL(prmem_free_pages);
diff --git a/kernel/prmem/prmem_init.c b/kernel/prmem/prmem_init.c
index 56df1e6d3ebc..d23833d296fe 100644
--- a/kernel/prmem/prmem_init.c
+++ b/kernel/prmem/prmem_init.c
@@ -9,6 +9,8 @@
bool prmem_inited;
+DEFINE_SPINLOCK(prmem_lock);
+
void __init prmem_init(void)
{
if (!prmem)
--
2.25.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCH v1 05/10] mm/prmem: Implement a buffer allocator for persistent memory
2023-10-16 23:32 ` [RFC PATCH v1 00/10] mm/prmem: Implement the Persistent-Across-Kexec memory feature (prmem) madvenka
` (3 preceding siblings ...)
2023-10-16 23:32 ` [RFC PATCH v1 04/10] mm/prmem: Implement a page allocator for persistent memory madvenka
@ 2023-10-16 23:32 ` madvenka
2023-10-16 23:32 ` [RFC PATCH v1 06/10] mm/prmem: Implement persistent XArray (and Radix Tree) madvenka
` (5 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: madvenka @ 2023-10-16 23:32 UTC (permalink / raw)
To: gregkh, pbonzini, rppt, jgowans, graf, arnd, keescook,
stanislav.kinsburskii, anthony.yznaga, linux-mm, linux-kernel,
madvenka, jamorris
From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
Implement functions that can allocate and free memory smaller than a page
size.
- prmem_alloc()
- prmem_free()
These functions look like kmalloc() and kfree(). However, the only GFP flag
that is processed is __GFP_ZERO to zero out the allocated memory.
To make the implementation simpler, create allocation caches for different
object sizes:
8, 16, 32, 64, ..., PAGE_SIZE
For a given size, allocate from the appropriate cache. This idea has been
plagiarized from the kmem allocator.
To fill the cache of a specific size, allocate a page, break it up into
equal sized objects and add the objects to the cache. This is just a very
simple allocator. It does not attempt to do sophisticated things like
cache coloring, coalescing objects that belong to the same page so the
page can be freed, etc.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
include/linux/prmem.h | 12 ++++
kernel/prmem/prmem_allocator.c | 112 ++++++++++++++++++++++++++++++++-
2 files changed, 123 insertions(+), 1 deletion(-)
diff --git a/include/linux/prmem.h b/include/linux/prmem.h
index 108683933c82..1cb4660cf35e 100644
--- a/include/linux/prmem.h
+++ b/include/linux/prmem.h
@@ -50,6 +50,8 @@ struct prmem_region {
struct gen_pool_chunk *chunk;
};
+#define PRMEM_MAX_CACHES 14
+
/*
* PRMEM metadata.
*
@@ -60,6 +62,9 @@ struct prmem_region {
* size Size of initial memory allocated to prmem.
*
* regions List of memory regions.
+ *
+ * caches Caches for different object sizes. For allocations smaller than
+ * PAGE_SIZE, these caches are used.
*/
struct prmem {
unsigned long checksum;
@@ -68,6 +73,9 @@ struct prmem {
/* Persistent Regions. */
struct list_head regions;
+
+ /* Allocation caches. */
+ void *caches[PRMEM_MAX_CACHES];
};
extern struct prmem *prmem;
@@ -87,6 +95,8 @@ int prmem_cmdline_size(void);
/* Allocator API. */
struct page *prmem_alloc_pages(unsigned int order, gfp_t gfp);
void prmem_free_pages(struct page *pages, unsigned int order);
+void *prmem_alloc(size_t size, gfp_t gfp);
+void prmem_free(void *va, size_t size);
/* Internal functions. */
struct prmem_region *prmem_add_region(unsigned long pa, size_t size);
@@ -95,6 +105,8 @@ void *prmem_alloc_pool(struct prmem_region *region, size_t size, int align);
void prmem_free_pool(struct prmem_region *region, void *va, size_t size);
void *prmem_alloc_pages_locked(unsigned int order);
void prmem_free_pages_locked(void *va, unsigned int order);
+void *prmem_alloc_locked(size_t size);
+void prmem_free_locked(void *va, size_t size);
unsigned long prmem_checksum(void *start, size_t size);
bool __init prmem_validate(void);
void prmem_cmdline(char *cmdline);
diff --git a/kernel/prmem/prmem_allocator.c b/kernel/prmem/prmem_allocator.c
index 07a5a430630c..f12975bc6777 100644
--- a/kernel/prmem/prmem_allocator.c
+++ b/kernel/prmem/prmem_allocator.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
/*
- * Persistent-Across-Kexec memory feature (prmem) - Allocator.
+ * Persistent-Across-Kexec memory (prmem) - Allocator.
*
* Copyright (C) 2023 Microsoft Corporation
* Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
@@ -72,3 +72,113 @@ void prmem_free_pages(struct page *pages, unsigned int order)
spin_unlock(&prmem_lock);
}
EXPORT_SYMBOL_GPL(prmem_free_pages);
+
+/* Buffer allocation functions. */
+
+#if PAGE_SIZE > 65536
+#error "Page size is too big"
+#endif
+
+static size_t prmem_cache_sizes[PRMEM_MAX_CACHES] = {
+ 8, 16, 32, 64, 128, 256, 512,
+ 1024, 2048, 4096, 8192, 16384, 32768, 65536,
+};
+
+static int prmem_cache_index(size_t size)
+{
+ int i;
+
+ for (i = 0; i < PRMEM_MAX_CACHES; i++) {
+ if (size <= prmem_cache_sizes[i])
+ return i;
+ }
+ BUG();
+}
+
+static void prmem_refill(void **cache, size_t size)
+{
+ void *va;
+ int i, n = PAGE_SIZE / size;
+
+ /* Allocate a page. */
+ va = prmem_alloc_pages_locked(0);
+ if (!va)
+ return;
+
+ /* Break up the page into pieces and put them in the cache. */
+ for (i = 0; i < n; i++, va += size) {
+ *((void **) va) = *cache;
+ *cache = va;
+ }
+}
+
+void *prmem_alloc_locked(size_t size)
+{
+ void *va;
+ int index;
+ void **cache;
+
+ index = prmem_cache_index(size);
+ size = prmem_cache_sizes[index];
+
+ cache = &prmem->caches[index];
+ if (!*cache) {
+ /* Refill the cache. */
+ prmem_refill(cache, size);
+ }
+
+ /* Allocate one from the cache. */
+ va = *cache;
+ if (va)
+ *cache = *((void **) va);
+ return va;
+}
+
+void *prmem_alloc(size_t size, gfp_t gfp)
+{
+ void *va;
+ bool zero = !!(gfp & __GFP_ZERO);
+
+ if (!prmem_inited || !size)
+ return NULL;
+
+ /* This function is only for sizes up to a PAGE_SIZE. */
+ if (size > PAGE_SIZE)
+ return NULL;
+
+ spin_lock(&prmem_lock);
+ va = prmem_alloc_locked(size);
+ spin_unlock(&prmem_lock);
+
+ if (va && zero)
+ memset(va, 0, size);
+ return va;
+}
+EXPORT_SYMBOL_GPL(prmem_alloc);
+
+void prmem_free_locked(void *va, size_t size)
+{
+ int index;
+ void **cache;
+
+ /* Free the object into its cache. */
+ index = prmem_cache_index(size);
+ cache = &prmem->caches[index];
+ *((void **) va) = *cache;
+ *cache = va;
+}
+
+void prmem_free(void *va, size_t size)
+{
+ if (!prmem_inited || !va || !size)
+ return;
+
+ /* This function is only for sizes up to a PAGE_SIZE. */
+ if (size > PAGE_SIZE)
+ return;
+
+ spin_lock(&prmem_lock);
+ prmem_free_locked(va, size);
+ spin_unlock(&prmem_lock);
+}
+EXPORT_SYMBOL_GPL(prmem_free);
--
2.25.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCH v1 06/10] mm/prmem: Implement persistent XArray (and Radix Tree)
2023-10-16 23:32 ` [RFC PATCH v1 00/10] mm/prmem: Implement the Persistent-Across-Kexec memory feature (prmem) madvenka
` (4 preceding siblings ...)
2023-10-16 23:32 ` [RFC PATCH v1 05/10] mm/prmem: Implement a buffer " madvenka
@ 2023-10-16 23:32 ` madvenka
2023-10-16 23:32 ` [RFC PATCH v1 07/10] mm/prmem: Implement named Persistent Instances madvenka
` (4 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: madvenka @ 2023-10-16 23:32 UTC (permalink / raw)
To: gregkh, pbonzini, rppt, jgowans, graf, arnd, keescook,
stanislav.kinsburskii, anthony.yznaga, linux-mm, linux-kernel,
madvenka, jamorris
From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
Consumers can persist their data structures by allocating persistent
memory for them.
Now, data structures are connected to one another using pointers, arrays,
linked lists, RB nodes, etc. These can all be persisted by allocating
memory for them from persistent memory. E.g., a linked list is persisted
if the data structures that embed the list head and the list nodes are
allocated from persistent memory. Ditto for RB trees.
One important exception is the XArray. The XArray itself can be embedded in
a persistent data structure. However, the XA nodes are allocated using the
kmem allocator.
Implement a persistent XArray. Introduce a new field, xa_persistent, in the
XArray. Implement an accessor function to set the field. If xa_persistent
is true, allocate XA nodes using the prmem allocator instead of the kmem
allocator. This makes the whole XArray persistent.
Since Radix Trees (lib/radix-tree.c) are implemented based on the XArray,
we also get persistent Radix Trees. The only difference is that pre-loading
is not supported for persistent Radix Tree nodes.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
include/linux/radix-tree.h | 4 ++++
include/linux/xarray.h | 15 ++++++++++++
lib/radix-tree.c | 49 +++++++++++++++++++++++++++++++-------
lib/xarray.c | 11 +++++----
4 files changed, 66 insertions(+), 13 deletions(-)
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index eae67015ce51..74f0bdc60bea 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -82,6 +82,7 @@ static inline bool radix_tree_is_internal_node(void *ptr)
struct radix_tree_root name = RADIX_TREE_INIT(name, mask)
#define INIT_RADIX_TREE(root, mask) xa_init_flags(root, mask)
+#define PERSIST_RADIX_TREE(root) xa_persistent(root)
static inline bool radix_tree_empty(const struct radix_tree_root *root)
{
@@ -254,6 +255,9 @@ unsigned int radix_tree_gang_lookup_tag_slot(const struct radix_tree_root *,
void __rcu ***results, unsigned long first_index,
unsigned int max_items, unsigned int tag);
int radix_tree_tagged(const struct radix_tree_root *, unsigned int tag);
+struct radix_tree_node *radix_node_alloc(struct radix_tree_root *root,
+ struct list_lru *lru, gfp_t gfp);
+void radix_node_free(struct radix_tree_node *node);
static inline void radix_tree_preload_end(void)
{
diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 741703b45f61..3176a5f62caf 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -295,6 +295,7 @@ enum xa_lock_type {
*/
struct xarray {
spinlock_t xa_lock;
+ bool xa_persistent;
/* private: The rest of the data structure is not to be used directly. */
gfp_t xa_flags;
void __rcu * xa_head;
@@ -302,6 +303,7 @@ struct xarray {
#define XARRAY_INIT(name, flags) { \
.xa_lock = __SPIN_LOCK_UNLOCKED(name.xa_lock), \
+ .xa_persistent = false, \
.xa_flags = flags, \
.xa_head = NULL, \
}
@@ -378,6 +380,7 @@ void xa_destroy(struct xarray *);
static inline void xa_init_flags(struct xarray *xa, gfp_t flags)
{
spin_lock_init(&xa->xa_lock);
+ xa->xa_persistent = false;
xa->xa_flags = flags;
xa->xa_head = NULL;
}
@@ -395,6 +398,17 @@ static inline void xa_init(struct xarray *xa)
xa_init_flags(xa, 0);
}
+/**
+ * xa_peristent() - xa_root and xa_node allocated from persistent memory.
+ * @xa: XArray.
+ *
+ * Context: Any context.
+ */
+static inline void xa_persistent(struct xarray *xa)
+{
+ xa->xa_persistent = true;
+}
+
/**
* xa_empty() - Determine if an array has any present entries.
* @xa: XArray.
@@ -1142,6 +1156,7 @@ struct xa_node {
unsigned char offset; /* Slot offset in parent */
unsigned char count; /* Total entry count */
unsigned char nr_values; /* Value entry count */
+ bool persistent; /* Allocated from persistent memory. */
struct xa_node __rcu *parent; /* NULL at top of tree */
struct xarray *array; /* The array we belong to */
union {
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 976b9bd02a1b..d3af6ff6c625 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -21,6 +21,7 @@
#include <linux/kmemleak.h>
#include <linux/percpu.h>
#include <linux/preempt.h> /* in_interrupt() */
+#include <linux/prmem.h>
#include <linux/radix-tree.h>
#include <linux/rcupdate.h>
#include <linux/slab.h>
@@ -225,6 +226,36 @@ static unsigned long next_index(unsigned long index,
return (index & ~node_maxindex(node)) + (offset << node->shift);
}
+static void radix_tree_node_ctor(void *arg);
+
+struct radix_tree_node *
+radix_node_alloc(struct radix_tree_root *root, struct list_lru *lru, gfp_t gfp)
+{
+ struct radix_tree_node *node;
+
+ if (root && root->xa_persistent) {
+ node = prmem_alloc(sizeof(struct radix_tree_node), gfp);
+ if (node) {
+ radix_tree_node_ctor(node);
+ node->persistent = true;
+ }
+ } else {
+ node = kmem_cache_alloc_lru(radix_tree_node_cachep, lru, gfp);
+ if (node)
+ node->persistent = false;
+ }
+ return node;
+}
+
+void radix_node_free(struct radix_tree_node *node)
+{
+ if (node->persistent) {
+ prmem_free(node, sizeof(*node));
+ return;
+ }
+ kmem_cache_free(radix_tree_node_cachep, node);
+}
+
/*
* This assumes that the caller has performed appropriate preallocation, and
* that the caller has pinned this thread of control to the current CPU.
@@ -241,8 +272,11 @@ radix_tree_node_alloc(gfp_t gfp_mask, struct radix_tree_node *parent,
* Preload code isn't irq safe and it doesn't make sense to use
* preloading during an interrupt anyway as all the allocations have
* to be atomic. So just do normal allocation when in interrupt.
+ *
+ * Also, there is no preloading for persistent trees.
*/
- if (!gfpflags_allow_blocking(gfp_mask) && !in_interrupt()) {
+ if (!gfpflags_allow_blocking(gfp_mask) && !in_interrupt() &&
+ !root->xa_persistent) {
struct radix_tree_preload *rtp;
/*
@@ -250,8 +284,7 @@ radix_tree_node_alloc(gfp_t gfp_mask, struct radix_tree_node *parent,
* cache first for the new node to get accounted to the memory
* cgroup.
*/
- ret = kmem_cache_alloc(radix_tree_node_cachep,
- gfp_mask | __GFP_NOWARN);
+ ret = radix_node_alloc(root, NULL, gfp_mask | __GFP_NOWARN);
if (ret)
goto out;
@@ -273,7 +306,7 @@ radix_tree_node_alloc(gfp_t gfp_mask, struct radix_tree_node *parent,
kmemleak_update_trace(ret);
goto out;
}
- ret = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);
+ ret = radix_node_alloc(root, NULL, gfp_mask);
out:
BUG_ON(radix_tree_is_internal_node(ret));
if (ret) {
@@ -301,7 +334,7 @@ void radix_tree_node_rcu_free(struct rcu_head *head)
memset(node->tags, 0, sizeof(node->tags));
INIT_LIST_HEAD(&node->private_list);
- kmem_cache_free(radix_tree_node_cachep, node);
+ radix_node_free(node);
}
static inline void
@@ -335,7 +368,7 @@ static __must_check int __radix_tree_preload(gfp_t gfp_mask, unsigned nr)
rtp = this_cpu_ptr(&radix_tree_preloads);
while (rtp->nr < nr) {
local_unlock(&radix_tree_preloads.lock);
- node = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);
+ node = radix_node_alloc(NULL, NULL, gfp_mask);
if (node == NULL)
goto out;
local_lock(&radix_tree_preloads.lock);
@@ -345,7 +378,7 @@ static __must_check int __radix_tree_preload(gfp_t gfp_mask, unsigned nr)
rtp->nodes = node;
rtp->nr++;
} else {
- kmem_cache_free(radix_tree_node_cachep, node);
+ radix_node_free(node);
}
}
ret = 0;
@@ -1585,7 +1618,7 @@ static int radix_tree_cpu_dead(unsigned int cpu)
while (rtp->nr) {
node = rtp->nodes;
rtp->nodes = node->parent;
- kmem_cache_free(radix_tree_node_cachep, node);
+ radix_node_free(node);
rtp->nr--;
}
return 0;
diff --git a/lib/xarray.c b/lib/xarray.c
index 2071a3718f4e..33a74b713e6a 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -9,6 +9,7 @@
#include <linux/bitmap.h>
#include <linux/export.h>
#include <linux/list.h>
+#include <linux/prmem.h>
#include <linux/slab.h>
#include <linux/xarray.h>
@@ -303,7 +304,7 @@ bool xas_nomem(struct xa_state *xas, gfp_t gfp)
}
if (xas->xa->xa_flags & XA_FLAGS_ACCOUNT)
gfp |= __GFP_ACCOUNT;
- xas->xa_alloc = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp);
+ xas->xa_alloc = radix_node_alloc(xas->xa, xas->xa_lru, gfp);
if (!xas->xa_alloc)
return false;
xas->xa_alloc->parent = NULL;
@@ -335,10 +336,10 @@ static bool __xas_nomem(struct xa_state *xas, gfp_t gfp)
gfp |= __GFP_ACCOUNT;
if (gfpflags_allow_blocking(gfp)) {
xas_unlock_type(xas, lock_type);
- xas->xa_alloc = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp);
+ xas->xa_alloc = radix_node_alloc(xas->xa, xas->xa_lru, gfp);
xas_lock_type(xas, lock_type);
} else {
- xas->xa_alloc = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp);
+ xas->xa_alloc = radix_node_alloc(xas->xa, xas->xa_lru, gfp);
}
if (!xas->xa_alloc)
return false;
@@ -372,7 +373,7 @@ static void *xas_alloc(struct xa_state *xas, unsigned int shift)
if (xas->xa->xa_flags & XA_FLAGS_ACCOUNT)
gfp |= __GFP_ACCOUNT;
- node = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp);
+ node = radix_node_alloc(xas->xa, xas->xa_lru, gfp);
if (!node) {
xas_set_err(xas, -ENOMEM);
return NULL;
@@ -1017,7 +1018,7 @@ void xas_split_alloc(struct xa_state *xas, void *entry, unsigned int order,
void *sibling = NULL;
struct xa_node *node;
- node = kmem_cache_alloc_lru(radix_tree_node_cachep, xas->xa_lru, gfp);
+ node = radix_node_alloc(xas->xa, xas->xa_lru, gfp);
if (!node)
goto nomem;
node->array = xas->xa;
--
2.25.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCH v1 07/10] mm/prmem: Implement named Persistent Instances.
2023-10-16 23:32 ` [RFC PATCH v1 00/10] mm/prmem: Implement the Persistent-Across-Kexec memory feature (prmem) madvenka
` (5 preceding siblings ...)
2023-10-16 23:32 ` [RFC PATCH v1 06/10] mm/prmem: Implement persistent XArray (and Radix Tree) madvenka
@ 2023-10-16 23:32 ` madvenka
2023-10-16 23:32 ` [RFC PATCH v1 08/10] mm/prmem: Implement Persistent Ramdisk instances madvenka
` (3 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: madvenka @ 2023-10-16 23:32 UTC (permalink / raw)
To: gregkh, pbonzini, rppt, jgowans, graf, arnd, keescook,
stanislav.kinsburskii, anthony.yznaga, linux-mm, linux-kernel,
madvenka, jamorris
From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
To persist any data, a consumer needs to do the following:
- create a persistent instance for it. The instance gets recorded
in the metadata.
- Name the instance.
- Record the instance data in the instance.
- Retrieve the instance by name after kexec.
- Retrieve instance data.
Implement the following API for consumers:
prmem_get(subsystem, name, create)
Get/Create a persistent instance. The consumer provides the name
of the subsystem and the name of the instance within the subsystem.
E.g., for a persistent ramdisk block device:
subsystem = "ramdisk"
instance = "pram0"
prmem_set_data()
Record a data pointer and a size for the instance. An instance may
contain many data structures connected to each other using pointers,
etc. A consumer is expected to record the top level data structure
in the instance. All other data structures must be reachable from
the top level data structure.
prmem_get_data()
Retrieve the data pointer and the size for the instance.
prmem_put()
Destroy a persistent instance. The instance data must be NULL at
this point. So, the consumer is responsible for freeing the
instance data and setting it to NULL in the instance prior to
destroying.
prmem_list()
Walk the instances of a subsystem and call a callback for each.
This allows a consumer to enumerate all of the instances associated
with a subsystem.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
include/linux/prmem.h | 36 +++++++++
kernel/prmem/Makefile | 2 +-
kernel/prmem/prmem_init.c | 1 +
kernel/prmem/prmem_instance.c | 139 ++++++++++++++++++++++++++++++++++
4 files changed, 177 insertions(+), 1 deletion(-)
create mode 100644 kernel/prmem/prmem_instance.c
diff --git a/include/linux/prmem.h b/include/linux/prmem.h
index 1cb4660cf35e..c7034690f7cb 100644
--- a/include/linux/prmem.h
+++ b/include/linux/prmem.h
@@ -50,6 +50,28 @@ struct prmem_region {
struct gen_pool_chunk *chunk;
};
+#define PRMEM_MAX_NAME 32
+
+/*
+ * To persist any data, a persistent instance is created for it and the data is
+ * "remembered" in the instance.
+ *
+ * node List node
+ * subsystem Subsystem/driver/module that created the instance. E.g.,
+ * "ramdisk" for the ramdisk driver.
+ * name Instance name within the subsystem/driver/module. E.g., "pram0"
+ * for a persistent ramdisk instance.
+ * data Pointer to data. E.g., the radix tree of pages in a ram disk.
+ * size Size of data.
+ */
+struct prmem_instance {
+ struct list_head node;
+ char subsystem[PRMEM_MAX_NAME];
+ char name[PRMEM_MAX_NAME];
+ void *data;
+ size_t size;
+};
+
#define PRMEM_MAX_CACHES 14
/*
@@ -63,6 +85,8 @@ struct prmem_region {
*
* regions List of memory regions.
*
+ * instances Persistent instances.
+ *
* caches Caches for different object sizes. For allocations smaller than
* PAGE_SIZE, these caches are used.
*/
@@ -74,6 +98,9 @@ struct prmem {
/* Persistent Regions. */
struct list_head regions;
+ /* Persistent Instances. */
+ struct list_head instances;
+
/* Allocation caches. */
void *caches[PRMEM_MAX_CACHES];
};
@@ -85,6 +112,8 @@ extern size_t prmem_size;
extern bool prmem_inited;
extern spinlock_t prmem_lock;
+typedef int (*prmem_list_func_t)(struct prmem_instance *instance, void *arg);
+
/* Kernel API. */
void prmem_reserve_early(void);
void prmem_reserve(void);
@@ -98,6 +127,13 @@ void prmem_free_pages(struct page *pages, unsigned int order);
void *prmem_alloc(size_t size, gfp_t gfp);
void prmem_free(void *va, size_t size);
+/* Persistent Instance API. */
+void *prmem_get(char *subsystem, char *name, bool create);
+void prmem_set_data(struct prmem_instance *instance, void *data, size_t size);
+void prmem_get_data(struct prmem_instance *instance, void **data, size_t *size);
+bool prmem_put(struct prmem_instance *instance);
+int prmem_list(char *subsystem, prmem_list_func_t func, void *arg);
+
/* Internal functions. */
struct prmem_region *prmem_add_region(unsigned long pa, size_t size);
bool prmem_create_pool(struct prmem_region *region, bool new_region);
diff --git a/kernel/prmem/Makefile b/kernel/prmem/Makefile
index 99bb19f0afd3..0ed7976580d6 100644
--- a/kernel/prmem/Makefile
+++ b/kernel/prmem/Makefile
@@ -1,4 +1,4 @@
# SPDX-License-Identifier: GPL-2.0
obj-y += prmem_parse.o prmem_reserve.o prmem_init.o prmem_region.o prmem_misc.o
-obj-y += prmem_allocator.o
+obj-y += prmem_allocator.o prmem_instance.o
diff --git a/kernel/prmem/prmem_init.c b/kernel/prmem/prmem_init.c
index d23833d296fe..166fca688ab3 100644
--- a/kernel/prmem/prmem_init.c
+++ b/kernel/prmem/prmem_init.c
@@ -21,6 +21,7 @@ void __init prmem_init(void)
prmem->metadata = prmem_metadata;
prmem->size = prmem_size;
INIT_LIST_HEAD(&prmem->regions);
+ INIT_LIST_HEAD(&prmem->instances);
if (!prmem_add_region(prmem_pa, prmem_size))
return;
diff --git a/kernel/prmem/prmem_instance.c b/kernel/prmem/prmem_instance.c
new file mode 100644
index 000000000000..ee3554d0ab8b
--- /dev/null
+++ b/kernel/prmem/prmem_instance.c
@@ -0,0 +1,139 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Persistent-Across-Kexec memory (prmem) - Persistent instances.
+ *
+ * Copyright (C) 2023 Microsoft Corporation
+ * Author: Madhavan T. Venkataraman (madvenka@linux.microsoft.com)
+ */
+#include <linux/prmem.h>
+
+static struct prmem_instance *prmem_find(char *subsystem, char *name)
+{
+ struct prmem_instance *instance;
+
+ list_for_each_entry(instance, &prmem->instances, node) {
+ if (!strcmp(instance->subsystem, subsystem) &&
+ !strcmp(instance->name, name)) {
+ return instance;
+ }
+ }
+ return NULL;
+}
+
+void *prmem_get(char *subsystem, char *name, bool create)
+{
+ int subsystem_len = strlen(subsystem);
+ int name_len = strlen(name);
+ struct prmem_instance *instance;
+
+ /*
+ * In early boot, you are allowed to get an existing instance. But
+ * you are not allowed to create one until prmem is fully initialized.
+ */
+ if (!prmem || (!prmem_inited && create))
+ return NULL;
+
+ if (!subsystem_len || subsystem_len >= PRMEM_MAX_NAME ||
+ !name_len || name_len >= PRMEM_MAX_NAME) {
+ return NULL;
+ }
+
+ spin_lock(&prmem_lock);
+
+ /* Check if it already exists. */
+ instance = prmem_find(subsystem, name);
+ if (instance || !create)
+ goto unlock;
+
+ instance = prmem_alloc_locked(sizeof(*instance));
+ if (!instance)
+ goto unlock;
+
+ strcpy(instance->subsystem, subsystem);
+ strcpy(instance->name, name);
+ instance->data = NULL;
+ instance->size = 0;
+
+ list_add_tail(&instance->node, &prmem->instances);
+unlock:
+ spin_unlock(&prmem_lock);
+ return instance;
+}
+EXPORT_SYMBOL_GPL(prmem_get);
+
+void prmem_set_data(struct prmem_instance *instance, void *data, size_t size)
+{
+ if (!prmem_inited)
+ return;
+
+ spin_lock(&prmem_lock);
+ instance->data = data;
+ instance->size = size;
+ spin_unlock(&prmem_lock);
+}
+EXPORT_SYMBOL_GPL(prmem_set_data);
+
+void prmem_get_data(struct prmem_instance *instance, void **data, size_t *size)
+{
+ if (!prmem)
+ return;
+
+ spin_lock(&prmem_lock);
+ *data = instance->data;
+ *size = instance->size;
+ spin_unlock(&prmem_lock);
+}
+EXPORT_SYMBOL_GPL(prmem_get_data);
+
+bool prmem_put(struct prmem_instance *instance)
+{
+ if (!prmem_inited)
+ return true;
+
+ spin_lock(&prmem_lock);
+
+ if (instance->data) {
+ /*
+ * Caller is responsible for freeing instance data and setting
+ * it to NULL.
+ */
+ spin_unlock(&prmem_lock);
+ return false;
+ }
+
+ /* Free instance. */
+ list_del(&instance->node);
+ prmem_free_locked(instance, sizeof(*instance));
+
+ spin_unlock(&prmem_lock);
+ return true;
+}
+EXPORT_SYMBOL_GPL(prmem_put);
+
+int prmem_list(char *subsystem, prmem_list_func_t func, void *arg)
+{
+ int subsystem_len = strlen(subsystem);
+ struct prmem_instance *instance;
+ int ret;
+
+ if (!prmem)
+ return 0;
+
+ if (!subsystem_len || subsystem_len >= PRMEM_MAX_NAME)
+ return -EINVAL;
+
+ spin_lock(&prmem_lock);
+
+ list_for_each_entry(instance, &prmem->instances, node) {
+ if (strcmp(instance->subsystem, subsystem))
+ continue;
+
+ ret = func(instance, arg);
+ if (ret)
+ break;
+ }
+
+ spin_unlock(&prmem_lock);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(prmem_list);
--
2.25.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCH v1 08/10] mm/prmem: Implement Persistent Ramdisk instances.
2023-10-16 23:32 ` [RFC PATCH v1 00/10] mm/prmem: Implement the Persistent-Across-Kexec memory feature (prmem) madvenka
` (6 preceding siblings ...)
2023-10-16 23:32 ` [RFC PATCH v1 07/10] mm/prmem: Implement named Persistent Instances madvenka
@ 2023-10-16 23:32 ` madvenka
2023-10-17 16:39 ` kernel test robot
2023-10-16 23:32 ` [RFC PATCH v1 09/10] mm/prmem: Implement DAX support for Persistent Ramdisks madvenka
` (2 subsequent siblings)
10 siblings, 1 reply; 16+ messages in thread
From: madvenka @ 2023-10-16 23:32 UTC (permalink / raw)
To: gregkh, pbonzini, rppt, jgowans, graf, arnd, keescook,
stanislav.kinsburskii, anthony.yznaga, linux-mm, linux-kernel,
madvenka, jamorris
From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
Using the prmem APIs, any kernel subsystem can persist its data. For
persisting user data, we need a filesystem.
Implement persistent ramdisk block device instances so that any filesystem
can be created on it.
Normal ramdisk devices are named "ram0", "ram1", "ram2", etc. Persistent
ramdisk devices will be named "pram0", "pram1", "pram2", etc.
For normal ramdisks, ramdisk pages are allocated using alloc_pages(). For
persistent ones, ramdisk pages are allocated using prmem_alloc_pages().
Each ram disk has a device structure - struct brd_device. For persistent
ram disks, allocate this from persistent memory and record it as the
instance data of the ram disk instance. The structure contains an XArray
of pages allocated to the ram disk. Make it a persistent XArray.
The disk size for all normal ramdisks is specified via a module parameter
"rd_size". This forces all of the ramdisks to have the same size.
For persistent ram disks, take a different approach. Define a module
parameter called "prd_sizes" which specifies a comma-separated list of
sizes. The sizes are applied in the order in which they are listed to
"pram0", "pram1", etc.
Ram Disk Usage
--------------
sudo modprobe brd prd_sizes="1G,2G"
This creates two ram disks with the specified sizes. That
is, /dev/pram0 will have a size of 1G. /dev/pram1 will
have a size of 2G.
sudo mkfs.ext4 /dev/pram0
sudo mkfs.ext4 /dev/pram1
Make filesystems on the persistent ram disks.
sudo mount -t ext4 /dev/pram0 /path/to/mountpoint0
sudo mount -t ext4 /dev/pram1 /path/to/mountpoint1
Mount them somewhere.
sudo umount /path/to/mountpoint0
sudo umount /path/to/mountpoint1
Unmount the filesystems.
After kexec
-----------
sudo modprobe brd (you may omit "prd_sizes")
This remembers the previously created persistent ram disks.
sudo mount -t ext4 /dev/pram0 /path/to/mountpoint0
sudo mount -t ext4 /dev/pram1 /path/to/mountpoint1
Mount the same filesystems.
The maximum number of persistent ram disk instances is specified via
CONFIG_BLK_DEV_PRAM_MAX. By default, this is zero.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
drivers/block/Kconfig | 11 +++
drivers/block/brd.c | 214 +++++++++++++++++++++++++++++++++++++++---
2 files changed, 213 insertions(+), 12 deletions(-)
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index 5b9d4aaebb81..08fa40f6e2de 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -256,6 +256,17 @@ config BLK_DEV_RAM_SIZE
The default value is 4096 kilobytes. Only change this if you know
what you are doing.
+config BLK_DEV_PRAM_MAX
+ int "Maximum number of Persistent RAM disks"
+ default "0"
+ depends on BLK_DEV_RAM
+ help
+ This allows the creation of persistent RAM disks. Persistent RAM
+ disks are used to remember data across a kexec reboot. The default
+ value is 0 Persistent RAM disks. Change this if you know what you
+ are doing. The sizes of the ram disks are specified via the boot
+ arg "prd_sizes" as a comma-separated list of sizes.
+
config CDROM_PKTCDVD
tristate "Packet writing on CD/DVD media (DEPRECATED)"
depends on !UML
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 970bd6ff38c4..3a05e56ca16f 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -24,9 +24,12 @@
#include <linux/slab.h>
#include <linux/backing-dev.h>
#include <linux/debugfs.h>
+#include <linux/prmem.h>
#include <linux/uaccess.h>
+enum brd_type { BRD_NORMAL = 0, BRD_PERSISTENT, };
+
/*
* Each block ramdisk device has a xarray brd_pages of pages that stores
* the pages containing the block device's contents. A brd page's ->index is
@@ -36,6 +39,7 @@
*/
struct brd_device {
int brd_number;
+ enum brd_type brd_type;
struct gendisk *brd_disk;
struct list_head brd_list;
@@ -46,6 +50,15 @@ struct brd_device {
u64 brd_nr_pages;
};
+/* Each of these functions performs an action based on brd_type. */
+static struct brd_device *brd_alloc_device(int i, enum brd_type type);
+static void brd_free_device(struct brd_device *brd);
+static struct page *brd_alloc_page(struct brd_device *brd, gfp_t gfp);
+static void brd_free_page(struct brd_device *brd, struct page *page);
+static void brd_xa_init(struct brd_device *brd);
+static void brd_init_name(struct brd_device *brd, char *name);
+static void brd_set_capacity(struct brd_device *brd);
+
/*
* Look up and return a brd's page for a given sector.
*/
@@ -75,7 +88,7 @@ static int brd_insert_page(struct brd_device *brd, sector_t sector, gfp_t gfp)
if (page)
return 0;
- page = alloc_page(gfp | __GFP_ZERO | __GFP_HIGHMEM);
+ page = brd_alloc_page(brd, gfp | __GFP_ZERO | __GFP_HIGHMEM);
if (!page)
return -ENOMEM;
@@ -87,7 +100,7 @@ static int brd_insert_page(struct brd_device *brd, sector_t sector, gfp_t gfp)
cur = __xa_cmpxchg(&brd->brd_pages, idx, NULL, page, gfp);
if (unlikely(cur)) {
- __free_page(page);
+ brd_free_page(brd, page);
ret = xa_err(cur);
if (!ret && (cur->index != idx))
ret = -EIO;
@@ -110,7 +123,7 @@ static void brd_free_pages(struct brd_device *brd)
pgoff_t idx;
xa_for_each(&brd->brd_pages, idx, page) {
- __free_page(page);
+ brd_free_page(brd, page);
cond_resched();
}
@@ -287,6 +300,18 @@ unsigned long rd_size = CONFIG_BLK_DEV_RAM_SIZE;
module_param(rd_size, ulong, 0444);
MODULE_PARM_DESC(rd_size, "Size of each RAM disk in kbytes.");
+/* Sizes of persistent ram disks are specified in a comma-separated list. */
+static char *prd_sizes;
+module_param(prd_sizes, charp, 0444);
+MODULE_PARM_DESC(prd_sizes, "Sizes of persistent RAM disks.");
+
+/* Persistent ram disk specific data. */
+struct prd_data {
+ struct prmem_instance *instance;
+ unsigned long size;
+};
+static struct prd_data prd_data[CONFIG_BLK_DEV_PRAM_MAX];
+
static int max_part = 1;
module_param(max_part, int, 0444);
MODULE_PARM_DESC(max_part, "Num Minors to reserve between devices");
@@ -295,6 +320,32 @@ MODULE_LICENSE("GPL");
MODULE_ALIAS_BLOCKDEV_MAJOR(RAMDISK_MAJOR);
MODULE_ALIAS("rd");
+void __init brd_parse(void)
+{
+ unsigned long size;
+ char *cur, *tmp;
+ int i = 0;
+
+ if (!CONFIG_BLK_DEV_PRAM_MAX || !prd_sizes)
+ return;
+
+ /* Parse persistent ram disk sizes. */
+ cur = prd_sizes;
+ do {
+ /* Get the size of a ramdisk. Sanity check it. */
+ size = memparse(cur, &tmp);
+ if (cur == tmp || !size) {
+ pr_warn("%s: Memory value expected\n", __func__);
+ return;
+ }
+ cur = tmp;
+
+ /* Add the ramdisk size. */
+ prd_data[i++].size = size;
+
+ } while (*cur++ == ',' && i < CONFIG_BLK_DEV_PRAM_MAX);
+}
+
#ifndef MODULE
/* Legacy boot options - nonmodular */
static int __init ramdisk_size(char *str)
@@ -314,23 +365,33 @@ static struct dentry *brd_debugfs_dir;
static int brd_alloc(int i)
{
+ int brd_number;
+ enum brd_type brd_type;
struct brd_device *brd;
struct gendisk *disk;
char buf[DISK_NAME_LEN];
int err = -ENOMEM;
+ if (i < rd_nr) {
+ brd_number = i;
+ brd_type = BRD_NORMAL;
+ } else {
+ brd_number = i - rd_nr;
+ brd_type = BRD_PERSISTENT;
+ }
+
list_for_each_entry(brd, &brd_devices, brd_list)
- if (brd->brd_number == i)
+ if (brd->brd_number == i && brd->brd_type == brd_type)
return -EEXIST;
- brd = kzalloc(sizeof(*brd), GFP_KERNEL);
+ brd = brd_alloc_device(brd_number, brd_type);
if (!brd)
return -ENOMEM;
- brd->brd_number = i;
+ brd->brd_number = brd_number;
list_add_tail(&brd->brd_list, &brd_devices);
- xa_init(&brd->brd_pages);
+ brd_xa_init(brd);
- snprintf(buf, DISK_NAME_LEN, "ram%d", i);
+ brd_init_name(brd, buf);
if (!IS_ERR_OR_NULL(brd_debugfs_dir))
debugfs_create_u64(buf, 0444, brd_debugfs_dir,
&brd->brd_nr_pages);
@@ -345,7 +406,7 @@ static int brd_alloc(int i)
disk->fops = &brd_fops;
disk->private_data = brd;
strscpy(disk->disk_name, buf, DISK_NAME_LEN);
- set_capacity(disk, rd_size * 2);
+ brd_set_capacity(brd);
/*
* This is so fdisk will align partitions on 4k, because of
@@ -370,7 +431,7 @@ static int brd_alloc(int i)
put_disk(disk);
out_free_dev:
list_del(&brd->brd_list);
- kfree(brd);
+ brd_free_device(brd);
return err;
}
@@ -390,7 +451,7 @@ static void brd_cleanup(void)
put_disk(brd->brd_disk);
brd_free_pages(brd);
list_del(&brd->brd_list);
- kfree(brd);
+ brd_free_device(brd);
}
}
@@ -427,13 +488,21 @@ static int __init brd_init(void)
goto out_free;
}
+ /* Parse persistent ram disk sizes. */
+ brd_parse();
+
+ /* Create persistent ram disks. */
+ for (i = 0; i < CONFIG_BLK_DEV_PRAM_MAX; i++)
+ brd_alloc(i + rd_nr);
+
/*
* brd module now has a feature to instantiate underlying device
* structure on-demand, provided that there is an access dev node.
*
* (1) if rd_nr is specified, create that many upfront. else
* it defaults to CONFIG_BLK_DEV_RAM_COUNT
- * (2) User can further extend brd devices by create dev node themselves
+ * (2) if prd_sizes is specified, create that many upfront.
+ * (3) User can further extend brd devices by create dev node themselves
* and have kernel automatically instantiate actual device
* on-demand. Example:
* mknod /path/devnod_name b 1 X # 1 is the rd major
@@ -469,3 +538,124 @@ static void __exit brd_exit(void)
module_init(brd_init);
module_exit(brd_exit);
+/* Each of these functions performs an action based on brd_type. */
+
+static struct brd_device *brd_alloc_device(int i, enum brd_type type)
+{
+ char name[PRMEM_MAX_NAME];
+ struct brd_device *brd;
+ struct prmem_instance *instance;
+ size_t size;
+ bool create;
+
+ if (type == BRD_NORMAL)
+ return kzalloc(sizeof(struct brd_device), GFP_KERNEL);
+
+ /*
+ * Get the persistent ramdisk instance. If it does not exist, it will
+ * be created, if a size has been specified.
+ */
+ create = !!prd_data[i].size;
+ snprintf(name, PRMEM_MAX_NAME, "pram%d", i);
+ instance = prmem_get("ramdisk", name, create);
+ if (!instance)
+ return NULL;
+
+ prmem_get_data(instance, (void **) &brd, &size);
+ if (brd) {
+ /* Existing instance. Ignore the module parameter. */
+ prd_data[i].size = size;
+ prd_data[i].instance = instance;
+ return brd;
+ }
+
+ /*
+ * New instance. Allocate brd from persistent memory and set it as
+ * instance data.
+ */
+ brd = prmem_alloc(sizeof(*brd), __GFP_ZERO);
+ if (!brd) {
+ prmem_put(instance);
+ return NULL;
+ }
+ brd->brd_type = BRD_PERSISTENT;
+ prmem_set_data(instance, brd, prd_data[i].size);
+
+ prd_data[i].instance = instance;
+ return brd;
+}
+
+static void brd_free_device(struct brd_device *brd)
+{
+ struct prmem_instance *instance;
+
+ if (brd->brd_type == BRD_NORMAL) {
+ kfree(brd);
+ return;
+ }
+
+ instance = prd_data[brd->brd_number].instance;
+ prmem_set_data(instance, NULL, 0);
+ prmem_free(brd, sizeof(*brd));
+ prmem_put(instance);
+}
+
+static struct page *brd_alloc_page(struct brd_device *brd, gfp_t gfp)
+{
+ if (brd->brd_type == BRD_NORMAL)
+ return alloc_page(gfp);
+ return prmem_alloc_pages(0, gfp);
+}
+
+static void brd_free_page(struct brd_device *brd, struct page *page)
+{
+ if (brd->brd_type == BRD_NORMAL)
+ __free_page(page);
+ else
+ prmem_free_pages(page, 0);
+}
+
+static void brd_xa_init(struct brd_device *brd)
+{
+ if (brd->brd_type == BRD_NORMAL) {
+ xa_init(&brd->brd_pages);
+ return;
+ }
+
+ if (brd->brd_nr_pages) {
+ /* Existing persistent instance. */
+ struct page *page;
+ pgoff_t idx;
+
+ /*
+ * The xarray of pages is persistent. However, the page
+ * indexes are not. Set them here.
+ */
+ xa_for_each(&brd->brd_pages, idx, page) {
+ page->index = idx;
+ }
+ } else {
+ /* New persistent instance. */
+ xa_init(&brd->brd_pages);
+ xa_persistent(&brd->brd_pages);
+ }
+}
+
+static void brd_init_name(struct brd_device *brd, char *name)
+{
+ if (brd->brd_type == BRD_NORMAL)
+ snprintf(name, DISK_NAME_LEN, "ram%d", brd->brd_number);
+ else
+ snprintf(name, DISK_NAME_LEN, "pram%d", brd->brd_number);
+}
+
+static void brd_set_capacity(struct brd_device *brd)
+{
+ unsigned long disksize;
+
+ if (brd->brd_type == BRD_NORMAL)
+ disksize = rd_size;
+ else
+ disksize = prd_data[brd->brd_number].size;
+ set_capacity(brd->brd_disk, disksize * 2);
+}
--
2.25.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCH v1 09/10] mm/prmem: Implement DAX support for Persistent Ramdisks.
2023-10-16 23:32 ` [RFC PATCH v1 00/10] mm/prmem: Implement the Persistent-Across-Kexec memory feature (prmem) madvenka
` (7 preceding siblings ...)
2023-10-16 23:32 ` [RFC PATCH v1 08/10] mm/prmem: Implement Persistent Ramdisk instances madvenka
@ 2023-10-16 23:32 ` madvenka
2023-10-16 23:32 ` [RFC PATCH v1 10/10] mm/prmem: Implement dynamic expansion of prmem madvenka
2023-10-17 8:31 ` [RFC PATCH v1 00/10] mm/prmem: Implement the Persistent-Across-Kexec memory feature (prmem) Alexander Graf
10 siblings, 0 replies; 16+ messages in thread
From: madvenka @ 2023-10-16 23:32 UTC (permalink / raw)
To: gregkh, pbonzini, rppt, jgowans, graf, arnd, keescook,
stanislav.kinsburskii, anthony.yznaga, linux-mm, linux-kernel,
madvenka, jamorris
From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
One problem with using a ramdisk is that the page cache will contain
redundant copies of ramdisk data. To avoid this, implement DAX support
for persistent ramdisks.
To avail this, the filesystem that is installed on the ramdisk must
support DAX. Like ext4. Mount the filesystem with the dax option. E.g.,
sudo mount -t ext4 -o dax /dev/pram0 /path/to/mountpoint
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
drivers/block/brd.c | 106 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 106 insertions(+)
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 3a05e56ca16f..d4a42d3bd212 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -25,6 +25,9 @@
#include <linux/backing-dev.h>
#include <linux/debugfs.h>
#include <linux/prmem.h>
+#include <linux/pfn_t.h>
+#include <linux/dax.h>
+#include <linux/uio.h>
#include <linux/uaccess.h>
@@ -42,6 +45,7 @@ struct brd_device {
enum brd_type brd_type;
struct gendisk *brd_disk;
struct list_head brd_list;
+ struct dax_device *brd_dax;
/*
* Backing store of pages. This is the contents of the block device.
@@ -58,6 +62,8 @@ static void brd_free_page(struct brd_device *brd, struct page *page);
static void brd_xa_init(struct brd_device *brd);
static void brd_init_name(struct brd_device *brd, char *name);
static void brd_set_capacity(struct brd_device *brd);
+static int brd_dax_init(struct brd_device *brd);
+static void brd_dax_cleanup(struct brd_device *brd);
/*
* Look up and return a brd's page for a given sector.
@@ -408,6 +414,9 @@ static int brd_alloc(int i)
strscpy(disk->disk_name, buf, DISK_NAME_LEN);
brd_set_capacity(brd);
+ if (brd_dax_init(brd))
+ goto out_clean_dax;
+
/*
* This is so fdisk will align partitions on 4k, because of
* direct_access API needing 4k alignment, returning a PFN
@@ -421,6 +430,8 @@ static int brd_alloc(int i)
blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, disk->queue);
blk_queue_flag_set(QUEUE_FLAG_NOWAIT, disk->queue);
+ if (brd->brd_dax)
+ blk_queue_flag_set(QUEUE_FLAG_DAX, disk->queue);
err = add_disk(disk);
if (err)
goto out_cleanup_disk;
@@ -429,6 +440,8 @@ static int brd_alloc(int i)
out_cleanup_disk:
put_disk(disk);
+out_clean_dax:
+ brd_dax_cleanup(brd);
out_free_dev:
list_del(&brd->brd_list);
brd_free_device(brd);
@@ -447,6 +460,7 @@ static void brd_cleanup(void)
debugfs_remove_recursive(brd_debugfs_dir);
list_for_each_entry_safe(brd, next, &brd_devices, brd_list) {
+ brd_dax_cleanup(brd);
del_gendisk(brd->brd_disk);
put_disk(brd->brd_disk);
brd_free_pages(brd);
@@ -659,3 +673,95 @@ static void brd_set_capacity(struct brd_device *brd)
disksize = prd_data[brd->brd_number].size;
set_capacity(brd->brd_disk, disksize * 2);
}
+
+static bool prd_dax_enabled = IS_ENABLED(CONFIG_FS_DAX);
+
+static long brd_dax_direct_access(struct dax_device *dax_dev,
+ pgoff_t pgoff, long nr_pages,
+ enum dax_access_mode mode,
+ void **kaddr, pfn_t *pfn);
+static int brd_dax_zero_page_range(struct dax_device *dax_dev,
+ pgoff_t pgoff, size_t nr_pages);
+
+static const struct dax_operations brd_dax_ops = {
+ .direct_access = brd_dax_direct_access,
+ .zero_page_range = brd_dax_zero_page_range,
+};
+
+static int brd_dax_init(struct brd_device *brd)
+{
+ if (!prd_dax_enabled || brd->brd_type == BRD_NORMAL)
+ return 0;
+
+ brd->brd_dax = alloc_dax(brd, &brd_dax_ops);
+ if (IS_ERR(brd->brd_dax)) {
+ pr_warn("%s: DAX failed\n", __func__);
+ brd->brd_dax = NULL;
+ return -ENOMEM;
+ }
+
+ if (dax_add_host(brd->brd_dax, brd->brd_disk)) {
+ pr_warn("%s: DAX add failed\n", __func__);
+ return -ENOMEM;
+ }
+ return 0;
+}
+
+static void brd_dax_cleanup(struct brd_device *brd)
+{
+ if (!prd_dax_enabled || brd->brd_type == BRD_NORMAL)
+ return;
+
+ if (brd->brd_dax) {
+ dax_remove_host(brd->brd_disk);
+ kill_dax(brd->brd_dax);
+ put_dax(brd->brd_dax);
+ }
+}
+static int brd_dax_zero_page_range(struct dax_device *dax_dev,
+ pgoff_t pgoff, size_t nr_pages)
+{
+ long rc;
+ void *kaddr;
+
+ rc = dax_direct_access(dax_dev, pgoff, nr_pages, DAX_ACCESS,
+ &kaddr, NULL);
+ if (rc < 0)
+ return rc;
+ memset(kaddr, 0, nr_pages << PAGE_SHIFT);
+ return 0;
+}
+
+static long __brd_direct_access(struct brd_device *brd, pgoff_t pgoff,
+ long nr_pages, void **kaddr, pfn_t *pfn)
+{
+ struct page *page;
+ sector_t sector = (sector_t) pgoff << PAGE_SECTORS_SHIFT;
+ int ret;
+
+ if (!brd)
+ return -ENODEV;
+
+ ret = brd_insert_page(brd, sector, GFP_NOWAIT);
+ if (ret)
+ return ret;
+
+ page = brd_lookup_page(brd, sector);
+ if (!page)
+ return -ENOSPC;
+
+ *kaddr = page_address(page);
+ if (pfn)
+ *pfn = page_to_pfn_t(page);
+
+ return 1;
+}
+
+static long brd_dax_direct_access(struct dax_device *dax_dev,
+ pgoff_t pgoff, long nr_pages, enum dax_access_mode mode,
+ void **kaddr, pfn_t *pfn)
+{
+ struct brd_device *brd = dax_get_private(dax_dev);
+
+ return __brd_direct_access(brd, pgoff, nr_pages, kaddr, pfn);
+}
--
2.25.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCH v1 10/10] mm/prmem: Implement dynamic expansion of prmem.
2023-10-16 23:32 ` [RFC PATCH v1 00/10] mm/prmem: Implement the Persistent-Across-Kexec memory feature (prmem) madvenka
` (8 preceding siblings ...)
2023-10-16 23:32 ` [RFC PATCH v1 09/10] mm/prmem: Implement DAX support for Persistent Ramdisks madvenka
@ 2023-10-16 23:32 ` madvenka
2023-10-17 8:31 ` [RFC PATCH v1 00/10] mm/prmem: Implement the Persistent-Across-Kexec memory feature (prmem) Alexander Graf
10 siblings, 0 replies; 16+ messages in thread
From: madvenka @ 2023-10-16 23:32 UTC (permalink / raw)
To: gregkh, pbonzini, rppt, jgowans, graf, arnd, keescook,
stanislav.kinsburskii, anthony.yznaga, linux-mm, linux-kernel,
madvenka, jamorris
From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
For some use cases, it is hard to predict how much actual memory is
needed to store persistent data. This will depend on the workload. Either
we would have to overcommit memory for persistent data. Or, we could
allow dynamic expansion of prmem memory.
Implement dynamic expansion of prmem. When the allocator runs out of memory
it calls alloc_pages(MAX_ORDER) to allocate a max order page. It creates a
region for that memory and adds it to the list of regions. Then, the
allocator can allocate from that region.
To allow this, extend the command line parameter:
prmem=size[KMG][,max_size[KMG]]
Size is allocated upfront as mentioned before. Between size and max_size,
prmem is expanded dynamically as mentioned above.
Choosing a max order page means that no fragmentation is created for
transparent huge pages and kmem slabs. But fragmentation may be created
for 1GB pages. This is not a problem for 1GB pages that are reserved
up front. This could be a problem for 1GB pages that are allocated at
run time dynamically.
If max_size is omitted from the command line parameter, no dynamic
expansion will happen.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
---
include/linux/prmem.h | 8 +++++++
kernel/prmem/prmem_allocator.c | 38 ++++++++++++++++++++++++++++++++++
kernel/prmem/prmem_init.c | 1 +
kernel/prmem/prmem_misc.c | 3 ++-
kernel/prmem/prmem_parse.c | 20 +++++++++++++++++-
kernel/prmem/prmem_region.c | 1 +
kernel/prmem/prmem_reserve.c | 1 +
7 files changed, 70 insertions(+), 2 deletions(-)
diff --git a/include/linux/prmem.h b/include/linux/prmem.h
index c7034690f7cb..bb552946cb5b 100644
--- a/include/linux/prmem.h
+++ b/include/linux/prmem.h
@@ -83,6 +83,9 @@ struct prmem_instance {
* metadata Physical address of the metadata page.
* size Size of initial memory allocated to prmem.
*
+ * cur_size Current amount of memory allocated to prmem.
+ * max_size Maximum amount of memory that can be allocated to prmem.
+ *
* regions List of memory regions.
*
* instances Persistent instances.
@@ -95,6 +98,10 @@ struct prmem {
unsigned long metadata;
size_t size;
+ /* Dynamic expansion. */
+ size_t cur_size;
+ size_t max_size;
+
/* Persistent Regions. */
struct list_head regions;
@@ -109,6 +116,7 @@ extern struct prmem *prmem;
extern unsigned long prmem_metadata;
extern unsigned long prmem_pa;
extern size_t prmem_size;
+extern size_t prmem_max_size;
extern bool prmem_inited;
extern spinlock_t prmem_lock;
diff --git a/kernel/prmem/prmem_allocator.c b/kernel/prmem/prmem_allocator.c
index f12975bc6777..1cb3eae8a3e7 100644
--- a/kernel/prmem/prmem_allocator.c
+++ b/kernel/prmem/prmem_allocator.c
@@ -9,17 +9,55 @@
/* Page Allocation functions. */
+static void prmem_expand(void)
+{
+ struct prmem_region *region;
+ struct page *pages;
+ unsigned int order = MAX_ORDER;
+ size_t size = (1UL << order) << PAGE_SHIFT;
+
+ if (prmem->cur_size + size > prmem->max_size)
+ return;
+
+ spin_unlock(&prmem_lock);
+ pages = alloc_pages(GFP_NOWAIT, order);
+ spin_lock(&prmem_lock);
+
+ if (!pages)
+ return;
+
+ /* cur_size may have changed. Recheck. */
+ if (prmem->cur_size + size > prmem->max_size)
+ goto free;
+
+ region = prmem_add_region(page_to_phys(pages), size);
+ if (!region)
+ goto free;
+
+ pr_warn("%s: prmem expanded by %ld\n", __func__, size);
+ return;
+free:
+ __free_pages(pages, order);
+}
+
void *prmem_alloc_pages_locked(unsigned int order)
{
struct prmem_region *region;
void *va;
size_t size = (1UL << order) << PAGE_SHIFT;
+ bool expand = true;
+retry:
list_for_each_entry(region, &prmem->regions, node) {
va = prmem_alloc_pool(region, size, size);
if (va)
return va;
}
+ if (expand) {
+ expand = false;
+ prmem_expand();
+ goto retry;
+ }
return NULL;
}
diff --git a/kernel/prmem/prmem_init.c b/kernel/prmem/prmem_init.c
index 166fca688ab3..f4814cc88508 100644
--- a/kernel/prmem/prmem_init.c
+++ b/kernel/prmem/prmem_init.c
@@ -20,6 +20,7 @@ void __init prmem_init(void)
/* Cold boot. */
prmem->metadata = prmem_metadata;
prmem->size = prmem_size;
+ prmem->max_size = prmem_max_size;
INIT_LIST_HEAD(&prmem->regions);
INIT_LIST_HEAD(&prmem->instances);
diff --git a/kernel/prmem/prmem_misc.c b/kernel/prmem/prmem_misc.c
index 49b6a7232c1a..3100662d2cbe 100644
--- a/kernel/prmem/prmem_misc.c
+++ b/kernel/prmem/prmem_misc.c
@@ -68,7 +68,8 @@ bool __init prmem_validate(void)
unsigned long checksum;
/* Sanity check the boot parameter. */
- if (prmem_metadata != prmem->metadata || prmem_size != prmem->size) {
+ if (prmem_metadata != prmem->metadata || prmem_size != prmem->size ||
+ prmem_max_size != prmem->max_size) {
pr_warn("%s: Boot parameter mismatch\n", __func__);
return false;
}
diff --git a/kernel/prmem/prmem_parse.c b/kernel/prmem/prmem_parse.c
index 6c1a23c6b84e..3a57b37fa191 100644
--- a/kernel/prmem/prmem_parse.c
+++ b/kernel/prmem/prmem_parse.c
@@ -8,9 +8,11 @@
#include <linux/prmem.h>
/*
- * Syntax: prmem=size[KMG]
+ * Syntax: prmem=size[KMG][,max_size[KMG]]
*
* Specifies the size of the initial memory to be allocated to prmem.
+ * Optionally, specifies the maximum amount of memory to be allocated to
+ * prmem. prmem will expand dynamically between size and max_size.
*/
static int __init prmem_size_parse(char *cmdline)
{
@@ -28,6 +30,22 @@ static int __init prmem_size_parse(char *cmdline)
}
prmem_size = size;
+ prmem_max_size = size;
+
+ cur = tmp;
+ if (*cur++ == ',') {
+ /* Get max size. */
+ size = memparse(cur, &tmp);
+ if (cur == tmp || !size || size & (PAGE_SIZE - 1) ||
+ size <= prmem_size) {
+ prmem_size = 0;
+ prmem_max_size = 0;
+ pr_warn("%s: Incorrect max size %lx\n", __func__, size);
+ return -EINVAL;
+ }
+ prmem_max_size = size;
+ }
+
return 0;
}
early_param("prmem", prmem_size_parse);
diff --git a/kernel/prmem/prmem_region.c b/kernel/prmem/prmem_region.c
index 6dc88c74d9c8..390329a34b74 100644
--- a/kernel/prmem/prmem_region.c
+++ b/kernel/prmem/prmem_region.c
@@ -82,5 +82,6 @@ struct prmem_region *prmem_add_region(unsigned long pa, size_t size)
return NULL;
list_add_tail(®ion->node, &prmem->regions);
+ prmem->cur_size += size;
return region;
}
diff --git a/kernel/prmem/prmem_reserve.c b/kernel/prmem/prmem_reserve.c
index 8000fff05402..c5ae5d7d8f0a 100644
--- a/kernel/prmem/prmem_reserve.c
+++ b/kernel/prmem/prmem_reserve.c
@@ -11,6 +11,7 @@ struct prmem *prmem;
unsigned long prmem_metadata;
unsigned long prmem_pa;
unsigned long prmem_size;
+unsigned long prmem_max_size;
void __init prmem_reserve_early(void)
{
--
2.25.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC PATCH v1 00/10] mm/prmem: Implement the Persistent-Across-Kexec memory feature (prmem)
2023-10-16 23:32 ` [RFC PATCH v1 00/10] mm/prmem: Implement the Persistent-Across-Kexec memory feature (prmem) madvenka
` (9 preceding siblings ...)
2023-10-16 23:32 ` [RFC PATCH v1 10/10] mm/prmem: Implement dynamic expansion of prmem madvenka
@ 2023-10-17 8:31 ` Alexander Graf
2023-10-17 18:08 ` Madhavan T. Venkataraman
10 siblings, 1 reply; 16+ messages in thread
From: Alexander Graf @ 2023-10-17 8:31 UTC (permalink / raw)
To: madvenka, gregkh, pbonzini, rppt, jgowans, arnd, keescook,
stanislav.kinsburskii, anthony.yznaga, linux-mm, linux-kernel,
jamorris, rostedt, kvm
Hey Madhavan!
This patch set looks super exciting - thanks a lot for putting it
together. We've been poking at a very similar direction for a while as
well and will discuss the fundamental problem of how to persist kernel
metadata across kexec at LPC:
https://lpc.events/event/17/contributions/1485/
It would be great to have you in the room as well then.
Some more comments inline.
On 17.10.23 01:32, madvenka@linux.microsoft.com wrote:
> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
>
> Introduction
> ============
>
> This feature can be used to persist kernel and user data across kexec reboots
> in RAM for various uses. E.g., persisting:
>
> - cached data. E.g., database caches.
> - state. E.g., KVM guest states.
> - historical information since the last cold boot. E.g., events, logs
> and journals.
> - measurements for integrity checks on the next boot.
> - driver data.
> - IOMMU mappings.
> - MMIO config information.
>
> This is useful on systems where there is no non-volatile storage or
> non-volatile storage is too small or too slow.
This is useful in more situations. We for example need it to do a kexec
while a virtual machine is in suspended state, but has IOMMU mappings
intact (Live Update). For that, we need to ensure DMA can still reach
the VM memory and that everything gets reassembled identically and
without interruptions on the receiving end.
> The following sections describe the implementation.
>
> I have enhanced the ram disk block device driver to provide persistent ram
> disks on which any filesystem can be created. This is for persisting user data.
> I have also implemented DAX support for the persistent ram disks.
This is probably the least interesting of the enablements, right? You
can already today reserve RAM on boot as DAX block device and use it for
that purpose.
> I am also working on making ZRAM persistent.
>
> I have also briefly discussed the following use cases:
>
> - Persisting IOMMU mappings
> - Remembering DMA pages
> - Reserving pages that encounter memory errors
> - Remembering IMA measurements for integrity checks
> - Remembering MMIO config info
> - Implementing prmemfs (special filesystem tailored for persistence)
>
> Allocate metadata
> =================
>
> Define a metadata structure to store all persistent memory related information.
> The metadata fits into one page. On a cold boot, allocate and initialize the
> metadata page.
>
> Allocate data
> =============
>
> On a cold boot, allocate some memory for storing persistent data. Call it
> persistent memory. Specify the size in a command line parameter:
>
> prmem=size[KMG][,max_size[KMG]]
>
> size Initial amount of memory allocated to prmem during boot
> max_size Maximum amount of memory that can be allocated to prmem
>
> When the initial memory is exhaused via allocations, expand prmem dynamically
> up to max_size. Expansion is done by allocating from the buddy allocator.
> Record all allocations in the metadata.
I don't understand why we need a separate allocator. Why can't we just
use normal Linux allocations and serialize their location for handover?
We would obviously still need to find a large contiguous piece of memory
for the target kernel to bootstrap itself into until it can read which
pages it can and can not use, but we can do that allocation in the
source environment using CMA, no?
What I'm trying to say is: I think we're better off separating the
handover mechanism from the allocation mechanism. If we can implement
handover without a new allocator, we can use it for simple things with a
slight runtime penalty. To accelerate the handover then, we can later
add a compacting allocator that can use the handover mechanism we
already built to persist itself.
I have a WIP branch where I'm toying with such a handover mechanism that
uses device tree to serialize/deserialize state. By standardizing the
property naming, we can in the receiving kernel mark all persistent
allocations as reserved and then slowly either free them again or mark
them as in-use one by one:
https://github.com/agraf/linux/commit/fd5736a21d549a9a86c178c91acb29ed7f364f42
I used ftrace as example payload to persist: With the handover mechanism
in place, we serialize/deserialize ftrace ring buffer metadata and are
thus able to read traces of the previous system after kexec. This way,
you can for example profile the kexec exit path.
It's not even in RFC state yet, there are a few things where I would
need a couple days to think hard about data structures, layouts and
other problems :). But I believe from the patch you get the idea.
One such user of kho could be a new allocator like prmem and each
subsystem's serialization code could choose to rely on the prmem
subsystem to persist data instead of doing it themselves. That way you
get a very non-intrusive enablement path for kexec handover, easily
amendable data structures that can change compatibly over time as well
as the ability to recreate ephemeral data structure based on persistent
information - which will be necessary to persist VFIO containers.
Alex
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH v1 00/10] mm/prmem: Implement the Persistent-Across-Kexec memory feature (prmem)
2023-10-17 8:31 ` [RFC PATCH v1 00/10] mm/prmem: Implement the Persistent-Across-Kexec memory feature (prmem) Alexander Graf
@ 2023-10-17 18:08 ` Madhavan T. Venkataraman
0 siblings, 0 replies; 16+ messages in thread
From: Madhavan T. Venkataraman @ 2023-10-17 18:08 UTC (permalink / raw)
To: Alexander Graf, gregkh, pbonzini, rppt, jgowans, arnd, keescook,
stanislav.kinsburskii, anthony.yznaga, linux-mm, linux-kernel,
jamorris, rostedt, kvm
Hey Alex,
Thanks a lot for your comments!
On 10/17/23 03:31, Alexander Graf wrote:
> Hey Madhavan!
>
> This patch set looks super exciting - thanks a lot for putting it together. We've been poking at a very similar direction for a while as well and will discuss the fundamental problem of how to persist kernel metadata across kexec at LPC:
>
> https://lpc.events/event/17/contributions/1485/
>
> It would be great to have you in the room as well then.
>
Yes. I am planning to attend. But I am attending virtually as I am not able to travel.
> Some more comments inline.
>
> On 17.10.23 01:32, madvenka@linux.microsoft.com wrote:
>> From: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
>>
>> Introduction
>> ============
>>
>> This feature can be used to persist kernel and user data across kexec reboots
>> in RAM for various uses. E.g., persisting:
>>
>> - cached data. E.g., database caches.
>> - state. E.g., KVM guest states.
>> - historical information since the last cold boot. E.g., events, logs
>> and journals.
>> - measurements for integrity checks on the next boot.
>> - driver data.
>> - IOMMU mappings.
>> - MMIO config information.
>>
>> This is useful on systems where there is no non-volatile storage or
>> non-volatile storage is too small or too slow.
>
>
> This is useful in more situations. We for example need it to do a kexec while a virtual machine is in suspended state, but has IOMMU mappings intact (Live Update). For that, we need to ensure DMA can still reach the VM memory and that everything gets reassembled identically and without interruptions on the receiving end.
>
>
I see.
>> The following sections describe the implementation.
>>
>> I have enhanced the ram disk block device driver to provide persistent ram
>> disks on which any filesystem can be created. This is for persisting user data.
>> I have also implemented DAX support for the persistent ram disks.
>
>
> This is probably the least interesting of the enablements, right? You can already today reserve RAM on boot as DAX block device and use it for that purpose.
>
Yes. pmem provides that functionality.
There are a few differences though. However, I don't have a good feel for how important these differences are to users. May be, they are not very significant. E.g,
- pmem regions need some setup using the ndctl command.
- IIUC, one needs to specify a starting address and a size for a pmem region. Having to specify a starting address may make it somewhat less flexible from a configuration point of view.
- In the case of pmem, the entire range of memory is set aside. In the case of the prmem persistent ram disk, pages are allocated as needed. So, persistent memory is shared among multiple
consumers more flexibly.
Also Greg H. wanted to see a filesystem based use case to be presented for persistent memory so we can see how it all comes together. I am working on prmemfs (a special FS tailored for persistence). But that will take some time. So, I wanted to present this ram disk use case as a more flexible alternative to pmem.
But you are right. They are equivalent for all practical purposes.
>
>> I am also working on making ZRAM persistent.
>>
>> I have also briefly discussed the following use cases:
>>
>> - Persisting IOMMU mappings
>> - Remembering DMA pages
>> - Reserving pages that encounter memory errors
>> - Remembering IMA measurements for integrity checks
>> - Remembering MMIO config info
>> - Implementing prmemfs (special filesystem tailored for persistence)
>>
>> Allocate metadata
>> =================
>>
>> Define a metadata structure to store all persistent memory related information.
>> The metadata fits into one page. On a cold boot, allocate and initialize the
>> metadata page.
>>
>> Allocate data
>> =============
>>
>> On a cold boot, allocate some memory for storing persistent data. Call it
>> persistent memory. Specify the size in a command line parameter:
>>
>> prmem=size[KMG][,max_size[KMG]]
>>
>> size Initial amount of memory allocated to prmem during boot
>> max_size Maximum amount of memory that can be allocated to prmem
>>
>> When the initial memory is exhaused via allocations, expand prmem dynamically
>> up to max_size. Expansion is done by allocating from the buddy allocator.
>> Record all allocations in the metadata.
>
>
> I don't understand why we need a separate allocator. Why can't we just use normal Linux allocations and serialize their location for handover? We would obviously still need to find a large contiguous piece of memory for the target kernel to bootstrap itself into until it can read which pages it can and can not use, but we can do that allocation in the source environment using CMA, no?
>
> What I'm trying to say is: I think we're better off separating the handover mechanism from the allocation mechanism. If we can implement handover without a new allocator, we can use it for simple things with a slight runtime penalty. To accelerate the handover then, we can later add a compacting allocator that can use the handover mechanism we already built to persist itself.
>
>
>
> I have a WIP branch where I'm toying with such a handover mechanism that uses device tree to serialize/deserialize state. By standardizing the property naming, we can in the receiving kernel mark all persistent allocations as reserved and then slowly either free them again or mark them as in-use one by one:
>
> https://github.com/agraf/linux/commit/fd5736a21d549a9a86c178c91acb29ed7f364f42
>
> I used ftrace as example payload to persist: With the handover mechanism in place, we serialize/deserialize ftrace ring buffer metadata and are thus able to read traces of the previous system after kexec. This way, you can for example profile the kexec exit path.
>
> It's not even in RFC state yet, there are a few things where I would need a couple days to think hard about data structures, layouts and other problems :). But I believe from the patch you get the idea.
>
> One such user of kho could be a new allocator like prmem and each subsystem's serialization code could choose to rely on the prmem subsystem to persist data instead of doing it themselves. That way you get a very non-intrusive enablement path for kexec handover, easily amendable data structures that can change compatibly over time as well as the ability to recreate ephemeral data structure based on persistent information - which will be necessary to persist VFIO containers.
>
OK. I will study your changes and your comments. I will send my feedback as well.
Thanks again!
Madhavan
>
> Alex
>
>
>
>
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread