All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/3] pmem memmap dump support
@ 2023-06-02 10:26 ` Li Zhijian
  0 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2023-06-02 10:26 UTC (permalink / raw)
  To: kexec, nvdimm
  Cc: linux-kernel, dan.j.williams, bhe, ruansy.fnst, y-goto, yangx.jy,
	Li Zhijian, Borislav Petkov, Dave Hansen, Dave Jiang, Dave Young,
	Eric Biederman, H. Peter Anvin, Ingo Molnar, Ira Weiny,
	Thomas Gleixner, Vishal Verma, Vivek Goyal, x86

Hello folks,

After sending out the previous version of the patch set, we received some comments,
and we really appreciate your input. However, as you can see, the current patch
set is still in its early stages, especially in terms of the solution selection,
which may still undergo changes.

Changes in V3:
Mainly based on the understanding from the first version, I implemented the proposal
suggested by Dan. In the kdump kernel, the device's superblock is read through
a device file interface to calculate the metadata range. In the second version,
the first kernel writes the metadata range to vmcoreinfo, and after kdump occurs,
the kdump kernel can directly read it from /proc/vmcore.

Comparing these two approaches, the advantage of Version 3 is fewer kernel
modifications, but the downside is the introduction of a new external library,
libndctl, to search for each namespace, which introduces a higher level of
coupling with ndctl.

One important thing to note about both V2 and V3 is the introduction of a new
ELF flag, PF_DEV, to indicate whether a range is on a device. I'm not sure if
there are better alternatives or if we can use this flag internally without
exposing it in elf.h.

We greatly appreciate your feedback and would like to hear your response.

In RFC stage, I folded these 3 projects in this same cover letter for reviewing convenience.
kernel(3):
  nvdimm: set force_raw=1 in kdump kernel
  x86/crash: Add pmem region into PT_LOADs of vmcore
  kernel/kexec_file: Mark pmem region with new flag PF_DEV
kexec-tools(1):
  kexec: Add and mark pmem region into PT_LOADs
makedumpfile(3):
  elf_info.c: Introduce is_pmem_pt_load_range
  makedumpfile.c: Exclude all pmem pages
  makedumpfile: get metadata boundaries from pmem's infoblock

Currently, this RFC has already implemented to supported case D*. And the case A&B is disabled
deliberately in makedumpfile.
---

pmem memmap can also be called pmem metadata here.

### Background and motivate overview ###
---
Crash dump is an important feature for trouble shooting of kernel. It is the final way to chase what
happened at the kernel panic, slowdown, and so on. It is the most important tool for customer support.
However, a part of data on pmem is not included in crash dump, it may cause difficulty to analyze
trouble around pmem (especially Filesystem-DAX).

A pmem namespace in "fsdax" or "devdax" mode requires allocation of per-page metadata[1]. The allocation
can be drawn from either mem(system memory) or dev(pmem device), see `ndctl help create-namespace` for
more details. In fsdax, struct page array becomes very important, it is one of the key data to find
status of reverse map.

So, when metadata was stored in pmem, even pmem's per-page metadata will not be dumped. That means
troubleshooters are unable to check more details about pmem from the dumpfile.

### Make pmem memmap dump support ###
---
Our goal is that whether metadata is stored on mem or pmem, its metadata can be dumped and then the
crash-utilities can read more details about the pmem. Of course, this feature can be enabled/disabled.

First, based on our previous investigation, according to the location of metadata and the scope of
dump, we can divide it into the following four cases: A, B, C, D.
It should be noted that although we mentioned case A&B below, we do not want these two cases to be
part of this feature, because dumping the entire pmem will consume a lot of space, and more importantly,
it may contain user sensitive data.

+-------------+----------+------------+
|\+--------+\     metadata location   |
|            ++-----------------------+
| dump scope  |  mem     |   PMEM     |
+-------------+----------+------------+
| entire pmem |     A    |     B      |
+-------------+----------+------------+
| metadata    |     C    |     D      |
+-------------+----------+------------+

### Testing ###
Only x86_64 are tested. Please note that we have to disable the 2nd kernel's libnvdimm to ensure the
metadata in 2nd kernel will not be touched again.

below 2 commits use sha256 to check the metadata in 1st kernel during panic and makedumpfile in 2nd kernel.
https://github.com/zhijianli88/makedumpfile/commit/91a135be6980e6e87b9e00b909aaaf8ef9566ec0
https://github.com/zhijianli88/linux/commit/55bef07f8f0b2e587737b796e73b92f242947e5a

### TODO ###
Only x86 are fully supported for both kexec_load(2) and kexec_file_load(2)
kexec_file_load(2) on other architectures are TODOs.
---
[1] Pmem region layout:
   ^<--namespace0.0---->^<--namespace0.1------>^
   |                    |                      |
   +--+m----------------+--+m------------------+---------------------+-+a
   |++|e                |++|e                  |                     |+|l
   |++|t                |++|t                  |                     |+|i
   |++|a                |++|a                  |                     |+|g
   |++|d  namespace0.0  |++|d  namespace0.1    |     un-allocated    |+|n
   |++|a    fsdax       |++|a     devdax       |                     |+|m
   |++|t                |++|t                  |                     |+|e
   +--+a----------------+--+a------------------+---------------------+-+n
   |                                                                   |t
   v<-----------------------pmem region------------------------------->v

[2] https://lore.kernel.org/linux-mm/70F971CF-1A96-4D87-B70C-B971C2A1747C@roc.cs.umass.edu/T/
[3] https://lore.kernel.org/linux-mm/3c752fc2-b6a0-2975-ffec-dba3edcf4155@fujitsu.com/

### makedumpfile output in case B ####
kdump.sh[224]: makedumpfile: version 1.7.2++ (released on 20 Oct 2022)
kdump.sh[224]: command line: makedumpfile -l --message-level 31 -d 31 /proc/vmcore /sysroot/var/crash/127.0.0.1-2023-04-21-02:50:57//vmcore-incomplete
kdump.sh[224]: sadump: does not have partition header
kdump.sh[224]: sadump: read dump device as unknown format
kdump.sh[224]: sadump: unknown format
kdump.sh[224]:                phys_start         phys_end       virt_start         virt_end  is_pmem
kdump.sh[224]: LOAD[ 0]          1000000          3c26000 ffffffff81000000 ffffffff83c26000    false
kdump.sh[224]: LOAD[ 1]           100000         7f000000 ffff888000100000 ffff88807f000000    false
kdump.sh[224]: LOAD[ 2]         bf000000         bffd7000 ffff8880bf000000 ffff8880bffd7000    false
kdump.sh[224]: LOAD[ 3]        100000000        140000000 ffff888100000000 ffff888140000000    false
kdump.sh[224]: LOAD[ 4]        140000000        23e200000 ffff888140000000 ffff88823e200000     true
kdump.sh[224]: Linux kdump
kdump.sh[224]: VMCOREINFO   :
kdump.sh[224]:   OSRELEASE=6.3.0-rc3-pmem-bad+
kdump.sh[224]:   BUILD-ID=0546bd82db93706799d3eea38194ac648790aa85
kdump.sh[224]:   PAGESIZE=4096
kdump.sh[224]: page_size    : 4096
kdump.sh[224]:   SYMBOL(init_uts_ns)=ffffffff82671300
kdump.sh[224]:   OFFSET(uts_namespace.name)=0
kdump.sh[224]:   SYMBOL(node_online_map)=ffffffff826bbe08
kdump.sh[224]:   SYMBOL(swapper_pg_dir)=ffffffff82446000
kdump.sh[224]:   SYMBOL(_stext)=ffffffff81000000
kdump.sh[224]:   SYMBOL(vmap_area_list)=ffffffff82585fb0
kdump.sh[224]:   SYMBOL(devm_memmap_vmcore_head)=ffffffff825603c0
kdump.sh[224]:   SIZE(devm_memmap_vmcore)=40
kdump.sh[224]:   OFFSET(devm_memmap_vmcore.entry)=0
kdump.sh[224]:   OFFSET(devm_memmap_vmcore.start)=16
kdump.sh[224]:   OFFSET(devm_memmap_vmcore.end)=24
kdump.sh[224]:   SYMBOL(mem_section)=ffff88813fff4000
kdump.sh[224]:   LENGTH(mem_section)=2048
kdump.sh[224]:   SIZE(mem_section)=16
kdump.sh[224]:   OFFSET(mem_section.section_mem_map)=0
...
kdump.sh[224]: STEP [Checking for memory holes  ] : 0.012699 seconds
kdump.sh[224]: STEP [Excluding unnecessary pages] : 0.538059 seconds
kdump.sh[224]: STEP [Copying data               ] : 0.995418 seconds
kdump.sh[224]: STEP [Copying data               ] : 0.000067 seconds
kdump.sh[224]: Writing erase info...
kdump.sh[224]: offset_eraseinfo: 5d02266, size_eraseinfo: 0
kdump.sh[224]: Original pages  : 0x00000000001c0cfd
kdump.sh[224]:   Excluded pages   : 0x00000000001a58d2
kdump.sh[224]:     Pages filled with zero  : 0x0000000000006805
kdump.sh[224]:     Non-private cache pages : 0x0000000000019e93
kdump.sh[224]:     Private cache pages     : 0x0000000000077572
kdump.sh[224]:     User process data pages : 0x0000000000002c3b
kdump.sh[224]:     Free pages              : 0x0000000000010e8d
kdump.sh[224]:     Hwpoison pages          : 0x0000000000000000
kdump.sh[224]:     Offline pages           : 0x0000000000000000
kdump.sh[224]:     pmem metadata pages     : 0x0000000000000000
kdump.sh[224]:     pmem userdata pages     : 0x00000000000fa200
kdump.sh[224]:   Remaining pages  : 0x000000000001b42b
kdump.sh[224]:   (The number of pages is reduced to 6%.)
kdump.sh[224]: Memory Hole     : 0x000000000007d503
kdump.sh[224]: --------------------------------------------------
kdump.sh[224]: Total pages     : 0x000000000023e200
kdump.sh[224]: Write bytes     : 97522590
kdump.sh[224]: Cache hit: 191669, miss: 292, hit rate: 99.8%
kdump.sh[224]: The dumpfile is saved to /sysroot/var/crash/127.0.0.1-2023-04-21-02:50:57//vmcore-incomplete.
kdump.sh[224]: makedumpfile Completed.

CC: Baoquan He <bhe@redhat.com>
CC: Borislav Petkov <bp@alien8.de>
CC: Dan Williams <dan.j.williams@intel.com>
CC: Dave Hansen <dave.hansen@linux.intel.com>
CC: Dave Jiang <dave.jiang@intel.com>
CC: Dave Young <dyoung@redhat.com>
CC: Eric Biederman <ebiederm@xmission.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Ira Weiny <ira.weiny@intel.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Vishal Verma <vishal.l.verma@intel.com>
CC: Vivek Goyal <vgoyal@redhat.com>
CC: x86@kernel.org
CC: kexec@lists.infradead.org
CC: nvdimm@lists.linux.dev

-- 
2.29.2


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH v3 0/3] pmem memmap dump support
@ 2023-06-02 10:26 ` Li Zhijian
  0 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2023-06-02 10:26 UTC (permalink / raw)
  To: kexec, nvdimm
  Cc: linux-kernel, dan.j.williams, bhe, ruansy.fnst, y-goto, yangx.jy,
	Li Zhijian, Borislav Petkov, Dave Hansen, Dave Jiang, Dave Young,
	Eric Biederman, H. Peter Anvin, Ingo Molnar, Ira Weiny,
	Thomas Gleixner, Vishal Verma, Vivek Goyal, x86

Hello folks,

After sending out the previous version of the patch set, we received some comments,
and we really appreciate your input. However, as you can see, the current patch
set is still in its early stages, especially in terms of the solution selection,
which may still undergo changes.

Changes in V3:
Mainly based on the understanding from the first version, I implemented the proposal
suggested by Dan. In the kdump kernel, the device's superblock is read through
a device file interface to calculate the metadata range. In the second version,
the first kernel writes the metadata range to vmcoreinfo, and after kdump occurs,
the kdump kernel can directly read it from /proc/vmcore.

Comparing these two approaches, the advantage of Version 3 is fewer kernel
modifications, but the downside is the introduction of a new external library,
libndctl, to search for each namespace, which introduces a higher level of
coupling with ndctl.

One important thing to note about both V2 and V3 is the introduction of a new
ELF flag, PF_DEV, to indicate whether a range is on a device. I'm not sure if
there are better alternatives or if we can use this flag internally without
exposing it in elf.h.

We greatly appreciate your feedback and would like to hear your response.

In RFC stage, I folded these 3 projects in this same cover letter for reviewing convenience.
kernel(3):
  nvdimm: set force_raw=1 in kdump kernel
  x86/crash: Add pmem region into PT_LOADs of vmcore
  kernel/kexec_file: Mark pmem region with new flag PF_DEV
kexec-tools(1):
  kexec: Add and mark pmem region into PT_LOADs
makedumpfile(3):
  elf_info.c: Introduce is_pmem_pt_load_range
  makedumpfile.c: Exclude all pmem pages
  makedumpfile: get metadata boundaries from pmem's infoblock

Currently, this RFC has already implemented to supported case D*. And the case A&B is disabled
deliberately in makedumpfile.
---

pmem memmap can also be called pmem metadata here.

### Background and motivate overview ###
---
Crash dump is an important feature for trouble shooting of kernel. It is the final way to chase what
happened at the kernel panic, slowdown, and so on. It is the most important tool for customer support.
However, a part of data on pmem is not included in crash dump, it may cause difficulty to analyze
trouble around pmem (especially Filesystem-DAX).

A pmem namespace in "fsdax" or "devdax" mode requires allocation of per-page metadata[1]. The allocation
can be drawn from either mem(system memory) or dev(pmem device), see `ndctl help create-namespace` for
more details. In fsdax, struct page array becomes very important, it is one of the key data to find
status of reverse map.

So, when metadata was stored in pmem, even pmem's per-page metadata will not be dumped. That means
troubleshooters are unable to check more details about pmem from the dumpfile.

### Make pmem memmap dump support ###
---
Our goal is that whether metadata is stored on mem or pmem, its metadata can be dumped and then the
crash-utilities can read more details about the pmem. Of course, this feature can be enabled/disabled.

First, based on our previous investigation, according to the location of metadata and the scope of
dump, we can divide it into the following four cases: A, B, C, D.
It should be noted that although we mentioned case A&B below, we do not want these two cases to be
part of this feature, because dumping the entire pmem will consume a lot of space, and more importantly,
it may contain user sensitive data.

+-------------+----------+------------+
|\+--------+\     metadata location   |
|            ++-----------------------+
| dump scope  |  mem     |   PMEM     |
+-------------+----------+------------+
| entire pmem |     A    |     B      |
+-------------+----------+------------+
| metadata    |     C    |     D      |
+-------------+----------+------------+

### Testing ###
Only x86_64 are tested. Please note that we have to disable the 2nd kernel's libnvdimm to ensure the
metadata in 2nd kernel will not be touched again.

below 2 commits use sha256 to check the metadata in 1st kernel during panic and makedumpfile in 2nd kernel.
https://github.com/zhijianli88/makedumpfile/commit/91a135be6980e6e87b9e00b909aaaf8ef9566ec0
https://github.com/zhijianli88/linux/commit/55bef07f8f0b2e587737b796e73b92f242947e5a

### TODO ###
Only x86 are fully supported for both kexec_load(2) and kexec_file_load(2)
kexec_file_load(2) on other architectures are TODOs.
---
[1] Pmem region layout:
   ^<--namespace0.0---->^<--namespace0.1------>^
   |                    |                      |
   +--+m----------------+--+m------------------+---------------------+-+a
   |++|e                |++|e                  |                     |+|l
   |++|t                |++|t                  |                     |+|i
   |++|a                |++|a                  |                     |+|g
   |++|d  namespace0.0  |++|d  namespace0.1    |     un-allocated    |+|n
   |++|a    fsdax       |++|a     devdax       |                     |+|m
   |++|t                |++|t                  |                     |+|e
   +--+a----------------+--+a------------------+---------------------+-+n
   |                                                                   |t
   v<-----------------------pmem region------------------------------->v

[2] https://lore.kernel.org/linux-mm/70F971CF-1A96-4D87-B70C-B971C2A1747C@roc.cs.umass.edu/T/
[3] https://lore.kernel.org/linux-mm/3c752fc2-b6a0-2975-ffec-dba3edcf4155@fujitsu.com/

### makedumpfile output in case B ####
kdump.sh[224]: makedumpfile: version 1.7.2++ (released on 20 Oct 2022)
kdump.sh[224]: command line: makedumpfile -l --message-level 31 -d 31 /proc/vmcore /sysroot/var/crash/127.0.0.1-2023-04-21-02:50:57//vmcore-incomplete
kdump.sh[224]: sadump: does not have partition header
kdump.sh[224]: sadump: read dump device as unknown format
kdump.sh[224]: sadump: unknown format
kdump.sh[224]:                phys_start         phys_end       virt_start         virt_end  is_pmem
kdump.sh[224]: LOAD[ 0]          1000000          3c26000 ffffffff81000000 ffffffff83c26000    false
kdump.sh[224]: LOAD[ 1]           100000         7f000000 ffff888000100000 ffff88807f000000    false
kdump.sh[224]: LOAD[ 2]         bf000000         bffd7000 ffff8880bf000000 ffff8880bffd7000    false
kdump.sh[224]: LOAD[ 3]        100000000        140000000 ffff888100000000 ffff888140000000    false
kdump.sh[224]: LOAD[ 4]        140000000        23e200000 ffff888140000000 ffff88823e200000     true
kdump.sh[224]: Linux kdump
kdump.sh[224]: VMCOREINFO   :
kdump.sh[224]:   OSRELEASE=6.3.0-rc3-pmem-bad+
kdump.sh[224]:   BUILD-ID=0546bd82db93706799d3eea38194ac648790aa85
kdump.sh[224]:   PAGESIZE=4096
kdump.sh[224]: page_size    : 4096
kdump.sh[224]:   SYMBOL(init_uts_ns)=ffffffff82671300
kdump.sh[224]:   OFFSET(uts_namespace.name)=0
kdump.sh[224]:   SYMBOL(node_online_map)=ffffffff826bbe08
kdump.sh[224]:   SYMBOL(swapper_pg_dir)=ffffffff82446000
kdump.sh[224]:   SYMBOL(_stext)=ffffffff81000000
kdump.sh[224]:   SYMBOL(vmap_area_list)=ffffffff82585fb0
kdump.sh[224]:   SYMBOL(devm_memmap_vmcore_head)=ffffffff825603c0
kdump.sh[224]:   SIZE(devm_memmap_vmcore)=40
kdump.sh[224]:   OFFSET(devm_memmap_vmcore.entry)=0
kdump.sh[224]:   OFFSET(devm_memmap_vmcore.start)=16
kdump.sh[224]:   OFFSET(devm_memmap_vmcore.end)=24
kdump.sh[224]:   SYMBOL(mem_section)=ffff88813fff4000
kdump.sh[224]:   LENGTH(mem_section)=2048
kdump.sh[224]:   SIZE(mem_section)=16
kdump.sh[224]:   OFFSET(mem_section.section_mem_map)=0
...
kdump.sh[224]: STEP [Checking for memory holes  ] : 0.012699 seconds
kdump.sh[224]: STEP [Excluding unnecessary pages] : 0.538059 seconds
kdump.sh[224]: STEP [Copying data               ] : 0.995418 seconds
kdump.sh[224]: STEP [Copying data               ] : 0.000067 seconds
kdump.sh[224]: Writing erase info...
kdump.sh[224]: offset_eraseinfo: 5d02266, size_eraseinfo: 0
kdump.sh[224]: Original pages  : 0x00000000001c0cfd
kdump.sh[224]:   Excluded pages   : 0x00000000001a58d2
kdump.sh[224]:     Pages filled with zero  : 0x0000000000006805
kdump.sh[224]:     Non-private cache pages : 0x0000000000019e93
kdump.sh[224]:     Private cache pages     : 0x0000000000077572
kdump.sh[224]:     User process data pages : 0x0000000000002c3b
kdump.sh[224]:     Free pages              : 0x0000000000010e8d
kdump.sh[224]:     Hwpoison pages          : 0x0000000000000000
kdump.sh[224]:     Offline pages           : 0x0000000000000000
kdump.sh[224]:     pmem metadata pages     : 0x0000000000000000
kdump.sh[224]:     pmem userdata pages     : 0x00000000000fa200
kdump.sh[224]:   Remaining pages  : 0x000000000001b42b
kdump.sh[224]:   (The number of pages is reduced to 6%.)
kdump.sh[224]: Memory Hole     : 0x000000000007d503
kdump.sh[224]: --------------------------------------------------
kdump.sh[224]: Total pages     : 0x000000000023e200
kdump.sh[224]: Write bytes     : 97522590
kdump.sh[224]: Cache hit: 191669, miss: 292, hit rate: 99.8%
kdump.sh[224]: The dumpfile is saved to /sysroot/var/crash/127.0.0.1-2023-04-21-02:50:57//vmcore-incomplete.
kdump.sh[224]: makedumpfile Completed.

CC: Baoquan He <bhe@redhat.com>
CC: Borislav Petkov <bp@alien8.de>
CC: Dan Williams <dan.j.williams@intel.com>
CC: Dave Hansen <dave.hansen@linux.intel.com>
CC: Dave Jiang <dave.jiang@intel.com>
CC: Dave Young <dyoung@redhat.com>
CC: Eric Biederman <ebiederm@xmission.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Ira Weiny <ira.weiny@intel.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Vishal Verma <vishal.l.verma@intel.com>
CC: Vivek Goyal <vgoyal@redhat.com>
CC: x86@kernel.org
CC: kexec@lists.infradead.org
CC: nvdimm@lists.linux.dev

-- 
2.29.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH v3 1/3] nvdimm: set force_raw=1 in kdump kernel
  2023-06-02 10:26 ` Li Zhijian
@ 2023-06-02 10:26   ` Li Zhijian
  -1 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2023-06-02 10:26 UTC (permalink / raw)
  To: kexec, nvdimm
  Cc: linux-kernel, dan.j.williams, bhe, ruansy.fnst, y-goto, yangx.jy,
	Li Zhijian, Vishal Verma, Dave Jiang, Ira Weiny

The virtually mapped memory map allows storing struct page objects for
persistent memory devices in pre-allocated storage on those devices.
These 'struct page objects' on devices are also known as metadata.

During libnvdimm/nd_pmem are loading, the previous metadata will
be re-constructed to fit the current running kernel. For kdump purpose,
these metadata should not be touched until the dumping is done so that
the metadata is identical.

To achieve this, we have some options
1. Don't provide libnvdimm driver in kdump kernel rootfs/initramfs
2. Disable libnvdimm driver by specific comline parameters (
   initcall_blacklist=libnvdimm_init libnvdimm.blacklist=1 rd.driver.blacklist=libnvdimm)
3. Enforce force_raw=1 for nvdimm namespace, because when force_raw=1,
   metadata will not be re-constructed again. This may also result in
   the pmem doesn't work before a few extra configurations.

Here we choose the 3rd one because the kdump application in this RFC relies
on some /sys interfaces exported by libnvdimm and nd_pmem etc.

CC: Dan Williams <dan.j.williams@intel.com>
CC: Vishal Verma <vishal.l.verma@intel.com>
CC: Dave Jiang <dave.jiang@intel.com>
CC: Ira Weiny <ira.weiny@intel.com>
CC: nvdimm@lists.linux.dev
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V3: new patch
---
 drivers/nvdimm/namespace_devs.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index c60ec0b373c5..2e59be8b9c78 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -8,6 +8,7 @@
 #include <linux/slab.h>
 #include <linux/list.h>
 #include <linux/nd.h>
+#include <linux/crash_dump.h>
 #include "nd-core.h"
 #include "pmem.h"
 #include "pfn.h"
@@ -1504,6 +1505,8 @@ struct nd_namespace_common *nvdimm_namespace_common_probe(struct device *dev)
 			return ERR_PTR(-ENODEV);
 	}
 
+	if (is_kdump_kernel())
+		ndns->force_raw = true;
 	return ndns;
 }
 EXPORT_SYMBOL(nvdimm_namespace_common_probe);
-- 
2.29.2


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v3 1/3] nvdimm: set force_raw=1 in kdump kernel
@ 2023-06-02 10:26   ` Li Zhijian
  0 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2023-06-02 10:26 UTC (permalink / raw)
  To: kexec, nvdimm
  Cc: linux-kernel, dan.j.williams, bhe, ruansy.fnst, y-goto, yangx.jy,
	Li Zhijian, Vishal Verma, Dave Jiang, Ira Weiny

The virtually mapped memory map allows storing struct page objects for
persistent memory devices in pre-allocated storage on those devices.
These 'struct page objects' on devices are also known as metadata.

During libnvdimm/nd_pmem are loading, the previous metadata will
be re-constructed to fit the current running kernel. For kdump purpose,
these metadata should not be touched until the dumping is done so that
the metadata is identical.

To achieve this, we have some options
1. Don't provide libnvdimm driver in kdump kernel rootfs/initramfs
2. Disable libnvdimm driver by specific comline parameters (
   initcall_blacklist=libnvdimm_init libnvdimm.blacklist=1 rd.driver.blacklist=libnvdimm)
3. Enforce force_raw=1 for nvdimm namespace, because when force_raw=1,
   metadata will not be re-constructed again. This may also result in
   the pmem doesn't work before a few extra configurations.

Here we choose the 3rd one because the kdump application in this RFC relies
on some /sys interfaces exported by libnvdimm and nd_pmem etc.

CC: Dan Williams <dan.j.williams@intel.com>
CC: Vishal Verma <vishal.l.verma@intel.com>
CC: Dave Jiang <dave.jiang@intel.com>
CC: Ira Weiny <ira.weiny@intel.com>
CC: nvdimm@lists.linux.dev
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V3: new patch
---
 drivers/nvdimm/namespace_devs.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index c60ec0b373c5..2e59be8b9c78 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -8,6 +8,7 @@
 #include <linux/slab.h>
 #include <linux/list.h>
 #include <linux/nd.h>
+#include <linux/crash_dump.h>
 #include "nd-core.h"
 #include "pmem.h"
 #include "pfn.h"
@@ -1504,6 +1505,8 @@ struct nd_namespace_common *nvdimm_namespace_common_probe(struct device *dev)
 			return ERR_PTR(-ENODEV);
 	}
 
+	if (is_kdump_kernel())
+		ndns->force_raw = true;
 	return ndns;
 }
 EXPORT_SYMBOL(nvdimm_namespace_common_probe);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v3 2/3] x86/crash: Add pmem region into PT_LOADs of vmcore
  2023-06-02 10:26 ` Li Zhijian
@ 2023-06-02 10:26   ` Li Zhijian
  -1 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2023-06-02 10:26 UTC (permalink / raw)
  To: kexec, nvdimm
  Cc: linux-kernel, dan.j.williams, bhe, ruansy.fnst, y-goto, yangx.jy,
	Li Zhijian, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, x86

Only the region described by PT_LOADs of /proc/vmcore are dumpable/readble
by dumping applications. Previously, on x86/x86_64 only system ram resources
will be injected into PT_LOADs.
So in order to make the entire pmem resource is dumpable/readable, we need
to add pmem region into the PT_LOADs of /proc/vmcore.

Here we introduce a new API walk_pmem_res() to sort out the pmem region. Note
that, unlike other walk_xxx_res() API in resource.c, we walk through
pmem resources without IORESOUCE_BUSY flag.

This is kexec_file_load() specific, for kexec_load(), kexec-tools will have a
similar change.

CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: Borislav Petkov <bp@alien8.de>
CC: Dave Hansen <dave.hansen@linux.intel.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Baoquan He <bhe@redhat.com>
CC: x86@kernel.org
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 arch/x86/kernel/crash.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index cdd92ab43cda..97763ea804c6 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -168,6 +168,17 @@ static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 	return 0;
 }
 
+/*
+ * This function calls the @func callback against all memory ranges, which
+ * are ranges marked as IORESOURCE_MEM and IORES_DESC_PERSISTENT_MEMORY.
+ */
+static int walk_pmem_res(u64 start, u64 end, void *arg,
+			 int (*func)(struct resource *, void *))
+{
+	return walk_iomem_res_desc(IORES_DESC_PERSISTENT_MEMORY, IORESOURCE_MEM,
+				   start, end, arg, func);
+}
+
 /* Gather all the required information to prepare elf headers for ram regions */
 static struct crash_mem *fill_up_crash_elf_data(void)
 {
@@ -178,6 +189,7 @@ static struct crash_mem *fill_up_crash_elf_data(void)
 	if (!nr_ranges)
 		return NULL;
 
+	walk_pmem_res(0, -1, &nr_ranges, get_nr_ram_ranges_callback);
 	/*
 	 * Exclusion of crash region and/or crashk_low_res may cause
 	 * another range split. So add extra two slots here.
@@ -243,6 +255,7 @@ static int prepare_elf_headers(struct kimage *image, void **addr,
 	ret = walk_system_ram_res(0, -1, cmem, prepare_elf64_ram_headers_callback);
 	if (ret)
 		goto out;
+	walk_pmem_res(0, -1, cmem, prepare_elf64_ram_headers_callback);
 
 	/* Exclude unwanted mem ranges */
 	ret = elf_header_exclude_ranges(cmem);
-- 
2.29.2


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v3 2/3] x86/crash: Add pmem region into PT_LOADs of vmcore
@ 2023-06-02 10:26   ` Li Zhijian
  0 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2023-06-02 10:26 UTC (permalink / raw)
  To: kexec, nvdimm
  Cc: linux-kernel, dan.j.williams, bhe, ruansy.fnst, y-goto, yangx.jy,
	Li Zhijian, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, x86

Only the region described by PT_LOADs of /proc/vmcore are dumpable/readble
by dumping applications. Previously, on x86/x86_64 only system ram resources
will be injected into PT_LOADs.
So in order to make the entire pmem resource is dumpable/readable, we need
to add pmem region into the PT_LOADs of /proc/vmcore.

Here we introduce a new API walk_pmem_res() to sort out the pmem region. Note
that, unlike other walk_xxx_res() API in resource.c, we walk through
pmem resources without IORESOUCE_BUSY flag.

This is kexec_file_load() specific, for kexec_load(), kexec-tools will have a
similar change.

CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: Borislav Petkov <bp@alien8.de>
CC: Dave Hansen <dave.hansen@linux.intel.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Baoquan He <bhe@redhat.com>
CC: x86@kernel.org
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 arch/x86/kernel/crash.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index cdd92ab43cda..97763ea804c6 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -168,6 +168,17 @@ static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 	return 0;
 }
 
+/*
+ * This function calls the @func callback against all memory ranges, which
+ * are ranges marked as IORESOURCE_MEM and IORES_DESC_PERSISTENT_MEMORY.
+ */
+static int walk_pmem_res(u64 start, u64 end, void *arg,
+			 int (*func)(struct resource *, void *))
+{
+	return walk_iomem_res_desc(IORES_DESC_PERSISTENT_MEMORY, IORESOURCE_MEM,
+				   start, end, arg, func);
+}
+
 /* Gather all the required information to prepare elf headers for ram regions */
 static struct crash_mem *fill_up_crash_elf_data(void)
 {
@@ -178,6 +189,7 @@ static struct crash_mem *fill_up_crash_elf_data(void)
 	if (!nr_ranges)
 		return NULL;
 
+	walk_pmem_res(0, -1, &nr_ranges, get_nr_ram_ranges_callback);
 	/*
 	 * Exclusion of crash region and/or crashk_low_res may cause
 	 * another range split. So add extra two slots here.
@@ -243,6 +255,7 @@ static int prepare_elf_headers(struct kimage *image, void **addr,
 	ret = walk_system_ram_res(0, -1, cmem, prepare_elf64_ram_headers_callback);
 	if (ret)
 		goto out;
+	walk_pmem_res(0, -1, cmem, prepare_elf64_ram_headers_callback);
 
 	/* Exclude unwanted mem ranges */
 	ret = elf_header_exclude_ranges(cmem);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v3 3/3] kernel/kexec_file: Mark pmem region with new flag PF_DEV
  2023-06-02 10:26 ` Li Zhijian
@ 2023-06-02 10:26   ` Li Zhijian
  -1 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2023-06-02 10:26 UTC (permalink / raw)
  To: kexec, nvdimm
  Cc: linux-kernel, dan.j.williams, bhe, ruansy.fnst, y-goto, yangx.jy,
	Li Zhijian, Eric Biederman

For pmem, metadata is specific to the namespace rather than the entire
pmem region. Therefore, ranges that have not yet created a namespace or
are unusable due to alignment reasons will not be associated with metadata.

When an application attempts to access regions that do not have
corresponding metadata, it will encounter an access error. With this flag,
the dumping applications are able to know this access error, and then
take special actions correspondingly.

This is kexec_file_load() specific, for the traditional kexec_load(),
kexec-tools will have a similar change.

CC: Eric Biederman <ebiederm@xmission.com>
CC: Baoquan He <bhe@redhat.com>
CC: kexec@lists.infradead.org
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 kernel/kexec_file.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index f989f5f1933b..0d5b516b96ee 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -29,6 +29,8 @@
 #include <linux/vmalloc.h>
 #include "kexec_internal.h"
 
+#define PF_DEV (1 << 4)
+
 #ifdef CONFIG_KEXEC_SIG
 static bool sig_enforce = IS_ENABLED(CONFIG_KEXEC_SIG_FORCE);
 
@@ -1221,6 +1223,12 @@ int crash_exclude_mem_range(struct crash_mem *mem,
 	return 0;
 }
 
+static bool is_pmem_range(u64 start, u64 size)
+{
+	return REGION_INTERSECTS == region_intersects(start, size,
+			IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY);
+}
+
 int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
 			  void **addr, unsigned long *sz)
 {
@@ -1302,6 +1310,8 @@ int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
 
 		phdr->p_type = PT_LOAD;
 		phdr->p_flags = PF_R|PF_W|PF_X;
+		if (is_pmem_range(mstart, mend - mstart))
+			phdr->p_flags |= PF_DEV;
 		phdr->p_offset  = mstart;
 
 		phdr->p_paddr = mstart;
-- 
2.29.2


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH v3 3/3] kernel/kexec_file: Mark pmem region with new flag PF_DEV
@ 2023-06-02 10:26   ` Li Zhijian
  0 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2023-06-02 10:26 UTC (permalink / raw)
  To: kexec, nvdimm
  Cc: linux-kernel, dan.j.williams, bhe, ruansy.fnst, y-goto, yangx.jy,
	Li Zhijian, Eric Biederman

For pmem, metadata is specific to the namespace rather than the entire
pmem region. Therefore, ranges that have not yet created a namespace or
are unusable due to alignment reasons will not be associated with metadata.

When an application attempts to access regions that do not have
corresponding metadata, it will encounter an access error. With this flag,
the dumping applications are able to know this access error, and then
take special actions correspondingly.

This is kexec_file_load() specific, for the traditional kexec_load(),
kexec-tools will have a similar change.

CC: Eric Biederman <ebiederm@xmission.com>
CC: Baoquan He <bhe@redhat.com>
CC: kexec@lists.infradead.org
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 kernel/kexec_file.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index f989f5f1933b..0d5b516b96ee 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -29,6 +29,8 @@
 #include <linux/vmalloc.h>
 #include "kexec_internal.h"
 
+#define PF_DEV (1 << 4)
+
 #ifdef CONFIG_KEXEC_SIG
 static bool sig_enforce = IS_ENABLED(CONFIG_KEXEC_SIG_FORCE);
 
@@ -1221,6 +1223,12 @@ int crash_exclude_mem_range(struct crash_mem *mem,
 	return 0;
 }
 
+static bool is_pmem_range(u64 start, u64 size)
+{
+	return REGION_INTERSECTS == region_intersects(start, size,
+			IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY);
+}
+
 int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
 			  void **addr, unsigned long *sz)
 {
@@ -1302,6 +1310,8 @@ int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
 
 		phdr->p_type = PT_LOAD;
 		phdr->p_flags = PF_R|PF_W|PF_X;
+		if (is_pmem_range(mstart, mend - mstart))
+			phdr->p_flags |= PF_DEV;
 		phdr->p_offset  = mstart;
 
 		phdr->p_paddr = mstart;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH kexec-tools v3 1/1] kexec: Add and mark pmem region into PT_LOADs
  2023-06-02 10:26 ` Li Zhijian
@ 2023-06-02 10:26   ` Li Zhijian
  -1 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2023-06-02 10:26 UTC (permalink / raw)
  To: kexec, nvdimm
  Cc: linux-kernel, dan.j.williams, bhe, ruansy.fnst, y-goto, yangx.jy,
	Li Zhijian, Vivek Goyal, Dave Young

It does:
1. Add pmem region into PT_LOADs of vmcore so that pmem region is
   dumpable
Only the region described by PT_LOADs of /proc/vmcore are dumpable/readble
by dumping applications. Previously, on x86/x86_64 only system ram resources
will be injected into PT_LOADs.
So in order to make the entire pmem resource is dumpable/readable, we need
to add pmem region into the PT_LOADs of /proc/vmcore.

2. Mark pmem region's p_flags as PF_DEV so that we are able to ignore
   the specific pages
For pmem, metadata is specific to the namespace rather than the entire
pmem region. Therefore, ranges that have not yet created a namespace or
are unusable due to alignment reasons will not be associated with metadata.

When an application attempts to access regions that do not have
corresponding metadata, it will encounter an access error. With this flag,
the dumping applications are able to know this access error, and then
take special actions correspondingly.

CC: Baoquan He <bhe@redhat.com>
CC: Vivek Goyal <vgoyal@redhat.com>
CC: Dave Young <dyoung@redhat.com>
CC: kexec@lists.infradead.org
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 kexec/crashdump-elf.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kexec/crashdump-elf.c b/kexec/crashdump-elf.c
index b8bb686a17ca..ab257e825187 100644
--- a/kexec/crashdump-elf.c
+++ b/kexec/crashdump-elf.c
@@ -25,6 +25,8 @@ do {									\
 } while(0)
 #endif
 
+#define PF_DEV (1 << 4)
+
 /* Prepares the crash memory headers and stores in supplied buffer. */
 int FUNC(struct kexec_info *info,
 	 struct crash_elf_info *elf_info,
@@ -199,7 +201,7 @@ int FUNC(struct kexec_info *info,
 	 * A seprate program header for Backup Region*/
 	for (i = 0; i < ranges; i++, range++) {
 		unsigned long long mstart, mend;
-		if (range->type != RANGE_RAM)
+		if (range->type != RANGE_RAM && range->type != RANGE_PMEM)
 			continue;
 		mstart = range->start;
 		mend = range->end;
@@ -209,6 +211,8 @@ int FUNC(struct kexec_info *info,
 		bufp += sizeof(PHDR);
 		phdr->p_type	= PT_LOAD;
 		phdr->p_flags	= PF_R|PF_W|PF_X;
+		if (range->type == RANGE_PMEM)
+			phdr->p_flags |= PF_DEV;
 		phdr->p_offset	= mstart;
 
 		if (mstart == info->backup_src_start
-- 
2.29.2


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH kexec-tools v3 1/1] kexec: Add and mark pmem region into PT_LOADs
@ 2023-06-02 10:26   ` Li Zhijian
  0 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2023-06-02 10:26 UTC (permalink / raw)
  To: kexec, nvdimm
  Cc: linux-kernel, dan.j.williams, bhe, ruansy.fnst, y-goto, yangx.jy,
	Li Zhijian, Vivek Goyal, Dave Young

It does:
1. Add pmem region into PT_LOADs of vmcore so that pmem region is
   dumpable
Only the region described by PT_LOADs of /proc/vmcore are dumpable/readble
by dumping applications. Previously, on x86/x86_64 only system ram resources
will be injected into PT_LOADs.
So in order to make the entire pmem resource is dumpable/readable, we need
to add pmem region into the PT_LOADs of /proc/vmcore.

2. Mark pmem region's p_flags as PF_DEV so that we are able to ignore
   the specific pages
For pmem, metadata is specific to the namespace rather than the entire
pmem region. Therefore, ranges that have not yet created a namespace or
are unusable due to alignment reasons will not be associated with metadata.

When an application attempts to access regions that do not have
corresponding metadata, it will encounter an access error. With this flag,
the dumping applications are able to know this access error, and then
take special actions correspondingly.

CC: Baoquan He <bhe@redhat.com>
CC: Vivek Goyal <vgoyal@redhat.com>
CC: Dave Young <dyoung@redhat.com>
CC: kexec@lists.infradead.org
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 kexec/crashdump-elf.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kexec/crashdump-elf.c b/kexec/crashdump-elf.c
index b8bb686a17ca..ab257e825187 100644
--- a/kexec/crashdump-elf.c
+++ b/kexec/crashdump-elf.c
@@ -25,6 +25,8 @@ do {									\
 } while(0)
 #endif
 
+#define PF_DEV (1 << 4)
+
 /* Prepares the crash memory headers and stores in supplied buffer. */
 int FUNC(struct kexec_info *info,
 	 struct crash_elf_info *elf_info,
@@ -199,7 +201,7 @@ int FUNC(struct kexec_info *info,
 	 * A seprate program header for Backup Region*/
 	for (i = 0; i < ranges; i++, range++) {
 		unsigned long long mstart, mend;
-		if (range->type != RANGE_RAM)
+		if (range->type != RANGE_RAM && range->type != RANGE_PMEM)
 			continue;
 		mstart = range->start;
 		mend = range->end;
@@ -209,6 +211,8 @@ int FUNC(struct kexec_info *info,
 		bufp += sizeof(PHDR);
 		phdr->p_type	= PT_LOAD;
 		phdr->p_flags	= PF_R|PF_W|PF_X;
+		if (range->type == RANGE_PMEM)
+			phdr->p_flags |= PF_DEV;
 		phdr->p_offset	= mstart;
 
 		if (mstart == info->backup_src_start
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH makedumpfile v3 1/3] elf_info.c: Introduce is_pmem_pt_load_range
  2023-06-02 10:26 ` Li Zhijian
@ 2023-06-02 10:26   ` Li Zhijian
  -1 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2023-06-02 10:26 UTC (permalink / raw)
  To: kexec, nvdimm
  Cc: linux-kernel, dan.j.williams, bhe, ruansy.fnst, y-goto, yangx.jy,
	Li Zhijian, Vivek Goyal, Dave Young

It checks BIT(4) of Elf64_Phdr, currently only the former 3 bits are used
by ELF. In kexec-tool, we extend the BIT(4) to indicate pmem or not.

dump_Elf_load:                phys_start         phys_end       virt_start         virt_end  is_pmem
dump_Elf_load: LOAD[ 0]         6b800000         6e42c000 ffffffffbcc00000 ffffffffbf82c000    false
dump_Elf_load: LOAD[ 1]             1000            9fc00 ffff975980001000 ffff97598009fc00    false
dump_Elf_load: LOAD[ 2]           100000         7f000000 ffff975980100000 ffff9759ff000000    false
dump_Elf_load: LOAD[ 3]         bf000000         bffd7000 ffff975a3f000000 ffff975a3ffd7000    false
dump_Elf_load: LOAD[ 4]        100000000        140000000 ffff975a80000000 ffff975ac0000000    false
dump_Elf_load: LOAD[ 5]        140000000        23e200000 ffff975ac0000000 ffff975bbe200000     true

CC: Baoquan He <bhe@redhat.com>
CC: Vivek Goyal <vgoyal@redhat.com>
CC: Dave Young <dyoung@redhat.com>
CC: kexec@lists.infradead.org
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 elf_info.c | 31 +++++++++++++++++++++++++++----
 elf_info.h |  1 +
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/elf_info.c b/elf_info.c
index bc24083655d6..41b36b2804d2 100644
--- a/elf_info.c
+++ b/elf_info.c
@@ -43,6 +43,7 @@ struct pt_load_segment {
 	unsigned long long	phys_end;
 	unsigned long long	virt_start;
 	unsigned long long	virt_end;
+	int			is_pmem;
 };
 
 static int			nr_cpus;             /* number of cpu */
@@ -153,6 +154,8 @@ check_elf_format(int fd, char *filename, int *phnum, unsigned int *num_load)
 	return FALSE;
 }
 
+#define PF_DEV (1 << 4)
+
 static int
 dump_Elf_load(Elf64_Phdr *prog, int num_load)
 {
@@ -170,17 +173,37 @@ dump_Elf_load(Elf64_Phdr *prog, int num_load)
 	pls->virt_end    = pls->virt_start + prog->p_memsz;
 	pls->file_offset = prog->p_offset;
 	pls->file_size   = prog->p_filesz;
+	pls->is_pmem     = !!(prog->p_flags & PF_DEV);
 
 	if (num_load == 0)
-		DEBUG_MSG("%8s %16s %16s %16s %16s\n", "",
-			"phys_start", "phys_end", "virt_start", "virt_end");
+		DEBUG_MSG("%8s %16s %16s %16s %16s %8s\n", "",
+			"phys_start", "phys_end", "virt_start", "virt_end",
+			"is_pmem");
 
-	DEBUG_MSG("LOAD[%2d] %16llx %16llx %16llx %16llx\n", num_load,
-		pls->phys_start, pls->phys_end, pls->virt_start, pls->virt_end);
+	DEBUG_MSG("LOAD[%2d] %16llx %16llx %16llx %16llx %8s\n", num_load,
+		pls->phys_start, pls->phys_end, pls->virt_start, pls->virt_end,
+		pls->is_pmem ? "true": "false");
 
 	return TRUE;
 }
 
+int is_pmem_pt_load_range(unsigned long long start, unsigned long long end)
+{
+	int i;
+	struct pt_load_segment *pls;
+
+	for (i = 0; i < num_pt_loads; i++) {
+		pls = &pt_loads[i];
+		if (pls->is_pmem && pls->phys_start == NOT_PADDR)
+			return TRUE;
+		if (pls->is_pmem && pls->phys_start != NOT_PADDR &&
+		    pls->phys_start <= start && pls->phys_end >= end)
+			return TRUE;
+	}
+
+	return FALSE;
+}
+
 static off_t
 offset_next_note(void *note)
 {
diff --git a/elf_info.h b/elf_info.h
index d5416b32cdd7..a08d59a331f6 100644
--- a/elf_info.h
+++ b/elf_info.h
@@ -64,6 +64,7 @@ int get_pt_load_extents(int idx,
 	off_t *file_offset,
 	off_t *file_size);
 unsigned int get_num_pt_loads(void);
+int is_pmem_pt_load_range(unsigned long long start, unsigned long long end);
 
 void set_nr_cpus(int num);
 int get_nr_cpus(void);
-- 
2.29.2


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH makedumpfile v3 1/3] elf_info.c: Introduce is_pmem_pt_load_range
@ 2023-06-02 10:26   ` Li Zhijian
  0 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2023-06-02 10:26 UTC (permalink / raw)
  To: kexec, nvdimm
  Cc: linux-kernel, dan.j.williams, bhe, ruansy.fnst, y-goto, yangx.jy,
	Li Zhijian, Vivek Goyal, Dave Young

It checks BIT(4) of Elf64_Phdr, currently only the former 3 bits are used
by ELF. In kexec-tool, we extend the BIT(4) to indicate pmem or not.

dump_Elf_load:                phys_start         phys_end       virt_start         virt_end  is_pmem
dump_Elf_load: LOAD[ 0]         6b800000         6e42c000 ffffffffbcc00000 ffffffffbf82c000    false
dump_Elf_load: LOAD[ 1]             1000            9fc00 ffff975980001000 ffff97598009fc00    false
dump_Elf_load: LOAD[ 2]           100000         7f000000 ffff975980100000 ffff9759ff000000    false
dump_Elf_load: LOAD[ 3]         bf000000         bffd7000 ffff975a3f000000 ffff975a3ffd7000    false
dump_Elf_load: LOAD[ 4]        100000000        140000000 ffff975a80000000 ffff975ac0000000    false
dump_Elf_load: LOAD[ 5]        140000000        23e200000 ffff975ac0000000 ffff975bbe200000     true

CC: Baoquan He <bhe@redhat.com>
CC: Vivek Goyal <vgoyal@redhat.com>
CC: Dave Young <dyoung@redhat.com>
CC: kexec@lists.infradead.org
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 elf_info.c | 31 +++++++++++++++++++++++++++----
 elf_info.h |  1 +
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/elf_info.c b/elf_info.c
index bc24083655d6..41b36b2804d2 100644
--- a/elf_info.c
+++ b/elf_info.c
@@ -43,6 +43,7 @@ struct pt_load_segment {
 	unsigned long long	phys_end;
 	unsigned long long	virt_start;
 	unsigned long long	virt_end;
+	int			is_pmem;
 };
 
 static int			nr_cpus;             /* number of cpu */
@@ -153,6 +154,8 @@ check_elf_format(int fd, char *filename, int *phnum, unsigned int *num_load)
 	return FALSE;
 }
 
+#define PF_DEV (1 << 4)
+
 static int
 dump_Elf_load(Elf64_Phdr *prog, int num_load)
 {
@@ -170,17 +173,37 @@ dump_Elf_load(Elf64_Phdr *prog, int num_load)
 	pls->virt_end    = pls->virt_start + prog->p_memsz;
 	pls->file_offset = prog->p_offset;
 	pls->file_size   = prog->p_filesz;
+	pls->is_pmem     = !!(prog->p_flags & PF_DEV);
 
 	if (num_load == 0)
-		DEBUG_MSG("%8s %16s %16s %16s %16s\n", "",
-			"phys_start", "phys_end", "virt_start", "virt_end");
+		DEBUG_MSG("%8s %16s %16s %16s %16s %8s\n", "",
+			"phys_start", "phys_end", "virt_start", "virt_end",
+			"is_pmem");
 
-	DEBUG_MSG("LOAD[%2d] %16llx %16llx %16llx %16llx\n", num_load,
-		pls->phys_start, pls->phys_end, pls->virt_start, pls->virt_end);
+	DEBUG_MSG("LOAD[%2d] %16llx %16llx %16llx %16llx %8s\n", num_load,
+		pls->phys_start, pls->phys_end, pls->virt_start, pls->virt_end,
+		pls->is_pmem ? "true": "false");
 
 	return TRUE;
 }
 
+int is_pmem_pt_load_range(unsigned long long start, unsigned long long end)
+{
+	int i;
+	struct pt_load_segment *pls;
+
+	for (i = 0; i < num_pt_loads; i++) {
+		pls = &pt_loads[i];
+		if (pls->is_pmem && pls->phys_start == NOT_PADDR)
+			return TRUE;
+		if (pls->is_pmem && pls->phys_start != NOT_PADDR &&
+		    pls->phys_start <= start && pls->phys_end >= end)
+			return TRUE;
+	}
+
+	return FALSE;
+}
+
 static off_t
 offset_next_note(void *note)
 {
diff --git a/elf_info.h b/elf_info.h
index d5416b32cdd7..a08d59a331f6 100644
--- a/elf_info.h
+++ b/elf_info.h
@@ -64,6 +64,7 @@ int get_pt_load_extents(int idx,
 	off_t *file_offset,
 	off_t *file_size);
 unsigned int get_num_pt_loads(void);
+int is_pmem_pt_load_range(unsigned long long start, unsigned long long end);
 
 void set_nr_cpus(int num);
 int get_nr_cpus(void);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH makedumpfile v3 2/3] makedumpfile.c: Exclude all pmem pages
  2023-06-02 10:26 ` Li Zhijian
@ 2023-06-02 10:26   ` Li Zhijian
  -1 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2023-06-02 10:26 UTC (permalink / raw)
  To: kexec, nvdimm
  Cc: linux-kernel, dan.j.williams, bhe, ruansy.fnst, y-goto, yangx.jy,
	Li Zhijian, Vivek Goyal, Dave Young

Generally, the pmem is too large to suitable to be dumped. Further, only
the namespace of the pmem is dumpable, but actually currently we have no
idea the excatly layout of the namespace in pmem. So we exclude all of
them temporarily. And later, we will try to support including/excluding
metadata by specific parameter.

CC: Baoquan He <bhe@redhat.com>
CC: Vivek Goyal <vgoyal@redhat.com>
CC: Dave Young <dyoung@redhat.com>
CC: kexec@lists.infradead.org
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 makedumpfile.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/makedumpfile.c b/makedumpfile.c
index cadc59662bef..f304f752b0ec 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -100,6 +100,7 @@ mdf_pfn_t pfn_user;
 mdf_pfn_t pfn_free;
 mdf_pfn_t pfn_hwpoison;
 mdf_pfn_t pfn_offline;
+mdf_pfn_t pfn_pmem_userdata;
 mdf_pfn_t pfn_elf_excluded;
 
 mdf_pfn_t num_dumped;
@@ -6389,6 +6390,7 @@ __exclude_unnecessary_pages(unsigned long mem_map,
 	unsigned int order_offset, dtor_offset;
 	unsigned long flags, mapping, private = 0;
 	unsigned long compound_dtor, compound_head = 0;
+	unsigned int is_pmem;
 
 	/*
 	 * If a multi-page exclusion is pending, do it first
@@ -6443,6 +6445,13 @@ __exclude_unnecessary_pages(unsigned long mem_map,
 				continue;
 		}
 
+		is_pmem = is_pmem_pt_load_range(pfn << PAGESHIFT(), (pfn + 1) << PAGESHIFT());
+		if (is_pmem) {
+			pfn_pmem_userdata++;
+			clear_bit_on_2nd_bitmap_for_kernel(pfn, cycle);
+			continue;
+		}
+
 		index_pg = pfn % PGMM_CACHED;
 		pcache  = page_cache + (index_pg * SIZE(page));
 
@@ -8122,7 +8131,7 @@ write_elf_pages_cyclic(struct cache_data *cd_header, struct cache_data *cd_page)
 	 */
 	if (info->flag_cyclic) {
 		pfn_zero = pfn_cache = pfn_cache_private = 0;
-		pfn_user = pfn_free = pfn_hwpoison = pfn_offline = 0;
+		pfn_user = pfn_free = pfn_hwpoison = pfn_offline = pfn_pmem_userdata = 0;
 		pfn_memhole = info->max_mapnr;
 	}
 
@@ -9460,7 +9469,7 @@ write_kdump_pages_and_bitmap_cyclic(struct cache_data *cd_header, struct cache_d
 		 * Reset counter for debug message.
 		 */
 		pfn_zero = pfn_cache = pfn_cache_private = 0;
-		pfn_user = pfn_free = pfn_hwpoison = pfn_offline = 0;
+		pfn_user = pfn_free = pfn_hwpoison = pfn_offline = pfn_pmem_userdata = 0;
 		pfn_memhole = info->max_mapnr;
 
 		/*
@@ -10408,7 +10417,7 @@ print_report(void)
 	 */
 	pfn_original = info->max_mapnr - pfn_memhole;
 
-	pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private
+	pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private + pfn_pmem_userdata
 	    + pfn_user + pfn_free + pfn_hwpoison + pfn_offline;
 
 	REPORT_MSG("\n");
@@ -10425,6 +10434,7 @@ print_report(void)
 	REPORT_MSG("    Free pages              : 0x%016llx\n", pfn_free);
 	REPORT_MSG("    Hwpoison pages          : 0x%016llx\n", pfn_hwpoison);
 	REPORT_MSG("    Offline pages           : 0x%016llx\n", pfn_offline);
+	REPORT_MSG("    pmem userdata pages     : 0x%016llx\n", pfn_pmem_userdata);
 	REPORT_MSG("  Remaining pages  : 0x%016llx\n",
 	    pfn_original - pfn_excluded);
 
@@ -10464,7 +10474,7 @@ print_mem_usage(void)
 	*/
 	pfn_original = info->max_mapnr - pfn_memhole;
 
-	pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private
+	pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private + pfn_pmem_userdata
 	    + pfn_user + pfn_free + pfn_hwpoison + pfn_offline;
 	shrinking = (pfn_original - pfn_excluded) * 100;
 	shrinking = shrinking / pfn_original;
-- 
2.29.2


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH makedumpfile v3 2/3] makedumpfile.c: Exclude all pmem pages
@ 2023-06-02 10:26   ` Li Zhijian
  0 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2023-06-02 10:26 UTC (permalink / raw)
  To: kexec, nvdimm
  Cc: linux-kernel, dan.j.williams, bhe, ruansy.fnst, y-goto, yangx.jy,
	Li Zhijian, Vivek Goyal, Dave Young

Generally, the pmem is too large to suitable to be dumped. Further, only
the namespace of the pmem is dumpable, but actually currently we have no
idea the excatly layout of the namespace in pmem. So we exclude all of
them temporarily. And later, we will try to support including/excluding
metadata by specific parameter.

CC: Baoquan He <bhe@redhat.com>
CC: Vivek Goyal <vgoyal@redhat.com>
CC: Dave Young <dyoung@redhat.com>
CC: kexec@lists.infradead.org
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 makedumpfile.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/makedumpfile.c b/makedumpfile.c
index cadc59662bef..f304f752b0ec 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -100,6 +100,7 @@ mdf_pfn_t pfn_user;
 mdf_pfn_t pfn_free;
 mdf_pfn_t pfn_hwpoison;
 mdf_pfn_t pfn_offline;
+mdf_pfn_t pfn_pmem_userdata;
 mdf_pfn_t pfn_elf_excluded;
 
 mdf_pfn_t num_dumped;
@@ -6389,6 +6390,7 @@ __exclude_unnecessary_pages(unsigned long mem_map,
 	unsigned int order_offset, dtor_offset;
 	unsigned long flags, mapping, private = 0;
 	unsigned long compound_dtor, compound_head = 0;
+	unsigned int is_pmem;
 
 	/*
 	 * If a multi-page exclusion is pending, do it first
@@ -6443,6 +6445,13 @@ __exclude_unnecessary_pages(unsigned long mem_map,
 				continue;
 		}
 
+		is_pmem = is_pmem_pt_load_range(pfn << PAGESHIFT(), (pfn + 1) << PAGESHIFT());
+		if (is_pmem) {
+			pfn_pmem_userdata++;
+			clear_bit_on_2nd_bitmap_for_kernel(pfn, cycle);
+			continue;
+		}
+
 		index_pg = pfn % PGMM_CACHED;
 		pcache  = page_cache + (index_pg * SIZE(page));
 
@@ -8122,7 +8131,7 @@ write_elf_pages_cyclic(struct cache_data *cd_header, struct cache_data *cd_page)
 	 */
 	if (info->flag_cyclic) {
 		pfn_zero = pfn_cache = pfn_cache_private = 0;
-		pfn_user = pfn_free = pfn_hwpoison = pfn_offline = 0;
+		pfn_user = pfn_free = pfn_hwpoison = pfn_offline = pfn_pmem_userdata = 0;
 		pfn_memhole = info->max_mapnr;
 	}
 
@@ -9460,7 +9469,7 @@ write_kdump_pages_and_bitmap_cyclic(struct cache_data *cd_header, struct cache_d
 		 * Reset counter for debug message.
 		 */
 		pfn_zero = pfn_cache = pfn_cache_private = 0;
-		pfn_user = pfn_free = pfn_hwpoison = pfn_offline = 0;
+		pfn_user = pfn_free = pfn_hwpoison = pfn_offline = pfn_pmem_userdata = 0;
 		pfn_memhole = info->max_mapnr;
 
 		/*
@@ -10408,7 +10417,7 @@ print_report(void)
 	 */
 	pfn_original = info->max_mapnr - pfn_memhole;
 
-	pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private
+	pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private + pfn_pmem_userdata
 	    + pfn_user + pfn_free + pfn_hwpoison + pfn_offline;
 
 	REPORT_MSG("\n");
@@ -10425,6 +10434,7 @@ print_report(void)
 	REPORT_MSG("    Free pages              : 0x%016llx\n", pfn_free);
 	REPORT_MSG("    Hwpoison pages          : 0x%016llx\n", pfn_hwpoison);
 	REPORT_MSG("    Offline pages           : 0x%016llx\n", pfn_offline);
+	REPORT_MSG("    pmem userdata pages     : 0x%016llx\n", pfn_pmem_userdata);
 	REPORT_MSG("  Remaining pages  : 0x%016llx\n",
 	    pfn_original - pfn_excluded);
 
@@ -10464,7 +10474,7 @@ print_mem_usage(void)
 	*/
 	pfn_original = info->max_mapnr - pfn_memhole;
 
-	pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private
+	pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private + pfn_pmem_userdata
 	    + pfn_user + pfn_free + pfn_hwpoison + pfn_offline;
 	shrinking = (pfn_original - pfn_excluded) * 100;
 	shrinking = shrinking / pfn_original;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH makedumpfile v3 3/3] makedumpfile: get metadata boundaries from pmem's infoblock
  2023-06-02 10:26 ` Li Zhijian
@ 2023-06-02 10:26   ` Li Zhijian
  -1 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2023-06-02 10:26 UTC (permalink / raw)
  To: kexec, nvdimm
  Cc: linux-kernel, dan.j.williams, bhe, ruansy.fnst, y-goto, yangx.jy,
	Li Zhijian, Vivek Goyal, Dave Young

some code are copied from ndctl.

This change requires libndctl which provides interface to walk through
all existing namespaces.

This also requires the namespace entered the force_raw mode(the kernel will
ensure that).

The resource interface provides the start of namespace, and device's
superblock provides the usedata's offset of namespace. According to this
information, we can caculate the scope of metadata.

A new dump level(-d 63) is introduced to skip the metadata as well in
this patch.

CC: Baoquan He <bhe@redhat.com>
CC: Vivek Goyal <vgoyal@redhat.com>
CC: Dave Young <dyoung@redhat.com>
CC: kexec@lists.infradead.org
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 Makefile       |   2 +-
 makedumpfile.c | 196 +++++++++++++++++++++++++++++++++++++++++++++++--
 makedumpfile.h |   3 +-
 3 files changed, 192 insertions(+), 9 deletions(-)

diff --git a/Makefile b/Makefile
index 0608035e913f..fd0a792c5647 100644
--- a/Makefile
+++ b/Makefile
@@ -50,7 +50,7 @@ OBJ_PART=$(patsubst %.c,%.o,$(SRC_PART))
 SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c arch/loongarch64.c
 OBJ_ARCH=$(patsubst %.c,%.o,$(SRC_ARCH))
 
-LIBS = -ldw -lbz2 -ldl -lelf -lz
+LIBS = -ldw -lbz2 -ldl -lelf -lz -lndctl
 ifneq ($(LINKTYPE), dynamic)
 LIBS := -static $(LIBS) -llzma
 endif
diff --git a/makedumpfile.c b/makedumpfile.c
index f304f752b0ec..b68d261f3d1e 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -27,6 +27,8 @@
 #include <limits.h>
 #include <assert.h>
 #include <zlib.h>
+#include <sys/types.h>
+#include <ndctl/libndctl.h>
 
 struct symbol_table	symbol_table;
 struct size_table	size_table;
@@ -100,6 +102,7 @@ mdf_pfn_t pfn_user;
 mdf_pfn_t pfn_free;
 mdf_pfn_t pfn_hwpoison;
 mdf_pfn_t pfn_offline;
+mdf_pfn_t pfn_pmem_metadata;
 mdf_pfn_t pfn_pmem_userdata;
 mdf_pfn_t pfn_elf_excluded;
 
@@ -6374,6 +6377,173 @@ exclude_range(mdf_pfn_t *counter, mdf_pfn_t pfn, mdf_pfn_t endpfn,
 	}
 }
 
+struct pmem_metadata_node {
+	unsigned long long start;
+	unsigned long long end;
+	struct pmem_metadata_node *next;
+} pmem_metadata_head;
+
+struct pmem_metadata_node *pmem_head = NULL;
+
+static void pmem_add_next(unsigned long long start, unsigned long long dataoff)
+{
+	struct pmem_metadata_node *tail = pmem_head, *node;
+
+	node = calloc(1, sizeof(*node));
+	if (!node)
+		return;
+
+	node->start = start >> info->page_shift;
+	node->end = (start + dataoff) >> info->page_shift;
+	node->next = NULL;
+
+	if (!pmem_head) {
+		pmem_head = node;
+		return;
+	}
+	while (tail->next)
+		tail = tail->next;
+	tail->next = node;
+}
+
+static void cleanup_pmem_metadata(void)
+{
+	struct pmem_metadata_node *head = pmem_head;
+	while (head) {
+		struct pmem_metadata_node *next = head->next;
+		free(head);
+		head = next;
+	}
+}
+
+static int is_pmem_metadata_range(unsigned long long start, unsigned long long end)
+{
+	struct pmem_metadata_node *head = pmem_head;
+	while (head) {
+		if (head->start <= start && head->end > end)
+			return TRUE;
+		head = head->next;
+	}
+
+	return FALSE;
+}
+
+static void dump_pmem_range(void)
+{
+	int i = 0;
+	struct pmem_metadata_node *node= pmem_head;
+
+	fprintf(stderr, "dump_pmem_range start......\n\n\n");
+	while (node) {
+		fprintf(stderr, "namespace[%d]: metadata[%llx, %llx]\n", i++, node->start, node->end);
+		node = node->next;
+	}
+	fprintf(stderr, "dump_pmem_range end........\n\n\n");
+}
+
+#define INFOBLOCK_SZ (8192)
+#define SZ_4K (4096)
+#define PFN_SIG_LEN 16
+
+typedef uint64_t u64;
+typedef int64_t s64;
+typedef uint32_t u32;
+typedef int32_t s32;
+typedef uint16_t u16;
+typedef int16_t s16;
+typedef uint8_t u8;
+typedef int8_t s8;
+
+typedef int64_t le64;
+typedef int32_t le32;
+typedef int16_t le16;
+
+struct pfn_sb {
+	u8 signature[PFN_SIG_LEN];
+	u8 uuid[16];
+	u8 parent_uuid[16];
+	le32 flags;
+	le16 version_major;
+	le16 version_minor;
+	le64 dataoff; /* relative to namespace_base + start_pad */
+	le64 npfns;
+	le32 mode;
+	/* minor-version-1 additions for section alignment */
+	le32 start_pad;
+	le32 end_trunc;
+	/* minor-version-2 record the base alignment of the mapping */
+	le32 align;
+	/* minor-version-3 guarantee the padding and flags are zero */
+	/* minor-version-4 record the page size and struct page size */
+	le32 page_size;
+	le16 page_struct_size;
+	u8 padding[3994];
+	le64 checksum;
+};
+
+static int nd_read_infoblock_dataoff(struct ndctl_namespace *ndns)
+{
+	int fd, rc;
+	char path[50];
+	char buf[INFOBLOCK_SZ + 1];
+	struct pfn_sb *pfn_sb = (struct pfn_sb *)(buf + SZ_4K);
+
+	sprintf(path, "/dev/%s", ndctl_namespace_get_block_device(ndns));
+
+	fd = open(path, O_RDONLY|O_EXCL);
+	if (fd < 0)
+		return -1;
+
+	rc = read(fd, buf, INFOBLOCK_SZ);
+	if (rc < INFOBLOCK_SZ) {
+		return -1;
+	}
+
+	return pfn_sb->dataoff;
+}
+
+int inspect_pmem_namespace(void)
+{
+	struct ndctl_ctx *ctx;
+	struct ndctl_bus *bus;
+	int rc = -1;
+
+	fprintf(stderr, "\n\ninspect_pmem_namespace!!\n\n");
+	rc = ndctl_new(&ctx);
+	if (rc)
+		return -1;
+
+	ndctl_bus_foreach(ctx, bus) {
+		struct ndctl_region *region;
+
+		ndctl_region_foreach(bus, region) {
+			struct ndctl_namespace *ndns;
+
+			ndctl_namespace_foreach(region, ndns) {
+				enum ndctl_namespace_mode mode;
+				long long start, end_metadata;
+
+				mode = ndctl_namespace_get_mode(ndns);
+				/* kdump kernel should set force_raw, mode become *safe* */
+				if (mode == NDCTL_NS_MODE_SAFE) {
+					fprintf(stderr, "Only raw can be dumpable\n");
+					continue;
+				}
+
+				start = ndctl_namespace_get_resource(ndns);
+				end_metadata = nd_read_infoblock_dataoff(ndns);
+
+				/* metadata really starts from 2M alignment */
+				if (start != ULLONG_MAX && end_metadata > 2 * 1024 * 1024) // 2M
+					pmem_add_next(start, end_metadata);
+			}
+		}
+	}
+
+	ndctl_unref(ctx);
+	return 0;
+}
+
 int
 __exclude_unnecessary_pages(unsigned long mem_map,
     mdf_pfn_t pfn_start, mdf_pfn_t pfn_end, struct cycle *cycle)
@@ -6447,9 +6617,17 @@ __exclude_unnecessary_pages(unsigned long mem_map,
 
 		is_pmem = is_pmem_pt_load_range(pfn << PAGESHIFT(), (pfn + 1) << PAGESHIFT());
 		if (is_pmem) {
-			pfn_pmem_userdata++;
-			clear_bit_on_2nd_bitmap_for_kernel(pfn, cycle);
-			continue;
+			if (is_pmem_metadata_range(pfn, pfn + 1)) {
+				if (info->dump_level & DL_EXCLUDE_PMEM_META) {
+					pfn_pmem_metadata++;
+					clear_bit_on_2nd_bitmap_for_kernel(pfn, cycle);
+					continue;
+				}
+			} else {
+				pfn_pmem_userdata++;
+				clear_bit_on_2nd_bitmap_for_kernel(pfn, cycle);
+				continue;
+			}
 		}
 
 		index_pg = pfn % PGMM_CACHED;
@@ -8130,7 +8308,7 @@ write_elf_pages_cyclic(struct cache_data *cd_header, struct cache_data *cd_page)
 	 * Reset counter for debug message.
 	 */
 	if (info->flag_cyclic) {
-		pfn_zero = pfn_cache = pfn_cache_private = 0;
+		pfn_zero = pfn_cache = pfn_cache_private = pfn_pmem_metadata = 0;
 		pfn_user = pfn_free = pfn_hwpoison = pfn_offline = pfn_pmem_userdata = 0;
 		pfn_memhole = info->max_mapnr;
 	}
@@ -9468,7 +9646,7 @@ write_kdump_pages_and_bitmap_cyclic(struct cache_data *cd_header, struct cache_d
 		/*
 		 * Reset counter for debug message.
 		 */
-		pfn_zero = pfn_cache = pfn_cache_private = 0;
+		pfn_zero = pfn_cache = pfn_cache_private = pfn_pmem_metadata = 0;
 		pfn_user = pfn_free = pfn_hwpoison = pfn_offline = pfn_pmem_userdata = 0;
 		pfn_memhole = info->max_mapnr;
 
@@ -10418,7 +10596,7 @@ print_report(void)
 	pfn_original = info->max_mapnr - pfn_memhole;
 
 	pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private + pfn_pmem_userdata
-	    + pfn_user + pfn_free + pfn_hwpoison + pfn_offline;
+	    + pfn_user + pfn_free + pfn_hwpoison + pfn_offline + pfn_pmem_metadata;
 
 	REPORT_MSG("\n");
 	REPORT_MSG("Original pages  : 0x%016llx\n", pfn_original);
@@ -10434,6 +10612,7 @@ print_report(void)
 	REPORT_MSG("    Free pages              : 0x%016llx\n", pfn_free);
 	REPORT_MSG("    Hwpoison pages          : 0x%016llx\n", pfn_hwpoison);
 	REPORT_MSG("    Offline pages           : 0x%016llx\n", pfn_offline);
+	REPORT_MSG("    pmem metadata pages     : 0x%016llx\n", pfn_pmem_metadata);
 	REPORT_MSG("    pmem userdata pages     : 0x%016llx\n", pfn_pmem_userdata);
 	REPORT_MSG("  Remaining pages  : 0x%016llx\n",
 	    pfn_original - pfn_excluded);
@@ -10475,7 +10654,7 @@ print_mem_usage(void)
 	pfn_original = info->max_mapnr - pfn_memhole;
 
 	pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private + pfn_pmem_userdata
-	    + pfn_user + pfn_free + pfn_hwpoison + pfn_offline;
+	    + pfn_user + pfn_free + pfn_hwpoison + pfn_offline + pfn_pmem_metadata;
 	shrinking = (pfn_original - pfn_excluded) * 100;
 	shrinking = shrinking / pfn_original;
 	total_size = info->page_size * pfn_original;
@@ -10768,6 +10947,8 @@ create_dumpfile(void)
 		}
 	}
 
+	inspect_pmem_namespace();
+	dump_pmem_range();
 	print_vtop();
 
 	num_retry = 0;
@@ -12441,6 +12622,7 @@ out:
 		}
 	}
 	free_elf_info();
+	cleanup_pmem_metadata();
 
 	return retcd;
 }
diff --git a/makedumpfile.h b/makedumpfile.h
index 85e5a4932983..ecb2fb4d7a4c 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -206,7 +206,7 @@ test_bit(int nr, unsigned long addr)
  * Dump Level
  */
 #define MIN_DUMP_LEVEL		(0)
-#define MAX_DUMP_LEVEL		(31)
+#define MAX_DUMP_LEVEL		(63)
 #define NUM_ARRAY_DUMP_LEVEL	(MAX_DUMP_LEVEL + 1) /* enough to allocate
 							all the dump_level */
 #define DL_EXCLUDE_ZERO		(0x001) /* Exclude Pages filled with Zeros */
@@ -216,6 +216,7 @@ test_bit(int nr, unsigned long addr)
 				           with Private Pages */
 #define DL_EXCLUDE_USER_DATA	(0x008) /* Exclude UserProcessData Pages */
 #define DL_EXCLUDE_FREE		(0x010)	/* Exclude Free Pages */
+#define DL_EXCLUDE_PMEM_META   (0x020) /* Exclude pmem metadata Pages */
 
 
 /*
-- 
2.29.2


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH makedumpfile v3 3/3] makedumpfile: get metadata boundaries from pmem's infoblock
@ 2023-06-02 10:26   ` Li Zhijian
  0 siblings, 0 replies; 22+ messages in thread
From: Li Zhijian @ 2023-06-02 10:26 UTC (permalink / raw)
  To: kexec, nvdimm
  Cc: linux-kernel, dan.j.williams, bhe, ruansy.fnst, y-goto, yangx.jy,
	Li Zhijian, Vivek Goyal, Dave Young

some code are copied from ndctl.

This change requires libndctl which provides interface to walk through
all existing namespaces.

This also requires the namespace entered the force_raw mode(the kernel will
ensure that).

The resource interface provides the start of namespace, and device's
superblock provides the usedata's offset of namespace. According to this
information, we can caculate the scope of metadata.

A new dump level(-d 63) is introduced to skip the metadata as well in
this patch.

CC: Baoquan He <bhe@redhat.com>
CC: Vivek Goyal <vgoyal@redhat.com>
CC: Dave Young <dyoung@redhat.com>
CC: kexec@lists.infradead.org
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
 Makefile       |   2 +-
 makedumpfile.c | 196 +++++++++++++++++++++++++++++++++++++++++++++++--
 makedumpfile.h |   3 +-
 3 files changed, 192 insertions(+), 9 deletions(-)

diff --git a/Makefile b/Makefile
index 0608035e913f..fd0a792c5647 100644
--- a/Makefile
+++ b/Makefile
@@ -50,7 +50,7 @@ OBJ_PART=$(patsubst %.c,%.o,$(SRC_PART))
 SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c arch/loongarch64.c
 OBJ_ARCH=$(patsubst %.c,%.o,$(SRC_ARCH))
 
-LIBS = -ldw -lbz2 -ldl -lelf -lz
+LIBS = -ldw -lbz2 -ldl -lelf -lz -lndctl
 ifneq ($(LINKTYPE), dynamic)
 LIBS := -static $(LIBS) -llzma
 endif
diff --git a/makedumpfile.c b/makedumpfile.c
index f304f752b0ec..b68d261f3d1e 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -27,6 +27,8 @@
 #include <limits.h>
 #include <assert.h>
 #include <zlib.h>
+#include <sys/types.h>
+#include <ndctl/libndctl.h>
 
 struct symbol_table	symbol_table;
 struct size_table	size_table;
@@ -100,6 +102,7 @@ mdf_pfn_t pfn_user;
 mdf_pfn_t pfn_free;
 mdf_pfn_t pfn_hwpoison;
 mdf_pfn_t pfn_offline;
+mdf_pfn_t pfn_pmem_metadata;
 mdf_pfn_t pfn_pmem_userdata;
 mdf_pfn_t pfn_elf_excluded;
 
@@ -6374,6 +6377,173 @@ exclude_range(mdf_pfn_t *counter, mdf_pfn_t pfn, mdf_pfn_t endpfn,
 	}
 }
 
+struct pmem_metadata_node {
+	unsigned long long start;
+	unsigned long long end;
+	struct pmem_metadata_node *next;
+} pmem_metadata_head;
+
+struct pmem_metadata_node *pmem_head = NULL;
+
+static void pmem_add_next(unsigned long long start, unsigned long long dataoff)
+{
+	struct pmem_metadata_node *tail = pmem_head, *node;
+
+	node = calloc(1, sizeof(*node));
+	if (!node)
+		return;
+
+	node->start = start >> info->page_shift;
+	node->end = (start + dataoff) >> info->page_shift;
+	node->next = NULL;
+
+	if (!pmem_head) {
+		pmem_head = node;
+		return;
+	}
+	while (tail->next)
+		tail = tail->next;
+	tail->next = node;
+}
+
+static void cleanup_pmem_metadata(void)
+{
+	struct pmem_metadata_node *head = pmem_head;
+	while (head) {
+		struct pmem_metadata_node *next = head->next;
+		free(head);
+		head = next;
+	}
+}
+
+static int is_pmem_metadata_range(unsigned long long start, unsigned long long end)
+{
+	struct pmem_metadata_node *head = pmem_head;
+	while (head) {
+		if (head->start <= start && head->end > end)
+			return TRUE;
+		head = head->next;
+	}
+
+	return FALSE;
+}
+
+static void dump_pmem_range(void)
+{
+	int i = 0;
+	struct pmem_metadata_node *node= pmem_head;
+
+	fprintf(stderr, "dump_pmem_range start......\n\n\n");
+	while (node) {
+		fprintf(stderr, "namespace[%d]: metadata[%llx, %llx]\n", i++, node->start, node->end);
+		node = node->next;
+	}
+	fprintf(stderr, "dump_pmem_range end........\n\n\n");
+}
+
+#define INFOBLOCK_SZ (8192)
+#define SZ_4K (4096)
+#define PFN_SIG_LEN 16
+
+typedef uint64_t u64;
+typedef int64_t s64;
+typedef uint32_t u32;
+typedef int32_t s32;
+typedef uint16_t u16;
+typedef int16_t s16;
+typedef uint8_t u8;
+typedef int8_t s8;
+
+typedef int64_t le64;
+typedef int32_t le32;
+typedef int16_t le16;
+
+struct pfn_sb {
+	u8 signature[PFN_SIG_LEN];
+	u8 uuid[16];
+	u8 parent_uuid[16];
+	le32 flags;
+	le16 version_major;
+	le16 version_minor;
+	le64 dataoff; /* relative to namespace_base + start_pad */
+	le64 npfns;
+	le32 mode;
+	/* minor-version-1 additions for section alignment */
+	le32 start_pad;
+	le32 end_trunc;
+	/* minor-version-2 record the base alignment of the mapping */
+	le32 align;
+	/* minor-version-3 guarantee the padding and flags are zero */
+	/* minor-version-4 record the page size and struct page size */
+	le32 page_size;
+	le16 page_struct_size;
+	u8 padding[3994];
+	le64 checksum;
+};
+
+static int nd_read_infoblock_dataoff(struct ndctl_namespace *ndns)
+{
+	int fd, rc;
+	char path[50];
+	char buf[INFOBLOCK_SZ + 1];
+	struct pfn_sb *pfn_sb = (struct pfn_sb *)(buf + SZ_4K);
+
+	sprintf(path, "/dev/%s", ndctl_namespace_get_block_device(ndns));
+
+	fd = open(path, O_RDONLY|O_EXCL);
+	if (fd < 0)
+		return -1;
+
+	rc = read(fd, buf, INFOBLOCK_SZ);
+	if (rc < INFOBLOCK_SZ) {
+		return -1;
+	}
+
+	return pfn_sb->dataoff;
+}
+
+int inspect_pmem_namespace(void)
+{
+	struct ndctl_ctx *ctx;
+	struct ndctl_bus *bus;
+	int rc = -1;
+
+	fprintf(stderr, "\n\ninspect_pmem_namespace!!\n\n");
+	rc = ndctl_new(&ctx);
+	if (rc)
+		return -1;
+
+	ndctl_bus_foreach(ctx, bus) {
+		struct ndctl_region *region;
+
+		ndctl_region_foreach(bus, region) {
+			struct ndctl_namespace *ndns;
+
+			ndctl_namespace_foreach(region, ndns) {
+				enum ndctl_namespace_mode mode;
+				long long start, end_metadata;
+
+				mode = ndctl_namespace_get_mode(ndns);
+				/* kdump kernel should set force_raw, mode become *safe* */
+				if (mode == NDCTL_NS_MODE_SAFE) {
+					fprintf(stderr, "Only raw can be dumpable\n");
+					continue;
+				}
+
+				start = ndctl_namespace_get_resource(ndns);
+				end_metadata = nd_read_infoblock_dataoff(ndns);
+
+				/* metadata really starts from 2M alignment */
+				if (start != ULLONG_MAX && end_metadata > 2 * 1024 * 1024) // 2M
+					pmem_add_next(start, end_metadata);
+			}
+		}
+	}
+
+	ndctl_unref(ctx);
+	return 0;
+}
+
 int
 __exclude_unnecessary_pages(unsigned long mem_map,
     mdf_pfn_t pfn_start, mdf_pfn_t pfn_end, struct cycle *cycle)
@@ -6447,9 +6617,17 @@ __exclude_unnecessary_pages(unsigned long mem_map,
 
 		is_pmem = is_pmem_pt_load_range(pfn << PAGESHIFT(), (pfn + 1) << PAGESHIFT());
 		if (is_pmem) {
-			pfn_pmem_userdata++;
-			clear_bit_on_2nd_bitmap_for_kernel(pfn, cycle);
-			continue;
+			if (is_pmem_metadata_range(pfn, pfn + 1)) {
+				if (info->dump_level & DL_EXCLUDE_PMEM_META) {
+					pfn_pmem_metadata++;
+					clear_bit_on_2nd_bitmap_for_kernel(pfn, cycle);
+					continue;
+				}
+			} else {
+				pfn_pmem_userdata++;
+				clear_bit_on_2nd_bitmap_for_kernel(pfn, cycle);
+				continue;
+			}
 		}
 
 		index_pg = pfn % PGMM_CACHED;
@@ -8130,7 +8308,7 @@ write_elf_pages_cyclic(struct cache_data *cd_header, struct cache_data *cd_page)
 	 * Reset counter for debug message.
 	 */
 	if (info->flag_cyclic) {
-		pfn_zero = pfn_cache = pfn_cache_private = 0;
+		pfn_zero = pfn_cache = pfn_cache_private = pfn_pmem_metadata = 0;
 		pfn_user = pfn_free = pfn_hwpoison = pfn_offline = pfn_pmem_userdata = 0;
 		pfn_memhole = info->max_mapnr;
 	}
@@ -9468,7 +9646,7 @@ write_kdump_pages_and_bitmap_cyclic(struct cache_data *cd_header, struct cache_d
 		/*
 		 * Reset counter for debug message.
 		 */
-		pfn_zero = pfn_cache = pfn_cache_private = 0;
+		pfn_zero = pfn_cache = pfn_cache_private = pfn_pmem_metadata = 0;
 		pfn_user = pfn_free = pfn_hwpoison = pfn_offline = pfn_pmem_userdata = 0;
 		pfn_memhole = info->max_mapnr;
 
@@ -10418,7 +10596,7 @@ print_report(void)
 	pfn_original = info->max_mapnr - pfn_memhole;
 
 	pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private + pfn_pmem_userdata
-	    + pfn_user + pfn_free + pfn_hwpoison + pfn_offline;
+	    + pfn_user + pfn_free + pfn_hwpoison + pfn_offline + pfn_pmem_metadata;
 
 	REPORT_MSG("\n");
 	REPORT_MSG("Original pages  : 0x%016llx\n", pfn_original);
@@ -10434,6 +10612,7 @@ print_report(void)
 	REPORT_MSG("    Free pages              : 0x%016llx\n", pfn_free);
 	REPORT_MSG("    Hwpoison pages          : 0x%016llx\n", pfn_hwpoison);
 	REPORT_MSG("    Offline pages           : 0x%016llx\n", pfn_offline);
+	REPORT_MSG("    pmem metadata pages     : 0x%016llx\n", pfn_pmem_metadata);
 	REPORT_MSG("    pmem userdata pages     : 0x%016llx\n", pfn_pmem_userdata);
 	REPORT_MSG("  Remaining pages  : 0x%016llx\n",
 	    pfn_original - pfn_excluded);
@@ -10475,7 +10654,7 @@ print_mem_usage(void)
 	pfn_original = info->max_mapnr - pfn_memhole;
 
 	pfn_excluded = pfn_zero + pfn_cache + pfn_cache_private + pfn_pmem_userdata
-	    + pfn_user + pfn_free + pfn_hwpoison + pfn_offline;
+	    + pfn_user + pfn_free + pfn_hwpoison + pfn_offline + pfn_pmem_metadata;
 	shrinking = (pfn_original - pfn_excluded) * 100;
 	shrinking = shrinking / pfn_original;
 	total_size = info->page_size * pfn_original;
@@ -10768,6 +10947,8 @@ create_dumpfile(void)
 		}
 	}
 
+	inspect_pmem_namespace();
+	dump_pmem_range();
 	print_vtop();
 
 	num_retry = 0;
@@ -12441,6 +12622,7 @@ out:
 		}
 	}
 	free_elf_info();
+	cleanup_pmem_metadata();
 
 	return retcd;
 }
diff --git a/makedumpfile.h b/makedumpfile.h
index 85e5a4932983..ecb2fb4d7a4c 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -206,7 +206,7 @@ test_bit(int nr, unsigned long addr)
  * Dump Level
  */
 #define MIN_DUMP_LEVEL		(0)
-#define MAX_DUMP_LEVEL		(31)
+#define MAX_DUMP_LEVEL		(63)
 #define NUM_ARRAY_DUMP_LEVEL	(MAX_DUMP_LEVEL + 1) /* enough to allocate
 							all the dump_level */
 #define DL_EXCLUDE_ZERO		(0x001) /* Exclude Pages filled with Zeros */
@@ -216,6 +216,7 @@ test_bit(int nr, unsigned long addr)
 				           with Private Pages */
 #define DL_EXCLUDE_USER_DATA	(0x008) /* Exclude UserProcessData Pages */
 #define DL_EXCLUDE_FREE		(0x010)	/* Exclude Free Pages */
+#define DL_EXCLUDE_PMEM_META   (0x020) /* Exclude pmem metadata Pages */
 
 
 /*
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 0/3] pmem memmap dump support
  2023-06-02 10:26 ` Li Zhijian
@ 2023-06-04 12:59   ` Baoquan He
  -1 siblings, 0 replies; 22+ messages in thread
From: Baoquan He @ 2023-06-04 12:59 UTC (permalink / raw)
  To: Li Zhijian, dan.j.williams
  Cc: kexec, nvdimm, Kazuhito Hagio, Simon Horman, linux-kernel,
	dan.j.williams, ruansy.fnst, y-goto, yangx.jy, Borislav Petkov,
	Dave Hansen, Dave Jiang, Dave Young, Eric Biederman,
	H. Peter Anvin, Ingo Molnar, Ira Weiny, Thomas Gleixner,
	Vishal Verma, Vivek Goyal, x86

Hi Zhijian,

On 06/02/23 at 06:26pm, Li Zhijian wrote:
> Hello folks,
> 
> After sending out the previous version of the patch set, we received some comments,
> and we really appreciate your input. However, as you can see, the current patch
> set is still in its early stages, especially in terms of the solution selection,
> which may still undergo changes.

Thanks for the effort to make it and improve. And add Kazu and Simon to
the CC because they maintain kexec-tools and makedumpfile utility.

For better reviewing, I would suggest splitting the patches into
differet patchset for different component or repo. Here, it's obviouly
has kernel patchset, kexec-tools patch and makedumpfile patchset.

For the kernel patches, it looks straightforward and clear, if Dan can
approve it from nvdimm side, everything should be fine. Then next we can
focus on the relevant kexec-tools and makedumpfile utility support.

Thanks
Baoquan


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 0/3] pmem memmap dump support
@ 2023-06-04 12:59   ` Baoquan He
  0 siblings, 0 replies; 22+ messages in thread
From: Baoquan He @ 2023-06-04 12:59 UTC (permalink / raw)
  To: Li Zhijian, dan.j.williams
  Cc: kexec, nvdimm, Kazuhito Hagio, Simon Horman, linux-kernel,
	dan.j.williams, ruansy.fnst, y-goto, yangx.jy, Borislav Petkov,
	Dave Hansen, Dave Jiang, Dave Young, Eric Biederman,
	H. Peter Anvin, Ingo Molnar, Ira Weiny, Thomas Gleixner,
	Vishal Verma, Vivek Goyal, x86

Hi Zhijian,

On 06/02/23 at 06:26pm, Li Zhijian wrote:
> Hello folks,
> 
> After sending out the previous version of the patch set, we received some comments,
> and we really appreciate your input. However, as you can see, the current patch
> set is still in its early stages, especially in terms of the solution selection,
> which may still undergo changes.

Thanks for the effort to make it and improve. And add Kazu and Simon to
the CC because they maintain kexec-tools and makedumpfile utility.

For better reviewing, I would suggest splitting the patches into
differet patchset for different component or repo. Here, it's obviouly
has kernel patchset, kexec-tools patch and makedumpfile patchset.

For the kernel patches, it looks straightforward and clear, if Dan can
approve it from nvdimm side, everything should be fine. Then next we can
focus on the relevant kexec-tools and makedumpfile utility support.

Thanks
Baoquan


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 0/3] pmem memmap dump support
  2023-06-04 12:59   ` Baoquan He
@ 2023-06-09  1:21     ` Zhijian Li (Fujitsu)
  -1 siblings, 0 replies; 22+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-06-09  1:21 UTC (permalink / raw)
  To: Baoquan He, dan.j.williams
  Cc: kexec, nvdimm, Kazuhito Hagio, Simon Horman, linux-kernel,
	Shiyang Ruan (Fujitsu), Yasunori Gotou (Fujitsu),
	Xiao Yang (Fujitsu),
	Borislav Petkov, Dave Hansen, Dave Jiang, Dave Young,
	Eric Biederman, H. Peter Anvin, Ingo Molnar, Ira Weiny,
	Thomas Gleixner, Vishal Verma, Vivek Goyal, x86

Baoquan,


On 04/06/2023 20:59, Baoquan He wrote:
> Hi Zhijian,
> 
> On 06/02/23 at 06:26pm, Li Zhijian wrote:
>> Hello folks,
>>
>> After sending out the previous version of the patch set, we received some comments,
>> and we really appreciate your input. However, as you can see, the current patch
>> set is still in its early stages, especially in terms of the solution selection,
>> which may still undergo changes.
> 
> Thanks for the effort to make it and improve. And add Kazu and Simon to
> the CC because they maintain kexec-tools and makedumpfile utility.
> 
> For better reviewing, I would suggest splitting the patches into
> differet patchset for different component or repo. Here, it's obviouly
> has kernel patchset, kexec-tools patch and makedumpfile patchset.

Thank you very much for your feedback.
Agreed, i will split them out if we can reach a basic proposal.


Thanks
Zhijian

> 
> For the kernel patches, it looks straightforward and clear, if Dan can
> approve it from nvdimm side, everything should be fine. Then next we can
> focus on the relevant kexec-tools and makedumpfile utility support.
> 
> Thanks
> Baoquan
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 0/3] pmem memmap dump support
@ 2023-06-09  1:21     ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 22+ messages in thread
From: Zhijian Li (Fujitsu) @ 2023-06-09  1:21 UTC (permalink / raw)
  To: Baoquan He, dan.j.williams
  Cc: kexec, nvdimm, Kazuhito Hagio, Simon Horman, linux-kernel,
	Shiyang Ruan (Fujitsu), Yasunori Gotou (Fujitsu),
	Xiao Yang (Fujitsu),
	Borislav Petkov, Dave Hansen, Dave Jiang, Dave Young,
	Eric Biederman, H. Peter Anvin, Ingo Molnar, Ira Weiny,
	Thomas Gleixner, Vishal Verma, Vivek Goyal, x86

Baoquan,


On 04/06/2023 20:59, Baoquan He wrote:
> Hi Zhijian,
> 
> On 06/02/23 at 06:26pm, Li Zhijian wrote:
>> Hello folks,
>>
>> After sending out the previous version of the patch set, we received some comments,
>> and we really appreciate your input. However, as you can see, the current patch
>> set is still in its early stages, especially in terms of the solution selection,
>> which may still undergo changes.
> 
> Thanks for the effort to make it and improve. And add Kazu and Simon to
> the CC because they maintain kexec-tools and makedumpfile utility.
> 
> For better reviewing, I would suggest splitting the patches into
> differet patchset for different component or repo. Here, it's obviouly
> has kernel patchset, kexec-tools patch and makedumpfile patchset.

Thank you very much for your feedback.
Agreed, i will split them out if we can reach a basic proposal.


Thanks
Zhijian

> 
> For the kernel patches, it looks straightforward and clear, if Dan can
> approve it from nvdimm side, everything should be fine. Then next we can
> focus on the relevant kexec-tools and makedumpfile utility support.
> 
> Thanks
> Baoquan
> 
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 0/3] pmem memmap dump support
  2023-06-02 10:26 ` Li Zhijian
@ 2023-06-25 10:27   ` Li, Zhijian
  -1 siblings, 0 replies; 22+ messages in thread
From: Li, Zhijian @ 2023-06-25 10:27 UTC (permalink / raw)
  To: kexec, nvdimm
  Cc: linux-kernel, dan.j.williams, bhe, ruansy.fnst, y-goto, yangx.jy,
	Borislav Petkov, Dave Hansen, Dave Jiang, Dave Young,
	Eric Biederman, H. Peter Anvin, Ingo Molnar, Ira Weiny,
	Thomas Gleixner, Vishal Verma, Vivek Goyal, x86

kindly ping, I do need your feedback especially some voices from the 
nvdimm. If there are any clarifications needed or if my initial email 
requires further details, please do not hesitate to let me know. I am 
more than willing to provide more additional information.

Thanks
Zhijian

on 6/2/2023 6:26 PM, Li Zhijian wrote:
> Hello folks,
>
> After sending out the previous version of the patch set, we received some comments,
> and we really appreciate your input. However, as you can see, the current patch
> set is still in its early stages, especially in terms of the solution selection,
> which may still undergo changes.
>
> Changes in V3:
> Mainly based on the understanding from the first version, I implemented the proposal
> suggested by Dan. In the kdump kernel, the device's superblock is read through
> a device file interface to calculate the metadata range. In the second version,
> the first kernel writes the metadata range to vmcoreinfo, and after kdump occurs,
> the kdump kernel can directly read it from /proc/vmcore.
>
> Comparing these two approaches, the advantage of Version 3 is fewer kernel
> modifications, but the downside is the introduction of a new external library,
> libndctl, to search for each namespace, which introduces a higher level of
> coupling with ndctl.
>
> One important thing to note about both V2 and V3 is the introduction of a new
> ELF flag, PF_DEV, to indicate whether a range is on a device. I'm not sure if
> there are better alternatives or if we can use this flag internally without
> exposing it in elf.h.
>
> We greatly appreciate your feedback and would like to hear your response.
>
> In RFC stage, I folded these 3 projects in this same cover letter for reviewing convenience.
> kernel(3):
>    nvdimm: set force_raw=1 in kdump kernel
>    x86/crash: Add pmem region into PT_LOADs of vmcore
>    kernel/kexec_file: Mark pmem region with new flag PF_DEV
> kexec-tools(1):
>    kexec: Add and mark pmem region into PT_LOADs
> makedumpfile(3):
>    elf_info.c: Introduce is_pmem_pt_load_range
>    makedumpfile.c: Exclude all pmem pages
>    makedumpfile: get metadata boundaries from pmem's infoblock
>
> Currently, this RFC has already implemented to supported case D*. And the case A&B is disabled
> deliberately in makedumpfile.
> ---
>
> pmem memmap can also be called pmem metadata here.
>
> ### Background and motivate overview ###
> ---
> Crash dump is an important feature for trouble shooting of kernel. It is the final way to chase what
> happened at the kernel panic, slowdown, and so on. It is the most important tool for customer support.
> However, a part of data on pmem is not included in crash dump, it may cause difficulty to analyze
> trouble around pmem (especially Filesystem-DAX).
>
> A pmem namespace in "fsdax" or "devdax" mode requires allocation of per-page metadata[1]. The allocation
> can be drawn from either mem(system memory) or dev(pmem device), see `ndctl help create-namespace` for
> more details. In fsdax, struct page array becomes very important, it is one of the key data to find
> status of reverse map.
>
> So, when metadata was stored in pmem, even pmem's per-page metadata will not be dumped. That means
> troubleshooters are unable to check more details about pmem from the dumpfile.
>
> ### Make pmem memmap dump support ###
> ---
> Our goal is that whether metadata is stored on mem or pmem, its metadata can be dumped and then the
> crash-utilities can read more details about the pmem. Of course, this feature can be enabled/disabled.
>
> First, based on our previous investigation, according to the location of metadata and the scope of
> dump, we can divide it into the following four cases: A, B, C, D.
> It should be noted that although we mentioned case A&B below, we do not want these two cases to be
> part of this feature, because dumping the entire pmem will consume a lot of space, and more importantly,
> it may contain user sensitive data.
>
> +-------------+----------+------------+
> |\+--------+\     metadata location   |
> |            ++-----------------------+
> | dump scope  |  mem     |   PMEM     |
> +-------------+----------+------------+
> | entire pmem |     A    |     B      |
> +-------------+----------+------------+
> | metadata    |     C    |     D      |
> +-------------+----------+------------+
>
> ### Testing ###
> Only x86_64 are tested. Please note that we have to disable the 2nd kernel's libnvdimm to ensure the
> metadata in 2nd kernel will not be touched again.
>
> below 2 commits use sha256 to check the metadata in 1st kernel during panic and makedumpfile in 2nd kernel.
> https://github.com/zhijianli88/makedumpfile/commit/91a135be6980e6e87b9e00b909aaaf8ef9566ec0
> https://github.com/zhijianli88/linux/commit/55bef07f8f0b2e587737b796e73b92f242947e5a
>
> ### TODO ###
> Only x86 are fully supported for both kexec_load(2) and kexec_file_load(2)
> kexec_file_load(2) on other architectures are TODOs.
> ---
> [1] Pmem region layout:
>     ^<--namespace0.0---->^<--namespace0.1------>^
>     |                    |                      |
>     +--+m----------------+--+m------------------+---------------------+-+a
>     |++|e                |++|e                  |                     |+|l
>     |++|t                |++|t                  |                     |+|i
>     |++|a                |++|a                  |                     |+|g
>     |++|d  namespace0.0  |++|d  namespace0.1    |     un-allocated    |+|n
>     |++|a    fsdax       |++|a     devdax       |                     |+|m
>     |++|t                |++|t                  |                     |+|e
>     +--+a----------------+--+a------------------+---------------------+-+n
>     |                                                                   |t
>     v<-----------------------pmem region------------------------------->v
>
> [2] https://lore.kernel.org/linux-mm/70F971CF-1A96-4D87-B70C-B971C2A1747C@roc.cs.umass.edu/T/
> [3] https://lore.kernel.org/linux-mm/3c752fc2-b6a0-2975-ffec-dba3edcf4155@fujitsu.com/
>
> ### makedumpfile output in case B ####
> kdump.sh[224]: makedumpfile: version 1.7.2++ (released on 20 Oct 2022)
> kdump.sh[224]: command line: makedumpfile -l --message-level 31 -d 31 /proc/vmcore /sysroot/var/crash/127.0.0.1-2023-04-21-02:50:57//vmcore-incomplete
> kdump.sh[224]: sadump: does not have partition header
> kdump.sh[224]: sadump: read dump device as unknown format
> kdump.sh[224]: sadump: unknown format
> kdump.sh[224]:                phys_start         phys_end       virt_start         virt_end  is_pmem
> kdump.sh[224]: LOAD[ 0]          1000000          3c26000 ffffffff81000000 ffffffff83c26000    false
> kdump.sh[224]: LOAD[ 1]           100000         7f000000 ffff888000100000 ffff88807f000000    false
> kdump.sh[224]: LOAD[ 2]         bf000000         bffd7000 ffff8880bf000000 ffff8880bffd7000    false
> kdump.sh[224]: LOAD[ 3]        100000000        140000000 ffff888100000000 ffff888140000000    false
> kdump.sh[224]: LOAD[ 4]        140000000        23e200000 ffff888140000000 ffff88823e200000     true
> kdump.sh[224]: Linux kdump
> kdump.sh[224]: VMCOREINFO   :
> kdump.sh[224]:   OSRELEASE=6.3.0-rc3-pmem-bad+
> kdump.sh[224]:   BUILD-ID=0546bd82db93706799d3eea38194ac648790aa85
> kdump.sh[224]:   PAGESIZE=4096
> kdump.sh[224]: page_size    : 4096
> kdump.sh[224]:   SYMBOL(init_uts_ns)=ffffffff82671300
> kdump.sh[224]:   OFFSET(uts_namespace.name)=0
> kdump.sh[224]:   SYMBOL(node_online_map)=ffffffff826bbe08
> kdump.sh[224]:   SYMBOL(swapper_pg_dir)=ffffffff82446000
> kdump.sh[224]:   SYMBOL(_stext)=ffffffff81000000
> kdump.sh[224]:   SYMBOL(vmap_area_list)=ffffffff82585fb0
> kdump.sh[224]:   SYMBOL(devm_memmap_vmcore_head)=ffffffff825603c0
> kdump.sh[224]:   SIZE(devm_memmap_vmcore)=40
> kdump.sh[224]:   OFFSET(devm_memmap_vmcore.entry)=0
> kdump.sh[224]:   OFFSET(devm_memmap_vmcore.start)=16
> kdump.sh[224]:   OFFSET(devm_memmap_vmcore.end)=24
> kdump.sh[224]:   SYMBOL(mem_section)=ffff88813fff4000
> kdump.sh[224]:   LENGTH(mem_section)=2048
> kdump.sh[224]:   SIZE(mem_section)=16
> kdump.sh[224]:   OFFSET(mem_section.section_mem_map)=0
> ...
> kdump.sh[224]: STEP [Checking for memory holes  ] : 0.012699 seconds
> kdump.sh[224]: STEP [Excluding unnecessary pages] : 0.538059 seconds
> kdump.sh[224]: STEP [Copying data               ] : 0.995418 seconds
> kdump.sh[224]: STEP [Copying data               ] : 0.000067 seconds
> kdump.sh[224]: Writing erase info...
> kdump.sh[224]: offset_eraseinfo: 5d02266, size_eraseinfo: 0
> kdump.sh[224]: Original pages  : 0x00000000001c0cfd
> kdump.sh[224]:   Excluded pages   : 0x00000000001a58d2
> kdump.sh[224]:     Pages filled with zero  : 0x0000000000006805
> kdump.sh[224]:     Non-private cache pages : 0x0000000000019e93
> kdump.sh[224]:     Private cache pages     : 0x0000000000077572
> kdump.sh[224]:     User process data pages : 0x0000000000002c3b
> kdump.sh[224]:     Free pages              : 0x0000000000010e8d
> kdump.sh[224]:     Hwpoison pages          : 0x0000000000000000
> kdump.sh[224]:     Offline pages           : 0x0000000000000000
> kdump.sh[224]:     pmem metadata pages     : 0x0000000000000000
> kdump.sh[224]:     pmem userdata pages     : 0x00000000000fa200
> kdump.sh[224]:   Remaining pages  : 0x000000000001b42b
> kdump.sh[224]:   (The number of pages is reduced to 6%.)
> kdump.sh[224]: Memory Hole     : 0x000000000007d503
> kdump.sh[224]: --------------------------------------------------
> kdump.sh[224]: Total pages     : 0x000000000023e200
> kdump.sh[224]: Write bytes     : 97522590
> kdump.sh[224]: Cache hit: 191669, miss: 292, hit rate: 99.8%
> kdump.sh[224]: The dumpfile is saved to /sysroot/var/crash/127.0.0.1-2023-04-21-02:50:57//vmcore-incomplete.
> kdump.sh[224]: makedumpfile Completed.
>
> CC: Baoquan He <bhe@redhat.com>
> CC: Borislav Petkov <bp@alien8.de>
> CC: Dan Williams <dan.j.williams@intel.com>
> CC: Dave Hansen <dave.hansen@linux.intel.com>
> CC: Dave Jiang <dave.jiang@intel.com>
> CC: Dave Young <dyoung@redhat.com>
> CC: Eric Biederman <ebiederm@xmission.com>
> CC: "H. Peter Anvin" <hpa@zytor.com>
> CC: Ingo Molnar <mingo@redhat.com>
> CC: Ira Weiny <ira.weiny@intel.com>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Vishal Verma <vishal.l.verma@intel.com>
> CC: Vivek Goyal <vgoyal@redhat.com>
> CC: x86@kernel.org
> CC: kexec@lists.infradead.org
> CC: nvdimm@lists.linux.dev
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v3 0/3] pmem memmap dump support
@ 2023-06-25 10:27   ` Li, Zhijian
  0 siblings, 0 replies; 22+ messages in thread
From: Li, Zhijian @ 2023-06-25 10:27 UTC (permalink / raw)
  To: kexec, nvdimm
  Cc: linux-kernel, dan.j.williams, bhe, ruansy.fnst, y-goto, yangx.jy,
	Borislav Petkov, Dave Hansen, Dave Jiang, Dave Young,
	Eric Biederman, H. Peter Anvin, Ingo Molnar, Ira Weiny,
	Thomas Gleixner, Vishal Verma, Vivek Goyal, x86

kindly ping, I do need your feedback especially some voices from the 
nvdimm. If there are any clarifications needed or if my initial email 
requires further details, please do not hesitate to let me know. I am 
more than willing to provide more additional information.

Thanks
Zhijian

on 6/2/2023 6:26 PM, Li Zhijian wrote:
> Hello folks,
>
> After sending out the previous version of the patch set, we received some comments,
> and we really appreciate your input. However, as you can see, the current patch
> set is still in its early stages, especially in terms of the solution selection,
> which may still undergo changes.
>
> Changes in V3:
> Mainly based on the understanding from the first version, I implemented the proposal
> suggested by Dan. In the kdump kernel, the device's superblock is read through
> a device file interface to calculate the metadata range. In the second version,
> the first kernel writes the metadata range to vmcoreinfo, and after kdump occurs,
> the kdump kernel can directly read it from /proc/vmcore.
>
> Comparing these two approaches, the advantage of Version 3 is fewer kernel
> modifications, but the downside is the introduction of a new external library,
> libndctl, to search for each namespace, which introduces a higher level of
> coupling with ndctl.
>
> One important thing to note about both V2 and V3 is the introduction of a new
> ELF flag, PF_DEV, to indicate whether a range is on a device. I'm not sure if
> there are better alternatives or if we can use this flag internally without
> exposing it in elf.h.
>
> We greatly appreciate your feedback and would like to hear your response.
>
> In RFC stage, I folded these 3 projects in this same cover letter for reviewing convenience.
> kernel(3):
>    nvdimm: set force_raw=1 in kdump kernel
>    x86/crash: Add pmem region into PT_LOADs of vmcore
>    kernel/kexec_file: Mark pmem region with new flag PF_DEV
> kexec-tools(1):
>    kexec: Add and mark pmem region into PT_LOADs
> makedumpfile(3):
>    elf_info.c: Introduce is_pmem_pt_load_range
>    makedumpfile.c: Exclude all pmem pages
>    makedumpfile: get metadata boundaries from pmem's infoblock
>
> Currently, this RFC has already implemented to supported case D*. And the case A&B is disabled
> deliberately in makedumpfile.
> ---
>
> pmem memmap can also be called pmem metadata here.
>
> ### Background and motivate overview ###
> ---
> Crash dump is an important feature for trouble shooting of kernel. It is the final way to chase what
> happened at the kernel panic, slowdown, and so on. It is the most important tool for customer support.
> However, a part of data on pmem is not included in crash dump, it may cause difficulty to analyze
> trouble around pmem (especially Filesystem-DAX).
>
> A pmem namespace in "fsdax" or "devdax" mode requires allocation of per-page metadata[1]. The allocation
> can be drawn from either mem(system memory) or dev(pmem device), see `ndctl help create-namespace` for
> more details. In fsdax, struct page array becomes very important, it is one of the key data to find
> status of reverse map.
>
> So, when metadata was stored in pmem, even pmem's per-page metadata will not be dumped. That means
> troubleshooters are unable to check more details about pmem from the dumpfile.
>
> ### Make pmem memmap dump support ###
> ---
> Our goal is that whether metadata is stored on mem or pmem, its metadata can be dumped and then the
> crash-utilities can read more details about the pmem. Of course, this feature can be enabled/disabled.
>
> First, based on our previous investigation, according to the location of metadata and the scope of
> dump, we can divide it into the following four cases: A, B, C, D.
> It should be noted that although we mentioned case A&B below, we do not want these two cases to be
> part of this feature, because dumping the entire pmem will consume a lot of space, and more importantly,
> it may contain user sensitive data.
>
> +-------------+----------+------------+
> |\+--------+\     metadata location   |
> |            ++-----------------------+
> | dump scope  |  mem     |   PMEM     |
> +-------------+----------+------------+
> | entire pmem |     A    |     B      |
> +-------------+----------+------------+
> | metadata    |     C    |     D      |
> +-------------+----------+------------+
>
> ### Testing ###
> Only x86_64 are tested. Please note that we have to disable the 2nd kernel's libnvdimm to ensure the
> metadata in 2nd kernel will not be touched again.
>
> below 2 commits use sha256 to check the metadata in 1st kernel during panic and makedumpfile in 2nd kernel.
> https://github.com/zhijianli88/makedumpfile/commit/91a135be6980e6e87b9e00b909aaaf8ef9566ec0
> https://github.com/zhijianli88/linux/commit/55bef07f8f0b2e587737b796e73b92f242947e5a
>
> ### TODO ###
> Only x86 are fully supported for both kexec_load(2) and kexec_file_load(2)
> kexec_file_load(2) on other architectures are TODOs.
> ---
> [1] Pmem region layout:
>     ^<--namespace0.0---->^<--namespace0.1------>^
>     |                    |                      |
>     +--+m----------------+--+m------------------+---------------------+-+a
>     |++|e                |++|e                  |                     |+|l
>     |++|t                |++|t                  |                     |+|i
>     |++|a                |++|a                  |                     |+|g
>     |++|d  namespace0.0  |++|d  namespace0.1    |     un-allocated    |+|n
>     |++|a    fsdax       |++|a     devdax       |                     |+|m
>     |++|t                |++|t                  |                     |+|e
>     +--+a----------------+--+a------------------+---------------------+-+n
>     |                                                                   |t
>     v<-----------------------pmem region------------------------------->v
>
> [2] https://lore.kernel.org/linux-mm/70F971CF-1A96-4D87-B70C-B971C2A1747C@roc.cs.umass.edu/T/
> [3] https://lore.kernel.org/linux-mm/3c752fc2-b6a0-2975-ffec-dba3edcf4155@fujitsu.com/
>
> ### makedumpfile output in case B ####
> kdump.sh[224]: makedumpfile: version 1.7.2++ (released on 20 Oct 2022)
> kdump.sh[224]: command line: makedumpfile -l --message-level 31 -d 31 /proc/vmcore /sysroot/var/crash/127.0.0.1-2023-04-21-02:50:57//vmcore-incomplete
> kdump.sh[224]: sadump: does not have partition header
> kdump.sh[224]: sadump: read dump device as unknown format
> kdump.sh[224]: sadump: unknown format
> kdump.sh[224]:                phys_start         phys_end       virt_start         virt_end  is_pmem
> kdump.sh[224]: LOAD[ 0]          1000000          3c26000 ffffffff81000000 ffffffff83c26000    false
> kdump.sh[224]: LOAD[ 1]           100000         7f000000 ffff888000100000 ffff88807f000000    false
> kdump.sh[224]: LOAD[ 2]         bf000000         bffd7000 ffff8880bf000000 ffff8880bffd7000    false
> kdump.sh[224]: LOAD[ 3]        100000000        140000000 ffff888100000000 ffff888140000000    false
> kdump.sh[224]: LOAD[ 4]        140000000        23e200000 ffff888140000000 ffff88823e200000     true
> kdump.sh[224]: Linux kdump
> kdump.sh[224]: VMCOREINFO   :
> kdump.sh[224]:   OSRELEASE=6.3.0-rc3-pmem-bad+
> kdump.sh[224]:   BUILD-ID=0546bd82db93706799d3eea38194ac648790aa85
> kdump.sh[224]:   PAGESIZE=4096
> kdump.sh[224]: page_size    : 4096
> kdump.sh[224]:   SYMBOL(init_uts_ns)=ffffffff82671300
> kdump.sh[224]:   OFFSET(uts_namespace.name)=0
> kdump.sh[224]:   SYMBOL(node_online_map)=ffffffff826bbe08
> kdump.sh[224]:   SYMBOL(swapper_pg_dir)=ffffffff82446000
> kdump.sh[224]:   SYMBOL(_stext)=ffffffff81000000
> kdump.sh[224]:   SYMBOL(vmap_area_list)=ffffffff82585fb0
> kdump.sh[224]:   SYMBOL(devm_memmap_vmcore_head)=ffffffff825603c0
> kdump.sh[224]:   SIZE(devm_memmap_vmcore)=40
> kdump.sh[224]:   OFFSET(devm_memmap_vmcore.entry)=0
> kdump.sh[224]:   OFFSET(devm_memmap_vmcore.start)=16
> kdump.sh[224]:   OFFSET(devm_memmap_vmcore.end)=24
> kdump.sh[224]:   SYMBOL(mem_section)=ffff88813fff4000
> kdump.sh[224]:   LENGTH(mem_section)=2048
> kdump.sh[224]:   SIZE(mem_section)=16
> kdump.sh[224]:   OFFSET(mem_section.section_mem_map)=0
> ...
> kdump.sh[224]: STEP [Checking for memory holes  ] : 0.012699 seconds
> kdump.sh[224]: STEP [Excluding unnecessary pages] : 0.538059 seconds
> kdump.sh[224]: STEP [Copying data               ] : 0.995418 seconds
> kdump.sh[224]: STEP [Copying data               ] : 0.000067 seconds
> kdump.sh[224]: Writing erase info...
> kdump.sh[224]: offset_eraseinfo: 5d02266, size_eraseinfo: 0
> kdump.sh[224]: Original pages  : 0x00000000001c0cfd
> kdump.sh[224]:   Excluded pages   : 0x00000000001a58d2
> kdump.sh[224]:     Pages filled with zero  : 0x0000000000006805
> kdump.sh[224]:     Non-private cache pages : 0x0000000000019e93
> kdump.sh[224]:     Private cache pages     : 0x0000000000077572
> kdump.sh[224]:     User process data pages : 0x0000000000002c3b
> kdump.sh[224]:     Free pages              : 0x0000000000010e8d
> kdump.sh[224]:     Hwpoison pages          : 0x0000000000000000
> kdump.sh[224]:     Offline pages           : 0x0000000000000000
> kdump.sh[224]:     pmem metadata pages     : 0x0000000000000000
> kdump.sh[224]:     pmem userdata pages     : 0x00000000000fa200
> kdump.sh[224]:   Remaining pages  : 0x000000000001b42b
> kdump.sh[224]:   (The number of pages is reduced to 6%.)
> kdump.sh[224]: Memory Hole     : 0x000000000007d503
> kdump.sh[224]: --------------------------------------------------
> kdump.sh[224]: Total pages     : 0x000000000023e200
> kdump.sh[224]: Write bytes     : 97522590
> kdump.sh[224]: Cache hit: 191669, miss: 292, hit rate: 99.8%
> kdump.sh[224]: The dumpfile is saved to /sysroot/var/crash/127.0.0.1-2023-04-21-02:50:57//vmcore-incomplete.
> kdump.sh[224]: makedumpfile Completed.
>
> CC: Baoquan He <bhe@redhat.com>
> CC: Borislav Petkov <bp@alien8.de>
> CC: Dan Williams <dan.j.williams@intel.com>
> CC: Dave Hansen <dave.hansen@linux.intel.com>
> CC: Dave Jiang <dave.jiang@intel.com>
> CC: Dave Young <dyoung@redhat.com>
> CC: Eric Biederman <ebiederm@xmission.com>
> CC: "H. Peter Anvin" <hpa@zytor.com>
> CC: Ingo Molnar <mingo@redhat.com>
> CC: Ira Weiny <ira.weiny@intel.com>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Vishal Verma <vishal.l.verma@intel.com>
> CC: Vivek Goyal <vgoyal@redhat.com>
> CC: x86@kernel.org
> CC: kexec@lists.infradead.org
> CC: nvdimm@lists.linux.dev
>


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-06-25 10:28 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-02 10:26 [RFC PATCH v3 0/3] pmem memmap dump support Li Zhijian
2023-06-02 10:26 ` Li Zhijian
2023-06-02 10:26 ` [RFC PATCH v3 1/3] nvdimm: set force_raw=1 in kdump kernel Li Zhijian
2023-06-02 10:26   ` Li Zhijian
2023-06-02 10:26 ` [RFC PATCH v3 2/3] x86/crash: Add pmem region into PT_LOADs of vmcore Li Zhijian
2023-06-02 10:26   ` Li Zhijian
2023-06-02 10:26 ` [RFC PATCH v3 3/3] kernel/kexec_file: Mark pmem region with new flag PF_DEV Li Zhijian
2023-06-02 10:26   ` Li Zhijian
2023-06-02 10:26 ` [RFC PATCH kexec-tools v3 1/1] kexec: Add and mark pmem region into PT_LOADs Li Zhijian
2023-06-02 10:26   ` Li Zhijian
2023-06-02 10:26 ` [RFC PATCH makedumpfile v3 1/3] elf_info.c: Introduce is_pmem_pt_load_range Li Zhijian
2023-06-02 10:26   ` Li Zhijian
2023-06-02 10:26 ` [RFC PATCH makedumpfile v3 2/3] makedumpfile.c: Exclude all pmem pages Li Zhijian
2023-06-02 10:26   ` Li Zhijian
2023-06-02 10:26 ` [RFC PATCH makedumpfile v3 3/3] makedumpfile: get metadata boundaries from pmem's infoblock Li Zhijian
2023-06-02 10:26   ` Li Zhijian
2023-06-04 12:59 ` [RFC PATCH v3 0/3] pmem memmap dump support Baoquan He
2023-06-04 12:59   ` Baoquan He
2023-06-09  1:21   ` Zhijian Li (Fujitsu)
2023-06-09  1:21     ` Zhijian Li (Fujitsu)
2023-06-25 10:27 ` Li, Zhijian
2023-06-25 10:27   ` Li, Zhijian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.