All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] Add writing support to vmcore for reusing oldmem
@ 2020-09-09  7:50 ` Kairui Song
  0 siblings, 0 replies; 17+ messages in thread
From: Kairui Song @ 2020-09-09  7:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Young, Baoquan He, Vivek Goyal, Alexey Dobriyan,
	Eric Biederman, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	kexec, Kairui Song

Currently vmcore only supports reading, this patch series is an RFC
to add writing support to vmcore. It's x86_64 only yet, I'll add other
architecture later if there is no problem with this idea.

My purpose of adding writing support is to reuse the crashed kernel's
old memory in kdump kernel, reduce kdump memory pressure, and
allow kdump to run with a smaller crashkernel reservation.

This is doable because in most cases, after kernel panic, user only
interested in the crashed kernel itself, and userspace/cache/free
memory pages are not dumped. `makedumpfile` is widely used to skip
these pages. Kernel pages usually only take a small part of
the whole old memory. So there will be many reusable pages.

By adding writing support, userspace then can use these pages as a fast
and temporary storage. This helps reduce memory pressure in many ways.

For example, I've written a POC program based on this, it will find
the reusable pages, and creates an NBD device which maps to these pages.
The NBD device can then be used as swap, or to hold some temp files
which previouly live in RAM.

The link of the POC tool: https://github.com/ryncsn/kdumpd

I tested it on x86_64 on latest Fedora by using it as swap with
following step in kdump kernel:

  1. Install this tool in kdump initramfs
  2. Execute following command in kdump:
     /sbin/modprobe nbd nbds_max=1
     /bin/kdumpd &
     /sbin/mkswap /dev/nbd0
     /sbin/swapon /dev/nbd0
  3. Observe the swap is being used:
     SwapTotal:        131068 kB
     SwapFree:         121852 kB

It helped to reduce the crashkernel from 168M to 110M for a successful
kdump run over NFSv3. There are still many workitems that could be done
based on this idea, eg. move the initramfs content to the old memory,
which may help reduce another ~10-20M of memory.

It's have been a long time issue that kdump suffers from OOM issue
with limited crashkernel memory. So reusing old memory could be very
helpful.

This method have it's limitation:
- Swap only works for userspace. But kdump userspace is a major memory
  consumer, so in general this should be helpful enough.
- For users who want to dump the whole memory area, this won't help as
  there is no reusable page.

I've tried other ways to improve the crashkernel value, eg.
- Reserve some smaller memory segments in first kernel for crashkernel: It's
  only a suppliment of the default crashkernel reservation and only make
  crashkernel value more adjustable, still not solving the real problem.

- Reuse old memory, but hotplug chunk of reusable old memory into
  kdump kernel's memory:
  It's hard to find large chunk of continuous memory, especially on
  systems with heavy workload, the reusable regions could be very
  fragmental. So it can only hotplug small fragments of memories,
  which looks hackish, and may have a high page table overhead.

- Implement the old memory based based block device as a kernel
  module. It doesn't looks good to have a module for this sole
  usage and it don't have much performance/implementation advantage
  compared to this RFC.

Besides, keeping all the complex logic of parsing reusing old memory
logic in userspace seems a better idea.

And as a plus, this could make it more doable and reasonable to
have n crashkernel=auto param. If there is a swap, then userspace
will have less memory pressure. crashkernel=auto can focus on the
kernel usage.

Kairui Song (3):
  vmcore: simplify read_from_olemem
  vmcore: Add interface to write to old mem
  x86_64: implement copy_to_oldmem_page

 arch/x86/kernel/crash_dump_64.c |  49 ++++++++--
 fs/proc/vmcore.c                | 154 ++++++++++++++++++++++++++------
 include/linux/crash_dump.h      |  18 +++-
 3 files changed, 180 insertions(+), 41 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 0/3] Add writing support to vmcore for reusing oldmem
@ 2020-09-09  7:50 ` Kairui Song
  0 siblings, 0 replies; 17+ messages in thread
From: Kairui Song @ 2020-09-09  7:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kairui Song, Baoquan He, kexec, Ingo Molnar, Borislav Petkov,
	Eric Biederman, Thomas Gleixner, Dave Young, Alexey Dobriyan,
	Vivek Goyal

Currently vmcore only supports reading, this patch series is an RFC
to add writing support to vmcore. It's x86_64 only yet, I'll add other
architecture later if there is no problem with this idea.

My purpose of adding writing support is to reuse the crashed kernel's
old memory in kdump kernel, reduce kdump memory pressure, and
allow kdump to run with a smaller crashkernel reservation.

This is doable because in most cases, after kernel panic, user only
interested in the crashed kernel itself, and userspace/cache/free
memory pages are not dumped. `makedumpfile` is widely used to skip
these pages. Kernel pages usually only take a small part of
the whole old memory. So there will be many reusable pages.

By adding writing support, userspace then can use these pages as a fast
and temporary storage. This helps reduce memory pressure in many ways.

For example, I've written a POC program based on this, it will find
the reusable pages, and creates an NBD device which maps to these pages.
The NBD device can then be used as swap, or to hold some temp files
which previouly live in RAM.

The link of the POC tool: https://github.com/ryncsn/kdumpd

I tested it on x86_64 on latest Fedora by using it as swap with
following step in kdump kernel:

  1. Install this tool in kdump initramfs
  2. Execute following command in kdump:
     /sbin/modprobe nbd nbds_max=1
     /bin/kdumpd &
     /sbin/mkswap /dev/nbd0
     /sbin/swapon /dev/nbd0
  3. Observe the swap is being used:
     SwapTotal:        131068 kB
     SwapFree:         121852 kB

It helped to reduce the crashkernel from 168M to 110M for a successful
kdump run over NFSv3. There are still many workitems that could be done
based on this idea, eg. move the initramfs content to the old memory,
which may help reduce another ~10-20M of memory.

It's have been a long time issue that kdump suffers from OOM issue
with limited crashkernel memory. So reusing old memory could be very
helpful.

This method have it's limitation:
- Swap only works for userspace. But kdump userspace is a major memory
  consumer, so in general this should be helpful enough.
- For users who want to dump the whole memory area, this won't help as
  there is no reusable page.

I've tried other ways to improve the crashkernel value, eg.
- Reserve some smaller memory segments in first kernel for crashkernel: It's
  only a suppliment of the default crashkernel reservation and only make
  crashkernel value more adjustable, still not solving the real problem.

- Reuse old memory, but hotplug chunk of reusable old memory into
  kdump kernel's memory:
  It's hard to find large chunk of continuous memory, especially on
  systems with heavy workload, the reusable regions could be very
  fragmental. So it can only hotplug small fragments of memories,
  which looks hackish, and may have a high page table overhead.

- Implement the old memory based based block device as a kernel
  module. It doesn't looks good to have a module for this sole
  usage and it don't have much performance/implementation advantage
  compared to this RFC.

Besides, keeping all the complex logic of parsing reusing old memory
logic in userspace seems a better idea.

And as a plus, this could make it more doable and reasonable to
have n crashkernel=auto param. If there is a swap, then userspace
will have less memory pressure. crashkernel=auto can focus on the
kernel usage.

Kairui Song (3):
  vmcore: simplify read_from_olemem
  vmcore: Add interface to write to old mem
  x86_64: implement copy_to_oldmem_page

 arch/x86/kernel/crash_dump_64.c |  49 ++++++++--
 fs/proc/vmcore.c                | 154 ++++++++++++++++++++++++++------
 include/linux/crash_dump.h      |  18 +++-
 3 files changed, 180 insertions(+), 41 deletions(-)

-- 
2.26.2


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 1/3] vmcore: simplify read_from_olemem
  2020-09-09  7:50 ` Kairui Song
@ 2020-09-09  7:50   ` Kairui Song
  -1 siblings, 0 replies; 17+ messages in thread
From: Kairui Song @ 2020-09-09  7:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Young, Baoquan He, Vivek Goyal, Alexey Dobriyan,
	Eric Biederman, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	kexec, Kairui Song

Simplify the code logic, also helps reduce object size and stack usage.

Stack usage:
  Before: fs/proc/vmcore.c:106:9:read_from_oldmem.part.0  80     static
          fs/proc/vmcore.c:106:9:read_from_oldmem         16     static
  After:  fs/proc/vmcore.c:106:9:read_from_oldmem         80     static

Size of vmcore.o:
          text    data     bss     dec     hex filename
  Before: 7677     109      88    7874    1ec2 fs/proc/vmcore.o
  After:  7669     109      88    7866    1eba fs/proc/vmcore.o

Signed-off-by: Kairui Song <kasong@redhat.com>
---
 fs/proc/vmcore.c | 27 ++++++++++-----------------
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index c3a345c28a93..124c2066f3e5 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -108,25 +108,19 @@ ssize_t read_from_oldmem(char *buf, size_t count,
 			 bool encrypted)
 {
 	unsigned long pfn, offset;
-	size_t nr_bytes;
-	ssize_t read = 0, tmp;
+	size_t nr_bytes, to_copy = count;
+	ssize_t tmp;
 
-	if (!count)
-		return 0;
-
-	offset = (unsigned long)(*ppos % PAGE_SIZE);
+	offset = (unsigned long)(*ppos & (PAGE_SIZE - 1));
 	pfn = (unsigned long)(*ppos / PAGE_SIZE);
 
-	do {
-		if (count > (PAGE_SIZE - offset))
-			nr_bytes = PAGE_SIZE - offset;
-		else
-			nr_bytes = count;
+	while (to_copy) {
+		nr_bytes = min(to_copy, PAGE_SIZE - offset);
 
 		/* If pfn is not ram, return zeros for sparse dump files */
-		if (pfn_is_ram(pfn) == 0)
+		if (pfn_is_ram(pfn) == 0) {
 			memset(buf, 0, nr_bytes);
-		else {
+		} else {
 			if (encrypted)
 				tmp = copy_oldmem_page_encrypted(pfn, buf,
 								 nr_bytes,
@@ -140,14 +134,13 @@ ssize_t read_from_oldmem(char *buf, size_t count,
 				return tmp;
 		}
 		*ppos += nr_bytes;
-		count -= nr_bytes;
 		buf += nr_bytes;
-		read += nr_bytes;
+		to_copy -= nr_bytes;
 		++pfn;
 		offset = 0;
-	} while (count);
+	}
 
-	return read;
+	return count;
 }
 
 /*
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 1/3] vmcore: simplify read_from_olemem
@ 2020-09-09  7:50   ` Kairui Song
  0 siblings, 0 replies; 17+ messages in thread
From: Kairui Song @ 2020-09-09  7:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kairui Song, Baoquan He, kexec, Ingo Molnar, Borislav Petkov,
	Eric Biederman, Thomas Gleixner, Dave Young, Alexey Dobriyan,
	Vivek Goyal

Simplify the code logic, also helps reduce object size and stack usage.

Stack usage:
  Before: fs/proc/vmcore.c:106:9:read_from_oldmem.part.0  80     static
          fs/proc/vmcore.c:106:9:read_from_oldmem         16     static
  After:  fs/proc/vmcore.c:106:9:read_from_oldmem         80     static

Size of vmcore.o:
          text    data     bss     dec     hex filename
  Before: 7677     109      88    7874    1ec2 fs/proc/vmcore.o
  After:  7669     109      88    7866    1eba fs/proc/vmcore.o

Signed-off-by: Kairui Song <kasong@redhat.com>
---
 fs/proc/vmcore.c | 27 ++++++++++-----------------
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index c3a345c28a93..124c2066f3e5 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -108,25 +108,19 @@ ssize_t read_from_oldmem(char *buf, size_t count,
 			 bool encrypted)
 {
 	unsigned long pfn, offset;
-	size_t nr_bytes;
-	ssize_t read = 0, tmp;
+	size_t nr_bytes, to_copy = count;
+	ssize_t tmp;
 
-	if (!count)
-		return 0;
-
-	offset = (unsigned long)(*ppos % PAGE_SIZE);
+	offset = (unsigned long)(*ppos & (PAGE_SIZE - 1));
 	pfn = (unsigned long)(*ppos / PAGE_SIZE);
 
-	do {
-		if (count > (PAGE_SIZE - offset))
-			nr_bytes = PAGE_SIZE - offset;
-		else
-			nr_bytes = count;
+	while (to_copy) {
+		nr_bytes = min(to_copy, PAGE_SIZE - offset);
 
 		/* If pfn is not ram, return zeros for sparse dump files */
-		if (pfn_is_ram(pfn) == 0)
+		if (pfn_is_ram(pfn) == 0) {
 			memset(buf, 0, nr_bytes);
-		else {
+		} else {
 			if (encrypted)
 				tmp = copy_oldmem_page_encrypted(pfn, buf,
 								 nr_bytes,
@@ -140,14 +134,13 @@ ssize_t read_from_oldmem(char *buf, size_t count,
 				return tmp;
 		}
 		*ppos += nr_bytes;
-		count -= nr_bytes;
 		buf += nr_bytes;
-		read += nr_bytes;
+		to_copy -= nr_bytes;
 		++pfn;
 		offset = 0;
-	} while (count);
+	}
 
-	return read;
+	return count;
 }
 
 /*
-- 
2.26.2


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 2/3] vmcore: Add interface to write to old mem
  2020-09-09  7:50 ` Kairui Song
@ 2020-09-09  7:50   ` Kairui Song
  -1 siblings, 0 replies; 17+ messages in thread
From: Kairui Song @ 2020-09-09  7:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Young, Baoquan He, Vivek Goyal, Alexey Dobriyan,
	Eric Biederman, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	kexec, Kairui Song

vmcore is used as the interface to access crashed kernel's memory in
kdump, and currently vmcore only supports reading.

Adding writing support is useful for enabling userspace making better
use of the old memory.

For kdump, `makedumpfile` is widely used to reduce the dumped vmcore
size, and in most setup, it will drop user space memory, caches. This
means these memory pages are reusable.

Kdump runs in limited pre-reserved memory region, so if these old memory
pages are reused, it can help reduce memory pressure in kdump kernel,
hence allow first kernel to reserve less memory for kdump.

Adding write support to vmcore is the first step, then user space can
do IO on the old mem. There are multiple ways to reuse the memory, for
example, userspace can register a NBD device, and redirect the IO on the
device to old memory. The NBD device can be used as swap, or used to
hold some temp files.

Signed-off-by: Kairui Song <kasong@redhat.com>
---
 fs/proc/vmcore.c           | 129 +++++++++++++++++++++++++++++++++----
 include/linux/crash_dump.h |  18 ++++--
 2 files changed, 131 insertions(+), 16 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 124c2066f3e5..23acc0f2ecd7 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -103,9 +103,9 @@ static int pfn_is_ram(unsigned long pfn)
 }
 
 /* Reads a page from the oldmem device from given offset. */
-ssize_t read_from_oldmem(char *buf, size_t count,
-			 u64 *ppos, int userbuf,
-			 bool encrypted)
+static ssize_t oldmem_rw_page(char *buf, size_t count,
+			      u64 *ppos, int userbuf,
+			      bool encrypted, bool is_write)
 {
 	unsigned long pfn, offset;
 	size_t nr_bytes, to_copy = count;
@@ -119,20 +119,33 @@ ssize_t read_from_oldmem(char *buf, size_t count,
 
 		/* If pfn is not ram, return zeros for sparse dump files */
 		if (pfn_is_ram(pfn) == 0) {
-			memset(buf, 0, nr_bytes);
-		} else {
-			if (encrypted)
-				tmp = copy_oldmem_page_encrypted(pfn, buf,
-								 nr_bytes,
-								 offset,
-								 userbuf);
+			if (is_write)
+				return -EINVAL;
 			else
-				tmp = copy_oldmem_page(pfn, buf, nr_bytes,
-						       offset, userbuf);
+				memset(buf, 0, nr_bytes);
+		} else {
+			if (encrypted) {
+				tmp = is_write ?
+					copy_to_oldmem_page_encrypted(pfn, buf,
+								      nr_bytes,
+								      offset,
+								      userbuf) :
+					copy_oldmem_page_encrypted(pfn, buf,
+								   nr_bytes,
+								   offset,
+								   userbuf);
+			} else {
+				tmp = is_write ?
+					copy_to_oldmem_page(pfn, buf, nr_bytes,
+							    offset, userbuf) :
+					copy_oldmem_page(pfn, buf, nr_bytes,
+							offset, userbuf);
+			}
 
 			if (tmp < 0)
 				return tmp;
 		}
+
 		*ppos += nr_bytes;
 		buf += nr_bytes;
 		to_copy -= nr_bytes;
@@ -143,6 +156,22 @@ ssize_t read_from_oldmem(char *buf, size_t count,
 	return count;
 }
 
+/* Reads a page from the oldmem device from given offset. */
+ssize_t read_from_oldmem(char *buf, size_t count,
+			 u64 *ppos, int userbuf,
+			 bool encrypted)
+{
+	return oldmem_rw_page(buf, count, ppos, userbuf, encrypted, 0);
+}
+
+/* Writes a page to the oldmem device of given offset. */
+ssize_t write_to_oldmem(char *buf, size_t count,
+			u64 *ppos, int userbuf,
+			bool encrypted)
+{
+	return oldmem_rw_page(buf, count, ppos, userbuf, encrypted, 1);
+}
+
 /*
  * Architectures may override this function to allocate ELF header in 2nd kernel
  */
@@ -184,6 +213,26 @@ int __weak remap_oldmem_pfn_range(struct vm_area_struct *vma,
 	return remap_pfn_range(vma, from, pfn, size, prot);
 }
 
+/*
+ * Architectures which support writing to oldmem overrides this.
+ */
+ssize_t __weak
+copy_to_oldmem_page(unsigned long pfn, char *buf, size_t csize,
+			   unsigned long offset, int userbuf)
+{
+	return -EOPNOTSUPP;
+}
+
+/*
+ * Architectures which support memory encryption override this.
+ */
+ssize_t __weak
+copy_to_oldmem_page_encrypted(unsigned long pfn, char *buf, size_t csize,
+			   unsigned long offset, int userbuf)
+{
+	return copy_to_oldmem_page(pfn, buf, csize, offset, userbuf);
+}
+
 /*
  * Architectures which support memory encryption override this.
  */
@@ -394,6 +443,61 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
 	return __read_vmcore((__force char *) buffer, buflen, fpos, 1);
 }
 
+/*
+ * Write to the old memory region, overriding ELF headers is not allowed.
+ * On error, negative value is returned otherwise number of bytes wrote
+ * are returned.
+ */
+static ssize_t __write_vmcore(char *buffer, size_t buflen, loff_t *fpos,
+			     int userbuf)
+{
+	ssize_t acc = 0, tmp;
+	size_t tsz;
+	u64 start;
+	struct vmcore *m = NULL;
+
+	if (buflen == 0 || *fpos >= vmcore_size)
+		return 0;
+
+	/* trim buflen to not go beyond EOF */
+	if (buflen > vmcore_size - *fpos)
+		buflen = vmcore_size - *fpos;
+
+	/* Deny writing to ELF headers */
+	if (*fpos < elfcorebuf_sz + elfnotes_sz)
+		return -EPERM;
+
+	list_for_each_entry(m, &vmcore_list, list) {
+		if (*fpos < m->offset + m->size) {
+			tsz = (size_t)min_t(unsigned long long,
+					    m->offset + m->size - *fpos,
+					    buflen);
+			start = m->paddr + *fpos - m->offset;
+			tmp = write_to_oldmem(buffer, tsz, &start,
+					      userbuf, mem_encrypt_active());
+			if (tmp < 0)
+				return tmp;
+			buflen -= tsz;
+			*fpos += tsz;
+			buffer += tsz;
+			acc += tsz;
+
+			/* leave now if filled buffer already */
+			if (buflen == 0)
+				return acc;
+		}
+	}
+
+	return acc;
+}
+
+
+static ssize_t write_vmcore(struct file *file, const char __user *buffer,
+			   size_t buflen, loff_t *fpos)
+{
+	return __write_vmcore((__force char *) buffer, buflen, fpos, 1);
+}
+
 /*
  * The vmcore fault handler uses the page cache and fills data using the
  * standard __vmcore_read() function.
@@ -662,6 +766,7 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
 
 static const struct proc_ops vmcore_proc_ops = {
 	.proc_read	= read_vmcore,
+	.proc_write	= write_vmcore,
 	.proc_lseek	= default_llseek,
 	.proc_mmap	= mmap_vmcore,
 };
diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index a5192b718dbe..8d8e75c08073 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -26,10 +26,15 @@ extern int remap_oldmem_pfn_range(struct vm_area_struct *vma,
 				  unsigned long size, pgprot_t prot);
 
 extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
-						unsigned long, int);
+				unsigned long, int);
 extern ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
 					  size_t csize, unsigned long offset,
 					  int userbuf);
+extern ssize_t copy_to_oldmem_page(unsigned long pfn, char *buf, size_t csize,
+				   unsigned long offset, int userbuf);
+extern ssize_t copy_to_oldmem_page_encrypted(unsigned long pfn, char *buf,
+					     size_t csize, unsigned long offset,
+					     int userbuf);
 
 void vmcore_cleanup(void);
 
@@ -119,10 +124,15 @@ static inline int vmcore_add_device_dump(struct vmcoredd_data *data)
 ssize_t read_from_oldmem(char *buf, size_t count,
 			 u64 *ppos, int userbuf,
 			 bool encrypted);
+ssize_t write_to_oldmem(char *buf, size_t count,
+			u64 *ppos, int userbuf,
+			bool encrypted);
 #else
-static inline ssize_t read_from_oldmem(char *buf, size_t count,
-				       u64 *ppos, int userbuf,
-				       bool encrypted)
+static inline ssize_t read_from_oldmem(char*, size_t, u64*, int, bool)
+{
+	return -EOPNOTSUPP;
+}
+static inline ssize_t write_to_oldmem(char*, size_t, u64*, int, bool);
 {
 	return -EOPNOTSUPP;
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 2/3] vmcore: Add interface to write to old mem
@ 2020-09-09  7:50   ` Kairui Song
  0 siblings, 0 replies; 17+ messages in thread
From: Kairui Song @ 2020-09-09  7:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kairui Song, Baoquan He, kexec, Ingo Molnar, Borislav Petkov,
	Eric Biederman, Thomas Gleixner, Dave Young, Alexey Dobriyan,
	Vivek Goyal

vmcore is used as the interface to access crashed kernel's memory in
kdump, and currently vmcore only supports reading.

Adding writing support is useful for enabling userspace making better
use of the old memory.

For kdump, `makedumpfile` is widely used to reduce the dumped vmcore
size, and in most setup, it will drop user space memory, caches. This
means these memory pages are reusable.

Kdump runs in limited pre-reserved memory region, so if these old memory
pages are reused, it can help reduce memory pressure in kdump kernel,
hence allow first kernel to reserve less memory for kdump.

Adding write support to vmcore is the first step, then user space can
do IO on the old mem. There are multiple ways to reuse the memory, for
example, userspace can register a NBD device, and redirect the IO on the
device to old memory. The NBD device can be used as swap, or used to
hold some temp files.

Signed-off-by: Kairui Song <kasong@redhat.com>
---
 fs/proc/vmcore.c           | 129 +++++++++++++++++++++++++++++++++----
 include/linux/crash_dump.h |  18 ++++--
 2 files changed, 131 insertions(+), 16 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 124c2066f3e5..23acc0f2ecd7 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -103,9 +103,9 @@ static int pfn_is_ram(unsigned long pfn)
 }
 
 /* Reads a page from the oldmem device from given offset. */
-ssize_t read_from_oldmem(char *buf, size_t count,
-			 u64 *ppos, int userbuf,
-			 bool encrypted)
+static ssize_t oldmem_rw_page(char *buf, size_t count,
+			      u64 *ppos, int userbuf,
+			      bool encrypted, bool is_write)
 {
 	unsigned long pfn, offset;
 	size_t nr_bytes, to_copy = count;
@@ -119,20 +119,33 @@ ssize_t read_from_oldmem(char *buf, size_t count,
 
 		/* If pfn is not ram, return zeros for sparse dump files */
 		if (pfn_is_ram(pfn) == 0) {
-			memset(buf, 0, nr_bytes);
-		} else {
-			if (encrypted)
-				tmp = copy_oldmem_page_encrypted(pfn, buf,
-								 nr_bytes,
-								 offset,
-								 userbuf);
+			if (is_write)
+				return -EINVAL;
 			else
-				tmp = copy_oldmem_page(pfn, buf, nr_bytes,
-						       offset, userbuf);
+				memset(buf, 0, nr_bytes);
+		} else {
+			if (encrypted) {
+				tmp = is_write ?
+					copy_to_oldmem_page_encrypted(pfn, buf,
+								      nr_bytes,
+								      offset,
+								      userbuf) :
+					copy_oldmem_page_encrypted(pfn, buf,
+								   nr_bytes,
+								   offset,
+								   userbuf);
+			} else {
+				tmp = is_write ?
+					copy_to_oldmem_page(pfn, buf, nr_bytes,
+							    offset, userbuf) :
+					copy_oldmem_page(pfn, buf, nr_bytes,
+							offset, userbuf);
+			}
 
 			if (tmp < 0)
 				return tmp;
 		}
+
 		*ppos += nr_bytes;
 		buf += nr_bytes;
 		to_copy -= nr_bytes;
@@ -143,6 +156,22 @@ ssize_t read_from_oldmem(char *buf, size_t count,
 	return count;
 }
 
+/* Reads a page from the oldmem device from given offset. */
+ssize_t read_from_oldmem(char *buf, size_t count,
+			 u64 *ppos, int userbuf,
+			 bool encrypted)
+{
+	return oldmem_rw_page(buf, count, ppos, userbuf, encrypted, 0);
+}
+
+/* Writes a page to the oldmem device of given offset. */
+ssize_t write_to_oldmem(char *buf, size_t count,
+			u64 *ppos, int userbuf,
+			bool encrypted)
+{
+	return oldmem_rw_page(buf, count, ppos, userbuf, encrypted, 1);
+}
+
 /*
  * Architectures may override this function to allocate ELF header in 2nd kernel
  */
@@ -184,6 +213,26 @@ int __weak remap_oldmem_pfn_range(struct vm_area_struct *vma,
 	return remap_pfn_range(vma, from, pfn, size, prot);
 }
 
+/*
+ * Architectures which support writing to oldmem overrides this.
+ */
+ssize_t __weak
+copy_to_oldmem_page(unsigned long pfn, char *buf, size_t csize,
+			   unsigned long offset, int userbuf)
+{
+	return -EOPNOTSUPP;
+}
+
+/*
+ * Architectures which support memory encryption override this.
+ */
+ssize_t __weak
+copy_to_oldmem_page_encrypted(unsigned long pfn, char *buf, size_t csize,
+			   unsigned long offset, int userbuf)
+{
+	return copy_to_oldmem_page(pfn, buf, csize, offset, userbuf);
+}
+
 /*
  * Architectures which support memory encryption override this.
  */
@@ -394,6 +443,61 @@ static ssize_t read_vmcore(struct file *file, char __user *buffer,
 	return __read_vmcore((__force char *) buffer, buflen, fpos, 1);
 }
 
+/*
+ * Write to the old memory region, overriding ELF headers is not allowed.
+ * On error, negative value is returned otherwise number of bytes wrote
+ * are returned.
+ */
+static ssize_t __write_vmcore(char *buffer, size_t buflen, loff_t *fpos,
+			     int userbuf)
+{
+	ssize_t acc = 0, tmp;
+	size_t tsz;
+	u64 start;
+	struct vmcore *m = NULL;
+
+	if (buflen == 0 || *fpos >= vmcore_size)
+		return 0;
+
+	/* trim buflen to not go beyond EOF */
+	if (buflen > vmcore_size - *fpos)
+		buflen = vmcore_size - *fpos;
+
+	/* Deny writing to ELF headers */
+	if (*fpos < elfcorebuf_sz + elfnotes_sz)
+		return -EPERM;
+
+	list_for_each_entry(m, &vmcore_list, list) {
+		if (*fpos < m->offset + m->size) {
+			tsz = (size_t)min_t(unsigned long long,
+					    m->offset + m->size - *fpos,
+					    buflen);
+			start = m->paddr + *fpos - m->offset;
+			tmp = write_to_oldmem(buffer, tsz, &start,
+					      userbuf, mem_encrypt_active());
+			if (tmp < 0)
+				return tmp;
+			buflen -= tsz;
+			*fpos += tsz;
+			buffer += tsz;
+			acc += tsz;
+
+			/* leave now if filled buffer already */
+			if (buflen == 0)
+				return acc;
+		}
+	}
+
+	return acc;
+}
+
+
+static ssize_t write_vmcore(struct file *file, const char __user *buffer,
+			   size_t buflen, loff_t *fpos)
+{
+	return __write_vmcore((__force char *) buffer, buflen, fpos, 1);
+}
+
 /*
  * The vmcore fault handler uses the page cache and fills data using the
  * standard __vmcore_read() function.
@@ -662,6 +766,7 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
 
 static const struct proc_ops vmcore_proc_ops = {
 	.proc_read	= read_vmcore,
+	.proc_write	= write_vmcore,
 	.proc_lseek	= default_llseek,
 	.proc_mmap	= mmap_vmcore,
 };
diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index a5192b718dbe..8d8e75c08073 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -26,10 +26,15 @@ extern int remap_oldmem_pfn_range(struct vm_area_struct *vma,
 				  unsigned long size, pgprot_t prot);
 
 extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
-						unsigned long, int);
+				unsigned long, int);
 extern ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
 					  size_t csize, unsigned long offset,
 					  int userbuf);
+extern ssize_t copy_to_oldmem_page(unsigned long pfn, char *buf, size_t csize,
+				   unsigned long offset, int userbuf);
+extern ssize_t copy_to_oldmem_page_encrypted(unsigned long pfn, char *buf,
+					     size_t csize, unsigned long offset,
+					     int userbuf);
 
 void vmcore_cleanup(void);
 
@@ -119,10 +124,15 @@ static inline int vmcore_add_device_dump(struct vmcoredd_data *data)
 ssize_t read_from_oldmem(char *buf, size_t count,
 			 u64 *ppos, int userbuf,
 			 bool encrypted);
+ssize_t write_to_oldmem(char *buf, size_t count,
+			u64 *ppos, int userbuf,
+			bool encrypted);
 #else
-static inline ssize_t read_from_oldmem(char *buf, size_t count,
-				       u64 *ppos, int userbuf,
-				       bool encrypted)
+static inline ssize_t read_from_oldmem(char*, size_t, u64*, int, bool)
+{
+	return -EOPNOTSUPP;
+}
+static inline ssize_t write_to_oldmem(char*, size_t, u64*, int, bool);
 {
 	return -EOPNOTSUPP;
 }
-- 
2.26.2


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 3/3] x86_64: implement copy_to_oldmem_page
  2020-09-09  7:50 ` Kairui Song
@ 2020-09-09  7:50   ` Kairui Song
  -1 siblings, 0 replies; 17+ messages in thread
From: Kairui Song @ 2020-09-09  7:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Young, Baoquan He, Vivek Goyal, Alexey Dobriyan,
	Eric Biederman, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	kexec, Kairui Song

Previous commit introduced writing support for vmcore, it requires
per-architecture implementation for the writing function.

Signed-off-by: Kairui Song <kasong@redhat.com>
---
 arch/x86/kernel/crash_dump_64.c | 49 +++++++++++++++++++++++++++------
 1 file changed, 40 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c
index 045e82e8945b..ec80da75b287 100644
--- a/arch/x86/kernel/crash_dump_64.c
+++ b/arch/x86/kernel/crash_dump_64.c
@@ -13,7 +13,7 @@
 
 static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize,
 				  unsigned long offset, int userbuf,
-				  bool encrypted)
+				  bool encrypted, bool is_write)
 {
 	void  *vaddr;
 
@@ -28,13 +28,25 @@ static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize,
 	if (!vaddr)
 		return -ENOMEM;
 
-	if (userbuf) {
-		if (copy_to_user((void __user *)buf, vaddr + offset, csize)) {
-			iounmap((void __iomem *)vaddr);
-			return -EFAULT;
+	if (is_write) {
+		if (userbuf) {
+			if (copy_from_user(vaddr + offset, (void __user *)buf, csize)) {
+				iounmap((void __iomem *)vaddr);
+				return -EFAULT;
+			}
+		} else {
+			memcpy(vaddr + offset, buf, csize);
 		}
-	} else
-		memcpy(buf, vaddr + offset, csize);
+	} else {
+		if (userbuf) {
+			if (copy_to_user((void __user *)buf, vaddr + offset, csize)) {
+				iounmap((void __iomem *)vaddr);
+				return -EFAULT;
+			}
+		} else {
+			memcpy(buf, vaddr + offset, csize);
+		}
+	}
 
 	set_iounmap_nonlazy();
 	iounmap((void __iomem *)vaddr);
@@ -57,7 +69,7 @@ static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize,
 ssize_t copy_oldmem_page(unsigned long pfn, char *buf, size_t csize,
 			 unsigned long offset, int userbuf)
 {
-	return __copy_oldmem_page(pfn, buf, csize, offset, userbuf, false);
+	return __copy_oldmem_page(pfn, buf, csize, offset, userbuf, false, false);
 }
 
 /**
@@ -68,7 +80,26 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf, size_t csize,
 ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf, size_t csize,
 				   unsigned long offset, int userbuf)
 {
-	return __copy_oldmem_page(pfn, buf, csize, offset, userbuf, true);
+	return __copy_oldmem_page(pfn, buf, csize, offset, userbuf, true, false);
+}
+
+/**
+ * copy_to_oldmem_page - similar to copy_oldmem_page but in opposite direction.
+ */
+ssize_t copy_to_oldmem_page(unsigned long pfn, char *src, size_t csize,
+		unsigned long offset, int userbuf)
+{
+	return __copy_oldmem_page(pfn, src, csize, offset, userbuf, false, true);
+}
+
+/**
+ * copy_to_oldmem_page_encrypted - similar to copy_oldmem_page_encrypted but
+ * in opposite direction.
+ */
+ssize_t copy_to_oldmem_page_encrypted(unsigned long pfn, char *src, size_t csize,
+		unsigned long offset, int userbuf)
+{
+	return __copy_oldmem_page(pfn, src, csize, offset, userbuf, true, true);
 }
 
 ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 3/3] x86_64: implement copy_to_oldmem_page
@ 2020-09-09  7:50   ` Kairui Song
  0 siblings, 0 replies; 17+ messages in thread
From: Kairui Song @ 2020-09-09  7:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kairui Song, Baoquan He, kexec, Ingo Molnar, Borislav Petkov,
	Eric Biederman, Thomas Gleixner, Dave Young, Alexey Dobriyan,
	Vivek Goyal

Previous commit introduced writing support for vmcore, it requires
per-architecture implementation for the writing function.

Signed-off-by: Kairui Song <kasong@redhat.com>
---
 arch/x86/kernel/crash_dump_64.c | 49 +++++++++++++++++++++++++++------
 1 file changed, 40 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c
index 045e82e8945b..ec80da75b287 100644
--- a/arch/x86/kernel/crash_dump_64.c
+++ b/arch/x86/kernel/crash_dump_64.c
@@ -13,7 +13,7 @@
 
 static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize,
 				  unsigned long offset, int userbuf,
-				  bool encrypted)
+				  bool encrypted, bool is_write)
 {
 	void  *vaddr;
 
@@ -28,13 +28,25 @@ static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize,
 	if (!vaddr)
 		return -ENOMEM;
 
-	if (userbuf) {
-		if (copy_to_user((void __user *)buf, vaddr + offset, csize)) {
-			iounmap((void __iomem *)vaddr);
-			return -EFAULT;
+	if (is_write) {
+		if (userbuf) {
+			if (copy_from_user(vaddr + offset, (void __user *)buf, csize)) {
+				iounmap((void __iomem *)vaddr);
+				return -EFAULT;
+			}
+		} else {
+			memcpy(vaddr + offset, buf, csize);
 		}
-	} else
-		memcpy(buf, vaddr + offset, csize);
+	} else {
+		if (userbuf) {
+			if (copy_to_user((void __user *)buf, vaddr + offset, csize)) {
+				iounmap((void __iomem *)vaddr);
+				return -EFAULT;
+			}
+		} else {
+			memcpy(buf, vaddr + offset, csize);
+		}
+	}
 
 	set_iounmap_nonlazy();
 	iounmap((void __iomem *)vaddr);
@@ -57,7 +69,7 @@ static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize,
 ssize_t copy_oldmem_page(unsigned long pfn, char *buf, size_t csize,
 			 unsigned long offset, int userbuf)
 {
-	return __copy_oldmem_page(pfn, buf, csize, offset, userbuf, false);
+	return __copy_oldmem_page(pfn, buf, csize, offset, userbuf, false, false);
 }
 
 /**
@@ -68,7 +80,26 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf, size_t csize,
 ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf, size_t csize,
 				   unsigned long offset, int userbuf)
 {
-	return __copy_oldmem_page(pfn, buf, csize, offset, userbuf, true);
+	return __copy_oldmem_page(pfn, buf, csize, offset, userbuf, true, false);
+}
+
+/**
+ * copy_to_oldmem_page - similar to copy_oldmem_page but in opposite direction.
+ */
+ssize_t copy_to_oldmem_page(unsigned long pfn, char *src, size_t csize,
+		unsigned long offset, int userbuf)
+{
+	return __copy_oldmem_page(pfn, src, csize, offset, userbuf, false, true);
+}
+
+/**
+ * copy_to_oldmem_page_encrypted - similar to copy_oldmem_page_encrypted but
+ * in opposite direction.
+ */
+ssize_t copy_to_oldmem_page_encrypted(unsigned long pfn, char *src, size_t csize,
+		unsigned long offset, int userbuf)
+{
+	return __copy_oldmem_page(pfn, src, csize, offset, userbuf, true, true);
 }
 
 ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos)
-- 
2.26.2


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/3] vmcore: simplify read_from_olemem
  2020-09-09  7:50   ` Kairui Song
  (?)
@ 2020-09-09 10:55   ` kernel test robot
  -1 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2020-09-09 10:55 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4161 bytes --]

Hi Kairui,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on linux/master]
[also build test WARNING on linus/master v5.9-rc4 next-20200908]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Kairui-Song/Add-writing-support-to-vmcore-for-reusing-oldmem/20200909-155222
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git bcf876870b95592b52519ed4aafcf9d95999bc9c
config: mips-randconfig-r035-20200909 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 8893d0816ccdf8998d2e21b5430e9d6abe7ef465)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install mips cross compiling tool for clang build
        # apt-get install binutils-mips-linux-gnu
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=mips 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> fs/proc/vmcore.c:118:14: warning: comparison of distinct pointer types ('typeof (to_copy) *' (aka 'unsigned int *') and 'typeof (((1UL) << 14) - offset) *' (aka 'unsigned long *')) [-Wcompare-distinct-pointer-types]
                   nr_bytes = min(to_copy, PAGE_SIZE - offset);
                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/kernel.h:884:19: note: expanded from macro 'min'
   #define min(x, y)       __careful_cmp(x, y, <)
                           ^~~~~~~~~~~~~~~~~~~~~~
   include/linux/kernel.h:875:24: note: expanded from macro '__careful_cmp'
           __builtin_choose_expr(__safe_cmp(x, y), \
                                 ^~~~~~~~~~~~~~~~
   include/linux/kernel.h:865:4: note: expanded from macro '__safe_cmp'
                   (__typecheck(x, y) && __no_side_effects(x, y))
                    ^~~~~~~~~~~~~~~~~
   include/linux/kernel.h:851:29: note: expanded from macro '__typecheck'
                   (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
                              ~~~~~~~~~~~~~~ ^  ~~~~~~~~~~~~~~
   1 warning generated.

# https://github.com/0day-ci/linux/commit/03450912209f6cc4521a2d3a83d1cc8f4b5a850e
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Kairui-Song/Add-writing-support-to-vmcore-for-reusing-oldmem/20200909-155222
git checkout 03450912209f6cc4521a2d3a83d1cc8f4b5a850e
vim +118 fs/proc/vmcore.c

   104	
   105	/* Reads a page from the oldmem device from given offset. */
   106	ssize_t read_from_oldmem(char *buf, size_t count,
   107				 u64 *ppos, int userbuf,
   108				 bool encrypted)
   109	{
   110		unsigned long pfn, offset;
   111		size_t nr_bytes, to_copy = count;
   112		ssize_t tmp;
   113	
   114		offset = (unsigned long)(*ppos & (PAGE_SIZE - 1));
   115		pfn = (unsigned long)(*ppos / PAGE_SIZE);
   116	
   117		while (to_copy) {
 > 118			nr_bytes = min(to_copy, PAGE_SIZE - offset);
   119	
   120			/* If pfn is not ram, return zeros for sparse dump files */
   121			if (pfn_is_ram(pfn) == 0) {
   122				memset(buf, 0, nr_bytes);
   123			} else {
   124				if (encrypted)
   125					tmp = copy_oldmem_page_encrypted(pfn, buf,
   126									 nr_bytes,
   127									 offset,
   128									 userbuf);
   129				else
   130					tmp = copy_oldmem_page(pfn, buf, nr_bytes,
   131							       offset, userbuf);
   132	
   133				if (tmp < 0)
   134					return tmp;
   135			}
   136			*ppos += nr_bytes;
   137			buf += nr_bytes;
   138			to_copy -= nr_bytes;
   139			++pfn;
   140			offset = 0;
   141		}
   142	
   143		return count;
   144	}
   145	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 27883 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/3] vmcore: Add interface to write to old mem
  2020-09-09  7:50   ` Kairui Song
  (?)
@ 2020-09-09 12:26   ` kernel test robot
  -1 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2020-09-09 12:26 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3913 bytes --]

Hi Kairui,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on linux/master]
[also build test ERROR on tip/x86/core linus/master v5.9-rc4 next-20200908]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Kairui-Song/Add-writing-support-to-vmcore-for-reusing-oldmem/20200909-155222
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git bcf876870b95592b52519ed4aafcf9d95999bc9c
config: nios2-randconfig-r026-20200909 (attached as .config)
compiler: nios2-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=nios2 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from net/core/dev.c:140:
   include/linux/crash_dump.h: In function 'read_from_oldmem':
>> include/linux/crash_dump.h:131:40: error: parameter name omitted
     131 | static inline ssize_t read_from_oldmem(char*, size_t, u64*, int, bool)
         |                                        ^~~~~
   include/linux/crash_dump.h:131:47: error: parameter name omitted
     131 | static inline ssize_t read_from_oldmem(char*, size_t, u64*, int, bool)
         |                                               ^~~~~~
   include/linux/crash_dump.h:131:55: error: parameter name omitted
     131 | static inline ssize_t read_from_oldmem(char*, size_t, u64*, int, bool)
         |                                                       ^~~~
   include/linux/crash_dump.h:131:61: error: parameter name omitted
     131 | static inline ssize_t read_from_oldmem(char*, size_t, u64*, int, bool)
         |                                                             ^~~
   include/linux/crash_dump.h:131:66: error: parameter name omitted
     131 | static inline ssize_t read_from_oldmem(char*, size_t, u64*, int, bool)
         |                                                                  ^~~~
   include/linux/crash_dump.h: At top level:
>> include/linux/crash_dump.h:136:1: error: expected identifier or '(' before '{' token
     136 | {
         | ^
   include/linux/crash_dump.h:135:23: warning: 'write_to_oldmem' declared 'static' but never defined [-Wunused-function]
     135 | static inline ssize_t write_to_oldmem(char*, size_t, u64*, int, bool);
         |                       ^~~~~~~~~~~~~~~

# https://github.com/0day-ci/linux/commit/6d641ec8d1a1d979916bca93ddf975a9a860c8f2
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Kairui-Song/Add-writing-support-to-vmcore-for-reusing-oldmem/20200909-155222
git checkout 6d641ec8d1a1d979916bca93ddf975a9a860c8f2
vim +131 include/linux/crash_dump.h

   122	
   123	#ifdef CONFIG_PROC_VMCORE
   124	ssize_t read_from_oldmem(char *buf, size_t count,
   125				 u64 *ppos, int userbuf,
   126				 bool encrypted);
   127	ssize_t write_to_oldmem(char *buf, size_t count,
   128				u64 *ppos, int userbuf,
   129				bool encrypted);
   130	#else
 > 131	static inline ssize_t read_from_oldmem(char*, size_t, u64*, int, bool)
   132	{
   133		return -EOPNOTSUPP;
   134	}
   135	static inline ssize_t write_to_oldmem(char*, size_t, u64*, int, bool);
 > 136	{
   137		return -EOPNOTSUPP;
   138	}
   139	#endif /* CONFIG_PROC_VMCORE */
   140	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 22524 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/3] vmcore: Add interface to write to old mem
  2020-09-09  7:50   ` Kairui Song
  (?)
  (?)
@ 2020-09-09 12:27   ` kernel test robot
  -1 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2020-09-09 12:27 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 11747 bytes --]

Hi Kairui,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on linux/master]
[also build test ERROR on linus/master v5.9-rc4 next-20200908]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Kairui-Song/Add-writing-support-to-vmcore-for-reusing-oldmem/20200909-155222
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git bcf876870b95592b52519ed4aafcf9d95999bc9c
config: riscv-randconfig-r015-20200909 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 8893d0816ccdf8998d2e21b5430e9d6abe7ef465)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install riscv cross compiling tool for clang build
        # apt-get install binutils-riscv64-linux-gnu
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=riscv 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   In file included from ./arch/riscv/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:13:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/riscv/include/asm/io.h:148:
   include/asm-generic/io.h:564:9: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           return inw(addr);
                  ^~~~~~~~~
   arch/riscv/include/asm/io.h:55:76: note: expanded from macro 'inw'
   #define inw(c)          ({ u16 __v; __io_pbr(); __v = readw_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
                                                                           ~~~~~~~~~~ ^
   arch/riscv/include/asm/mmio.h:94:76: note: expanded from macro 'readw_cpu'
   #define readw_cpu(c)            ({ u16 __r = le16_to_cpu((__force __le16)__raw_readw(c)); __r; })
                                                                                        ^
   include/uapi/linux/byteorder/little_endian.h:36:51: note: expanded from macro '__le16_to_cpu'
   #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
                                                     ^
   In file included from net/core/dev.c:88:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:10:
   In file included from ./arch/riscv/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:13:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/riscv/include/asm/io.h:148:
   include/asm-generic/io.h:572:9: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           return inl(addr);
                  ^~~~~~~~~
   arch/riscv/include/asm/io.h:56:76: note: expanded from macro 'inl'
   #define inl(c)          ({ u32 __v; __io_pbr(); __v = readl_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
                                                                           ~~~~~~~~~~ ^
   arch/riscv/include/asm/mmio.h:95:76: note: expanded from macro 'readl_cpu'
   #define readl_cpu(c)            ({ u32 __r = le32_to_cpu((__force __le32)__raw_readl(c)); __r; })
                                                                                        ^
   include/uapi/linux/byteorder/little_endian.h:34:51: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
                                                     ^
   In file included from net/core/dev.c:88:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:10:
   In file included from ./arch/riscv/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:13:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/riscv/include/asm/io.h:148:
   include/asm-generic/io.h:580:2: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           outb(value, addr);
           ^~~~~~~~~~~~~~~~~
   arch/riscv/include/asm/io.h:58:68: note: expanded from macro 'outb'
   #define outb(v,c)       ({ __io_pbw(); writeb_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })
                                                                 ~~~~~~~~~~ ^
   arch/riscv/include/asm/mmio.h:97:52: note: expanded from macro 'writeb_cpu'
   #define writeb_cpu(v, c)        ((void)__raw_writeb((v), (c)))
                                                             ^
   In file included from net/core/dev.c:88:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:10:
   In file included from ./arch/riscv/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:13:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/riscv/include/asm/io.h:148:
   include/asm-generic/io.h:588:2: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           outw(value, addr);
           ^~~~~~~~~~~~~~~~~
   arch/riscv/include/asm/io.h:59:68: note: expanded from macro 'outw'
   #define outw(v,c)       ({ __io_pbw(); writew_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })
                                                                 ~~~~~~~~~~ ^
   arch/riscv/include/asm/mmio.h:98:76: note: expanded from macro 'writew_cpu'
   #define writew_cpu(v, c)        ((void)__raw_writew((__force u16)cpu_to_le16(v), (c)))
                                                                                     ^
   In file included from net/core/dev.c:88:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:10:
   In file included from ./arch/riscv/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:13:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/riscv/include/asm/io.h:148:
   include/asm-generic/io.h:596:2: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           outl(value, addr);
           ^~~~~~~~~~~~~~~~~
   arch/riscv/include/asm/io.h:60:68: note: expanded from macro 'outl'
   #define outl(v,c)       ({ __io_pbw(); writel_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })
                                                                 ~~~~~~~~~~ ^
   arch/riscv/include/asm/mmio.h:99:76: note: expanded from macro 'writel_cpu'
   #define writel_cpu(v, c)        ((void)__raw_writel((__force u32)cpu_to_le32(v), (c)))
                                                                                     ^
   In file included from net/core/dev.c:88:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:10:
   In file included from ./arch/riscv/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:13:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/riscv/include/asm/io.h:148:
   include/asm-generic/io.h:1017:55: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           return (port > MMIO_UPPER_LIMIT) ? NULL : PCI_IOBASE + port;
                                                     ~~~~~~~~~~ ^
   In file included from net/core/dev.c:140:
>> include/linux/crash_dump.h:131:45: warning: omitting the parameter name in a function definition is a C2x extension [-Wc2x-extensions]
   static inline ssize_t read_from_oldmem(char*, size_t, u64*, int, bool)
                                               ^
   include/linux/crash_dump.h:131:53: warning: omitting the parameter name in a function definition is a C2x extension [-Wc2x-extensions]
   static inline ssize_t read_from_oldmem(char*, size_t, u64*, int, bool)
                                                       ^
   include/linux/crash_dump.h:131:59: warning: omitting the parameter name in a function definition is a C2x extension [-Wc2x-extensions]
   static inline ssize_t read_from_oldmem(char*, size_t, u64*, int, bool)
                                                             ^
   include/linux/crash_dump.h:131:64: warning: omitting the parameter name in a function definition is a C2x extension [-Wc2x-extensions]
   static inline ssize_t read_from_oldmem(char*, size_t, u64*, int, bool)
                                                                  ^
   include/linux/crash_dump.h:131:70: warning: omitting the parameter name in a function definition is a C2x extension [-Wc2x-extensions]
   static inline ssize_t read_from_oldmem(char*, size_t, u64*, int, bool)
                                                                        ^
>> include/linux/crash_dump.h:136:1: error: expected identifier or '('
   {
   ^
   12 warnings and 1 error generated.

# https://github.com/0day-ci/linux/commit/6d641ec8d1a1d979916bca93ddf975a9a860c8f2
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Kairui-Song/Add-writing-support-to-vmcore-for-reusing-oldmem/20200909-155222
git checkout 6d641ec8d1a1d979916bca93ddf975a9a860c8f2
vim +136 include/linux/crash_dump.h

ae7eb82a92fae4 Thiago Jung Bauermann 2019-08-06  122  
ae7eb82a92fae4 Thiago Jung Bauermann 2019-08-06  123  #ifdef CONFIG_PROC_VMCORE
ae7eb82a92fae4 Thiago Jung Bauermann 2019-08-06  124  ssize_t read_from_oldmem(char *buf, size_t count,
ae7eb82a92fae4 Thiago Jung Bauermann 2019-08-06  125  			 u64 *ppos, int userbuf,
ae7eb82a92fae4 Thiago Jung Bauermann 2019-08-06  126  			 bool encrypted);
6d641ec8d1a1d9 Kairui Song           2020-09-09  127  ssize_t write_to_oldmem(char *buf, size_t count,
ae7eb82a92fae4 Thiago Jung Bauermann 2019-08-06  128  			u64 *ppos, int userbuf,
6d641ec8d1a1d9 Kairui Song           2020-09-09  129  			bool encrypted);
6d641ec8d1a1d9 Kairui Song           2020-09-09  130  #else
6d641ec8d1a1d9 Kairui Song           2020-09-09 @131  static inline ssize_t read_from_oldmem(char*, size_t, u64*, int, bool)
6d641ec8d1a1d9 Kairui Song           2020-09-09  132  {
6d641ec8d1a1d9 Kairui Song           2020-09-09  133  	return -EOPNOTSUPP;
6d641ec8d1a1d9 Kairui Song           2020-09-09  134  }
6d641ec8d1a1d9 Kairui Song           2020-09-09  135  static inline ssize_t write_to_oldmem(char*, size_t, u64*, int, bool);
ae7eb82a92fae4 Thiago Jung Bauermann 2019-08-06 @136  {
ae7eb82a92fae4 Thiago Jung Bauermann 2019-08-06  137  	return -EOPNOTSUPP;
ae7eb82a92fae4 Thiago Jung Bauermann 2019-08-06  138  }
ae7eb82a92fae4 Thiago Jung Bauermann 2019-08-06  139  #endif /* CONFIG_PROC_VMCORE */
ae7eb82a92fae4 Thiago Jung Bauermann 2019-08-06  140  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 26971 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/3] Add writing support to vmcore for reusing oldmem
  2020-09-09  7:50 ` Kairui Song
@ 2020-09-09 14:04   ` Eric W. Biederman
  -1 siblings, 0 replies; 17+ messages in thread
From: Eric W. Biederman @ 2020-09-09 14:04 UTC (permalink / raw)
  To: Kairui Song
  Cc: linux-kernel, Dave Young, Baoquan He, Vivek Goyal,
	Alexey Dobriyan, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	kexec

Kairui Song <kasong@redhat.com> writes:

> Currently vmcore only supports reading, this patch series is an RFC
> to add writing support to vmcore. It's x86_64 only yet, I'll add other
> architecture later if there is no problem with this idea.
>
> My purpose of adding writing support is to reuse the crashed kernel's
> old memory in kdump kernel, reduce kdump memory pressure, and
> allow kdump to run with a smaller crashkernel reservation.
>
> This is doable because in most cases, after kernel panic, user only
> interested in the crashed kernel itself, and userspace/cache/free
> memory pages are not dumped. `makedumpfile` is widely used to skip
> these pages. Kernel pages usually only take a small part of
> the whole old memory. So there will be many reusable pages.
>
> By adding writing support, userspace then can use these pages as a fast
> and temporary storage. This helps reduce memory pressure in many ways.
>
> For example, I've written a POC program based on this, it will find
> the reusable pages, and creates an NBD device which maps to these pages.
> The NBD device can then be used as swap, or to hold some temp files
> which previouly live in RAM.
>
> The link of the POC tool: https://github.com/ryncsn/kdumpd

A couple of thoughts.
1) Unless I am completely mistaken treating this as a exercise in
   memory hotplug would be much simpler.

   AKA just plug in the memory that is not needed as part of the kdump.

   I see below that you have problems doing this because
   of fragmentation.  I still think hotplug is doable using some
   kind of fragmented memory zone.
   
2) The purpose of the memory reservation is because hardware is
   still potentially running agains the memory of the old kernel.

   By the time we have brought up a new kernel enough of the hardware
   may have been reinitialized that we don't have to worry about
   hardware randomly dma'ing into the memory used by the old kernel.

   With IOMMUs and care we may be able to guarantee for some machine
   configurations it is impossible for DMA to come from some piece of
   hardware that is present but the kernel does not have a driver
   loaded for.


I really do not like this approach because it is fundamentlly doing the
wrong thing.  Adding write support to read-only drivers.  I do not see
anywhere that you even mentioned the hard problem and the reason we
reserve memory in the first place.  Hardware spontaneously DMA'ing onto
it.

> It's have been a long time issue that kdump suffers from OOM issue
> with limited crashkernel memory. So reusing old memory could be very
> helpful.

There is a very fine line here between reusing existing code (aka
drivers and userspace) and doing something that should work.

It might make sense to figure out what is using so much memory
that an OOM is triggered.

Ages ago I did something that was essentially dumping the kernels printk
buffer to the serial console in case of a crash and I had things down to
something comparatively miniscule like 8M or less.

My memory is that historically it has been high performance scsi raid
drivers or something like that, that are behind the need to have such
large memory reservations.

Now that I think about it, you aren't by any chance doing something
silly like running systemd in your initrd are you?  Are these OOMs by
any chance a userspace problem rather than a problem with inefficient
drivers?


In summary you either need to show that it is safe to reuse the
memory of the old kernel, or do some work to reduce the memory footprint
of the crashdump kernel, and the crashdump userspace. 

Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/3] Add writing support to vmcore for reusing oldmem
@ 2020-09-09 14:04   ` Eric W. Biederman
  0 siblings, 0 replies; 17+ messages in thread
From: Eric W. Biederman @ 2020-09-09 14:04 UTC (permalink / raw)
  To: Kairui Song
  Cc: Baoquan He, kexec, linux-kernel, Ingo Molnar, Borislav Petkov,
	Thomas Gleixner, Dave Young, Alexey Dobriyan, Vivek Goyal

Kairui Song <kasong@redhat.com> writes:

> Currently vmcore only supports reading, this patch series is an RFC
> to add writing support to vmcore. It's x86_64 only yet, I'll add other
> architecture later if there is no problem with this idea.
>
> My purpose of adding writing support is to reuse the crashed kernel's
> old memory in kdump kernel, reduce kdump memory pressure, and
> allow kdump to run with a smaller crashkernel reservation.
>
> This is doable because in most cases, after kernel panic, user only
> interested in the crashed kernel itself, and userspace/cache/free
> memory pages are not dumped. `makedumpfile` is widely used to skip
> these pages. Kernel pages usually only take a small part of
> the whole old memory. So there will be many reusable pages.
>
> By adding writing support, userspace then can use these pages as a fast
> and temporary storage. This helps reduce memory pressure in many ways.
>
> For example, I've written a POC program based on this, it will find
> the reusable pages, and creates an NBD device which maps to these pages.
> The NBD device can then be used as swap, or to hold some temp files
> which previouly live in RAM.
>
> The link of the POC tool: https://github.com/ryncsn/kdumpd

A couple of thoughts.
1) Unless I am completely mistaken treating this as a exercise in
   memory hotplug would be much simpler.

   AKA just plug in the memory that is not needed as part of the kdump.

   I see below that you have problems doing this because
   of fragmentation.  I still think hotplug is doable using some
   kind of fragmented memory zone.
   
2) The purpose of the memory reservation is because hardware is
   still potentially running agains the memory of the old kernel.

   By the time we have brought up a new kernel enough of the hardware
   may have been reinitialized that we don't have to worry about
   hardware randomly dma'ing into the memory used by the old kernel.

   With IOMMUs and care we may be able to guarantee for some machine
   configurations it is impossible for DMA to come from some piece of
   hardware that is present but the kernel does not have a driver
   loaded for.


I really do not like this approach because it is fundamentlly doing the
wrong thing.  Adding write support to read-only drivers.  I do not see
anywhere that you even mentioned the hard problem and the reason we
reserve memory in the first place.  Hardware spontaneously DMA'ing onto
it.

> It's have been a long time issue that kdump suffers from OOM issue
> with limited crashkernel memory. So reusing old memory could be very
> helpful.

There is a very fine line here between reusing existing code (aka
drivers and userspace) and doing something that should work.

It might make sense to figure out what is using so much memory
that an OOM is triggered.

Ages ago I did something that was essentially dumping the kernels printk
buffer to the serial console in case of a crash and I had things down to
something comparatively miniscule like 8M or less.

My memory is that historically it has been high performance scsi raid
drivers or something like that, that are behind the need to have such
large memory reservations.

Now that I think about it, you aren't by any chance doing something
silly like running systemd in your initrd are you?  Are these OOMs by
any chance a userspace problem rather than a problem with inefficient
drivers?


In summary you either need to show that it is safe to reuse the
memory of the old kernel, or do some work to reduce the memory footprint
of the crashdump kernel, and the crashdump userspace. 

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/3] Add writing support to vmcore for reusing oldmem
  2020-09-09 14:04   ` Eric W. Biederman
@ 2020-09-09 16:43     ` Kairui Song
  -1 siblings, 0 replies; 17+ messages in thread
From: Kairui Song @ 2020-09-09 16:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Kernel Mailing List, Dave Young, Baoquan He, Vivek Goyal,
	Alexey Dobriyan, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	kexec

On Wed, Sep 9, 2020 at 10:04 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Kairui Song <kasong@redhat.com> writes:
>
> > Currently vmcore only supports reading, this patch series is an RFC
> > to add writing support to vmcore. It's x86_64 only yet, I'll add other
> > architecture later if there is no problem with this idea.
> >
> > My purpose of adding writing support is to reuse the crashed kernel's
> > old memory in kdump kernel, reduce kdump memory pressure, and
> > allow kdump to run with a smaller crashkernel reservation.
> >
> > This is doable because in most cases, after kernel panic, user only
> > interested in the crashed kernel itself, and userspace/cache/free
> > memory pages are not dumped. `makedumpfile` is widely used to skip
> > these pages. Kernel pages usually only take a small part of
> > the whole old memory. So there will be many reusable pages.
> >
> > By adding writing support, userspace then can use these pages as a fast
> > and temporary storage. This helps reduce memory pressure in many ways.
> >
> > For example, I've written a POC program based on this, it will find
> > the reusable pages, and creates an NBD device which maps to these pages.
> > The NBD device can then be used as swap, or to hold some temp files
> > which previouly live in RAM.
> >
> > The link of the POC tool: https://github.com/ryncsn/kdumpd
>
> A couple of thoughts.
> 1) Unless I am completely mistaken treating this as a exercise in
>    memory hotplug would be much simpler.
>
>    AKA just plug in the memory that is not needed as part of the kdump.
>
>    I see below that you have problems doing this because
>    of fragmentation.  I still think hotplug is doable using some
>    kind of fragmented memory zone.
>
> 2) The purpose of the memory reservation is because hardware is
>    still potentially running agains the memory of the old kernel.
>
>    By the time we have brought up a new kernel enough of the hardware
>    may have been reinitialized that we don't have to worry about
>    hardware randomly dma'ing into the memory used by the old kernel.
>
>    With IOMMUs and care we may be able to guarantee for some machine
>    configurations it is impossible for DMA to come from some piece of
>    hardware that is present but the kernel does not have a driver
>    loaded for.\
>
> I really do not like this approach because it is fundamentlly doing the
> wrong thing.  Adding write support to read-only drivers.  I do not see
> anywhere that you even mentioned the hard problem and the reason we
> reserve memory in the first place.  Hardware spontaneously DMA'ing onto
> it.
>
That POC tool looks ugly for now as it only a draft to prove this
works, sorry about it.

For the patch, yes, it is expecting IOMMU to lower the chance of
potential DMA issue, and expecting DMA will not hit userspace/free
page, or at least won't override a massive amount of reusable old
memory. And I thought about some solutions for the potential DMA
issue.

As old memories are used as a block device, which is proxied by
userspace, so upon each IO, the userspace tool could do an integrity
check of the corresponding data stored in old mem, and keep multiple
copies of the data. (eg. use 512M of old memory to hold a 128M block
device). These copies will be kept far away from each other regarding
the physical memory location. The reusable old memories are sparse so
the actual memory containing the data should be also sparse.
So if some part is corrupted, it is still recoverable. Unless the DMA
went very wrong and wiped a large region of memory, but if such thing
happens, it's most likely kernel pages are also being wiped by DMA, so
the vmcore is already corrupted and kdump may not help. But at least
it won't fail silently, the userspace tool can still do something like
dump some available data to an easy to setup target.

And also that's one of the reasons not using old memory as kdump's
memory directly.

> > It's have been a long time issue that kdump suffers from OOM issue
> > with limited crashkernel memory. So reusing old memory could be very
> > helpful.
>
> There is a very fine line here between reusing existing code (aka
> drivers and userspace) and doing something that should work.
>
> It might make sense to figure out what is using so much memory
> that an OOM is triggered.
>
> Ages ago I did something that was essentially dumping the kernels printk
> buffer to the serial console in case of a crash and I had things down to
> something comparatively miniscule like 8M or less.
>
> My memory is that historically it has been high performance scsi raid
> drivers or something like that, that are behind the need to have such
> large memory reservations.
>
> Now that I think about it, you aren't by any chance doing something
> silly like running systemd in your initrd are you?  Are these OOMs by
> any chance a userspace problem rather than a problem with inefficient
> drivers?

The problem with the user space is that, kdump is expected to dump to
different kinds of dump targets, and there are some very memory
consuming ones, eg. the NFSv3 case. And there are many other labor
heavy jobs for the dump target setup, like network setup, lvm setup,
iscsi setup, multipath etc, etc, not to mention potential corner cases
with these dump targets. And it is not practical to reimplement them
again in a memory friendly way.

And the user space is growing, even if user only included "bash
makedumpfile ssh ip" and required libs in initramfs, which are
essential for dumping the vmcore over SSH (dump over SSH is commonly
used), the initramfs will take 20M after decompressed.

The kernel driver memory usage is trackable and I only encountered a
few of such issues, and many have applied workaround for kdump. And if
userspace memory pressure is reduced, kernel will also have more
memory.

And now in Fedora, it is using the existing tool Dracut to generate
the initramfs, which heavily depends on systemd indeed. Even with
these helpers, it has been taking quite some work to support all the
dump targets.

>
> In summary you either need to show that it is safe to reuse the
> memory of the old kernel, or do some work to reduce the memory footprint
> of the crashdump kernel, and the crashdump userspace.
>
> Eric
>


--
Best Regards,
Kairui Song


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/3] Add writing support to vmcore for reusing oldmem
@ 2020-09-09 16:43     ` Kairui Song
  0 siblings, 0 replies; 17+ messages in thread
From: Kairui Song @ 2020-09-09 16:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Baoquan He, kexec, Linux Kernel Mailing List, Ingo Molnar,
	Borislav Petkov, Thomas Gleixner, Dave Young, Alexey Dobriyan,
	Vivek Goyal

On Wed, Sep 9, 2020 at 10:04 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Kairui Song <kasong@redhat.com> writes:
>
> > Currently vmcore only supports reading, this patch series is an RFC
> > to add writing support to vmcore. It's x86_64 only yet, I'll add other
> > architecture later if there is no problem with this idea.
> >
> > My purpose of adding writing support is to reuse the crashed kernel's
> > old memory in kdump kernel, reduce kdump memory pressure, and
> > allow kdump to run with a smaller crashkernel reservation.
> >
> > This is doable because in most cases, after kernel panic, user only
> > interested in the crashed kernel itself, and userspace/cache/free
> > memory pages are not dumped. `makedumpfile` is widely used to skip
> > these pages. Kernel pages usually only take a small part of
> > the whole old memory. So there will be many reusable pages.
> >
> > By adding writing support, userspace then can use these pages as a fast
> > and temporary storage. This helps reduce memory pressure in many ways.
> >
> > For example, I've written a POC program based on this, it will find
> > the reusable pages, and creates an NBD device which maps to these pages.
> > The NBD device can then be used as swap, or to hold some temp files
> > which previouly live in RAM.
> >
> > The link of the POC tool: https://github.com/ryncsn/kdumpd
>
> A couple of thoughts.
> 1) Unless I am completely mistaken treating this as a exercise in
>    memory hotplug would be much simpler.
>
>    AKA just plug in the memory that is not needed as part of the kdump.
>
>    I see below that you have problems doing this because
>    of fragmentation.  I still think hotplug is doable using some
>    kind of fragmented memory zone.
>
> 2) The purpose of the memory reservation is because hardware is
>    still potentially running agains the memory of the old kernel.
>
>    By the time we have brought up a new kernel enough of the hardware
>    may have been reinitialized that we don't have to worry about
>    hardware randomly dma'ing into the memory used by the old kernel.
>
>    With IOMMUs and care we may be able to guarantee for some machine
>    configurations it is impossible for DMA to come from some piece of
>    hardware that is present but the kernel does not have a driver
>    loaded for.\
>
> I really do not like this approach because it is fundamentlly doing the
> wrong thing.  Adding write support to read-only drivers.  I do not see
> anywhere that you even mentioned the hard problem and the reason we
> reserve memory in the first place.  Hardware spontaneously DMA'ing onto
> it.
>
That POC tool looks ugly for now as it only a draft to prove this
works, sorry about it.

For the patch, yes, it is expecting IOMMU to lower the chance of
potential DMA issue, and expecting DMA will not hit userspace/free
page, or at least won't override a massive amount of reusable old
memory. And I thought about some solutions for the potential DMA
issue.

As old memories are used as a block device, which is proxied by
userspace, so upon each IO, the userspace tool could do an integrity
check of the corresponding data stored in old mem, and keep multiple
copies of the data. (eg. use 512M of old memory to hold a 128M block
device). These copies will be kept far away from each other regarding
the physical memory location. The reusable old memories are sparse so
the actual memory containing the data should be also sparse.
So if some part is corrupted, it is still recoverable. Unless the DMA
went very wrong and wiped a large region of memory, but if such thing
happens, it's most likely kernel pages are also being wiped by DMA, so
the vmcore is already corrupted and kdump may not help. But at least
it won't fail silently, the userspace tool can still do something like
dump some available data to an easy to setup target.

And also that's one of the reasons not using old memory as kdump's
memory directly.

> > It's have been a long time issue that kdump suffers from OOM issue
> > with limited crashkernel memory. So reusing old memory could be very
> > helpful.
>
> There is a very fine line here between reusing existing code (aka
> drivers and userspace) and doing something that should work.
>
> It might make sense to figure out what is using so much memory
> that an OOM is triggered.
>
> Ages ago I did something that was essentially dumping the kernels printk
> buffer to the serial console in case of a crash and I had things down to
> something comparatively miniscule like 8M or less.
>
> My memory is that historically it has been high performance scsi raid
> drivers or something like that, that are behind the need to have such
> large memory reservations.
>
> Now that I think about it, you aren't by any chance doing something
> silly like running systemd in your initrd are you?  Are these OOMs by
> any chance a userspace problem rather than a problem with inefficient
> drivers?

The problem with the user space is that, kdump is expected to dump to
different kinds of dump targets, and there are some very memory
consuming ones, eg. the NFSv3 case. And there are many other labor
heavy jobs for the dump target setup, like network setup, lvm setup,
iscsi setup, multipath etc, etc, not to mention potential corner cases
with these dump targets. And it is not practical to reimplement them
again in a memory friendly way.

And the user space is growing, even if user only included "bash
makedumpfile ssh ip" and required libs in initramfs, which are
essential for dumping the vmcore over SSH (dump over SSH is commonly
used), the initramfs will take 20M after decompressed.

The kernel driver memory usage is trackable and I only encountered a
few of such issues, and many have applied workaround for kdump. And if
userspace memory pressure is reduced, kernel will also have more
memory.

And now in Fedora, it is using the existing tool Dracut to generate
the initramfs, which heavily depends on systemd indeed. Even with
these helpers, it has been taking quite some work to support all the
dump targets.

>
> In summary you either need to show that it is safe to reuse the
> memory of the old kernel, or do some work to reduce the memory footprint
> of the crashdump kernel, and the crashdump userspace.
>
> Eric
>


--
Best Regards,
Kairui Song


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/3] Add writing support to vmcore for reusing oldmem
  2020-09-09 16:43     ` Kairui Song
@ 2020-09-21  7:17       ` Kairui Song
  -1 siblings, 0 replies; 17+ messages in thread
From: Kairui Song @ 2020-09-21  7:17 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Kernel Mailing List, Dave Young, Baoquan He, Vivek Goyal,
	Alexey Dobriyan, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	kexec

On Thu, Sep 10, 2020 at 12:43 AM Kairui Song <kasong@redhat.com> wrote:
>
> On Wed, Sep 9, 2020 at 10:04 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >
> > Kairui Song <kasong@redhat.com> writes:
> >
> > > Currently vmcore only supports reading, this patch series is an RFC
> > > to add writing support to vmcore. It's x86_64 only yet, I'll add other
> > > architecture later if there is no problem with this idea.
> > >
> > > My purpose of adding writing support is to reuse the crashed kernel's
> > > old memory in kdump kernel, reduce kdump memory pressure, and
> > > allow kdump to run with a smaller crashkernel reservation.
> > >
> > > This is doable because in most cases, after kernel panic, user only
> > > interested in the crashed kernel itself, and userspace/cache/free
> > > memory pages are not dumped. `makedumpfile` is widely used to skip
> > > these pages. Kernel pages usually only take a small part of
> > > the whole old memory. So there will be many reusable pages.
> > >
> > > By adding writing support, userspace then can use these pages as a fast
> > > and temporary storage. This helps reduce memory pressure in many ways.
> > >
> > > For example, I've written a POC program based on this, it will find
> > > the reusable pages, and creates an NBD device which maps to these pages.
> > > The NBD device can then be used as swap, or to hold some temp files
> > > which previouly live in RAM.
> > >
> > > The link of the POC tool: https://github.com/ryncsn/kdumpd
> >
> > A couple of thoughts.
> > 1) Unless I am completely mistaken treating this as a exercise in
> >    memory hotplug would be much simpler.
> >
> >    AKA just plug in the memory that is not needed as part of the kdump.
> >
> >    I see below that you have problems doing this because
> >    of fragmentation.  I still think hotplug is doable using some
> >    kind of fragmented memory zone.
> >
> > 2) The purpose of the memory reservation is because hardware is
> >    still potentially running agains the memory of the old kernel.
> >
> >    By the time we have brought up a new kernel enough of the hardware
> >    may have been reinitialized that we don't have to worry about
> >    hardware randomly dma'ing into the memory used by the old kernel.
> >
> >    With IOMMUs and care we may be able to guarantee for some machine
> >    configurations it is impossible for DMA to come from some piece of
> >    hardware that is present but the kernel does not have a driver
> >    loaded for.\
> >
> > I really do not like this approach because it is fundamentlly doing the
> > wrong thing.  Adding write support to read-only drivers.  I do not see
> > anywhere that you even mentioned the hard problem and the reason we
> > reserve memory in the first place.  Hardware spontaneously DMA'ing onto
> > it.
> >
> That POC tool looks ugly for now as it only a draft to prove this
> works, sorry about it.
>
> For the patch, yes, it is expecting IOMMU to lower the chance of
> potential DMA issue, and expecting DMA will not hit userspace/free
> page, or at least won't override a massive amount of reusable old
> memory. And I thought about some solutions for the potential DMA
> issue.
>
> As old memories are used as a block device, which is proxied by
> userspace, so upon each IO, the userspace tool could do an integrity
> check of the corresponding data stored in old mem, and keep multiple
> copies of the data. (eg. use 512M of old memory to hold a 128M block
> device). These copies will be kept far away from each other regarding
> the physical memory location. The reusable old memories are sparse so
> the actual memory containing the data should be also sparse.
> So if some part is corrupted, it is still recoverable. Unless the DMA
> went very wrong and wiped a large region of memory, but if such thing
> happens, it's most likely kernel pages are also being wiped by DMA, so
> the vmcore is already corrupted and kdump may not help. But at least
> it won't fail silently, the userspace tool can still do something like
> dump some available data to an easy to setup target.
>
> And also that's one of the reasons not using old memory as kdump's
> memory directly.
>
> > > It's have been a long time issue that kdump suffers from OOM issue
> > > with limited crashkernel memory. So reusing old memory could be very
> > > helpful.
> >
> > There is a very fine line here between reusing existing code (aka
> > drivers and userspace) and doing something that should work.
> >
> > It might make sense to figure out what is using so much memory
> > that an OOM is triggered.
> >
> > Ages ago I did something that was essentially dumping the kernels printk
> > buffer to the serial console in case of a crash and I had things down to
> > something comparatively miniscule like 8M or less.
> >
> > My memory is that historically it has been high performance scsi raid
> > drivers or something like that, that are behind the need to have such
> > large memory reservations.
> >
> > Now that I think about it, you aren't by any chance doing something
> > silly like running systemd in your initrd are you?  Are these OOMs by
> > any chance a userspace problem rather than a problem with inefficient
> > drivers?
>
> The problem with the user space is that, kdump is expected to dump to
> different kinds of dump targets, and there are some very memory
> consuming ones, eg. the NFSv3 case. And there are many other labor
> heavy jobs for the dump target setup, like network setup, lvm setup,
> iscsi setup, multipath etc, etc, not to mention potential corner cases
> with these dump targets. And it is not practical to reimplement them
> again in a memory friendly way.
>
> And the user space is growing, even if user only included "bash
> makedumpfile ssh ip" and required libs in initramfs, which are
> essential for dumping the vmcore over SSH (dump over SSH is commonly
> used), the initramfs will take 20M after decompressed.
>
> The kernel driver memory usage is trackable and I only encountered a
> few of such issues, and many have applied workaround for kdump. And if
> userspace memory pressure is reduced, kernel will also have more
> memory.
>
> And now in Fedora, it is using the existing tool Dracut to generate
> the initramfs, which heavily depends on systemd indeed. Even with
> these helpers, it has been taking quite some work to support all the
> dump targets.
>
> >
> > In summary you either need to show that it is safe to reuse the
> > memory of the old kernel, or do some work to reduce the memory footprint
> > of the crashdump kernel, and the crashdump userspace.
> >
> > Eric
> >
>
>

Hi Eric,

I'm trying a new idea, that is, stop the DMA by clearing the PCI's bus
master bit, it's still a best-effort try, but should be helpful for
most cases:

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 03d37128a24f..736e3d13287f 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1825,6 +1825,10 @@ int pci_setup_device(struct pci_dev *dev)
        /* Early fixups, before probing the BARs */
        pci_fixup_device(pci_fixup_early, dev);

+       if (reset_devices && !(class & (PCI_CLASS_DISPLAY_VGA |
PCI_BASE_CLASS_BRIDGE)))
+               pci_clear_master(dev);
+
        pci_info(dev, "[%04x:%04x] type %02x class %#08x\n",
                 dev->vendor, dev->device, dev->hdr_type, dev->class);

With these techniques (some are mentioned in the previous reply), do
you think reuse old memory will be acceptable?

1. Try to stop DMA by clearing the bus master bit.
2. Do some verification upon each IO to detect DMA issues, store
redundant data for recovery.
3. If an unrecoverable DMA issue still occurs, do a fallback dump to
an easy-to-setup target.

--
Best Regards,
Kairui Song


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/3] Add writing support to vmcore for reusing oldmem
@ 2020-09-21  7:17       ` Kairui Song
  0 siblings, 0 replies; 17+ messages in thread
From: Kairui Song @ 2020-09-21  7:17 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Baoquan He, kexec, Linux Kernel Mailing List, Ingo Molnar,
	Borislav Petkov, Thomas Gleixner, Dave Young, Alexey Dobriyan,
	Vivek Goyal

On Thu, Sep 10, 2020 at 12:43 AM Kairui Song <kasong@redhat.com> wrote:
>
> On Wed, Sep 9, 2020 at 10:04 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >
> > Kairui Song <kasong@redhat.com> writes:
> >
> > > Currently vmcore only supports reading, this patch series is an RFC
> > > to add writing support to vmcore. It's x86_64 only yet, I'll add other
> > > architecture later if there is no problem with this idea.
> > >
> > > My purpose of adding writing support is to reuse the crashed kernel's
> > > old memory in kdump kernel, reduce kdump memory pressure, and
> > > allow kdump to run with a smaller crashkernel reservation.
> > >
> > > This is doable because in most cases, after kernel panic, user only
> > > interested in the crashed kernel itself, and userspace/cache/free
> > > memory pages are not dumped. `makedumpfile` is widely used to skip
> > > these pages. Kernel pages usually only take a small part of
> > > the whole old memory. So there will be many reusable pages.
> > >
> > > By adding writing support, userspace then can use these pages as a fast
> > > and temporary storage. This helps reduce memory pressure in many ways.
> > >
> > > For example, I've written a POC program based on this, it will find
> > > the reusable pages, and creates an NBD device which maps to these pages.
> > > The NBD device can then be used as swap, or to hold some temp files
> > > which previouly live in RAM.
> > >
> > > The link of the POC tool: https://github.com/ryncsn/kdumpd
> >
> > A couple of thoughts.
> > 1) Unless I am completely mistaken treating this as a exercise in
> >    memory hotplug would be much simpler.
> >
> >    AKA just plug in the memory that is not needed as part of the kdump.
> >
> >    I see below that you have problems doing this because
> >    of fragmentation.  I still think hotplug is doable using some
> >    kind of fragmented memory zone.
> >
> > 2) The purpose of the memory reservation is because hardware is
> >    still potentially running agains the memory of the old kernel.
> >
> >    By the time we have brought up a new kernel enough of the hardware
> >    may have been reinitialized that we don't have to worry about
> >    hardware randomly dma'ing into the memory used by the old kernel.
> >
> >    With IOMMUs and care we may be able to guarantee for some machine
> >    configurations it is impossible for DMA to come from some piece of
> >    hardware that is present but the kernel does not have a driver
> >    loaded for.\
> >
> > I really do not like this approach because it is fundamentlly doing the
> > wrong thing.  Adding write support to read-only drivers.  I do not see
> > anywhere that you even mentioned the hard problem and the reason we
> > reserve memory in the first place.  Hardware spontaneously DMA'ing onto
> > it.
> >
> That POC tool looks ugly for now as it only a draft to prove this
> works, sorry about it.
>
> For the patch, yes, it is expecting IOMMU to lower the chance of
> potential DMA issue, and expecting DMA will not hit userspace/free
> page, or at least won't override a massive amount of reusable old
> memory. And I thought about some solutions for the potential DMA
> issue.
>
> As old memories are used as a block device, which is proxied by
> userspace, so upon each IO, the userspace tool could do an integrity
> check of the corresponding data stored in old mem, and keep multiple
> copies of the data. (eg. use 512M of old memory to hold a 128M block
> device). These copies will be kept far away from each other regarding
> the physical memory location. The reusable old memories are sparse so
> the actual memory containing the data should be also sparse.
> So if some part is corrupted, it is still recoverable. Unless the DMA
> went very wrong and wiped a large region of memory, but if such thing
> happens, it's most likely kernel pages are also being wiped by DMA, so
> the vmcore is already corrupted and kdump may not help. But at least
> it won't fail silently, the userspace tool can still do something like
> dump some available data to an easy to setup target.
>
> And also that's one of the reasons not using old memory as kdump's
> memory directly.
>
> > > It's have been a long time issue that kdump suffers from OOM issue
> > > with limited crashkernel memory. So reusing old memory could be very
> > > helpful.
> >
> > There is a very fine line here between reusing existing code (aka
> > drivers and userspace) and doing something that should work.
> >
> > It might make sense to figure out what is using so much memory
> > that an OOM is triggered.
> >
> > Ages ago I did something that was essentially dumping the kernels printk
> > buffer to the serial console in case of a crash and I had things down to
> > something comparatively miniscule like 8M or less.
> >
> > My memory is that historically it has been high performance scsi raid
> > drivers or something like that, that are behind the need to have such
> > large memory reservations.
> >
> > Now that I think about it, you aren't by any chance doing something
> > silly like running systemd in your initrd are you?  Are these OOMs by
> > any chance a userspace problem rather than a problem with inefficient
> > drivers?
>
> The problem with the user space is that, kdump is expected to dump to
> different kinds of dump targets, and there are some very memory
> consuming ones, eg. the NFSv3 case. And there are many other labor
> heavy jobs for the dump target setup, like network setup, lvm setup,
> iscsi setup, multipath etc, etc, not to mention potential corner cases
> with these dump targets. And it is not practical to reimplement them
> again in a memory friendly way.
>
> And the user space is growing, even if user only included "bash
> makedumpfile ssh ip" and required libs in initramfs, which are
> essential for dumping the vmcore over SSH (dump over SSH is commonly
> used), the initramfs will take 20M after decompressed.
>
> The kernel driver memory usage is trackable and I only encountered a
> few of such issues, and many have applied workaround for kdump. And if
> userspace memory pressure is reduced, kernel will also have more
> memory.
>
> And now in Fedora, it is using the existing tool Dracut to generate
> the initramfs, which heavily depends on systemd indeed. Even with
> these helpers, it has been taking quite some work to support all the
> dump targets.
>
> >
> > In summary you either need to show that it is safe to reuse the
> > memory of the old kernel, or do some work to reduce the memory footprint
> > of the crashdump kernel, and the crashdump userspace.
> >
> > Eric
> >
>
>

Hi Eric,

I'm trying a new idea, that is, stop the DMA by clearing the PCI's bus
master bit, it's still a best-effort try, but should be helpful for
most cases:

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 03d37128a24f..736e3d13287f 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1825,6 +1825,10 @@ int pci_setup_device(struct pci_dev *dev)
        /* Early fixups, before probing the BARs */
        pci_fixup_device(pci_fixup_early, dev);

+       if (reset_devices && !(class & (PCI_CLASS_DISPLAY_VGA |
PCI_BASE_CLASS_BRIDGE)))
+               pci_clear_master(dev);
+
        pci_info(dev, "[%04x:%04x] type %02x class %#08x\n",
                 dev->vendor, dev->device, dev->hdr_type, dev->class);

With these techniques (some are mentioned in the previous reply), do
you think reuse old memory will be acceptable?

1. Try to stop DMA by clearing the bus master bit.
2. Do some verification upon each IO to detect DMA issues, store
redundant data for recovery.
3. If an unrecoverable DMA issue still occurs, do a fallback dump to
an easy-to-setup target.

--
Best Regards,
Kairui Song


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-09-21  7:17 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-09  7:50 [RFC PATCH 0/3] Add writing support to vmcore for reusing oldmem Kairui Song
2020-09-09  7:50 ` Kairui Song
2020-09-09  7:50 ` [RFC PATCH 1/3] vmcore: simplify read_from_olemem Kairui Song
2020-09-09  7:50   ` Kairui Song
2020-09-09 10:55   ` kernel test robot
2020-09-09  7:50 ` [RFC PATCH 2/3] vmcore: Add interface to write to old mem Kairui Song
2020-09-09  7:50   ` Kairui Song
2020-09-09 12:26   ` kernel test robot
2020-09-09 12:27   ` kernel test robot
2020-09-09  7:50 ` [RFC PATCH 3/3] x86_64: implement copy_to_oldmem_page Kairui Song
2020-09-09  7:50   ` Kairui Song
2020-09-09 14:04 ` [RFC PATCH 0/3] Add writing support to vmcore for reusing oldmem Eric W. Biederman
2020-09-09 14:04   ` Eric W. Biederman
2020-09-09 16:43   ` Kairui Song
2020-09-09 16:43     ` Kairui Song
2020-09-21  7:17     ` Kairui Song
2020-09-21  7:17       ` Kairui Song

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.