All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] makedumpfile: Allow 'info->page_offset' calc via VMCOREINFO in '/proc/kcore'
@ 2018-10-29  7:50 Bhupesh Sharma
  2018-10-29  7:50 ` [PATCH v2 1/2] makedumpfile/arm64: Use VMCOREINFO inside '/proc/kcore' (if available) Bhupesh Sharma
  2018-10-29  7:50 ` [PATCH v2 2/2] makedumpfile/x86_64: Add 'page_offset_base' reading capability from VMCOREINFO Bhupesh Sharma
  0 siblings, 2 replies; 11+ messages in thread
From: Bhupesh Sharma @ 2018-10-29  7:50 UTC (permalink / raw)
  To: kexec; +Cc: bhsharma, bhupesh.linux, Baoquan He, Kazuhito Hagio

This two patch series, tries to introduce an architecture independent
method to calculate the 'info->page_offset' value for x86_64
and arm64 architectures, by reading the VMCOREINFO note in
'/proc/kcore'.

Since linux kernel commit 23c85094fe1895caefdd
["proc/kcore: add vmcoreinfo note to /proc/kcore"]), '/proc/kcore'
contains a new PT_NOTE which carries the VMCOREINFO information.

If the same is available, one can use it in user-land to
retrieve machine specific symbols or strings being appended to the
vmcoreinfo even for live-debugging of the primary kernel as a
standard interface exposed by kernel for sharing machine specific
details with the user-land.

I understand that there might be some reservations about exporting
such machine-specific details in the vmcoreinfo, but to unify
the implementations across user-land and archs, perhaps this would be
good starting point to start a discussion.

Here are brief details of the patches in this patchset:
- PATCH 1/2: Calculates 'info->page_offset' for arm64 arch via
  vmcoreinfo note inside '/proc/kcore'.
- PATCH 2/2: Calculates 'info->page_offset' for x86_64 arch via
  vmcoreinfo note inside '/proc/kcore', thus fixing a makedumpfile
  '--mem-usage' breakage with latest kernels on x86_64.

Cc: Baoquan He <bhe@redhat.com>
Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>

Bhupesh Sharma (2):
  makedumpfile/arm64: Use VMCOREINFO inside '/proc/kcore' (if available)
  makedumpfile/x86_64: Add 'page_offset_base' reading capability from
    VMCOREINFO

 arch/arm64.c   | 114 +++++++++++++++++++++++++++++++++++++++++++--------------
 arch/x86_64.c  |  17 +++++++++
 makedumpfile.c |  69 +++++++++++++++++++++++++++++++---
 makedumpfile.h |   3 +-
 4 files changed, 169 insertions(+), 34 deletions(-)

-- 
2.7.4


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/2] makedumpfile/arm64: Use VMCOREINFO inside '/proc/kcore' (if available)
  2018-10-29  7:50 [PATCH v2 0/2] makedumpfile: Allow 'info->page_offset' calc via VMCOREINFO in '/proc/kcore' Bhupesh Sharma
@ 2018-10-29  7:50 ` Bhupesh Sharma
  2018-11-02 21:00   ` Kazuhito Hagio
  2018-10-29  7:50 ` [PATCH v2 2/2] makedumpfile/x86_64: Add 'page_offset_base' reading capability from VMCOREINFO Bhupesh Sharma
  1 sibling, 1 reply; 11+ messages in thread
From: Bhupesh Sharma @ 2018-10-29  7:50 UTC (permalink / raw)
  To: kexec; +Cc: bhsharma, bhupesh.linux, Baoquan He, Kazuhito Hagio

makedumpfile commit 94c97db3fe859ca14d7b38b0ae9ee0ffb83689d2 (arm64: Get
'info->page_offset' from PT_LOAD segments to support KASLR boot cases)
added a method to determine 'info->page_offset' from PT_LOAD segments
for arm64 platforms.

In this commit, I hardcoded the 'NOT_PADDR_ARM64' macro as
0x0000000010a80000UL which was a valid value on qualcomm-amberwing
boards (which was the arm64 board available with me at the time).

However, I was testing this change on several other arm64 boards
like apm-mustang, huawei-taishan and hp-moonshot (now that I have
access to them) and saw that this value can vary on the basis of
the "Kernel code" memory range placement.

To fix the same, this patchset uses a new approach. Since kernel
version 4.19-rc5 (commit
23c85094fe1895caefdd19ef624ee687ec5f4507 ["proc/kcore: add vmcoreinfo
note to /proc/kcore"]), '/proc/kcore' contains a new
PT_NOTE which carries the VMCOREINFO information.

If the same is available, we can use it for makedumpfile
'show_mem_usage()' functionality. This is especially useful
for architectures like arm64 as we can get kernel symbols like
'VA_BITS' and 'kimage_voffset' from the '/proc/kcore' itself and use it
to calculate 'info->page_offset' when we make a call to
'get_page_offset()'.

This VMCOREINFO note provides us a standard interface which can be
leveraged while debugging live (or primary) kernel with makedumpfile
(and other user-space tools), especially to derive the machine specific
details (for e.g. VA_BITS, PHYS_OFFSET and kimage_voffset for arm64
arch). The same has been suggested by the arm64 kernel maintainers (see
[0]) as the standard interface exposed by kernel for sharing
machine specific details with the user-land via vmcoreinfo.

[0]. https://www.mail-archive.com/kexec@lists.infradead.org/msg20300.html

I will send a follow-up patch to read 'kaslr_offset' for arm64 cases
from vmcoreinfo inside '/proc/kcore' after this patchset is applied.

Here are some details of the tests I ran:

Testing:
--------
1. arm64 boards tested:
   huawei-taishan, apm-mustang and qualcomm-amberwing boards.

2. Use-cases tested:
   a) Primary kernel ->
      [] --mem-usage:
         # makedumpfile -f --mem-usage /proc/kcore

      [] filtering use-case:
         # makedumpfile --split -d 31 -x vmlinux --config scrub.conf vmcore dumpfile_{1,2,3}

      [] dumpfile creation:
         # makedumpfile -d 31 -x vmlinux vmcore dumpfile

   b) Crash kernel ->
      [] dumpfile creation:
         # makedumpfile -l --message-level 31 -d 31 /proc/vmcore dump

3. Kernel versions tested:
   a) Kernel version 4.19-rc5 and above.
   b) Kernel version 4.14.

Fixes: 94c97db3fe859ca14d7b38b0ae9ee0ffb83689d2 "arm64: Get 'info->page_offset' from PT_LOAD segments to support KASLR boot cases"
Cc: Baoquan He <bhe@redhat.com>
Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
---
 arch/arm64.c   | 114 +++++++++++++++++++++++++++++++++++++++++++--------------
 makedumpfile.c |  67 ++++++++++++++++++++++++++++++---
 makedumpfile.h |   2 +-
 3 files changed, 149 insertions(+), 34 deletions(-)

diff --git a/arch/arm64.c b/arch/arm64.c
index 362609668ea2..d695eff628f0 100644
--- a/arch/arm64.c
+++ b/arch/arm64.c
@@ -53,6 +53,7 @@ static unsigned long kimage_voffset;
 #define PAGE_OFFSET_42 ((0xffffffffffffffffUL) << 42)
 #define PAGE_OFFSET_47 ((0xffffffffffffffffUL) << 47)
 #define PAGE_OFFSET_48 ((0xffffffffffffffffUL) << 48)
+#define __PAGE_OFFSET(x) ((0xffffffffffffffffUL) << (x - 1))
 
 #define pgd_val(x)		((x).pgd)
 #define pud_val(x)		(pgd_val((x).pgd))
@@ -311,45 +312,104 @@ get_versiondep_info_arm64(void)
 	unsigned long long virt_start;
 	ulong _stext;
 
-	_stext = get_stext_symbol();
-	if (!_stext) {
-		ERRMSG("Can't get the symbol of _stext.\n");
-		return FALSE;
-	}
+	/* Calculate 'VA_BITS'. */
 
-	/* Derive va_bits as per arch/arm64/Kconfig */
-	if ((_stext & PAGE_OFFSET_36) == PAGE_OFFSET_36) {
-		va_bits = 36;
-	} else if ((_stext & PAGE_OFFSET_39) == PAGE_OFFSET_39) {
-		va_bits = 39;
-	} else if ((_stext & PAGE_OFFSET_42) == PAGE_OFFSET_42) {
-		va_bits = 42;
-	} else if ((_stext & PAGE_OFFSET_47) == PAGE_OFFSET_47) {
-		va_bits = 47;
-	} else if ((_stext & PAGE_OFFSET_48) == PAGE_OFFSET_48) {
-		va_bits = 48;
+	/* Since kernel version 4.19, '/proc/kcore' contains a new
+	 * PT_NOTE which carries the VMCOREINFO information.
+	 *
+	 * If the same is available, use it as it already contains the
+	 * value of 'VA_BITS' on the machine.
+	 *
+	 * Otherwise, we can read the '_stext' symbol and determine the
+	 * 'VA_BITS' value from the same as well.
+	 */
+	if (info->flag_kcore_contains_vmcoreinfo &&
+	    (NUMBER(VA_BITS) != NOT_FOUND_NUMBER)) {
+		va_bits = NUMBER(VA_BITS);
 	} else {
-		ERRMSG("Cannot find a proper _stext for calculating VA_BITS\n");
-		return FALSE;
+		_stext = get_stext_symbol();
+		if (!_stext) {
+			ERRMSG("Can't get the symbol of _stext.\n");
+			return FALSE;
+		}
+
+		/* Derive va_bits as per arch/arm64/Kconfig */
+		if ((_stext & PAGE_OFFSET_36) == PAGE_OFFSET_36) {
+			va_bits = 36;
+		} else if ((_stext & PAGE_OFFSET_39) == PAGE_OFFSET_39) {
+			va_bits = 39;
+		} else if ((_stext & PAGE_OFFSET_42) == PAGE_OFFSET_42) {
+			va_bits = 42;
+		} else if ((_stext & PAGE_OFFSET_47) == PAGE_OFFSET_47) {
+			va_bits = 47;
+		} else if ((_stext & PAGE_OFFSET_48) == PAGE_OFFSET_48) {
+			va_bits = 48;
+		} else {
+			ERRMSG("Cannot find a proper _stext for calculating VA_BITS\n");
+			return FALSE;
+		}
+	}
+
+	/* Calculate 'info->page_offset'. */
+
+	/* Since kernel version 4.19, '/proc/kcore' contains a new
+	 * PT_NOTE which carries the VMCOREINFO information.
+	 *
+	 * If the same is available, use it as it already contains the
+	 * value of 'kimage_voffset' on the machine.
+	 */
+	if (info->flag_kcore_contains_vmcoreinfo &&
+	    (NUMBER(kimage_voffset) != NOT_FOUND_NUMBER)) {
+		kimage_voffset = NUMBER(kimage_voffset);
 	}
 
+	/* First, lets try and calculate the 'info->page_offset' value
+	 * from PT_LOAD segments, if they are available.
+	 */
 	if (get_num_pt_loads()) {
 		for (i = 0;
 		    get_pt_load(i, &phys_start, NULL, &virt_start, NULL);
 		    i++) {
-			if (virt_start != NOT_KV_ADDR
-			    && virt_start < __START_KERNEL_map
-			    && phys_start != NOT_PADDR
-			    && phys_start != NOT_PADDR_ARM64) {
-				info->page_offset = virt_start - phys_start;
-				DEBUG_MSG("info->page_offset: %lx, VA_BITS: %d\n",
-						info->page_offset, va_bits);
-				return TRUE;
+			/* On systems where we have a valid 'kimage_voffset'
+			 * available by now, we should give preference to the same
+			 * while calculating 'info->page_offset'.
+			 *
+			 * Otherwise, we can ensure that we consider
+			 * only those PT_LOAD segments whose 'virt_start'
+			 * is greater than the PAGE_OFFSET value (as defined
+			 * in 'arch/arm64/include/asm/memory.h').
+			 */
+			if (!kimage_voffset) {
+				if (virt_start != NOT_KV_ADDR &&
+				   virt_start > __PAGE_OFFSET(va_bits) &&
+				   phys_start != NOT_PADDR) {
+					info->page_offset = virt_start - phys_start;
+					DEBUG_MSG("info->page_offset: %lx, VA_BITS: %d\n",
+							info->page_offset, va_bits);
+					return TRUE;
+				}
+			} else {
+				if (virt_start != NOT_KV_ADDR &&
+				   phys_start != NOT_PADDR &&
+				   (virt_start - phys_start) != kimage_voffset) {
+					info->page_offset = virt_start - phys_start;
+					DEBUG_MSG("info->page_offset: %lx, VA_BITS: %d\n",
+							info->page_offset, va_bits);
+					return TRUE;
+				}
 			}
 		}
 	}
 
-	info->page_offset = (0xffffffffffffffffUL) << (va_bits - 1);
+	/* Fallback to hard-coded value (equal to PAGE_OFFSET macro
+	 * defined in 'arch/arm64/include/asm/memory.h'), as the last
+	 * resort.
+	 *
+	 * Note that this will not be a valid value on KASLR enabled
+	 * kernels as the start address of linear range is also
+	 * randomized for KASLR boot cases.
+	 */
+	info->page_offset = __PAGE_OFFSET(va_bits);
 	DEBUG_MSG("page_offset=%lx, va_bits=%d\n", info->page_offset,
 			va_bits);
 
diff --git a/makedumpfile.c b/makedumpfile.c
index 91c1ab47fca4..a1b8d8ff5a84 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -1302,6 +1302,20 @@ error:
 	return FALSE;
 }
 
+static int
+check_kcore_contains_vmcoreinfo(int fd, char *name)
+{
+	if (!get_elf_info(fd, name))
+		return FALSE;
+
+	if (!has_vmcoreinfo())
+		return FALSE;
+
+	DEBUG_MSG("VMCOREINFO PT_NOTE found in %s\n", name);
+
+	return TRUE;
+}
+
 int
 open_dump_memory(void)
 {
@@ -1314,6 +1328,23 @@ open_dump_memory(void)
 	}
 	info->fd_memory = fd;
 
+	/* Since kernel version 4.19, '/proc/kcore' contains a new
+	 * PT_NOTE which carries the VMCOREINFO information.
+	 *
+	 * If the same is available, use it for makedumpfile
+	 * show_mem_usage() cases.
+	 */
+	if (info->flag_mem_usage &&
+	    !(strcmp(info->name_memory, "/proc/kcore")) &&
+	    (info->kernel_version >= KERNEL_VERSION(4, 19, 0))){
+		status = check_kcore_contains_vmcoreinfo(fd,
+						info->name_memory);
+		if (status == TRUE) {
+			info->flag_kcore_contains_vmcoreinfo = TRUE;
+			return TRUE;
+		}
+	}
+
 	status = check_kdump_compressed(info->name_memory);
 	if (status == TRUE) {
 		info->flag_refiltering = TRUE;
@@ -11195,6 +11226,8 @@ static int get_sys_kernel_vmcoreinfo(uint64_t *addr, uint64_t *len)
 
 int show_mem_usage(void)
 {
+	off_t offset;
+	unsigned long size;
 	uint64_t vmcoreinfo_addr, vmcoreinfo_len;
 	struct cycle cycle = {0};
 
@@ -11208,17 +11241,39 @@ int show_mem_usage(void)
 	if (!open_files_for_creating_dumpfile())
 		return FALSE;
 
-	if (!get_elf_loads(info->fd_memory, info->name_memory))
-		return FALSE;
+	/* Since kernel version 4.19, '/proc/kcore' contains a new
+	 * PT_NOTE which carries the VMCOREINFO information.
+	 *
+	 * If the same is available, use it for makedumpfile
+	 * show_mem_usage(). This is especially useful for architectures
+	 * like arm64 as we can get symbols like 'VA_BITS' and
+	 * 'kimage_voffset' before we call get_page_offset().
+	 */
+
+	if (!info->flag_kcore_contains_vmcoreinfo) {
+		if (!get_elf_loads(info->fd_memory, info->name_memory))
+			return FALSE;
+	} else {
+		if (has_vmcoreinfo()) {
+			get_vmcoreinfo(&offset, &size);
+			if (!read_vmcoreinfo_from_vmcore(offset, size, FALSE))
+				return FALSE;
+		}
+	}
 
 	if (!get_page_offset())
 		return FALSE;
 
-	if (!get_sys_kernel_vmcoreinfo(&vmcoreinfo_addr, &vmcoreinfo_len))
-		return FALSE;
+	/* If flag_kcore_contains_vmcoreinfo is TRUE when we are here,
+	 * we don't need to read the vmcoreinfo again.
+	 */
+	if (!info->flag_kcore_contains_vmcoreinfo)
+		if (!get_sys_kernel_vmcoreinfo(&vmcoreinfo_addr, &vmcoreinfo_len))
+			return FALSE;
 
-	if (!set_kcore_vmcoreinfo(vmcoreinfo_addr, vmcoreinfo_len))
-		return FALSE;
+	if (!info->flag_kcore_contains_vmcoreinfo)
+		if (!set_kcore_vmcoreinfo(vmcoreinfo_addr, vmcoreinfo_len))
+			return FALSE;
 
 	if (!initial())
 		return FALSE;
diff --git a/makedumpfile.h b/makedumpfile.h
index 46ebe2ee5a1f..77f7ec8e3867 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -544,7 +544,6 @@ unsigned long get_kvbase_arm64(void);
 #define KVBASE			get_kvbase_arm64()
 
 #define __START_KERNEL_map	(0xffffffff80000000UL)
-#define NOT_PADDR_ARM64		(0x0000000010a80000UL)
 
 #endif /* aarch64 */
 
@@ -1308,6 +1307,7 @@ struct DumpInfo {
 	int		flag_vmemmap;        /* kernel supports vmemmap address space */
 	int		flag_excludevm;      /* -e - excluding unused vmemmap pages */
 	int		flag_use_count;      /* _refcount is named _count in struct page */
+	int		flag_kcore_contains_vmcoreinfo;    /* '/proc/kcore' contains a VMCOREINFO PT_NOTE */
 	unsigned long	vaddr_for_vtop;      /* virtual address for debugging */
 	long		page_size;           /* size of page */
 	long		page_shift;
-- 
2.7.4


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/2] makedumpfile/x86_64: Add 'page_offset_base' reading capability from VMCOREINFO
  2018-10-29  7:50 [PATCH v2 0/2] makedumpfile: Allow 'info->page_offset' calc via VMCOREINFO in '/proc/kcore' Bhupesh Sharma
  2018-10-29  7:50 ` [PATCH v2 1/2] makedumpfile/arm64: Use VMCOREINFO inside '/proc/kcore' (if available) Bhupesh Sharma
@ 2018-10-29  7:50 ` Bhupesh Sharma
  2018-11-16 16:32   ` Kazuhito Hagio
  1 sibling, 1 reply; 11+ messages in thread
From: Bhupesh Sharma @ 2018-10-29  7:50 UTC (permalink / raw)
  To: kexec; +Cc: bhsharma, bhupesh.linux, Baoquan He, Kazuhito Hagio

makedumpfile '--mem-usage' use-case is currently broken on x86_64 with
latest kernels (see a sample log below with 4.19 kernel):

[root@hp-dl380pgen8-02-vm-15 ~]# makedumpfile -f --mem-usage /proc/kcore
   The kernel version is not supported.
   The makedumpfile operation may be incomplete.
   <snip..>
   set_kcore_vmcoreinfo: Can't get the offset of VMCOREINFO(/proc/kcore).
   Success

   makedumpfile Failed.

It seems that the issue is seen because KCORE_REMAP which was added
to recent kernels and appears as a PT_LOAD in '/proc/kcore'.

I have recently proposed a kernel patch which allows appending
'page_offset_base' for x86_64 platforms to vmcoreinfo, so that
user-space tools can use the same as a standard interface to determine
the start of direct mapping of all physical memory (see [0]).

This in in-line with the philosophy being proposed by kernel
maintainers where one can use the vmcoreinfo note in user-land (from
'/proc/kcore') to retrieve machine specific symbols or strings being
appended to the vmcoreinfo even for live-debugging of the
primary kernel as a standard interface exposed by kernel for
sharing machine specific details with the user-land.

With this fix, the '--mem-usage' use case works fine on the x86_64
machines even with newer kernels:

[root@ibm-x3650m4-01-vm-16 makedumpfile]# makedumpfile -f --mem-usage /proc/kcore -D
   The kernel version is not supported.
   The makedumpfile operation may be incomplete.
   <..snip..>
   Linux kdump
   VMCOREINFO PT_NOTE found in /proc/kcore
   page_size    : 4096
   info->page_offset : ffff880000000000
   <..snip..>The kernel doesn't support mmap(),read() will be used instead.

   TYPE		PAGES			EXCLUDABLE	DESCRIPTION
   ----------------------------------------------------------------------
   ZERO		20846           	yes		Pages filled with zero
   NON_PRI_CACHE	57371           	yes		Cache pages without private flag
   PRI_CACHE	1               	yes		Cache pages with private flag
   USER		13841           	yes		User process pages
   FREE		777956          	yes		Free pages
   KERN_DATA	55625           	no		Dumpable kernel data

   page size:		4096
   Total pages on system:	925640
   Total size on system:	3791421440       Byte

[0]. https://lkml.org/lkml/2018/10/27/86

I will send a follow-up patch to read 'kaslr_offset' for x86_64 cases
from vmcoreinfo inside '/proc/kcore' after this patchset is applied.

Cc: Baoquan He <bhe@redhat.com>
Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
---
 arch/x86_64.c  | 17 +++++++++++++++++
 makedumpfile.c |  2 ++
 makedumpfile.h |  1 +
 3 files changed, 20 insertions(+)

diff --git a/arch/x86_64.c b/arch/x86_64.c
index 2b3c0bb356a1..a5fbdf26b8e7 100644
--- a/arch/x86_64.c
+++ b/arch/x86_64.c
@@ -98,6 +98,22 @@ get_page_offset_x86_64(void)
 		return TRUE;
 	}
 
+	/* Calculate 'info->page_offset'. */
+
+	/* Since kernel version 4.19, '/proc/kcore' contains a new
+	 * PT_NOTE which carries the VMCOREINFO information.
+	 *
+	 * If the same is available, use it as it already contains the
+	 * value of 'page_offset_base' on the machine.
+	 */
+	if (info->flag_kcore_contains_vmcoreinfo &&
+	    (NUMBER(page_offset_base) != NOT_FOUND_NUMBER)) {
+		page_offset_base = NUMBER(page_offset_base);
+		info->page_offset = page_offset_base;
+		DEBUG_MSG("info->page_offset : %lx\n", info->page_offset);
+		return TRUE;
+	}
+
 	if (get_num_pt_loads()) {
 		for (i = 0;
 			get_pt_load(i, &phys_start, NULL, &virt_start, NULL);
@@ -106,6 +122,7 @@ get_page_offset_x86_64(void)
 					&& virt_start < __START_KERNEL_map
 					&& phys_start != NOT_PADDR) {
 				info->page_offset = virt_start - phys_start;
+				DEBUG_MSG("info->page_offset : %lx\n", info->page_offset);
 				return TRUE;
 			}
 		}
diff --git a/makedumpfile.c b/makedumpfile.c
index a1b8d8ff5a84..668e3ec3956c 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -2319,6 +2319,7 @@ write_vmcoreinfo_data(void)
 
 	WRITE_NUMBER("PAGE_BUDDY_MAPCOUNT_VALUE", PAGE_BUDDY_MAPCOUNT_VALUE);
 	WRITE_NUMBER("phys_base", phys_base);
+	WRITE_NUMBER("page_offset_base", page_offset_base);
 
 	WRITE_NUMBER("HUGETLB_PAGE_DTOR", HUGETLB_PAGE_DTOR);
 #ifdef __aarch64__
@@ -2719,6 +2720,7 @@ read_vmcoreinfo(void)
 
 	READ_NUMBER("PAGE_BUDDY_MAPCOUNT_VALUE", PAGE_BUDDY_MAPCOUNT_VALUE);
 	READ_NUMBER("phys_base", phys_base);
+	READ_NUMBER("page_offset_base", page_offset_base);
 #ifdef __aarch64__
 	READ_NUMBER("VA_BITS", VA_BITS);
 	READ_NUMBER_UNSIGNED("PHYS_OFFSET", PHYS_OFFSET);
diff --git a/makedumpfile.h b/makedumpfile.h
index 77f7ec8e3867..09e14f0ef86c 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -1920,6 +1920,7 @@ struct number_table {
 	long	MAX_PHYSMEM_BITS;
 	long    HUGETLB_PAGE_DTOR;
 	long	phys_base;
+	unsigned long	page_offset_base;
 #ifdef __aarch64__
 	long 	VA_BITS;
 	unsigned long	PHYS_OFFSET;
-- 
2.7.4


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: [PATCH v2 1/2] makedumpfile/arm64: Use VMCOREINFO inside '/proc/kcore' (if available)
  2018-10-29  7:50 ` [PATCH v2 1/2] makedumpfile/arm64: Use VMCOREINFO inside '/proc/kcore' (if available) Bhupesh Sharma
@ 2018-11-02 21:00   ` Kazuhito Hagio
  2018-11-04 22:16     ` Bhupesh Sharma
  2018-11-06 20:17     ` Kazuhito Hagio
  0 siblings, 2 replies; 11+ messages in thread
From: Kazuhito Hagio @ 2018-11-02 21:00 UTC (permalink / raw)
  To: Bhupesh Sharma; +Cc: kexec, Baoquan He

Hi Bhupesh,

On 10/29/2018 3:50 AM, Bhupesh Sharma wrote:
> makedumpfile commit 94c97db3fe859ca14d7b38b0ae9ee0ffb83689d2 (arm64: Get
> 'info->page_offset' from PT_LOAD segments to support KASLR boot cases)
> added a method to determine 'info->page_offset' from PT_LOAD segments
> for arm64 platforms.
> 
> In this commit, I hardcoded the 'NOT_PADDR_ARM64' macro as
> 0x0000000010a80000UL which was a valid value on qualcomm-amberwing
> boards (which was the arm64 board available with me at the time).
> 
> However, I was testing this change on several other arm64 boards
> like apm-mustang, huawei-taishan and hp-moonshot (now that I have
> access to them) and saw that this value can vary on the basis of
> the "Kernel code" memory range placement.
> 
> To fix the same, this patchset uses a new approach. Since kernel
> version 4.19-rc5 (commit
> 23c85094fe1895caefdd19ef624ee687ec5f4507 ["proc/kcore: add vmcoreinfo
> note to /proc/kcore"]), '/proc/kcore' contains a new
> PT_NOTE which carries the VMCOREINFO information.
> 
> If the same is available, we can use it for makedumpfile
> 'show_mem_usage()' functionality. This is especially useful
> for architectures like arm64 as we can get kernel symbols like
> 'VA_BITS' and 'kimage_voffset' from the '/proc/kcore' itself and use it
> to calculate 'info->page_offset' when we make a call to
> 'get_page_offset()'.

First, in show_mem_usage(), we set info->page_offset in advance
in order to get the offset of vmcoreinfo in /proc/kcore (not in PT_NOTE)
from /sys/kernel/vmcoreinfo information.

Since now PT_NOTE has vmcoreinfo in /proc/kcore as well, we don't
need get_page_offset() here any longer, right?

So I suggest the following patch for show_mem_usage(),
in addition, we can remove the get_elf_loads() function.

diff --git a/makedumpfile.c b/makedumpfile.c
index 91c1ab4..a02665b 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -11208,17 +11208,23 @@ int show_mem_usage(void)
 	if (!open_files_for_creating_dumpfile())
 		return FALSE;
 
-	if (!get_elf_loads(info->fd_memory, info->name_memory))
+	if (!get_elf_info(info->fd_memory, info->name_memory))
 		return FALSE;
 
-	if (!get_page_offset())
-		return FALSE;
+	/*
+	 * Since kernel 4.19, /proc/kcore also has vmcoreinfo in PT_NOTE,
+	 * in that case we don't need /sys/kernel/vmcoreinfo.
+	 */
+	if (!has_vmcoreinfo()) {
+		if (!get_page_offset())
+			return FALSE;
 
-	if (!get_sys_kernel_vmcoreinfo(&vmcoreinfo_addr, &vmcoreinfo_len))
-		return FALSE;
+		if (!get_sys_kernel_vmcoreinfo(&vmcoreinfo_addr, &vmcoreinfo_len))
+			return FALSE;
 
-	if (!set_kcore_vmcoreinfo(vmcoreinfo_addr, vmcoreinfo_len))
-		return FALSE;
+		if (!set_kcore_vmcoreinfo(vmcoreinfo_addr, vmcoreinfo_len))
+			return FALSE;
+	}
 
 	if (!initial())
 		return FALSE;


> 
> This VMCOREINFO note provides us a standard interface which can be
> leveraged while debugging live (or primary) kernel with makedumpfile
> (and other user-space tools), especially to derive the machine specific
> details (for e.g. VA_BITS, PHYS_OFFSET and kimage_voffset for arm64
> arch). The same has been suggested by the arm64 kernel maintainers (see
> [0]) as the standard interface exposed by kernel for sharing
> machine specific details with the user-land via vmcoreinfo.
> 
> [0]. https://www.mail-archive.com/kexec@lists.infradead.org/msg20300.html
> 
> I will send a follow-up patch to read 'kaslr_offset' for arm64 cases
> from vmcoreinfo inside '/proc/kcore' after this patchset is applied.
> 
> Here are some details of the tests I ran:
> 
> Testing:
> --------
> 1. arm64 boards tested:
>    huawei-taishan, apm-mustang and qualcomm-amberwing boards.
> 
> 2. Use-cases tested:
>    a) Primary kernel ->
>       [] --mem-usage:
>          # makedumpfile -f --mem-usage /proc/kcore
> 
>       [] filtering use-case:
>          # makedumpfile --split -d 31 -x vmlinux --config scrub.conf vmcore dumpfile_{1,2,3}
> 
>       [] dumpfile creation:
>          # makedumpfile -d 31 -x vmlinux vmcore dumpfile
> 
>    b) Crash kernel ->
>       [] dumpfile creation:
>          # makedumpfile -l --message-level 31 -d 31 /proc/vmcore dump
> 
> 3. Kernel versions tested:
>    a) Kernel version 4.19-rc5 and above.
>    b) Kernel version 4.14.
> 
> Fixes: 94c97db3fe859ca14d7b38b0ae9ee0ffb83689d2 "arm64: Get 'info->page_offset' from PT_LOAD segments to
> support KASLR boot cases"
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
> Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
> ---
>  arch/arm64.c   | 114 +++++++++++++++++++++++++++++++++++++++++++--------------
>  makedumpfile.c |  67 ++++++++++++++++++++++++++++++---
>  makedumpfile.h |   2 +-
>  3 files changed, 149 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/arm64.c b/arch/arm64.c
> index 362609668ea2..d695eff628f0 100644
> --- a/arch/arm64.c
> +++ b/arch/arm64.c
> @@ -53,6 +53,7 @@ static unsigned long kimage_voffset;
>  #define PAGE_OFFSET_42 ((0xffffffffffffffffUL) << 42)
>  #define PAGE_OFFSET_47 ((0xffffffffffffffffUL) << 47)
>  #define PAGE_OFFSET_48 ((0xffffffffffffffffUL) << 48)
> +#define __PAGE_OFFSET(x) ((0xffffffffffffffffUL) << (x - 1))
> 
>  #define pgd_val(x)		((x).pgd)
>  #define pud_val(x)		(pgd_val((x).pgd))
> @@ -311,45 +312,104 @@ get_versiondep_info_arm64(void)
>  	unsigned long long virt_start;
>  	ulong _stext;
> 
> -	_stext = get_stext_symbol();
> -	if (!_stext) {
> -		ERRMSG("Can't get the symbol of _stext.\n");
> -		return FALSE;
> -	}
> +	/* Calculate 'VA_BITS'. */
> 
> -	/* Derive va_bits as per arch/arm64/Kconfig */
> -	if ((_stext & PAGE_OFFSET_36) == PAGE_OFFSET_36) {
> -		va_bits = 36;
> -	} else if ((_stext & PAGE_OFFSET_39) == PAGE_OFFSET_39) {
> -		va_bits = 39;
> -	} else if ((_stext & PAGE_OFFSET_42) == PAGE_OFFSET_42) {
> -		va_bits = 42;
> -	} else if ((_stext & PAGE_OFFSET_47) == PAGE_OFFSET_47) {
> -		va_bits = 47;
> -	} else if ((_stext & PAGE_OFFSET_48) == PAGE_OFFSET_48) {
> -		va_bits = 48;
> +	/* Since kernel version 4.19, '/proc/kcore' contains a new
> +	 * PT_NOTE which carries the VMCOREINFO information.
> +	 *
> +	 * If the same is available, use it as it already contains the
> +	 * value of 'VA_BITS' on the machine.
> +	 *
> +	 * Otherwise, we can read the '_stext' symbol and determine the
> +	 * 'VA_BITS' value from the same as well.
> +	 */
> +	if (info->flag_kcore_contains_vmcoreinfo &&
> +	    (NUMBER(VA_BITS) != NOT_FOUND_NUMBER)) {
> +		va_bits = NUMBER(VA_BITS);
>  	} else {
> -		ERRMSG("Cannot find a proper _stext for calculating VA_BITS\n");
> -		return FALSE;
> +		_stext = get_stext_symbol();
> +		if (!_stext) {
> +			ERRMSG("Can't get the symbol of _stext.\n");
> +			return FALSE;
> +		}
> +
> +		/* Derive va_bits as per arch/arm64/Kconfig */
> +		if ((_stext & PAGE_OFFSET_36) == PAGE_OFFSET_36) {
> +			va_bits = 36;
> +		} else if ((_stext & PAGE_OFFSET_39) == PAGE_OFFSET_39) {
> +			va_bits = 39;
> +		} else if ((_stext & PAGE_OFFSET_42) == PAGE_OFFSET_42) {
> +			va_bits = 42;
> +		} else if ((_stext & PAGE_OFFSET_47) == PAGE_OFFSET_47) {
> +			va_bits = 47;
> +		} else if ((_stext & PAGE_OFFSET_48) == PAGE_OFFSET_48) {
> +			va_bits = 48;
> +		} else {
> +			ERRMSG("Cannot find a proper _stext for calculating VA_BITS\n");
> +			return FALSE;
> +		}
> +	}
> +
> +	/* Calculate 'info->page_offset'. */
> +
> +	/* Since kernel version 4.19, '/proc/kcore' contains a new
> +	 * PT_NOTE which carries the VMCOREINFO information.
> +	 *
> +	 * If the same is available, use it as it already contains the
> +	 * value of 'kimage_voffset' on the machine.
> +	 */
> +	if (info->flag_kcore_contains_vmcoreinfo &&
> +	    (NUMBER(kimage_voffset) != NOT_FOUND_NUMBER)) {
> +		kimage_voffset = NUMBER(kimage_voffset);
>  	}
> 
> +	/* First, lets try and calculate the 'info->page_offset' value
> +	 * from PT_LOAD segments, if they are available.
> +	 */
>  	if (get_num_pt_loads()) {
>  		for (i = 0;
>  		    get_pt_load(i, &phys_start, NULL, &virt_start, NULL);
>  		    i++) {
> -			if (virt_start != NOT_KV_ADDR
> -			    && virt_start < __START_KERNEL_map
> -			    && phys_start != NOT_PADDR
> -			    && phys_start != NOT_PADDR_ARM64) {
> -				info->page_offset = virt_start - phys_start;
> -				DEBUG_MSG("info->page_offset: %lx, VA_BITS: %d\n",
> -						info->page_offset, va_bits);
> -				return TRUE;
> +			/* On systems where we have a valid 'kimage_voffset'
> +			 * available by now, we should give preference to the same
> +			 * while calculating 'info->page_offset'.
> +			 *
> +			 * Otherwise, we can ensure that we consider
> +			 * only those PT_LOAD segments whose 'virt_start'
> +			 * is greater than the PAGE_OFFSET value (as defined
> +			 * in 'arch/arm64/include/asm/memory.h').
> +			 */
> +			if (!kimage_voffset) {
> +				if (virt_start != NOT_KV_ADDR &&
> +				   virt_start > __PAGE_OFFSET(va_bits) &&
> +				   phys_start != NOT_PADDR) {
> +					info->page_offset = virt_start - phys_start;
> +					DEBUG_MSG("info->page_offset: %lx, VA_BITS: %d\n",
> +							info->page_offset, va_bits);
> +					return TRUE;
> +				}
> +			} else {
> +				if (virt_start != NOT_KV_ADDR &&
> +				   phys_start != NOT_PADDR &&
> +				   (virt_start - phys_start) != kimage_voffset) {
> +					info->page_offset = virt_start - phys_start;
> +					DEBUG_MSG("info->page_offset: %lx, VA_BITS: %d\n",
> +							info->page_offset, va_bits);
> +					return TRUE;
> +				}

If something like KCORE_REMAP are added to kcore on arm64, even with
vmcoreinfo, this calculation can return a wrong page_offset, right?
Can we calculate info->page_offset without PT_LOAD information?

Originally, info->page_offset (PAGE_OFFSET) is used in the following
parts on arm64.

arch/arm64.c:
__pa(unsigned long vaddr)
{
        if (kimage_voffset == NOT_FOUND_NUMBER ||
                        (vaddr >= PAGE_OFFSET))
                return (vaddr - PAGE_OFFSET + info->phys_base);
        else
                return (vaddr - kimage_voffset);
}

elf_info.c:
        kvstart = (ulong)start + PAGE_OFFSET;
        kvend = (ulong)end + PAGE_OFFSET;
--
        kvaddr = (ulong)vmcoreinfo_addr + PAGE_OFFSET;

I'm wondering about its consistency.  Is it OK to set
(virt_start - phys_start) to info->page_offset on arm64?
In other words, on arm64 system with info->phys_base != 0, does it
work correctly for both /proc/vmcore and --mem-usage /proc/kcore?

Maybe I have some misunderstanding..

Thanks,
Kazu

>  			}
>  		}
>  	}
> 
> -	info->page_offset = (0xffffffffffffffffUL) << (va_bits - 1);
> +	/* Fallback to hard-coded value (equal to PAGE_OFFSET macro
> +	 * defined in 'arch/arm64/include/asm/memory.h'), as the last
> +	 * resort.
> +	 *
> +	 * Note that this will not be a valid value on KASLR enabled
> +	 * kernels as the start address of linear range is also
> +	 * randomized for KASLR boot cases.
> +	 */
> +	info->page_offset = __PAGE_OFFSET(va_bits);
>  	DEBUG_MSG("page_offset=%lx, va_bits=%d\n", info->page_offset,
>  			va_bits);
> 
> diff --git a/makedumpfile.c b/makedumpfile.c
> index 91c1ab47fca4..a1b8d8ff5a84 100644
> --- a/makedumpfile.c
> +++ b/makedumpfile.c
> @@ -1302,6 +1302,20 @@ error:
>  	return FALSE;
>  }
> 
> +static int
> +check_kcore_contains_vmcoreinfo(int fd, char *name)
> +{
> +	if (!get_elf_info(fd, name))
> +		return FALSE;
> +
> +	if (!has_vmcoreinfo())
> +		return FALSE;
> +
> +	DEBUG_MSG("VMCOREINFO PT_NOTE found in %s\n", name);
> +
> +	return TRUE;
> +}
> +
>  int
>  open_dump_memory(void)
>  {
> @@ -1314,6 +1328,23 @@ open_dump_memory(void)
>  	}
>  	info->fd_memory = fd;
> 
> +	/* Since kernel version 4.19, '/proc/kcore' contains a new
> +	 * PT_NOTE which carries the VMCOREINFO information.
> +	 *
> +	 * If the same is available, use it for makedumpfile
> +	 * show_mem_usage() cases.
> +	 */
> +	if (info->flag_mem_usage &&
> +	    !(strcmp(info->name_memory, "/proc/kcore")) &&
> +	    (info->kernel_version >= KERNEL_VERSION(4, 19, 0))){
> +		status = check_kcore_contains_vmcoreinfo(fd,
> +						info->name_memory);
> +		if (status == TRUE) {
> +			info->flag_kcore_contains_vmcoreinfo = TRUE;
> +			return TRUE;
> +		}
> +	}
> +
>  	status = check_kdump_compressed(info->name_memory);
>  	if (status == TRUE) {
>  		info->flag_refiltering = TRUE;
> @@ -11195,6 +11226,8 @@ static int get_sys_kernel_vmcoreinfo(uint64_t *addr, uint64_t *len)
> 
>  int show_mem_usage(void)
>  {
> +	off_t offset;
> +	unsigned long size;
>  	uint64_t vmcoreinfo_addr, vmcoreinfo_len;
>  	struct cycle cycle = {0};
> 
> @@ -11208,17 +11241,39 @@ int show_mem_usage(void)
>  	if (!open_files_for_creating_dumpfile())
>  		return FALSE;
> 
> -	if (!get_elf_loads(info->fd_memory, info->name_memory))
> -		return FALSE;
> +	/* Since kernel version 4.19, '/proc/kcore' contains a new
> +	 * PT_NOTE which carries the VMCOREINFO information.
> +	 *
> +	 * If the same is available, use it for makedumpfile
> +	 * show_mem_usage(). This is especially useful for architectures
> +	 * like arm64 as we can get symbols like 'VA_BITS' and
> +	 * 'kimage_voffset' before we call get_page_offset().
> +	 */
> +
> +	if (!info->flag_kcore_contains_vmcoreinfo) {
> +		if (!get_elf_loads(info->fd_memory, info->name_memory))
> +			return FALSE;
> +	} else {
> +		if (has_vmcoreinfo()) {
> +			get_vmcoreinfo(&offset, &size);
> +			if (!read_vmcoreinfo_from_vmcore(offset, size, FALSE))
> +				return FALSE;
> +		}
> +	}
> 
>  	if (!get_page_offset())
>  		return FALSE;
> 
> -	if (!get_sys_kernel_vmcoreinfo(&vmcoreinfo_addr, &vmcoreinfo_len))
> -		return FALSE;
> +	/* If flag_kcore_contains_vmcoreinfo is TRUE when we are here,
> +	 * we don't need to read the vmcoreinfo again.
> +	 */
> +	if (!info->flag_kcore_contains_vmcoreinfo)
> +		if (!get_sys_kernel_vmcoreinfo(&vmcoreinfo_addr, &vmcoreinfo_len))
> +			return FALSE;
> 
> -	if (!set_kcore_vmcoreinfo(vmcoreinfo_addr, vmcoreinfo_len))
> -		return FALSE;
> +	if (!info->flag_kcore_contains_vmcoreinfo)
> +		if (!set_kcore_vmcoreinfo(vmcoreinfo_addr, vmcoreinfo_len))
> +			return FALSE;
> 
>  	if (!initial())
>  		return FALSE;
> diff --git a/makedumpfile.h b/makedumpfile.h
> index 46ebe2ee5a1f..77f7ec8e3867 100644
> --- a/makedumpfile.h
> +++ b/makedumpfile.h
> @@ -544,7 +544,6 @@ unsigned long get_kvbase_arm64(void);
>  #define KVBASE			get_kvbase_arm64()
> 
>  #define __START_KERNEL_map	(0xffffffff80000000UL)
> -#define NOT_PADDR_ARM64		(0x0000000010a80000UL)
> 
>  #endif /* aarch64 */
> 
> @@ -1308,6 +1307,7 @@ struct DumpInfo {
>  	int		flag_vmemmap;        /* kernel supports vmemmap address space */
>  	int		flag_excludevm;      /* -e - excluding unused vmemmap pages */
>  	int		flag_use_count;      /* _refcount is named _count in struct page */
> +	int		flag_kcore_contains_vmcoreinfo;    /* '/proc/kcore' contains a VMCOREINFO PT_NOTE */
>  	unsigned long	vaddr_for_vtop;      /* virtual address for debugging */
>  	long		page_size;           /* size of page */
>  	long		page_shift;
> --
> 2.7.4
> 



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] makedumpfile/arm64: Use VMCOREINFO inside '/proc/kcore' (if available)
  2018-11-02 21:00   ` Kazuhito Hagio
@ 2018-11-04 22:16     ` Bhupesh Sharma
  2018-11-06 20:17     ` Kazuhito Hagio
  1 sibling, 0 replies; 11+ messages in thread
From: Bhupesh Sharma @ 2018-11-04 22:16 UTC (permalink / raw)
  To: Kazuhito Hagio; +Cc: kexec mailing list, Baoquan He

Hi Kazu,

Thanks for your email. I am travelling this week for my holidays.
I will come back with my replies on this once I return (hopefully by
the coming Friday or earlier).

Thanks for your patience.

Regards,
Bhupesh


On Sat, Nov 3, 2018 at 2:32 AM Kazuhito Hagio <k-hagio@ab.jp.nec.com> wrote:
>
> Hi Bhupesh,
>
> On 10/29/2018 3:50 AM, Bhupesh Sharma wrote:
> > makedumpfile commit 94c97db3fe859ca14d7b38b0ae9ee0ffb83689d2 (arm64: Get
> > 'info->page_offset' from PT_LOAD segments to support KASLR boot cases)
> > added a method to determine 'info->page_offset' from PT_LOAD segments
> > for arm64 platforms.
> >
> > In this commit, I hardcoded the 'NOT_PADDR_ARM64' macro as
> > 0x0000000010a80000UL which was a valid value on qualcomm-amberwing
> > boards (which was the arm64 board available with me at the time).
> >
> > However, I was testing this change on several other arm64 boards
> > like apm-mustang, huawei-taishan and hp-moonshot (now that I have
> > access to them) and saw that this value can vary on the basis of
> > the "Kernel code" memory range placement.
> >
> > To fix the same, this patchset uses a new approach. Since kernel
> > version 4.19-rc5 (commit
> > 23c85094fe1895caefdd19ef624ee687ec5f4507 ["proc/kcore: add vmcoreinfo
> > note to /proc/kcore"]), '/proc/kcore' contains a new
> > PT_NOTE which carries the VMCOREINFO information.
> >
> > If the same is available, we can use it for makedumpfile
> > 'show_mem_usage()' functionality. This is especially useful
> > for architectures like arm64 as we can get kernel symbols like
> > 'VA_BITS' and 'kimage_voffset' from the '/proc/kcore' itself and use it
> > to calculate 'info->page_offset' when we make a call to
> > 'get_page_offset()'.
>
> First, in show_mem_usage(), we set info->page_offset in advance
> in order to get the offset of vmcoreinfo in /proc/kcore (not in PT_NOTE)
> from /sys/kernel/vmcoreinfo information.
>
> Since now PT_NOTE has vmcoreinfo in /proc/kcore as well, we don't
> need get_page_offset() here any longer, right?
>
> So I suggest the following patch for show_mem_usage(),
> in addition, we can remove the get_elf_loads() function.
>
> diff --git a/makedumpfile.c b/makedumpfile.c
> index 91c1ab4..a02665b 100644
> --- a/makedumpfile.c
> +++ b/makedumpfile.c
> @@ -11208,17 +11208,23 @@ int show_mem_usage(void)
>         if (!open_files_for_creating_dumpfile())
>                 return FALSE;
>
> -       if (!get_elf_loads(info->fd_memory, info->name_memory))
> +       if (!get_elf_info(info->fd_memory, info->name_memory))
>                 return FALSE;
>
> -       if (!get_page_offset())
> -               return FALSE;
> +       /*
> +        * Since kernel 4.19, /proc/kcore also has vmcoreinfo in PT_NOTE,
> +        * in that case we don't need /sys/kernel/vmcoreinfo.
> +        */
> +       if (!has_vmcoreinfo()) {
> +               if (!get_page_offset())
> +                       return FALSE;
>
> -       if (!get_sys_kernel_vmcoreinfo(&vmcoreinfo_addr, &vmcoreinfo_len))
> -               return FALSE;
> +               if (!get_sys_kernel_vmcoreinfo(&vmcoreinfo_addr, &vmcoreinfo_len))
> +                       return FALSE;
>
> -       if (!set_kcore_vmcoreinfo(vmcoreinfo_addr, vmcoreinfo_len))
> -               return FALSE;
> +               if (!set_kcore_vmcoreinfo(vmcoreinfo_addr, vmcoreinfo_len))
> +                       return FALSE;
> +       }
>
>         if (!initial())
>                 return FALSE;
>
>
> >
> > This VMCOREINFO note provides us a standard interface which can be
> > leveraged while debugging live (or primary) kernel with makedumpfile
> > (and other user-space tools), especially to derive the machine specific
> > details (for e.g. VA_BITS, PHYS_OFFSET and kimage_voffset for arm64
> > arch). The same has been suggested by the arm64 kernel maintainers (see
> > [0]) as the standard interface exposed by kernel for sharing
> > machine specific details with the user-land via vmcoreinfo.
> >
> > [0]. https://www.mail-archive.com/kexec@lists.infradead.org/msg20300.html
> >
> > I will send a follow-up patch to read 'kaslr_offset' for arm64 cases
> > from vmcoreinfo inside '/proc/kcore' after this patchset is applied.
> >
> > Here are some details of the tests I ran:
> >
> > Testing:
> > --------
> > 1. arm64 boards tested:
> >    huawei-taishan, apm-mustang and qualcomm-amberwing boards.
> >
> > 2. Use-cases tested:
> >    a) Primary kernel ->
> >       [] --mem-usage:
> >          # makedumpfile -f --mem-usage /proc/kcore
> >
> >       [] filtering use-case:
> >          # makedumpfile --split -d 31 -x vmlinux --config scrub.conf vmcore dumpfile_{1,2,3}
> >
> >       [] dumpfile creation:
> >          # makedumpfile -d 31 -x vmlinux vmcore dumpfile
> >
> >    b) Crash kernel ->
> >       [] dumpfile creation:
> >          # makedumpfile -l --message-level 31 -d 31 /proc/vmcore dump
> >
> > 3. Kernel versions tested:
> >    a) Kernel version 4.19-rc5 and above.
> >    b) Kernel version 4.14.
> >
> > Fixes: 94c97db3fe859ca14d7b38b0ae9ee0ffb83689d2 "arm64: Get 'info->page_offset' from PT_LOAD segments to
> > support KASLR boot cases"
> > Cc: Baoquan He <bhe@redhat.com>
> > Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
> > Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
> > ---
> >  arch/arm64.c   | 114 +++++++++++++++++++++++++++++++++++++++++++--------------
> >  makedumpfile.c |  67 ++++++++++++++++++++++++++++++---
> >  makedumpfile.h |   2 +-
> >  3 files changed, 149 insertions(+), 34 deletions(-)
> >
> > diff --git a/arch/arm64.c b/arch/arm64.c
> > index 362609668ea2..d695eff628f0 100644
> > --- a/arch/arm64.c
> > +++ b/arch/arm64.c
> > @@ -53,6 +53,7 @@ static unsigned long kimage_voffset;
> >  #define PAGE_OFFSET_42 ((0xffffffffffffffffUL) << 42)
> >  #define PAGE_OFFSET_47 ((0xffffffffffffffffUL) << 47)
> >  #define PAGE_OFFSET_48 ((0xffffffffffffffffUL) << 48)
> > +#define __PAGE_OFFSET(x) ((0xffffffffffffffffUL) << (x - 1))
> >
> >  #define pgd_val(x)           ((x).pgd)
> >  #define pud_val(x)           (pgd_val((x).pgd))
> > @@ -311,45 +312,104 @@ get_versiondep_info_arm64(void)
> >       unsigned long long virt_start;
> >       ulong _stext;
> >
> > -     _stext = get_stext_symbol();
> > -     if (!_stext) {
> > -             ERRMSG("Can't get the symbol of _stext.\n");
> > -             return FALSE;
> > -     }
> > +     /* Calculate 'VA_BITS'. */
> >
> > -     /* Derive va_bits as per arch/arm64/Kconfig */
> > -     if ((_stext & PAGE_OFFSET_36) == PAGE_OFFSET_36) {
> > -             va_bits = 36;
> > -     } else if ((_stext & PAGE_OFFSET_39) == PAGE_OFFSET_39) {
> > -             va_bits = 39;
> > -     } else if ((_stext & PAGE_OFFSET_42) == PAGE_OFFSET_42) {
> > -             va_bits = 42;
> > -     } else if ((_stext & PAGE_OFFSET_47) == PAGE_OFFSET_47) {
> > -             va_bits = 47;
> > -     } else if ((_stext & PAGE_OFFSET_48) == PAGE_OFFSET_48) {
> > -             va_bits = 48;
> > +     /* Since kernel version 4.19, '/proc/kcore' contains a new
> > +      * PT_NOTE which carries the VMCOREINFO information.
> > +      *
> > +      * If the same is available, use it as it already contains the
> > +      * value of 'VA_BITS' on the machine.
> > +      *
> > +      * Otherwise, we can read the '_stext' symbol and determine the
> > +      * 'VA_BITS' value from the same as well.
> > +      */
> > +     if (info->flag_kcore_contains_vmcoreinfo &&
> > +         (NUMBER(VA_BITS) != NOT_FOUND_NUMBER)) {
> > +             va_bits = NUMBER(VA_BITS);
> >       } else {
> > -             ERRMSG("Cannot find a proper _stext for calculating VA_BITS\n");
> > -             return FALSE;
> > +             _stext = get_stext_symbol();
> > +             if (!_stext) {
> > +                     ERRMSG("Can't get the symbol of _stext.\n");
> > +                     return FALSE;
> > +             }
> > +
> > +             /* Derive va_bits as per arch/arm64/Kconfig */
> > +             if ((_stext & PAGE_OFFSET_36) == PAGE_OFFSET_36) {
> > +                     va_bits = 36;
> > +             } else if ((_stext & PAGE_OFFSET_39) == PAGE_OFFSET_39) {
> > +                     va_bits = 39;
> > +             } else if ((_stext & PAGE_OFFSET_42) == PAGE_OFFSET_42) {
> > +                     va_bits = 42;
> > +             } else if ((_stext & PAGE_OFFSET_47) == PAGE_OFFSET_47) {
> > +                     va_bits = 47;
> > +             } else if ((_stext & PAGE_OFFSET_48) == PAGE_OFFSET_48) {
> > +                     va_bits = 48;
> > +             } else {
> > +                     ERRMSG("Cannot find a proper _stext for calculating VA_BITS\n");
> > +                     return FALSE;
> > +             }
> > +     }
> > +
> > +     /* Calculate 'info->page_offset'. */
> > +
> > +     /* Since kernel version 4.19, '/proc/kcore' contains a new
> > +      * PT_NOTE which carries the VMCOREINFO information.
> > +      *
> > +      * If the same is available, use it as it already contains the
> > +      * value of 'kimage_voffset' on the machine.
> > +      */
> > +     if (info->flag_kcore_contains_vmcoreinfo &&
> > +         (NUMBER(kimage_voffset) != NOT_FOUND_NUMBER)) {
> > +             kimage_voffset = NUMBER(kimage_voffset);
> >       }
> >
> > +     /* First, lets try and calculate the 'info->page_offset' value
> > +      * from PT_LOAD segments, if they are available.
> > +      */
> >       if (get_num_pt_loads()) {
> >               for (i = 0;
> >                   get_pt_load(i, &phys_start, NULL, &virt_start, NULL);
> >                   i++) {
> > -                     if (virt_start != NOT_KV_ADDR
> > -                         && virt_start < __START_KERNEL_map
> > -                         && phys_start != NOT_PADDR
> > -                         && phys_start != NOT_PADDR_ARM64) {
> > -                             info->page_offset = virt_start - phys_start;
> > -                             DEBUG_MSG("info->page_offset: %lx, VA_BITS: %d\n",
> > -                                             info->page_offset, va_bits);
> > -                             return TRUE;
> > +                     /* On systems where we have a valid 'kimage_voffset'
> > +                      * available by now, we should give preference to the same
> > +                      * while calculating 'info->page_offset'.
> > +                      *
> > +                      * Otherwise, we can ensure that we consider
> > +                      * only those PT_LOAD segments whose 'virt_start'
> > +                      * is greater than the PAGE_OFFSET value (as defined
> > +                      * in 'arch/arm64/include/asm/memory.h').
> > +                      */
> > +                     if (!kimage_voffset) {
> > +                             if (virt_start != NOT_KV_ADDR &&
> > +                                virt_start > __PAGE_OFFSET(va_bits) &&
> > +                                phys_start != NOT_PADDR) {
> > +                                     info->page_offset = virt_start - phys_start;
> > +                                     DEBUG_MSG("info->page_offset: %lx, VA_BITS: %d\n",
> > +                                                     info->page_offset, va_bits);
> > +                                     return TRUE;
> > +                             }
> > +                     } else {
> > +                             if (virt_start != NOT_KV_ADDR &&
> > +                                phys_start != NOT_PADDR &&
> > +                                (virt_start - phys_start) != kimage_voffset) {
> > +                                     info->page_offset = virt_start - phys_start;
> > +                                     DEBUG_MSG("info->page_offset: %lx, VA_BITS: %d\n",
> > +                                                     info->page_offset, va_bits);
> > +                                     return TRUE;
> > +                             }
>
> If something like KCORE_REMAP are added to kcore on arm64, even with
> vmcoreinfo, this calculation can return a wrong page_offset, right?
> Can we calculate info->page_offset without PT_LOAD information?
>
> Originally, info->page_offset (PAGE_OFFSET) is used in the following
> parts on arm64.
>
> arch/arm64.c:
> __pa(unsigned long vaddr)
> {
>         if (kimage_voffset == NOT_FOUND_NUMBER ||
>                         (vaddr >= PAGE_OFFSET))
>                 return (vaddr - PAGE_OFFSET + info->phys_base);
>         else
>                 return (vaddr - kimage_voffset);
> }
>
> elf_info.c:
>         kvstart = (ulong)start + PAGE_OFFSET;
>         kvend = (ulong)end + PAGE_OFFSET;
> --
>         kvaddr = (ulong)vmcoreinfo_addr + PAGE_OFFSET;
>
> I'm wondering about its consistency.  Is it OK to set
> (virt_start - phys_start) to info->page_offset on arm64?
> In other words, on arm64 system with info->phys_base != 0, does it
> work correctly for both /proc/vmcore and --mem-usage /proc/kcore?
>
> Maybe I have some misunderstanding..
>
> Thanks,
> Kazu
>
> >                       }
> >               }
> >       }
> >
> > -     info->page_offset = (0xffffffffffffffffUL) << (va_bits - 1);
> > +     /* Fallback to hard-coded value (equal to PAGE_OFFSET macro
> > +      * defined in 'arch/arm64/include/asm/memory.h'), as the last
> > +      * resort.
> > +      *
> > +      * Note that this will not be a valid value on KASLR enabled
> > +      * kernels as the start address of linear range is also
> > +      * randomized for KASLR boot cases.
> > +      */
> > +     info->page_offset = __PAGE_OFFSET(va_bits);
> >       DEBUG_MSG("page_offset=%lx, va_bits=%d\n", info->page_offset,
> >                       va_bits);
> >
> > diff --git a/makedumpfile.c b/makedumpfile.c
> > index 91c1ab47fca4..a1b8d8ff5a84 100644
> > --- a/makedumpfile.c
> > +++ b/makedumpfile.c
> > @@ -1302,6 +1302,20 @@ error:
> >       return FALSE;
> >  }
> >
> > +static int
> > +check_kcore_contains_vmcoreinfo(int fd, char *name)
> > +{
> > +     if (!get_elf_info(fd, name))
> > +             return FALSE;
> > +
> > +     if (!has_vmcoreinfo())
> > +             return FALSE;
> > +
> > +     DEBUG_MSG("VMCOREINFO PT_NOTE found in %s\n", name);
> > +
> > +     return TRUE;
> > +}
> > +
> >  int
> >  open_dump_memory(void)
> >  {
> > @@ -1314,6 +1328,23 @@ open_dump_memory(void)
> >       }
> >       info->fd_memory = fd;
> >
> > +     /* Since kernel version 4.19, '/proc/kcore' contains a new
> > +      * PT_NOTE which carries the VMCOREINFO information.
> > +      *
> > +      * If the same is available, use it for makedumpfile
> > +      * show_mem_usage() cases.
> > +      */
> > +     if (info->flag_mem_usage &&
> > +         !(strcmp(info->name_memory, "/proc/kcore")) &&
> > +         (info->kernel_version >= KERNEL_VERSION(4, 19, 0))){
> > +             status = check_kcore_contains_vmcoreinfo(fd,
> > +                                             info->name_memory);
> > +             if (status == TRUE) {
> > +                     info->flag_kcore_contains_vmcoreinfo = TRUE;
> > +                     return TRUE;
> > +             }
> > +     }
> > +
> >       status = check_kdump_compressed(info->name_memory);
> >       if (status == TRUE) {
> >               info->flag_refiltering = TRUE;
> > @@ -11195,6 +11226,8 @@ static int get_sys_kernel_vmcoreinfo(uint64_t *addr, uint64_t *len)
> >
> >  int show_mem_usage(void)
> >  {
> > +     off_t offset;
> > +     unsigned long size;
> >       uint64_t vmcoreinfo_addr, vmcoreinfo_len;
> >       struct cycle cycle = {0};
> >
> > @@ -11208,17 +11241,39 @@ int show_mem_usage(void)
> >       if (!open_files_for_creating_dumpfile())
> >               return FALSE;
> >
> > -     if (!get_elf_loads(info->fd_memory, info->name_memory))
> > -             return FALSE;
> > +     /* Since kernel version 4.19, '/proc/kcore' contains a new
> > +      * PT_NOTE which carries the VMCOREINFO information.
> > +      *
> > +      * If the same is available, use it for makedumpfile
> > +      * show_mem_usage(). This is especially useful for architectures
> > +      * like arm64 as we can get symbols like 'VA_BITS' and
> > +      * 'kimage_voffset' before we call get_page_offset().
> > +      */
> > +
> > +     if (!info->flag_kcore_contains_vmcoreinfo) {
> > +             if (!get_elf_loads(info->fd_memory, info->name_memory))
> > +                     return FALSE;
> > +     } else {
> > +             if (has_vmcoreinfo()) {
> > +                     get_vmcoreinfo(&offset, &size);
> > +                     if (!read_vmcoreinfo_from_vmcore(offset, size, FALSE))
> > +                             return FALSE;
> > +             }
> > +     }
> >
> >       if (!get_page_offset())
> >               return FALSE;
> >
> > -     if (!get_sys_kernel_vmcoreinfo(&vmcoreinfo_addr, &vmcoreinfo_len))
> > -             return FALSE;
> > +     /* If flag_kcore_contains_vmcoreinfo is TRUE when we are here,
> > +      * we don't need to read the vmcoreinfo again.
> > +      */
> > +     if (!info->flag_kcore_contains_vmcoreinfo)
> > +             if (!get_sys_kernel_vmcoreinfo(&vmcoreinfo_addr, &vmcoreinfo_len))
> > +                     return FALSE;
> >
> > -     if (!set_kcore_vmcoreinfo(vmcoreinfo_addr, vmcoreinfo_len))
> > -             return FALSE;
> > +     if (!info->flag_kcore_contains_vmcoreinfo)
> > +             if (!set_kcore_vmcoreinfo(vmcoreinfo_addr, vmcoreinfo_len))
> > +                     return FALSE;
> >
> >       if (!initial())
> >               return FALSE;
> > diff --git a/makedumpfile.h b/makedumpfile.h
> > index 46ebe2ee5a1f..77f7ec8e3867 100644
> > --- a/makedumpfile.h
> > +++ b/makedumpfile.h
> > @@ -544,7 +544,6 @@ unsigned long get_kvbase_arm64(void);
> >  #define KVBASE                       get_kvbase_arm64()
> >
> >  #define __START_KERNEL_map   (0xffffffff80000000UL)
> > -#define NOT_PADDR_ARM64              (0x0000000010a80000UL)
> >
> >  #endif /* aarch64 */
> >
> > @@ -1308,6 +1307,7 @@ struct DumpInfo {
> >       int             flag_vmemmap;        /* kernel supports vmemmap address space */
> >       int             flag_excludevm;      /* -e - excluding unused vmemmap pages */
> >       int             flag_use_count;      /* _refcount is named _count in struct page */
> > +     int             flag_kcore_contains_vmcoreinfo;    /* '/proc/kcore' contains a VMCOREINFO PT_NOTE */
> >       unsigned long   vaddr_for_vtop;      /* virtual address for debugging */
> >       long            page_size;           /* size of page */
> >       long            page_shift;
> > --
> > 2.7.4
> >
>
>

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH v2 1/2] makedumpfile/arm64: Use VMCOREINFO inside '/proc/kcore' (if available)
  2018-11-02 21:00   ` Kazuhito Hagio
  2018-11-04 22:16     ` Bhupesh Sharma
@ 2018-11-06 20:17     ` Kazuhito Hagio
  2018-11-08 21:58       ` Bhupesh Sharma
  1 sibling, 1 reply; 11+ messages in thread
From: Kazuhito Hagio @ 2018-11-06 20:17 UTC (permalink / raw)
  To: Bhupesh Sharma; +Cc: kexec, Baoquan He

On 11/2/2018 5:00 PM, Kazuhito Hagio wrote:
> Originally, info->page_offset (PAGE_OFFSET) is used in the following
> parts on arm64.
> 
> arch/arm64.c:
> __pa(unsigned long vaddr)
> {
>         if (kimage_voffset == NOT_FOUND_NUMBER ||
>                         (vaddr >= PAGE_OFFSET))
>                 return (vaddr - PAGE_OFFSET + info->phys_base);
>         else
>                 return (vaddr - kimage_voffset);
> }
> 
> elf_info.c:
>         kvstart = (ulong)start + PAGE_OFFSET;
>         kvend = (ulong)end + PAGE_OFFSET;
> --
>         kvaddr = (ulong)vmcoreinfo_addr + PAGE_OFFSET;
> 
> I'm wondering about its consistency.  Is it OK to set
> (virt_start - phys_start) to info->page_offset on arm64?
> In other words, on arm64 system with info->phys_base != 0, does it
> work correctly for both /proc/vmcore and --mem-usage /proc/kcore?

I found an arm64 system available and (virt_start - phys_start)
tested OK for both /proc/vmcore and --mem-usage /proc/kcore,
because the __pa() function is used just only for swapper_pg_dir
and the "(vaddr - PAGE_OFFSET + info->phys_base)" line is not used.

The definition of PAGE_OFFSET in kernel is:

  #define PAGE_OFFSET             (UL(0xffffffffffffffff) - \
          (UL(1) << (VA_BITS - 1)) + 1)

The one in crash is:

  #define ARM64_PAGE_OFFSET    ((0xffffffffffffffffUL) \
                                        << (machdep->machspec->VA_BITS - 1))

So I think it would be better to use the same definition also in
makedumpfile to avoid confusion. In other words, I think the following
does not support arm64 now, and ideally should be fixed by introducing
something like phys_to_virt()..

elf_info.c:
        kvstart = (ulong)start + PAGE_OFFSET;
        kvend = (ulong)end + PAGE_OFFSET;
        --
        kvaddr = (ulong)vmcoreinfo_addr + PAGE_OFFSET;

and then,

  if (NUMBER(VA_BITS) != NOT_FOUND_NUMBER)
      va_bits = NUMBER(VA_BITS);
  else
      ... va_vits calculation ...

  info->page_offset = 0xffffffffffffffffUL << (va_bits - 1);

And also we will need to add get_phys_base() to show_mem_usage()
with some additional code for arm64 with old kernels.
I'm considering this approach.

Thanks,
Kazu



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] makedumpfile/arm64: Use VMCOREINFO inside '/proc/kcore' (if available)
  2018-11-06 20:17     ` Kazuhito Hagio
@ 2018-11-08 21:58       ` Bhupesh Sharma
  2018-11-12 19:51         ` Kazuhito Hagio
  0 siblings, 1 reply; 11+ messages in thread
From: Bhupesh Sharma @ 2018-11-08 21:58 UTC (permalink / raw)
  To: Kazuhito Hagio; +Cc: kexec mailing list, Baoquan He

Hi Kazu,

On Wed, Nov 7, 2018 at 1:53 AM Kazuhito Hagio <k-hagio@ab.jp.nec.com> wrote:
>
> On 11/2/2018 5:00 PM, Kazuhito Hagio wrote:
> > Originally, info->page_offset (PAGE_OFFSET) is used in the following
> > parts on arm64.
> >
> > arch/arm64.c:
> > __pa(unsigned long vaddr)
> > {
> >         if (kimage_voffset == NOT_FOUND_NUMBER ||
> >                         (vaddr >= PAGE_OFFSET))
> >                 return (vaddr - PAGE_OFFSET + info->phys_base);
> >         else
> >                 return (vaddr - kimage_voffset);
> > }
> >
> > elf_info.c:
> >         kvstart = (ulong)start + PAGE_OFFSET;
> >         kvend = (ulong)end + PAGE_OFFSET;
> > --
> >         kvaddr = (ulong)vmcoreinfo_addr + PAGE_OFFSET;
> >
> > I'm wondering about its consistency.  Is it OK to set
> > (virt_start - phys_start) to info->page_offset on arm64?
> > In other words, on arm64 system with info->phys_base != 0, does it
> > work correctly for both /proc/vmcore and --mem-usage /proc/kcore?
>
> I found an arm64 system available and (virt_start - phys_start)
> tested OK for both /proc/vmcore and --mem-usage /proc/kcore,
> because the __pa() function is used just only for swapper_pg_dir
> and the "(vaddr - PAGE_OFFSET + info->phys_base)" line is not used.
>
> The definition of PAGE_OFFSET in kernel is:
>
>   #define PAGE_OFFSET             (UL(0xffffffffffffffff) - \
>           (UL(1) << (VA_BITS - 1)) + 1)
>
> The one in crash is:
>
>   #define ARM64_PAGE_OFFSET    ((0xffffffffffffffffUL) \
>                                         << (machdep->machspec->VA_BITS - 1))
>
> So I think it would be better to use the same definition also in
> makedumpfile to avoid confusion. In other words, I think the following
> does not support arm64 now, and ideally should be fixed by introducing
> something like phys_to_virt()..
>
> elf_info.c:
>         kvstart = (ulong)start + PAGE_OFFSET;
>         kvend = (ulong)end + PAGE_OFFSET;
>         --
>         kvaddr = (ulong)vmcoreinfo_addr + PAGE_OFFSET;
>
> and then,
>
>   if (NUMBER(VA_BITS) != NOT_FOUND_NUMBER)
>       va_bits = NUMBER(VA_BITS);
>   else
>       ... va_vits calculation ...
>
>   info->page_offset = 0xffffffffffffffffUL << (va_bits - 1);
>
> And also we will need to add get_phys_base() to show_mem_usage()
> with some additional code for arm64 with old kernels.
> I'm considering this approach.

Note that PAGE_OFFSET is not constant on KASLR enabled arm64 kernels
as the start of the linear range gets randomized as well (as per the
KASLR seed) which means that PAGE_OFFSET is no longer equal to the
macro:
#define PAGE_OFFSET             (UL(0xffffffffffffffff) - \
           (UL(1) << (VA_BITS - 1)) + 1)

Please have a look at my discussion with the arm64 maintainers which
started the whole vmcoreinfo in kcore stuff rolling: ).
It talks about PAGE_OFFSET and how it cannot be a constant macro for
KASLR boot cases:
[1]. https://www.spinics.net/lists/arm-kernel/msg655933.html
[2]. https://www.spinics.net/lists/kexec/msg20842.html

Also implementing __pa() or __phys_to_virt(x) in user-space is not
easy as reported earlier for kexec-tools as well. A typical arm64
__phys_to_virt(x) looks like this (simplified):

| #define __phys_to_virt(x) \
| ((unsigned long)((x) - memstart_addr) | PAGE_OFFSET)

Please have a look at the discussion with the arm64 maintainers
regarding their views on the same here:
[3]. https://www.spinics.net/lists/kexec/msg20867.html

As James, suggested in this thread, " If user-space needs this value
to work with
/proc/{kcore,vmcore} we should expose something like 'p2v_offset' as an elf-note
on those files. (looks like they both have elf-headers).", something
which I am working on already and plan to send upstream next week.

My only concern is regarding whether we are heading in a direction in
user-land, which has been NAK'ed by arm64 kernel maintainers, which
means that if we need to avoid the nasty rewrite of a 'p2v'
implementation in the user-space we should probably stick to using the
standard interfaces exposed by kernel space and add the missing ones
in kernel and push the kernel community to consider accepting the same
as a standard ABI between kernel and user space.

Thanks,
Bhupesh

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH v2 1/2] makedumpfile/arm64: Use VMCOREINFO inside '/proc/kcore' (if available)
  2018-11-08 21:58       ` Bhupesh Sharma
@ 2018-11-12 19:51         ` Kazuhito Hagio
  2018-11-15 20:27           ` Bhupesh Sharma
  0 siblings, 1 reply; 11+ messages in thread
From: Kazuhito Hagio @ 2018-11-12 19:51 UTC (permalink / raw)
  To: Bhupesh Sharma; +Cc: kexec mailing list, Baoquan He

Hi Bhupesh,

(I'm trying to send this again without the attached files
because the previous one looks being held by kexec list.
sorry for the duplication.)

Thank you for the information.

On 11/8/2018 4:58 PM, Bhupesh Sharma wrote:
> Hi Kazu,
> 
> On Wed, Nov 7, 2018 at 1:53 AM Kazuhito Hagio <k-hagio@ab.jp.nec.com> wrote:
> >
> > On 11/2/2018 5:00 PM, Kazuhito Hagio wrote:
> > > Originally, info->page_offset (PAGE_OFFSET) is used in the following
> > > parts on arm64.
> > >
> > > arch/arm64.c:
> > > __pa(unsigned long vaddr)
> > > {
> > >         if (kimage_voffset == NOT_FOUND_NUMBER ||
> > >                         (vaddr >= PAGE_OFFSET))
> > >                 return (vaddr - PAGE_OFFSET + info->phys_base);
> > >         else
> > >                 return (vaddr - kimage_voffset);
> > > }
> > >
> > > elf_info.c:
> > >         kvstart = (ulong)start + PAGE_OFFSET;
> > >         kvend = (ulong)end + PAGE_OFFSET;
> > > --
> > >         kvaddr = (ulong)vmcoreinfo_addr + PAGE_OFFSET;
> > >
> > > I'm wondering about its consistency.  Is it OK to set
> > > (virt_start - phys_start) to info->page_offset on arm64?
> > > In other words, on arm64 system with info->phys_base != 0, does it
> > > work correctly for both /proc/vmcore and --mem-usage /proc/kcore?
> >
> > I found an arm64 system available and (virt_start - phys_start)
> > tested OK for both /proc/vmcore and --mem-usage /proc/kcore,
> > because the __pa() function is used just only for swapper_pg_dir
> > and the "(vaddr - PAGE_OFFSET + info->phys_base)" line is not used.
> >
> > The definition of PAGE_OFFSET in kernel is:
> >
> >   #define PAGE_OFFSET             (UL(0xffffffffffffffff) - \
> >           (UL(1) << (VA_BITS - 1)) + 1)
> >
> > The one in crash is:
> >
> >   #define ARM64_PAGE_OFFSET    ((0xffffffffffffffffUL) \
> >                                         << (machdep->machspec->VA_BITS - 1))
> >
> > So I think it would be better to use the same definition also in
> > makedumpfile to avoid confusion. In other words, I think the following
> > does not support arm64 now, and ideally should be fixed by introducing
> > something like phys_to_virt()..
> >
> > elf_info.c:
> >         kvstart = (ulong)start + PAGE_OFFSET;
> >         kvend = (ulong)end + PAGE_OFFSET;
> >         --
> >         kvaddr = (ulong)vmcoreinfo_addr + PAGE_OFFSET;
> >
> > and then,
> >
> >   if (NUMBER(VA_BITS) != NOT_FOUND_NUMBER)
> >       va_bits = NUMBER(VA_BITS);
> >   else
> >       ... va_vits calculation ...
> >
> >   info->page_offset = 0xffffffffffffffffUL << (va_bits - 1);
> >
> > And also we will need to add get_phys_base() to show_mem_usage()
> > with some additional code for arm64 with old kernels.
> > I'm considering this approach.
> 
> Note that PAGE_OFFSET is not constant on KASLR enabled arm64 kernels
> as the start of the linear range gets randomized as well (as per the
> KASLR seed) which means that PAGE_OFFSET is no longer equal to the
> macro:
> #define PAGE_OFFSET             (UL(0xffffffffffffffff) - \
>            (UL(1) << (VA_BITS - 1)) + 1)

The PAGE_OFFSET macro itself is still constant in arm64 kernel and used by
p2v/v2p calculation, and also used by the test whether a virtual address is
in linear kernel range (as VA_BITS):
---
extern s64                      memstart_addr;
/* PHYS_OFFSET - the physical address of the start of memory. */
#define PHYS_OFFSET             ({ VM_BUG_ON(memstart_addr & 1); memstart_addr; })
...
/*
 * The linear kernel range starts in the middle of the virtual adddress
 * space. Testing the top bit for the start of the region is a
 * sufficient check.
 */
#define __is_lm_address(addr)   (!!((addr) & BIT(VA_BITS - 1)))

#define __lm_to_phys(addr)      (((addr) & ~PAGE_OFFSET) + PHYS_OFFSET)
#define __kimg_to_phys(addr)    ((addr) - kimage_voffset)

#define __virt_to_phys_nodebug(x) ({                                    \
        phys_addr_t __x = (phys_addr_t)(x);                             \
        __is_lm_address(__x) ? __lm_to_phys(__x) :                      \
                               __kimg_to_phys(__x);                     \
})
...
#define __phys_to_virt(x)       ((unsigned long)((x) - PHYS_OFFSET) | PAGE_OFFSET)
---

I meant that I just would like to imitate these definitions also in
makedumpfile first, for better maintenance and future changes.

Of course, PAGE_OFFSET becomes constant (from VA_BITS only), but PHYS_OFFSET
(= memstart_addr) will be variable by some conditions including KASLR.
I think that this PAGE_OFFSET is a bit different from x86_64's one on its
meaning.

> Please have a look at my discussion with the arm64 maintainers which
> started the whole vmcoreinfo in kcore stuff rolling: ).
> It talks about PAGE_OFFSET and how it cannot be a constant macro for
> KASLR boot cases:
> [1]. https://www.spinics.net/lists/arm-kernel/msg655933.html
> [2]. https://www.spinics.net/lists/kexec/msg20842.html
> 
> Also implementing __pa() or __phys_to_virt(x) in user-space is not
> easy as reported earlier for kexec-tools as well. A typical arm64
> __phys_to_virt(x) looks like this (simplified):
> 
> | #define __phys_to_virt(x) \
> | ((unsigned long)((x) - memstart_addr) | PAGE_OFFSET)
> 
> Please have a look at the discussion with the arm64 maintainers
> regarding their views on the same here:
> [3]. https://www.spinics.net/lists/kexec/msg20867.html
> 
> As James, suggested in this thread, " If user-space needs this value
> to work with
> /proc/{kcore,vmcore} we should expose something like 'p2v_offset' as an elf-note
> on those files. (looks like they both have elf-headers).", something
> which I am working on already and plan to send upstream next week.
> 
> My only concern is regarding whether we are heading in a direction in
> user-land, which has been NAK'ed by arm64 kernel maintainers, which
> means that if we need to avoid the nasty rewrite of a 'p2v'
> implementation in the user-space we should probably stick to using the
> standard interfaces exposed by kernel space and add the missing ones
> in kernel and push the kernel community to consider accepting the same
> as a standard ABI between kernel and user space.

I agree to using standard interfaces and pushing it to kernel.
I'm thinking to add comments to your x86_64 'page_offset_base' patch.

Anyway, first I'd like to address the above definition change and the problem
on arm64 with kernels < 4.19 to simlify the situation.

I wrote the attached patches as the first step:

Patch 1 introduces paddr_to_vaddr() and replaces a few "paddr + PAGE_OFFSET"
lines, which is p2v calculation for x86_64, with it.  There is no behavior
change on all architectures, although some architectures may need a fix.

Patch 2 redefines PAGE_OFFSET on arm64 and implement paddr_to_vaddr_arm64().
The meaning of PAGE_OFFSET is changed, but the case that using (vaddr - paddr)
for p2v calculation is not changed.

I tested them on two arm64 systems (including KASLR enabled, because its
KERNELOFFSET=2eff90ae0000 in vmcoreinfo) and it tested OK,
but could you test them on your arm64 boards?

If it works correctly, then let's think about using vmcoreinfo in PT_NOTE in
/proc/kcore, which kernels >= 4.19 have.

Thanks,
Kazu

--
From 7888baf1386728c9c3132d3e24fc8fdf48f55f03 Mon Sep 17 00:00:00 2001
From: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Date: Thu, 8 Nov 2018 14:26:43 -0500
Subject: [PATCH 1/2] Prepare paddr_to_vaddr() for architectures other than
 x86_64 to support --mem-usage option

Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
---
 elf_info.c     |  7 ++++---
 makedumpfile.h | 11 +++++++++++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/elf_info.c b/elf_info.c
index 711601a..1d33e96 100644
--- a/elf_info.c
+++ b/elf_info.c
@@ -372,7 +372,7 @@ int set_kcore_vmcoreinfo(uint64_t vmcoreinfo_addr, uint64_t vmcoreinfo_len)
 	off_t offset_desc;
 
 	offset = UNINITIALIZED;
-	kvaddr = (ulong)vmcoreinfo_addr + PAGE_OFFSET;
+	kvaddr = paddr_to_vaddr(vmcoreinfo_addr);
 
 	for (i = 0; i < num_pt_loads; ++i) {
 		struct pt_load_segment *p = &pt_loads[i];
@@ -810,10 +810,11 @@ static int exclude_segment(struct pt_load_segment **pt_loads,
 	int i, j, tidx = -1;
 	unsigned long long	vstart, vend, kvstart, kvend;
 	struct pt_load_segment temp_seg = {0};
-	kvstart = (ulong)start + PAGE_OFFSET;
-	kvend = (ulong)end + PAGE_OFFSET;
 	unsigned long size;
 
+	kvstart = paddr_to_vaddr(start);
+	kvend = paddr_to_vaddr(end);
+
 	for (i = 0; i < (*num_pt_loads); i++) {
 		vstart = (*pt_loads)[i].virt_start;
 		vend = (*pt_loads)[i].virt_end;
diff --git a/makedumpfile.h b/makedumpfile.h
index 46ebe2e..574a335 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -965,6 +965,8 @@ typedef unsigned long pgd_t;
 static inline int stub_true() { return TRUE; }
 static inline int stub_true_ul(unsigned long x) { return TRUE; }
 static inline int stub_false() { return FALSE; }
+#define paddr_to_vaddr_general(X) ((X) + PAGE_OFFSET)
+
 #ifdef __aarch64__
 int get_phys_base_arm64(void);
 int get_machdep_info_arm64(void);
@@ -975,6 +977,7 @@ int get_xen_info_arm64(void);
 unsigned long get_kaslr_offset_arm64(unsigned long vaddr);
 #define find_vmemmap()		stub_false()
 #define vaddr_to_paddr(X)	vaddr_to_paddr_arm64(X)
+#define paddr_to_vaddr(X)	paddr_to_vaddr_general(X)
 #define get_phys_base()		get_phys_base_arm64()
 #define get_machdep_info()	get_machdep_info_arm64()
 #define get_versiondep_info()	get_versiondep_info_arm64()
@@ -995,6 +998,7 @@ unsigned long long vaddr_to_paddr_arm(unsigned long vaddr);
 #define get_versiondep_info()	stub_true()
 #define get_kaslr_offset(X)	stub_false()
 #define vaddr_to_paddr(X)	vaddr_to_paddr_arm(X)
+#define paddr_to_vaddr(X)	paddr_to_vaddr_general(X)
 #define is_phys_addr(X)		stub_true_ul(X)
 #define arch_crashkernel_mem_size()	stub_false()
 #endif /* arm */
@@ -1009,6 +1013,7 @@ unsigned long long vaddr_to_paddr_x86(unsigned long vaddr);
 #define get_versiondep_info()	get_versiondep_info_x86()
 #define get_kaslr_offset(X)	stub_false()
 #define vaddr_to_paddr(X)	vaddr_to_paddr_x86(X)
+#define paddr_to_vaddr(X)	paddr_to_vaddr_general(X)
 #define is_phys_addr(X)		stub_true_ul(X)
 #define arch_crashkernel_mem_size()	stub_false()
 #endif /* x86 */
@@ -1026,6 +1031,7 @@ unsigned long long vtop4_x86_64_pagetable(unsigned long vaddr, unsigned long pag
 #define get_versiondep_info()	get_versiondep_info_x86_64()
 #define get_kaslr_offset(X)	get_kaslr_offset_x86_64(X)
 #define vaddr_to_paddr(X)	vtop4_x86_64(X)
+#define paddr_to_vaddr(X)	paddr_to_vaddr_general(X)
 #define is_phys_addr(X)		stub_true_ul(X)
 #define arch_crashkernel_mem_size()	stub_false()
 #endif /* x86_64 */
@@ -1041,6 +1047,7 @@ int arch_crashkernel_mem_size_ppc64(void);
 #define get_versiondep_info()	get_versiondep_info_ppc64()
 #define get_kaslr_offset(X)	stub_false()
 #define vaddr_to_paddr(X)	vaddr_to_paddr_ppc64(X)
+#define paddr_to_vaddr(X)	paddr_to_vaddr_general(X)
 #define is_phys_addr(X)		stub_true_ul(X)
 #define arch_crashkernel_mem_size()	arch_crashkernel_mem_size_ppc64()
 #endif          /* powerpc64 */
@@ -1054,6 +1061,7 @@ unsigned long long vaddr_to_paddr_ppc(unsigned long vaddr);
 #define get_versiondep_info()	stub_true()
 #define get_kaslr_offset(X)	stub_false()
 #define vaddr_to_paddr(X)	vaddr_to_paddr_ppc(X)
+#define paddr_to_vaddr(X)	paddr_to_vaddr_general(X)
 #define is_phys_addr(X)		stub_true_ul(X)
 #define arch_crashkernel_mem_size()	stub_false()
 #endif          /* powerpc32 */
@@ -1068,6 +1076,7 @@ int is_iomem_phys_addr_s390x(unsigned long addr);
 #define get_versiondep_info()	stub_true()
 #define get_kaslr_offset(X)	stub_false()
 #define vaddr_to_paddr(X)	vaddr_to_paddr_s390x(X)
+#define paddr_to_vaddr(X)	paddr_to_vaddr_general(X)
 #define is_phys_addr(X)		is_iomem_phys_addr_s390x(X)
 #define arch_crashkernel_mem_size()	stub_false()
 #endif          /* s390x */
@@ -1082,6 +1091,7 @@ unsigned long long vaddr_to_paddr_ia64(unsigned long vaddr);
 #define get_versiondep_info()	stub_true()
 #define get_kaslr_offset(X)	stub_false()
 #define vaddr_to_paddr(X)	vaddr_to_paddr_ia64(X)
+#define paddr_to_vaddr(X)	paddr_to_vaddr_general(X)
 #define VADDR_REGION(X)		(((unsigned long)(X)) >> REGION_SHIFT)
 #define is_phys_addr(X)		stub_true_ul(X)
 #define arch_crashkernel_mem_size()	stub_false()
@@ -1096,6 +1106,7 @@ unsigned long long vaddr_to_paddr_sparc64(unsigned long vaddr);
 #define get_phys_base()         get_phys_base_sparc64()
 #define get_versiondep_info()   get_versiondep_info_sparc64()
 #define vaddr_to_paddr(X)       vaddr_to_paddr_sparc64(X)
+#define paddr_to_vaddr(X)	paddr_to_vaddr_general(X)
 #define is_phys_addr(X)		stub_true_ul(X)
 #define arch_crashkernel_mem_size()	stub_false()
 #endif		/* sparc64 */
-- 
2.18.1


From 2ce3e43db15e882e8b2746dc3d55a274d364a2f7 Mon Sep 17 00:00:00 2001
From: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Date: Thu, 8 Nov 2018 15:18:45 -0500
Subject: [PATCH 2/2] arm64: implement paddr_to_vaddr_arm64() to support
 --mem-usage option

Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
---
 arch/arm64.c   | 54 ++++++++++++++++++++++++++++----------------------
 makedumpfile.c |  4 ++++
 makedumpfile.h |  6 +++---
 3 files changed, 37 insertions(+), 27 deletions(-)

diff --git a/arch/arm64.c b/arch/arm64.c
index 3626096..43c6b83 100644
--- a/arch/arm64.c
+++ b/arch/arm64.c
@@ -174,11 +174,35 @@ get_kvbase_arm64(void)
 int
 get_phys_base_arm64(void)
 {
-	info->phys_base = NUMBER(PHYS_OFFSET);
+	int i;
+	unsigned long long phys_start;
+	unsigned long long virt_start;
 
-	DEBUG_MSG("phys_base    : %lx\n", info->phys_base);
+	if (NUMBER(PHYS_OFFSET) != NOT_FOUND_NUMBER) {
+		info->phys_base = NUMBER(PHYS_OFFSET);
+		DEBUG_MSG("phys_base    : %lx (vmcoreinfo)\n",
+				info->phys_base);
+		return TRUE;
+	}
 
-	return TRUE;
+	if (get_num_pt_loads() && PAGE_OFFSET) {
+		for (i = 0;
+		    get_pt_load(i, &phys_start, NULL, &virt_start, NULL);
+		    i++) {
+			if (virt_start != NOT_KV_ADDR
+			    && virt_start >= PAGE_OFFSET
+			    && phys_start != NOT_PADDR) {
+				info->phys_base = phys_start -
+					(virt_start & ~PAGE_OFFSET);
+				DEBUG_MSG("phys_base    : %lx (pt_load)\n",
+						info->phys_base);
+				return TRUE;
+			}
+		}
+	}
+
+	DEBUG_MSG("Cannot determine phys_base\n");
+	return FALSE;
 }
 
 unsigned long
@@ -306,9 +330,6 @@ get_xen_info_arm64(void)
 int
 get_versiondep_info_arm64(void)
 {
-	int i;
-	unsigned long long phys_start;
-	unsigned long long virt_start;
 	ulong _stext;
 
 	_stext = get_stext_symbol();
@@ -333,25 +354,10 @@ get_versiondep_info_arm64(void)
 		return FALSE;
 	}
 
-	if (get_num_pt_loads()) {
-		for (i = 0;
-		    get_pt_load(i, &phys_start, NULL, &virt_start, NULL);
-		    i++) {
-			if (virt_start != NOT_KV_ADDR
-			    && virt_start < __START_KERNEL_map
-			    && phys_start != NOT_PADDR
-			    && phys_start != NOT_PADDR_ARM64) {
-				info->page_offset = virt_start - phys_start;
-				DEBUG_MSG("info->page_offset: %lx, VA_BITS: %d\n",
-						info->page_offset, va_bits);
-				return TRUE;
-			}
-		}
-	}
-
 	info->page_offset = (0xffffffffffffffffUL) << (va_bits - 1);
-	DEBUG_MSG("page_offset=%lx, va_bits=%d\n", info->page_offset,
-			va_bits);
+
+	DEBUG_MSG("va_bits      : %d\n", va_bits);
+	DEBUG_MSG("page_offset  : %lx\n", info->page_offset);
 
 	return TRUE;
 }
diff --git a/makedumpfile.c b/makedumpfile.c
index 91c1ab4..8923538 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -11214,6 +11214,10 @@ int show_mem_usage(void)
 	if (!get_page_offset())
 		return FALSE;
 
+	/* paddr_to_vaddr() on arm64 needs phys_base. */
+	if (!get_phys_base())
+		return FALSE;
+
 	if (!get_sys_kernel_vmcoreinfo(&vmcoreinfo_addr, &vmcoreinfo_len))
 		return FALSE;
 
diff --git a/makedumpfile.h b/makedumpfile.h
index 574a335..a76b499 100644
--- a/makedumpfile.h
+++ b/makedumpfile.h
@@ -542,9 +542,7 @@ do { \
 #ifdef __aarch64__
 unsigned long get_kvbase_arm64(void);
 #define KVBASE			get_kvbase_arm64()
-
 #define __START_KERNEL_map	(0xffffffff80000000UL)
-#define NOT_PADDR_ARM64		(0x0000000010a80000UL)
 
 #endif /* aarch64 */
 
@@ -975,9 +973,11 @@ int get_versiondep_info_arm64(void);
 int get_xen_basic_info_arm64(void);
 int get_xen_info_arm64(void);
 unsigned long get_kaslr_offset_arm64(unsigned long vaddr);
+#define paddr_to_vaddr_arm64(X)	(((X) - info->phys_base) | PAGE_OFFSET)
+
 #define find_vmemmap()		stub_false()
 #define vaddr_to_paddr(X)	vaddr_to_paddr_arm64(X)
-#define paddr_to_vaddr(X)	paddr_to_vaddr_general(X)
+#define paddr_to_vaddr(X)	paddr_to_vaddr_arm64(X)
 #define get_phys_base()		get_phys_base_arm64()
 #define get_machdep_info()	get_machdep_info_arm64()
 #define get_versiondep_info()	get_versiondep_info_arm64()
-- 
2.18.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] makedumpfile/arm64: Use VMCOREINFO inside '/proc/kcore' (if available)
  2018-11-12 19:51         ` Kazuhito Hagio
@ 2018-11-15 20:27           ` Bhupesh Sharma
  2018-11-16 15:50             ` Kazuhito Hagio
  0 siblings, 1 reply; 11+ messages in thread
From: Bhupesh Sharma @ 2018-11-15 20:27 UTC (permalink / raw)
  To: Kazuhito Hagio; +Cc: kexec mailing list, Baoquan He

Hi Kazu,

Thanks for the patches. Some comments inline:

On Tue, Nov 13, 2018 at 1:24 AM Kazuhito Hagio <k-hagio@ab.jp.nec.com> wrote:
>
> Hi Bhupesh,
>
> (I'm trying to send this again without the attached files
> because the previous one looks being held by kexec list.
> sorry for the duplication.)
>
> Thank you for the information.
>
> On 11/8/2018 4:58 PM, Bhupesh Sharma wrote:
> > Hi Kazu,
> >
> > On Wed, Nov 7, 2018 at 1:53 AM Kazuhito Hagio <k-hagio@ab.jp.nec.com> wrote:
> > >
> > > On 11/2/2018 5:00 PM, Kazuhito Hagio wrote:
> > > > Originally, info->page_offset (PAGE_OFFSET) is used in the following
> > > > parts on arm64.
> > > >
> > > > arch/arm64.c:
> > > > __pa(unsigned long vaddr)
> > > > {
> > > >         if (kimage_voffset == NOT_FOUND_NUMBER ||
> > > >                         (vaddr >= PAGE_OFFSET))
> > > >                 return (vaddr - PAGE_OFFSET + info->phys_base);
> > > >         else
> > > >                 return (vaddr - kimage_voffset);
> > > > }
> > > >
> > > > elf_info.c:
> > > >         kvstart = (ulong)start + PAGE_OFFSET;
> > > >         kvend = (ulong)end + PAGE_OFFSET;
> > > > --
> > > >         kvaddr = (ulong)vmcoreinfo_addr + PAGE_OFFSET;
> > > >
> > > > I'm wondering about its consistency.  Is it OK to set
> > > > (virt_start - phys_start) to info->page_offset on arm64?
> > > > In other words, on arm64 system with info->phys_base != 0, does it
> > > > work correctly for both /proc/vmcore and --mem-usage /proc/kcore?
> > >
> > > I found an arm64 system available and (virt_start - phys_start)
> > > tested OK for both /proc/vmcore and --mem-usage /proc/kcore,
> > > because the __pa() function is used just only for swapper_pg_dir
> > > and the "(vaddr - PAGE_OFFSET + info->phys_base)" line is not used.
> > >
> > > The definition of PAGE_OFFSET in kernel is:
> > >
> > >   #define PAGE_OFFSET             (UL(0xffffffffffffffff) - \
> > >           (UL(1) << (VA_BITS - 1)) + 1)
> > >
> > > The one in crash is:
> > >
> > >   #define ARM64_PAGE_OFFSET    ((0xffffffffffffffffUL) \
> > >                                         << (machdep->machspec->VA_BITS - 1))
> > >
> > > So I think it would be better to use the same definition also in
> > > makedumpfile to avoid confusion. In other words, I think the following
> > > does not support arm64 now, and ideally should be fixed by introducing
> > > something like phys_to_virt()..
> > >
> > > elf_info.c:
> > >         kvstart = (ulong)start + PAGE_OFFSET;
> > >         kvend = (ulong)end + PAGE_OFFSET;
> > >         --
> > >         kvaddr = (ulong)vmcoreinfo_addr + PAGE_OFFSET;
> > >
> > > and then,
> > >
> > >   if (NUMBER(VA_BITS) != NOT_FOUND_NUMBER)
> > >       va_bits = NUMBER(VA_BITS);
> > >   else
> > >       ... va_vits calculation ...
> > >
> > >   info->page_offset = 0xffffffffffffffffUL << (va_bits - 1);
> > >
> > > And also we will need to add get_phys_base() to show_mem_usage()
> > > with some additional code for arm64 with old kernels.
> > > I'm considering this approach.
> >
> > Note that PAGE_OFFSET is not constant on KASLR enabled arm64 kernels
> > as the start of the linear range gets randomized as well (as per the
> > KASLR seed) which means that PAGE_OFFSET is no longer equal to the
> > macro:
> > #define PAGE_OFFSET             (UL(0xffffffffffffffff) - \
> >            (UL(1) << (VA_BITS - 1)) + 1)
>
> The PAGE_OFFSET macro itself is still constant in arm64 kernel and used by
> p2v/v2p calculation, and also used by the test whether a virtual address is
> in linear kernel range (as VA_BITS):
> ---
> extern s64                      memstart_addr;
> /* PHYS_OFFSET - the physical address of the start of memory. */
> #define PHYS_OFFSET             ({ VM_BUG_ON(memstart_addr & 1); memstart_addr; })
> ...
> /*
>  * The linear kernel range starts in the middle of the virtual adddress
>  * space. Testing the top bit for the start of the region is a
>  * sufficient check.
>  */
> #define __is_lm_address(addr)   (!!((addr) & BIT(VA_BITS - 1)))
>
> #define __lm_to_phys(addr)      (((addr) & ~PAGE_OFFSET) + PHYS_OFFSET)
> #define __kimg_to_phys(addr)    ((addr) - kimage_voffset)
>
> #define __virt_to_phys_nodebug(x) ({                                    \
>         phys_addr_t __x = (phys_addr_t)(x);                             \
>         __is_lm_address(__x) ? __lm_to_phys(__x) :                      \
>                                __kimg_to_phys(__x);                     \
> })
> ...
> #define __phys_to_virt(x)       ((unsigned long)((x) - PHYS_OFFSET) | PAGE_OFFSET)
> ---

I understand, but for me (and as I understood from other arm64 users
as well), the kernel
virtual address layout and the comments we have in place are probably
broken after KASLR
was enabled on arm64. This is something I shared with arm64 maintainers as well:
[1]. https://www.spinics.net/lists/arm-kernel/msg655933.html

Lets go through some details first.
Please note that the PAGE_OFFSET is defined as
('arch/arm64/include/asm/memory.h'):

/*
 * PAGE_OFFSET - the virtual address of the start of the linear map (top
 *         (VA_BITS - 1))

So, basically this points to the virtual address which indicates the
start of linear mapping.
Now, the linear region takes up exactly half of the kernel virtual
address space.

So, with a normal non-KASLR boot (I am sharing the logs below on a
system booted with 'nokaslr' in bootargs),
the PAGE_OFFSET macro would normally point to the half mark of the
kernel virtual address space.
[I needed to revert the kernel commit
"071929dbdd865f779a89ba3f1e06ba8d17dd3743: arm64: Stop printing the
virtual memory layout" to get these logs]

[    0.000000] Virtual kernel memory layout:
[    0.000000]     modules : 0xffff000000000000 - 0xffff000008000000
(   128 MB)
[    0.000000]     vmalloc : 0xffff000008000000 - 0xffff7bdfffff0000
(126847 GB)
[    0.000000]       .text : 0x(____ptrval____) - 0x(____ptrval____)
(  8768 KB)
[    0.000000]     .rodata : 0x(____ptrval____) - 0x(____ptrval____)
(  5696 KB)
[    0.000000]       .init : 0x(____ptrval____) - 0x(____ptrval____)
(  4224 KB)
[    0.000000]       .data : 0x(____ptrval____) - 0x(____ptrval____)
(  1775 KB)
[    0.000000]        .bss : 0x(____ptrval____) - 0x(____ptrval____)
(  9193 KB)
[    0.000000]     fixed   : 0xffff7fdffe790000 - 0xffff7fdffec00000
(  4544 KB)
[    0.000000]     PCI I/O : 0xffff7fdffee00000 - 0xffff7fdfffe00000
(    16 MB)
[    0.000000]     vmemmap : 0xffff7fe000000000 - 0xffff800000000000
(   128 GB maximum)
[    0.000000]               0xffff7fe000000000 - 0xffff7fe004000000
(    64 MB actual)
[    0.000000]     memory  : 0xffff800000000000 - 0xffff801000000000
( 65536 MB)

This is on a system with 64K page size and VA_BITS=48.
For such a configuration, PAGE_OFFSET is also calculated as
0xffff800000000000, which the
same as the start address of the memory range (i.e memory  :
0xffff800000000000 - 0xffff801000000000)
which depicts the linear region from where 'kmalloc' allocations take place.

Now, lets look at the boot logs on the same system with 'nokaslr'
removed from bootargs (i.e. a KASLR boot case):

[    0.000000] Virtual kernel memory layout:
[    0.000000]     modules : 0xffff000000000000 - 0xffff000008000000
(   128 MB)
[    0.000000]     vmalloc : 0xffff000008000000 - 0xffff7bdfffff0000
(126847 GB)
[    0.000000]       .text : 0x(____ptrval____) - 0x(____ptrval____)
(  8768 KB)
[    0.000000]     .rodata : 0x(____ptrval____) - 0x(____ptrval____)
(  5696 KB)
[    0.000000]       .init : 0x(____ptrval____) - 0x(____ptrval____)
(  4224 KB)
[    0.000000]       .data : 0x(____ptrval____) - 0x(____ptrval____)
(  1775 KB)
[    0.000000]        .bss : 0x(____ptrval____) - 0x(____ptrval____)
(  9193 KB)
[    0.000000]     fixed   : 0xffff7fdffe790000 - 0xffff7fdffec00000
(  4544 KB)
[    0.000000]     PCI I/O : 0xffff7fdffee00000 - 0xffff7fdfffe00000
(    16 MB)
[    0.000000]     vmemmap : 0xffff7fe000000000 - 0xffff800000000000
(   128 GB maximum)
[    0.000000]               0xffff7fea5b300000 - 0xffff7fea5f300000
(    64 MB actual)
[    0.000000]     memory  : 0xffffa96cc0000000 - 0xffffa97cc0000000
( 65536 MB)

As you will note here, the start of linear range as indicated by the
'memory' node is 0xffffa96cc0000000
(this would be a random value for each KASLR boot), however the
PAGE_OFFSET macro is still stuck
at 0xffff800000000000, which is confusing (and probably incorrect), as
the 'kmalloc' calls return addresses in the range
'0xffffa96cc0000000 - 0xffffa97cc0000000' and not in the
'0xffff800000000000 - 0xffff801000000000'  range
anymore as we saw in the non-KASLR boot case.

> I meant that I just would like to imitate these definitions also in
> makedumpfile first, for better maintenance and future changes.

Sure, I think the arm64 kernel ideally requires a standard ABI to
expose the start of linear address region to the user-space
as well for both KASLR and non-KASLR cases (something like a
'page_offset_base' for x86_64), and I plan to submit a RFC patch for
arm64 maintainers
consideration once my x86_64 kernel patch is accepted.

> Of course, PAGE_OFFSET becomes constant (from VA_BITS only), but PHYS_OFFSET
> (= memstart_addr) will be variable by some conditions including KASLR.
> I think that this PAGE_OFFSET is a bit different from x86_64's one on its
> meaning.

See above, but the point I am trying to make is that if the 'kmalloc'
return addresses are nowhere close to the
PAGE_OFFSET macro value, it causes a lot of confusion while debugging
kernel issues.
(see [1] for details, specially the suggestions I made there to the
maintainers, copied below for quick access:

So, my question is to access the start of linear region (which was
earlier determinable via PAGE_OFFSET macro), whether I should:

- do some back-computation for the start of linear region from the
'memstart_addr' in user-space, or
- use a new global variable in kernel which is assigned the value of
memblock_start_of_DRAM()' and assign it to '/proc/kallsyms', so that
it can be read by user-space tools, or
- whether we should rather look at removing the PAGE_OFFSET usage from
the kernel and replace it with a global variable instead which is
properly updated for KASLR case as well.)

> > Please have a look at my discussion with the arm64 maintainers which
> > started the whole vmcoreinfo in kcore stuff rolling: ).
> > It talks about PAGE_OFFSET and how it cannot be a constant macro for
> > KASLR boot cases:
> > [1]. https://www.spinics.net/lists/arm-kernel/msg655933.html
> > [2]. https://www.spinics.net/lists/kexec/msg20842.html
> >
> > Also implementing __pa() or __phys_to_virt(x) in user-space is not
> > easy as reported earlier for kexec-tools as well. A typical arm64
> > __phys_to_virt(x) looks like this (simplified):
> >
> > | #define __phys_to_virt(x) \
> > | ((unsigned long)((x) - memstart_addr) | PAGE_OFFSET)
> >
> > Please have a look at the discussion with the arm64 maintainers
> > regarding their views on the same here:
> > [3]. https://www.spinics.net/lists/kexec/msg20867.html
> >
> > As James, suggested in this thread, " If user-space needs this value
> > to work with
> > /proc/{kcore,vmcore} we should expose something like 'p2v_offset' as an elf-note
> > on those files. (looks like they both have elf-headers).", something
> > which I am working on already and plan to send upstream next week.
> >
> > My only concern is regarding whether we are heading in a direction in
> > user-land, which has been NAK'ed by arm64 kernel maintainers, which
> > means that if we need to avoid the nasty rewrite of a 'p2v'
> > implementation in the user-space we should probably stick to using the
> > standard interfaces exposed by kernel space and add the missing ones
> > in kernel and push the kernel community to consider accepting the same
> > as a standard ABI between kernel and user space.
>
> I agree to using standard interfaces and pushing it to kernel.
> I'm thinking to add comments to your x86_64 'page_offset_base' patch.

Thanks a lot. That would help.

> Anyway, first I'd like to address the above definition change and the problem
> on arm64 with kernels < 4.19 to simlify the situation.
>
> I wrote the attached patches as the first step:
>
> Patch 1 introduces paddr_to_vaddr() and replaces a few "paddr + PAGE_OFFSET"
> lines, which is p2v calculation for x86_64, with it.  There is no behavior
> change on all architectures, although some architectures may need a fix.
>
> Patch 2 redefines PAGE_OFFSET on arm64 and implement paddr_to_vaddr_arm64().
> The meaning of PAGE_OFFSET is changed, but the case that using (vaddr - paddr)
> for p2v calculation is not changed.
>
> I tested them on two arm64 systems (including KASLR enabled, because its
> KERNELOFFSET=2eff90ae0000 in vmcoreinfo) and it tested OK,
> but could you test them on your arm64 boards?
>
> If it works correctly, then let's think about using vmcoreinfo in PT_NOTE in
> /proc/kcore, which kernels >= 4.19 have.

Yes, these patches work fine on my hp-moonshot machine both for KASLR
and non-KASLR cases,
and I agree with you that we need some kind of a solution for <4.19
kernels as well and probably
this is a good fix to have for now.

So, please feel free to add:
Tested-by: Bhupesh Sharma <bhsharma@redhat.com>

Thanks for your help.

Regards,
Bhupesh

> Thanks,
> Kazu
>
> --
> From 7888baf1386728c9c3132d3e24fc8fdf48f55f03 Mon Sep 17 00:00:00 2001
> From: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
> Date: Thu, 8 Nov 2018 14:26:43 -0500
> Subject: [PATCH 1/2] Prepare paddr_to_vaddr() for architectures other than
>  x86_64 to support --mem-usage option
>
> Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
> ---
>  elf_info.c     |  7 ++++---
>  makedumpfile.h | 11 +++++++++++
>  2 files changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/elf_info.c b/elf_info.c
> index 711601a..1d33e96 100644
> --- a/elf_info.c
> +++ b/elf_info.c
> @@ -372,7 +372,7 @@ int set_kcore_vmcoreinfo(uint64_t vmcoreinfo_addr, uint64_t vmcoreinfo_len)
>         off_t offset_desc;
>
>         offset = UNINITIALIZED;
> -       kvaddr = (ulong)vmcoreinfo_addr + PAGE_OFFSET;
> +       kvaddr = paddr_to_vaddr(vmcoreinfo_addr);
>
>         for (i = 0; i < num_pt_loads; ++i) {
>                 struct pt_load_segment *p = &pt_loads[i];
> @@ -810,10 +810,11 @@ static int exclude_segment(struct pt_load_segment **pt_loads,
>         int i, j, tidx = -1;
>         unsigned long long      vstart, vend, kvstart, kvend;
>         struct pt_load_segment temp_seg = {0};
> -       kvstart = (ulong)start + PAGE_OFFSET;
> -       kvend = (ulong)end + PAGE_OFFSET;
>         unsigned long size;
>
> +       kvstart = paddr_to_vaddr(start);
> +       kvend = paddr_to_vaddr(end);
> +
>         for (i = 0; i < (*num_pt_loads); i++) {
>                 vstart = (*pt_loads)[i].virt_start;
>                 vend = (*pt_loads)[i].virt_end;
> diff --git a/makedumpfile.h b/makedumpfile.h
> index 46ebe2e..574a335 100644
> --- a/makedumpfile.h
> +++ b/makedumpfile.h
> @@ -965,6 +965,8 @@ typedef unsigned long pgd_t;
>  static inline int stub_true() { return TRUE; }
>  static inline int stub_true_ul(unsigned long x) { return TRUE; }
>  static inline int stub_false() { return FALSE; }
> +#define paddr_to_vaddr_general(X) ((X) + PAGE_OFFSET)
> +
>  #ifdef __aarch64__
>  int get_phys_base_arm64(void);
>  int get_machdep_info_arm64(void);
> @@ -975,6 +977,7 @@ int get_xen_info_arm64(void);
>  unsigned long get_kaslr_offset_arm64(unsigned long vaddr);
>  #define find_vmemmap()         stub_false()
>  #define vaddr_to_paddr(X)      vaddr_to_paddr_arm64(X)
> +#define paddr_to_vaddr(X)      paddr_to_vaddr_general(X)
>  #define get_phys_base()                get_phys_base_arm64()
>  #define get_machdep_info()     get_machdep_info_arm64()
>  #define get_versiondep_info()  get_versiondep_info_arm64()
> @@ -995,6 +998,7 @@ unsigned long long vaddr_to_paddr_arm(unsigned long vaddr);
>  #define get_versiondep_info()  stub_true()
>  #define get_kaslr_offset(X)    stub_false()
>  #define vaddr_to_paddr(X)      vaddr_to_paddr_arm(X)
> +#define paddr_to_vaddr(X)      paddr_to_vaddr_general(X)
>  #define is_phys_addr(X)                stub_true_ul(X)
>  #define arch_crashkernel_mem_size()    stub_false()
>  #endif /* arm */
> @@ -1009,6 +1013,7 @@ unsigned long long vaddr_to_paddr_x86(unsigned long vaddr);
>  #define get_versiondep_info()  get_versiondep_info_x86()
>  #define get_kaslr_offset(X)    stub_false()
>  #define vaddr_to_paddr(X)      vaddr_to_paddr_x86(X)
> +#define paddr_to_vaddr(X)      paddr_to_vaddr_general(X)
>  #define is_phys_addr(X)                stub_true_ul(X)
>  #define arch_crashkernel_mem_size()    stub_false()
>  #endif /* x86 */
> @@ -1026,6 +1031,7 @@ unsigned long long vtop4_x86_64_pagetable(unsigned long vaddr, unsigned long pag
>  #define get_versiondep_info()  get_versiondep_info_x86_64()
>  #define get_kaslr_offset(X)    get_kaslr_offset_x86_64(X)
>  #define vaddr_to_paddr(X)      vtop4_x86_64(X)
> +#define paddr_to_vaddr(X)      paddr_to_vaddr_general(X)
>  #define is_phys_addr(X)                stub_true_ul(X)
>  #define arch_crashkernel_mem_size()    stub_false()
>  #endif /* x86_64 */
> @@ -1041,6 +1047,7 @@ int arch_crashkernel_mem_size_ppc64(void);
>  #define get_versiondep_info()  get_versiondep_info_ppc64()
>  #define get_kaslr_offset(X)    stub_false()
>  #define vaddr_to_paddr(X)      vaddr_to_paddr_ppc64(X)
> +#define paddr_to_vaddr(X)      paddr_to_vaddr_general(X)
>  #define is_phys_addr(X)                stub_true_ul(X)
>  #define arch_crashkernel_mem_size()    arch_crashkernel_mem_size_ppc64()
>  #endif          /* powerpc64 */
> @@ -1054,6 +1061,7 @@ unsigned long long vaddr_to_paddr_ppc(unsigned long vaddr);
>  #define get_versiondep_info()  stub_true()
>  #define get_kaslr_offset(X)    stub_false()
>  #define vaddr_to_paddr(X)      vaddr_to_paddr_ppc(X)
> +#define paddr_to_vaddr(X)      paddr_to_vaddr_general(X)
>  #define is_phys_addr(X)                stub_true_ul(X)
>  #define arch_crashkernel_mem_size()    stub_false()
>  #endif          /* powerpc32 */
> @@ -1068,6 +1076,7 @@ int is_iomem_phys_addr_s390x(unsigned long addr);
>  #define get_versiondep_info()  stub_true()
>  #define get_kaslr_offset(X)    stub_false()
>  #define vaddr_to_paddr(X)      vaddr_to_paddr_s390x(X)
> +#define paddr_to_vaddr(X)      paddr_to_vaddr_general(X)
>  #define is_phys_addr(X)                is_iomem_phys_addr_s390x(X)
>  #define arch_crashkernel_mem_size()    stub_false()
>  #endif          /* s390x */
> @@ -1082,6 +1091,7 @@ unsigned long long vaddr_to_paddr_ia64(unsigned long vaddr);
>  #define get_versiondep_info()  stub_true()
>  #define get_kaslr_offset(X)    stub_false()
>  #define vaddr_to_paddr(X)      vaddr_to_paddr_ia64(X)
> +#define paddr_to_vaddr(X)      paddr_to_vaddr_general(X)
>  #define VADDR_REGION(X)                (((unsigned long)(X)) >> REGION_SHIFT)
>  #define is_phys_addr(X)                stub_true_ul(X)
>  #define arch_crashkernel_mem_size()    stub_false()
> @@ -1096,6 +1106,7 @@ unsigned long long vaddr_to_paddr_sparc64(unsigned long vaddr);
>  #define get_phys_base()         get_phys_base_sparc64()
>  #define get_versiondep_info()   get_versiondep_info_sparc64()
>  #define vaddr_to_paddr(X)       vaddr_to_paddr_sparc64(X)
> +#define paddr_to_vaddr(X)      paddr_to_vaddr_general(X)
>  #define is_phys_addr(X)                stub_true_ul(X)
>  #define arch_crashkernel_mem_size()    stub_false()
>  #endif         /* sparc64 */
> --
> 2.18.1
>
>
> From 2ce3e43db15e882e8b2746dc3d55a274d364a2f7 Mon Sep 17 00:00:00 2001
> From: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
> Date: Thu, 8 Nov 2018 15:18:45 -0500
> Subject: [PATCH 2/2] arm64: implement paddr_to_vaddr_arm64() to support
>  --mem-usage option
>
> Signed-off-by: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
> ---
>  arch/arm64.c   | 54 ++++++++++++++++++++++++++++----------------------
>  makedumpfile.c |  4 ++++
>  makedumpfile.h |  6 +++---
>  3 files changed, 37 insertions(+), 27 deletions(-)
>
> diff --git a/arch/arm64.c b/arch/arm64.c
> index 3626096..43c6b83 100644
> --- a/arch/arm64.c
> +++ b/arch/arm64.c
> @@ -174,11 +174,35 @@ get_kvbase_arm64(void)
>  int
>  get_phys_base_arm64(void)
>  {
> -       info->phys_base = NUMBER(PHYS_OFFSET);
> +       int i;
> +       unsigned long long phys_start;
> +       unsigned long long virt_start;
>
> -       DEBUG_MSG("phys_base    : %lx\n", info->phys_base);
> +       if (NUMBER(PHYS_OFFSET) != NOT_FOUND_NUMBER) {
> +               info->phys_base = NUMBER(PHYS_OFFSET);
> +               DEBUG_MSG("phys_base    : %lx (vmcoreinfo)\n",
> +                               info->phys_base);
> +               return TRUE;
> +       }
>
> -       return TRUE;
> +       if (get_num_pt_loads() && PAGE_OFFSET) {
> +               for (i = 0;
> +                   get_pt_load(i, &phys_start, NULL, &virt_start, NULL);
> +                   i++) {
> +                       if (virt_start != NOT_KV_ADDR
> +                           && virt_start >= PAGE_OFFSET
> +                           && phys_start != NOT_PADDR) {
> +                               info->phys_base = phys_start -
> +                                       (virt_start & ~PAGE_OFFSET);
> +                               DEBUG_MSG("phys_base    : %lx (pt_load)\n",
> +                                               info->phys_base);
> +                               return TRUE;
> +                       }
> +               }
> +       }
> +
> +       DEBUG_MSG("Cannot determine phys_base\n");
> +       return FALSE;
>  }
>
>  unsigned long
> @@ -306,9 +330,6 @@ get_xen_info_arm64(void)
>  int
>  get_versiondep_info_arm64(void)
>  {
> -       int i;
> -       unsigned long long phys_start;
> -       unsigned long long virt_start;
>         ulong _stext;
>
>         _stext = get_stext_symbol();
> @@ -333,25 +354,10 @@ get_versiondep_info_arm64(void)
>                 return FALSE;
>         }
>
> -       if (get_num_pt_loads()) {
> -               for (i = 0;
> -                   get_pt_load(i, &phys_start, NULL, &virt_start, NULL);
> -                   i++) {
> -                       if (virt_start != NOT_KV_ADDR
> -                           && virt_start < __START_KERNEL_map
> -                           && phys_start != NOT_PADDR
> -                           && phys_start != NOT_PADDR_ARM64) {
> -                               info->page_offset = virt_start - phys_start;
> -                               DEBUG_MSG("info->page_offset: %lx, VA_BITS: %d\n",
> -                                               info->page_offset, va_bits);
> -                               return TRUE;
> -                       }
> -               }
> -       }
> -
>         info->page_offset = (0xffffffffffffffffUL) << (va_bits - 1);
> -       DEBUG_MSG("page_offset=%lx, va_bits=%d\n", info->page_offset,
> -                       va_bits);
> +
> +       DEBUG_MSG("va_bits      : %d\n", va_bits);
> +       DEBUG_MSG("page_offset  : %lx\n", info->page_offset);
>
>         return TRUE;
>  }
> diff --git a/makedumpfile.c b/makedumpfile.c
> index 91c1ab4..8923538 100644
> --- a/makedumpfile.c
> +++ b/makedumpfile.c
> @@ -11214,6 +11214,10 @@ int show_mem_usage(void)
>         if (!get_page_offset())
>                 return FALSE;
>
> +       /* paddr_to_vaddr() on arm64 needs phys_base. */
> +       if (!get_phys_base())
> +               return FALSE;
> +
>         if (!get_sys_kernel_vmcoreinfo(&vmcoreinfo_addr, &vmcoreinfo_len))
>                 return FALSE;
>
> diff --git a/makedumpfile.h b/makedumpfile.h
> index 574a335..a76b499 100644
> --- a/makedumpfile.h
> +++ b/makedumpfile.h
> @@ -542,9 +542,7 @@ do { \
>  #ifdef __aarch64__
>  unsigned long get_kvbase_arm64(void);
>  #define KVBASE                 get_kvbase_arm64()
> -
>  #define __START_KERNEL_map     (0xffffffff80000000UL)
> -#define NOT_PADDR_ARM64                (0x0000000010a80000UL)
>
>  #endif /* aarch64 */
>
> @@ -975,9 +973,11 @@ int get_versiondep_info_arm64(void);
>  int get_xen_basic_info_arm64(void);
>  int get_xen_info_arm64(void);
>  unsigned long get_kaslr_offset_arm64(unsigned long vaddr);
> +#define paddr_to_vaddr_arm64(X)        (((X) - info->phys_base) | PAGE_OFFSET)
> +
>  #define find_vmemmap()         stub_false()
>  #define vaddr_to_paddr(X)      vaddr_to_paddr_arm64(X)
> -#define paddr_to_vaddr(X)      paddr_to_vaddr_general(X)
> +#define paddr_to_vaddr(X)      paddr_to_vaddr_arm64(X)
>  #define get_phys_base()                get_phys_base_arm64()
>  #define get_machdep_info()     get_machdep_info_arm64()
>  #define get_versiondep_info()  get_versiondep_info_arm64()
> --
> 2.18.1
>
>

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH v2 1/2] makedumpfile/arm64: Use VMCOREINFO inside '/proc/kcore' (if available)
  2018-11-15 20:27           ` Bhupesh Sharma
@ 2018-11-16 15:50             ` Kazuhito Hagio
  0 siblings, 0 replies; 11+ messages in thread
From: Kazuhito Hagio @ 2018-11-16 15:50 UTC (permalink / raw)
  To: Bhupesh Sharma; +Cc: kexec mailing list, Baoquan He

Hi Bhupesh,

On 11/15/2018 3:27 PM, Bhupesh Sharma wrote:
>>> Note that PAGE_OFFSET is not constant on KASLR enabled arm64 kernels
>>> as the start of the linear range gets randomized as well (as per the
>>> KASLR seed) which means that PAGE_OFFSET is no longer equal to the
>>> macro:
>>> #define PAGE_OFFSET             (UL(0xffffffffffffffff) - \
>>>            (UL(1) << (VA_BITS - 1)) + 1)
>>
>> The PAGE_OFFSET macro itself is still constant in arm64 kernel and used by
>> p2v/v2p calculation, and also used by the test whether a virtual address is
>> in linear kernel range (as VA_BITS):

> I understand, but for me (and as I understood from other arm64 users
> as well), the kernel
> virtual address layout and the comments we have in place are probably
> broken after KASLR
> was enabled on arm64. This is something I shared with arm64 maintainers as well:
> [1]. https://www.spinics.net/lists/arm-kernel/msg655933.html
> 
> Lets go through some details first.
> Please note that the PAGE_OFFSET is defined as
> ('arch/arm64/include/asm/memory.h'):
> 
> /*
>  * PAGE_OFFSET - the virtual address of the start of the linear map (top
>  *         (VA_BITS - 1))
> 
> So, basically this points to the virtual address which indicates the
> start of linear mapping.
> Now, the linear region takes up exactly half of the kernel virtual
> address space.
> 
> So, with a normal non-KASLR boot (I am sharing the logs below on a
> system booted with 'nokaslr' in bootargs),
> the PAGE_OFFSET macro would normally point to the half mark of the
> kernel virtual address space.
> [I needed to revert the kernel commit
> "071929dbdd865f779a89ba3f1e06ba8d17dd3743: arm64: Stop printing the
> virtual memory layout" to get these logs]
> 
> [    0.000000] Virtual kernel memory layout:
> [    0.000000]     modules : 0xffff000000000000 - 0xffff000008000000
> (   128 MB)
> [    0.000000]     vmalloc : 0xffff000008000000 - 0xffff7bdfffff0000
> (126847 GB)
> [    0.000000]       .text : 0x(____ptrval____) - 0x(____ptrval____)
> (  8768 KB)
> [    0.000000]     .rodata : 0x(____ptrval____) - 0x(____ptrval____)
> (  5696 KB)
> [    0.000000]       .init : 0x(____ptrval____) - 0x(____ptrval____)
> (  4224 KB)
> [    0.000000]       .data : 0x(____ptrval____) - 0x(____ptrval____)
> (  1775 KB)
> [    0.000000]        .bss : 0x(____ptrval____) - 0x(____ptrval____)
> (  9193 KB)
> [    0.000000]     fixed   : 0xffff7fdffe790000 - 0xffff7fdffec00000
> (  4544 KB)
> [    0.000000]     PCI I/O : 0xffff7fdffee00000 - 0xffff7fdfffe00000
> (    16 MB)
> [    0.000000]     vmemmap : 0xffff7fe000000000 - 0xffff800000000000
> (   128 GB maximum)
> [    0.000000]               0xffff7fe000000000 - 0xffff7fe004000000
> (    64 MB actual)
> [    0.000000]     memory  : 0xffff800000000000 - 0xffff801000000000
> ( 65536 MB)
> 
> This is on a system with 64K page size and VA_BITS=48.
> For such a configuration, PAGE_OFFSET is also calculated as
> 0xffff800000000000, which the
> same as the start address of the memory range (i.e memory  :
> 0xffff800000000000 - 0xffff801000000000)
> which depicts the linear region from where 'kmalloc' allocations take place.
> 
> Now, lets look at the boot logs on the same system with 'nokaslr'
> removed from bootargs (i.e. a KASLR boot case):
> 
> [    0.000000] Virtual kernel memory layout:
> [    0.000000]     modules : 0xffff000000000000 - 0xffff000008000000
> (   128 MB)
> [    0.000000]     vmalloc : 0xffff000008000000 - 0xffff7bdfffff0000
> (126847 GB)
> [    0.000000]       .text : 0x(____ptrval____) - 0x(____ptrval____)
> (  8768 KB)
> [    0.000000]     .rodata : 0x(____ptrval____) - 0x(____ptrval____)
> (  5696 KB)
> [    0.000000]       .init : 0x(____ptrval____) - 0x(____ptrval____)
> (  4224 KB)
> [    0.000000]       .data : 0x(____ptrval____) - 0x(____ptrval____)
> (  1775 KB)
> [    0.000000]        .bss : 0x(____ptrval____) - 0x(____ptrval____)
> (  9193 KB)
> [    0.000000]     fixed   : 0xffff7fdffe790000 - 0xffff7fdffec00000
> (  4544 KB)
> [    0.000000]     PCI I/O : 0xffff7fdffee00000 - 0xffff7fdfffe00000
> (    16 MB)
> [    0.000000]     vmemmap : 0xffff7fe000000000 - 0xffff800000000000
> (   128 GB maximum)
> [    0.000000]               0xffff7fea5b300000 - 0xffff7fea5f300000
> (    64 MB actual)
> [    0.000000]     memory  : 0xffffa96cc0000000 - 0xffffa97cc0000000
> ( 65536 MB)
> 
> As you will note here, the start of linear range as indicated by the
> 'memory' node is 0xffffa96cc0000000
> (this would be a random value for each KASLR boot), however the
> PAGE_OFFSET macro is still stuck
> at 0xffff800000000000, which is confusing (and probably incorrect), as
> the 'kmalloc' calls return addresses in the range
> '0xffffa96cc0000000 - 0xffffa97cc0000000' and not in the
> '0xffff800000000000 - 0xffff801000000000'  range
> anymore as we saw in the non-KASLR boot case.

Thanks for the details.
Its name may be confusing, but I think that whether it is correct or not is
how it is used. If there is a misuse bug of it, it should be fixed, though.

>> I meant that I just would like to imitate these definitions also in
>> makedumpfile first, for better maintenance and future changes.
> 
> Sure, I think the arm64 kernel ideally requires a standard ABI to
> expose the start of linear address region to the user-space
> as well for both KASLR and non-KASLR cases (something like a
> 'page_offset_base' for x86_64), and I plan to submit a RFC patch for
> arm64 maintainers
> consideration once my x86_64 kernel patch is accepted.

So you might need a use case.
As for makedumpfile, this time we found that we can manage with VA_BITS and
PHYS_OFFSET also with KASLR (as far as we tested), and they are already in
vmcoreinfo. If there is another use case, it would be helpful for the patch
to be applied.	

>> I tested them on two arm64 systems (including KASLR enabled, because its
>> KERNELOFFSET=2eff90ae0000 in vmcoreinfo) and it tested OK,
>> but could you test them on your arm64 boards?
>>
>> If it works correctly, then let's think about using vmcoreinfo in PT_NOTE in
>> /proc/kcore, which kernels >= 4.19 have.
> 
> Yes, these patches work fine on my hp-moonshot machine both for KASLR
> and non-KASLR cases,
> and I agree with you that we need some kind of a solution for <4.19
> kernels as well and probably
> this is a good fix to have for now.
> 
> So, please feel free to add:
> Tested-by: Bhupesh Sharma <bhsharma@redhat.com>

Thank you for testing them! I really appreciate it.
I'll merge them with your tag.

Kazu

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH v2 2/2] makedumpfile/x86_64: Add 'page_offset_base' reading capability from VMCOREINFO
  2018-10-29  7:50 ` [PATCH v2 2/2] makedumpfile/x86_64: Add 'page_offset_base' reading capability from VMCOREINFO Bhupesh Sharma
@ 2018-11-16 16:32   ` Kazuhito Hagio
  0 siblings, 0 replies; 11+ messages in thread
From: Kazuhito Hagio @ 2018-11-16 16:32 UTC (permalink / raw)
  To: Bhupesh Sharma, kexec; +Cc: Baoquan He

Hi Bhupesh,

On 10/29/2018 3:50 AM, Bhupesh Sharma wrote:
> makedumpfile '--mem-usage' use-case is currently broken on x86_64 with
> latest kernels (see a sample log below with 4.19 kernel):
> 
> [root@hp-dl380pgen8-02-vm-15 ~]# makedumpfile -f --mem-usage /proc/kcore
>    The kernel version is not supported.
>    The makedumpfile operation may be incomplete.
>    <snip..>
>    set_kcore_vmcoreinfo: Can't get the offset of VMCOREINFO(/proc/kcore).
>    Success
> 
>    makedumpfile Failed.
> 
> It seems that the issue is seen because KCORE_REMAP which was added
> to recent kernels and appears as a PT_LOAD in '/proc/kcore'.

I'm a bit late, but as for this problem:

* Since kernel 4.19 itself doesn't have NUMBER(page_offset_base) in vmcoreinfo,
we need to merge the patch I sent in order to support kernel 4.19.
http://lists.infradead.org/pipermail/kexec/2018-October/021749.html

* It looks like the following commit already removed KCORE_REMAP PT_LOAD from
/proc/kcore at 4.20-rc1, and the problem doesn't occur with mainline kernel.

commit bf904d2762ee6fc1e4acfcb0772bbfb4a27ad8a6
Author: Andy Lutomirski <luto@kernel.org>
Date:   Mon Sep 3 15:59:44 2018 -0700

    x86/pti/64: Remove the SYSCALL64 entry trampoline
...
-       /*
-        * The cpu_entry_area alias addresses are not in the kernel binary
-        * so they do not show up in /proc/kcore normally.  This adds entries
-        * for them manually.
-        */
-       kclist_add_remap(&per_cpu(kcore_entry_trampoline, cpu),
-                        _entry_trampoline,
-                        &get_cpu_entry_area(cpu)->entry_trampoline, PAGE_SIZE);

But yes, kcore PT_LOADs seems changeable like this.

Thanks,
Kazu



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-11-16 16:36 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-29  7:50 [PATCH v2 0/2] makedumpfile: Allow 'info->page_offset' calc via VMCOREINFO in '/proc/kcore' Bhupesh Sharma
2018-10-29  7:50 ` [PATCH v2 1/2] makedumpfile/arm64: Use VMCOREINFO inside '/proc/kcore' (if available) Bhupesh Sharma
2018-11-02 21:00   ` Kazuhito Hagio
2018-11-04 22:16     ` Bhupesh Sharma
2018-11-06 20:17     ` Kazuhito Hagio
2018-11-08 21:58       ` Bhupesh Sharma
2018-11-12 19:51         ` Kazuhito Hagio
2018-11-15 20:27           ` Bhupesh Sharma
2018-11-16 15:50             ` Kazuhito Hagio
2018-10-29  7:50 ` [PATCH v2 2/2] makedumpfile/x86_64: Add 'page_offset_base' reading capability from VMCOREINFO Bhupesh Sharma
2018-11-16 16:32   ` Kazuhito Hagio

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.